Jupyter Notebooks: Your Secret Weapon for Salesforce Org Discovery
Every Salesforce consultant has been there. You get the keys to a new org, log in, and immediately feel that familiar mix of curiosity and dread. What’s in here? How bad is it? Where are the land mines?
You could click around Setup for a few hours. You could ask the client a bunch of questions (and get a bunch of “I think maybe someone set that up a few years ago?” answers). Or, you could do what I’ve been doing lately: fire up a Jupyter notebook and let the org tell you its own story.
Why a Notebook?
I’ve been reaching for Jupyter notebooks more and more during client discovery, and at this point I’m not sure how I ever lived without them. The appeal boils down to a few things.
First, notebooks are iterative. You can run a query, look at the results, decide what to dig into next, and keep going without losing context. It’s the closest thing to having a conversation with a Salesforce org.
Second, they’re self-documenting. When you’re done poking around, you don’t just have a pile of SOQL queries in a text file. You have a narrative (think markdown headings, inline results, and charts) that you can hand to a colleague or even share with the client. It’s a living artifact of your discovery process.
Third, and this is the one that sold me, they’re reusable. I’ve built up a standard org analysis notebook that I bring to every new engagement. I connect it to the org, hit “Run All,” and within a few minutes I have a high-level picture of what I’m working with. It’s not a replacement for real discovery, but it’s a phenomenal starting point.
The Setup
The tooling is straightforward. I use simple_salesforce as a Python client for the Salesforce REST API, pandas for wrangling data into DataFrames, and matplotlib (with seaborn for aesthetics because I have some self-respect) for visualization.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
from simple_salesforce import Salesforce
seaborn.set()
The notebook is designed to run either locally or in Google Colab, which is nice when you’re pairing with someone who doesn’t have a local Python environment set up.
A Quick but Important Note About Credentials
This is the part where I get on a small soapbox. Never hard-code your Salesforce credentials. Not in the notebook, not in a config file you commit to Git, not anywhere that isn’t explicitly designed for secrets management.
When running in Google Colab, I pull credentials from Colab’s userdata secrets store:
from google.colab import userdata
sf = Salesforce(
username=userdata.get('SF_USERNAME'),
password=userdata.get('SF_PASSWORD'),
security_token=userdata.get('SF_SECURITY_TOKEN'),
consumer_key=userdata.get('SF_CLIENT_ID'),
consumer_secret=userdata.get('SF_CLIENT_SECRET'),
domain=userdata.get('SF_DOMAIN')
)
When running locally, I load from an environment file that is .gitignored into oblivion:
import json
filename = f'./envs/{env_name}.json'
with open(filename, 'r') as f:
creds = json.load(f)
sf = Salesforce(
username=creds['username'],
password=creds['password'],
security_token=creds['security_token'],
consumer_key=creds['client_id'],
consumer_secret=creds['client_secret'],
domain=creds['host']
)
The environment file itself is just a simple JSON blob with your connected app credentials and login info. The key point is that it stays local. It never touches version control. If you’re using dotenv or OS-level environment variables instead, even better. The principle is the same: secrets stay secret.
(Stepping off soapbox now.)
What the Notebook Actually Does
Here’s the rough structure of my standard discovery notebook. Think of it as a checklist of the things I want to know about any org before I start making promises.
1. API Limits: Am I About to Ruin Someone’s Day?
The very first thing the notebook does after authenticating is check the org’s daily API request usage. This matters because the notebook itself will consume API calls, and if you’re connecting to a production org that’s already running hot, you want to know before you fire off a few hundred SOQL queries.
limits = sf.limits()
api_usage = limits['DailyApiRequests']
consumed = api_usage['Max'] - api_usage['Remaining']
pct = round(consumed / api_usage['Max'], 2) * 100
if pct > 80:
print(f'WARNING: {pct}% consumed. Halting.')
raise StopExecution
else:
print(f'{pct}% of daily API requests consumed. Proceeding.')
I built in a StopExecution exception that halts the notebook if API consumption is above 80%. It’s a small thing, but it’s saved me from an awkward conversation at least once.
2. Apex Code Size: How Much Custom Code Am I Inheriting?
Next up: getting a sense of the codebase. I query all active Apex classes (excluding managed packages and test classes) and calculate each one’s size as a percentage of Salesforce’s character limit. This gives me a quick feel for how much custom code exists and where the big files live.
# Find test classes via SOSL
test_classes = sf.search(
"FIND {@isTest} IN ALL FIELDS RETURNING ApexClass(Id, Name)"
)
# Query all active, unpackaged Apex classes
all_classes = sf.query(
"SELECT Id, Name, Body, LengthWithoutComments "
"FROM ApexClass "
"WHERE Status = 'Active' AND NamespacePrefix = null"
)
From there, it’s just a matter of calculating byte sizes and sorting. A class sitting at 3% of the org’s total Apex limit is worth investigating. A handful of large utility classes from an open-source logging library? Probably fine. A 50KB class called DataFactory with no clear naming convention? That’s where you start asking questions.
3. Automation Inventory: Scanning for Land Mines
This section is the one I find most valuable in early discovery. Salesforce orgs have a tendency to accumulate automation like a junk drawer accumulates takeout menus. Process Builders, record-triggered flows, Apex triggers… they all pile up. Understanding what fires when, and on which objects, is critical.
Process Builders:
process_builders = sf.query(
"SELECT ApiName, Label, InstalledPackageName, VersionNumber "
"FROM FlowDefinitionView "
"WHERE IsActive = TRUE AND ProcessType = 'Workflow'"
)
Just seeing the count here tells you something. A handful of Process Builders on a mature org? Normal. Fourteen of them, some from managed packages, some clearly hand-rolled years ago? That’s a migration conversation waiting to happen (since Salesforce is sunsetting Process Builder in favor of Flows).
Record-Triggered Flows:
flows = sf.query(
"SELECT ApiName, TriggerType, Label, TriggerObjectOrEventLabel, "
"RecordTriggerType, TriggerOrder "
"FROM FlowDefinitionView "
"WHERE IsActive = TRUE "
"AND (TriggerType='RecordAfterSave' OR TriggerType='RecordBeforeSave')"
)
I chart these by object label to see which objects are the busiest. If an object has multiple before-save flows, after-save flows, and Apex triggers, that’s a land mine. You need to understand the order of execution and whether they’re stepping on each other.
Apex Triggers:
triggers = sf.query(
"SELECT Name, TableEnumOrId, NamespacePrefix, Status, "
"UsageAfterInsert, UsageAfterUpdate, UsageBeforeInsert, UsageBeforeUpdate "
"FROM ApexTrigger WHERE Status = 'Active'"
)
I split these into managed package triggers versus unpackaged triggers. Managed package triggers (like the ones from CPQ, which can have dozens) are largely out of your control but important to be aware of. Unpackaged triggers are the ones you’ll be maintaining, and visualizing them by object and trigger event gives you an instant sense of complexity.
4. Storage and Object Usage: What’s Actually in This Org?
I query every customizable SObject in the org and get record counts for each. This takes a minute (it’s a lot of queries), but the result is a clear picture of which objects are heavily used and which are ghost towns.
entities = sf.query_all(
"SELECT QualifiedApiName "
"FROM EntityDefinition "
"WHERE IsCustomSetting = FALSE AND IsCustomizable = TRUE"
)
# Then, for each object:
count = sf.query(f"SELECT COUNT() FROM {object_api_name}")
Objects with zero records are interesting. Did someone build this out and never use it? Objects with millions of records are interesting too. Is storage going to be a concern? I also plot record growth over time for the top objects by querying CreatedDate on recent records. A sudden spike in record creation usually has a story behind it.
5. User Adoption: Is Anyone Actually Using This Thing?
Finally, I look at licensing and user activity. License utilization tells you whether the org is right-sized or if you’re paying for seats nobody’s sitting in. User activity (specifically LastLoginDate) tells you who’s engaged and who’s not.
licenses = sf.query(
"SELECT Name, TotalLicenses, UsedLicenses, Status "
"FROM UserLicense"
)
users = sf.query(
"SELECT Name, IsActive, Profile.Name, UserRole.Name, "
"LastLoginDate, Profile.UserLicense.Name "
"FROM User "
"WHERE IsActive = TRUE AND UserType != 'AutomatedProcess'"
)
A user who hasn’t logged in for 90+ days is worth flagging. A pile of unused Salesforce Platform licenses? That’s a cost optimization conversation.
The Bigger Picture
None of this replaces talking to people. You still need to sit down with admins, developers, and end users to understand why things are the way they are. But running this notebook gives you a massive head start. You walk into those conversations already knowing which objects are busiest, where the automation is tangled, and which corners of the org haven’t been touched in years.
It also earns you a bit of credibility. There’s something about saying “I noticed you have fourteen active Process Builders and a before-save flow on Opportunity that appears to overlap with an Apex trigger” that signals you’ve done your homework.
If you’re a Salesforce consultant or developer who hasn’t dabbled in Jupyter notebooks yet, I’d genuinely encourage you to give it a try. The learning curve for basic pandas and simple_salesforce is small, and the payoff in terms of discovery speed is enormous. And once you have your template notebook, every new org is just a Shift+Enter away from being a lot less mysterious.
Happy exploring.