SQLite3 and Other Critical Tools for Salesforce Data Work

SQLite? More Like SQLightning!

Introduction

Working with Salesforce data be daunting. Whether you’re migrating data from one org to another, transforming data for a new integration, or simply cleaning up data, the process can be time-consuming and error-prone. In this post, we’ll explore some tools and techniques that can help streamline the process and make your life easier.

Guiding Principles

Like any other development work, a clear set of goals is essential to success. Here are a few goals to consider when working with Salesforce data:

Efficiency: Minimize the time and effort required to complete the task. Automation is key here.
Accuracy: Ensure that the data is migrated, transformed, or cleaned up correctly. This means clear tests need to be defined.

and, most importantly;

Repeatability: Make the process repeatable so that it can be run multiple times without errors, in an idempotent manner.

Meet SQLite3, the Swiss Army Knife of Databases

SQLite3 is a lightweight, serverless, and self-contained SQL database engine. It’s perfect for this kind of work, as we don’t need to worry about concurrent database writes or setting up a database server. Lastly, it’s extremely portable, as it’s just a single file that can be easily shared.

A common practice is to use SQLite3 as a staging area for the data we’ll work with. We can easily extract/import data from a source system or series of files and persist the data in dedicated SQLite tables. Then, we can run our transformations and validations against this data before loading it into Salesforce.

Lastly, a table JOIN beats the pants off of a VLOOKUP operation any day of the week.

Python, Pandas, and Simple-Salesforce

In addition to SQLite, we’ll need a few additional tools. We’ll leverage Python throughout this post, but Ruby (with the Restforce gem) or Node.js (with the jsforce library) could be used as well.

Simple-Salesforce is a Python library that provides a simple and easy-to-use interface to interact with Salesforce.
Pandas is a powerful data manipulation library that provides data structures and functions to quickly and efficiently manipulate data.

Friendly reminder: never store your Salesforce credentials in your code. Use environment variables or a secure configuration file.

Setting up your Python Environment

To get started, you’ll need to install the required Python libraries. You can do this using pip:

pip install simple-salesforce pandas

Heads up: it’s always a good idea to use a virtual environment to manage your Python dependencies.

More detail on setting up a virtual environment can be found here.

Putting It All Together

Let’s walk through a simple example to illustrate how these tools can be used together.

Step 1

Create a new Connected App in Salesforce and obtain the client ID and client secret. You’ll also need your user’s security token. You’ll need to enable OAuth settings for the app, and the scope for the connected app should include full and refresh_token.

Step 2

Store your credentials in a secure manner. I recommend using environment variables, but we’ll use a configuration file for this example:

{
    "username":       "",
    "password":       "",
    "security_token": "",
    "client_id":      "",
    "client_secret":  "",
    "api_version":    "61.0",
    "host":           "test"
}

Step 3

Authenticate with Salesforce using Simple-Salesforce. Here’s a simple function to build the connection using the configuration file:

from simple_salesforce import Salesforce
import json

env_file = open("./myconfigurationfile.json", 'r')
sf_creds = json.loads(env_file.read())

un = sf_creds['username']
pw = sf_creds['password']
st = sf_creds['security_token']
ck = sf_creds['client_id']
cs = sf_creds['client_secret']
en = sf_creds['host']

sf = Salesforce(username=un, password=pw, security_token=st, consumer_key=ck, consumer_secret=cs, domain=en)

(Yes, it’s really that straightforward.)

Step 4

Extract data from Salesforce. In this example, we’ll extract data from the Account SObject and dynamically retrieve all fields:

sobject_type = "Account"
sobject = getattr(sf, sobject_type)
description = sobject.describe()
field_names = []
compound_fields = set()
for field in description["fields"]:
    field_names.append(field["name"])
    if field["compoundFieldName"] is not None and field["name"] != "Name":
        # compound fields should not be queried directly
        compound_fields.add(field["compoundFieldName"])

# remove compound fields
field_names = [field for field in field_names if field not in compound_fields]

# dynamically define bulk2 type
sobject = getattr(sf_client.bulk2, sobject_type)
query = f"SELECT {', '.join(field_names)} FROM {sobject_name}"
sobject.download(query, path='./temp_data')

Step 5

Now that we have our data locally as CSV files, we can load them into a SQLite3 table:

import sqlite3
import pandas as pd
import os
from pathlib import Path

# create a new SQLite3 database
db = sqlite3.connect('salesforce_db.sqlite3')

# for each CSV file in the data directory, load it into a SQLite3 table
for csv in Path('./temp_data').glob("*.csv"):
    df = pd.read_csv(csv)
    df.to_sql("Account", db, if_exists='append')
    
# then, we can clean up the CSV files
for filename in os.listdir('./temp_data'):
    file_path = os.path.join('./temp_data', filename)
    try:
        if os.path.isfile(file_path) or os.path.islink(file_path):
            os.unlink(file_path)
        elif os.path.isdir(file_path):
            shutil.rmtree(file_path)
    except Exception as e:
        print('Failed to delete %s. Reason: %s' % (file_path, e))

Step 6

Now that we have our data in SQLite3, we can run our transformations and validate the outputs. There’s a number of ways to do this, including running additional Python scripts, but we’ll leverage plain ol’ SQL for this example:

Here, we’ll create a new table Account_Clean and remove any records where the Website or Industry field is empty:

CREATE TABLE Account_Clean AS
SELECT * FROM Account
WHERE Website IS NOT NULL OR Industry IS NOT NULL;

Step 7

Finally, if we had modified data, we can load the records into Salesforce via the Bulk API:

pd.read_sql_query("SELECT * FROM Account_Clean", db).to_csv("temp-data.csv", sep=",", index=False)
sobject = salesforce_client.bulk2.__getattr__("Account")
results = sobject.upsert("temp-data.csv", external_id_field="Id")

Closing Thoughts

SQLite3, Python, and Simple-Salesforce are incredibly powerful tools that can make your data work more efficient, repeatable, and accurate. By leveraging these tools, you can streamline your data migration, transformation, and cleanup processes and focus on the more important (and fun) aspects of your work.