Populating Salesforce environments with test data using OpenAI GPT3 and Azure Logic Apps

I've recently spent some time playing with a Salesforce developer environment and wanted to understand how I can populate it with test data. As it turns out, developer environments are already populated with test data, it's just I'm a newbie to Salesforce so couldn't find it! 😂

Putting my lack of "Salesforce-fu" to one side, I did find a good overview of how you can push test data into non-production Salesforce environments from the Salesforce Ben website. I have to say the use of Apex code and CSV files felt a bit clunky, especially as a new Salesforce developer. The blog post made use of Mockaroo to generate the test data which looks pretty cool, and it turns out Mockaroo has a REST API you can use for programmatic access.

TL;DR - find the sample project for this on GitHub.

Problem solved? Not quite!

If you can solve a problem, why not solve it with AI?!

It's safe to say none of us has avoided the recent AI hype train which has been driven primarily by Microsoft's use of OpenAI's powerful models in various products. Despite the hype, it is worth looking at practical examples of how you can use these models in your solutions, so I thought... instead of Mockaroo, why don't we use an AI model to generate our test data? But which one?

To help with the selection process I started with what I wanted:

👉 Test data for Salesforce Account and Contact objects in a JSON format.

As I don't actually know what the JSON structure looks like for these objects, I spun up a version of the gpt-35-turbo model in Azure OpenAI Services and asked it to provide a sample Account and a Sample Contact object. Thankfully it's clever enough to deal with my typos!

gpt-35-turbo answering question on a Salesforce Contact object

gpt-35-turbo answering question on a Salesforce Account object

I've seen a few people use the public preview of ChatGPT to generate test data, so my first solution used the same model. More on this in a moment.

No code like low code

I've written a lot about how I'm a fan of using Azure Logic Apps to solve problems, and yet again I find myself turning to them for this situation.

I started by looking at the Logic Apps Connectors list to see what was available for OpenAI and found one that is aimed at the native OpenAI REST APIs. As I'm using the Azure-published versions of the OpenAI services this Connector wasn't going to work for me, so I decided to use the in-built Logic Apps HTTP Connector and call the Azure OpenAI REST API directly.

My other requirement was to insert the generated data into Salesforce. Thankfully there is an existing Logic Apps Salesforce Connector which does the job. Probably the only missing feature is the ability to insert multiple records in a single call, but I can live with that for now.

Let's go ahead and build out a solution.

Chat completions versus Completions for GPT3

Earlier in the post I mentioned I'd return to using the gpt-35-turbo model for generating test data. I did use it in my Logic App but immediately hit an issue. You'll notice in the screenshots above that not only did I get a JSON structure returned, but I also got a text description of the JSON structure. This is because the Chat interface is designed specifically for what it says - chat! I thought I could ask it to just return the JSON structure, but it always returned some human-friendly text along with the JSON object. You can see a sample in the content field below.

{
  "body": {
    "id": "chatcmpl-7ByJeZcKdXv8lv65R8EajWtNK0eZj",
    "object": "chat.completion",
    "created": 1683087818,
    "model": "gpt-35-turbo",
    "choices": [
      {
        "index": 0,
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": "Here's an example JSON object for a Salesforce Account:\n\n```\n{\n    \"Name\": \"Acme Inc\",\n    \"Description\": \"Manufacturing and distribution of widgets\",\n    \"Industry\": \"Manufacturing\"\n}\n```\n\nThis object represents an Account object with a name, description, and industry. You can add more fields as needed for your specific use case."
        }
      }
    ],
    "usage": {
      "completion_tokens": 74,
      "prompt_tokens": 46,
      "total_tokens": 120
    }
  }
}

I could have written a solution to parse out the JSON object, but that felt a bit flakey and unnecessary. In doing a bit more research (and finding this great video on using GPT3 for test data) I found what I actually needed was to use the GPT3 Davinci model which just returns a completion without the chat wrapper.

In my existing Azure OpenAI service I went ahead and deployed an instance of the text-davinci-003 model which gives me just Completions without the Chat wrapper.

View of deployed models in my Azure OpenAI service

Now I have the model deployed I can build my Logic App to call the REST API and generate my test data. The way to interact with the Completions API is documented on Microsoft Learn. Because I want the Completion to use a specific format for the JSON object to return I have to provide a sample. Embedding and escaping JSON in JSON is a pain, so I cheated a bit and defined the structure in a variable in the Logic App 😁.

Here's the Azure OpenAI REST API call in the Logic App. The temperature value is an attempt to try and create some variability in the responses.

Azure Logic App HTTP action to call the Azure OpenAI REST API

You'll notice that a) I'm using Key Vault to store my API key and b) I have a Logic App variable that holds my JSON object scaffold. That variable looks like this:

Azure Logic App variable definition for JSON object scaffold

Now I don't need to try and escape JSON in JSON!

When I run the request for the API this is a sample response of what I get back:

{
  "id": "cmpl-7Cetr8dbnU9Pwbm67uLasIsv4hB6e",
  "object": "text_completion",
  "created": 1683251511,
  "model": "text-davinci-003",
  "choices": [
    {
      "text": "\n\n{  \n  \"Name\": \"ABC Corporation\",  \n  \"Type\": \"Technology\",  \n  \"Industry\": \"Information Technology\",  \n  \"BillingAddress\": {  \n    \"street\": \"123 Main Street\",  \n    \"city\": \"Mountain View\",  \n    \"state\": \"California\",  \n    \"postalCode\": \"94043\",  \n    \"country\": \"United States\"  \n  },  \n  \"Phone\": \"555-555-5555\",  \n  \"Website\": \"http://www.abc-corporation.com\"  \n}",
      "index": 0,
      "finish_reason": "stop",
      "logprobs": null
    }
  ],
  "usage": {
    "completion_tokens": 132,
    "prompt_tokens": 119,
    "total_tokens": 251
  }
}

Then a little bit of Parse JSON magic and I can covert the text into the object I need.

{
  "Name": "ABC Corporation",
  "Type": "Technology",
  "Industry": "Information Technology",
  "BillingAddress": {
    "street": "123 Main Street",
    "city": "Mountain View",
    "state": "California",
    "postalCode": "94043",
    "country": "United States"
  },
  "Phone": "555-555-5555",
  "Website": "http://www.abc-corporation.com"
}

Finally I can feed this into the Salesforce action I have and populate the Account object in my environment.

Azure Logic App HTTP action to create Account in Salesforce

Creating linked entities

As I wanted to create Contacts linked to Accounts, I created a parallel branch in my Logic App to use the same OpenAI REST API call to create a Contact Object. I then use this JSON object and the Account ID of the inserted Account to create a new linked Contact.

Here's what the entire Logic Apps looks like.

Wrapping up

There are a few things to be aware of out the back of this solution:

The Azure OpenAPI Service is currently available only via approved use. You have to apply for access and meet certain criteria before access is granted.
The Azure OpenAI REST APIs have limited response lengths - up to 4,096 tokens (effectively characters) which also includes your prompt. This means generating large datasets in a single call isn't currently possible.
The cost of using the Azure OpenAI APIs is not that expense... unless you are generating massive amounts of text. I'm using the Standard Tier and my limited testing (more than you see in this post) generated the grand total cost of $0.12 AUD in costs. Billing is via your standard Azure payment method, so it's super easy to get going when compared to connecting directly to OpenAI's service APIs.
The OpenAI service doesn't generate guaranteed unique data. I've had multiple instances in testing where it created the same Account multiple causing Salesforce to return an error due to duplicate detection. I could probably improve this by better crafting my prompt. I'll put that on the to-do list!
The Logic Apps Salesforce connector doesn't have pre-defined bulk insert/update actions which means you need to look at other methods for achieving this. Looping over data in a Logic App can quickly result in ballooning costs, which is something that you might be unaware of. It's worth reading the documentation to understand how billing works.

If you want to see how this solution hangs together you can find the Bicep deployment and Logic App Standard definition on GitHub. Manually building this Logic App is about 20 minutes work if you're keen 😜.

There's a lot of concern floating around about the current generation of AI services and their impact on jobs (and broader society), but when AI is treated as Augmented Intelligence, where it is additive to human capabilities (like it is here), then hopefully we can see that the future is a lot brighter with AI than without.

Happy Days! 😎