Gathering insights from customer calls with AI

How I used LLMs to improve my understanding of my customers

Introduction

Recently I wanted to review ~100 customer call transcripts at my startup to look for any interesting insights I may have missed in my own notes.

I’m not an AI/ML engineer, but this seemed like a fun reason to explore the current capabilities of large language models (LLMs), so I’m experimenting and sharing my results.

(For the curious, the code is on GitHub.)

Method

I used a text generation LLM (e.g. OpenAI, Anthropic, etc.) to analyze my customer conversation transcripts in three stages:

Summarize the problems faced by the users in each conversation into an aggregate list of all conversation topics.
Synthesize those conversation topics into a set of core problems.
Analyze each conversation with respect to the core problems to retrieve specific insights.

At the end, I wanted a coherent set of customer-specific problems, anecdotes, and their current solutions/workarounds.

Stages

Let’s walk through how each stage works and some challenges I encountered along the way.

One challenge is that LLMs lack an inherent understanding of importance, so this process largely includes techniques for telling the model what is important to us.¹

Stage 1 — Summarize

Goal: Summarize common themes from each conversation.

Input → All conversation transcripts
Output → All conversation topics

‍

Early attempts

At first, I tried a prompt like “return a list of topics discussed", however, this fails quickly, because the model thinks “I’m on my 5th cup of coffee today and should stop now.” is just as valid a problem as “We’ve been having trouble structuring roles at scale.”

To improve on this naive attempt, I broke down the prompt into some clauses to help it better understand the summarization task:

Persona clause: context for how the model should behave.
Input clause: how to read and clean the transcripts.
Output clause: how to process the transcripts, transform it, and write it.

Now let’s look at these one at a time for summarization.

‍

Persona clause (setting the context)

This preamble helps the model focus on the problems of our business. It’s a general purpose clause I’ll add to every prompt, nicely setting the tone of the model:

You are a product manager for Spyglass, a security platform for data teams.

Your job is to help me better understand my user and their problems.

Spyglass is a security platform that helps your data team continuously improve the security of your data by providing access exploration, easy-to-build access rules, change management, and automated compliance.

This discussion should be succinct and very related to our product as a security tool for data teams.

This clause alone helps the model filter out small talk problems like “too many meetings today” and “I wish it wasn’t raining.”

‍

Input clause (reading and cleaning)

This clause describes the input data: how it’s formatted, what the model should ignore, and what it should focus on.

When dealing with transcripts, I want to make sure the model ignores our own problem statements and opinions, as well as anything that feels like fluff:

Analyze this transcript of a conversation between our team and our customer. This is a VTT file, you can ignore timestamps.

In transcripts, you should ignore compliments, opinions, or things that feel subjective.

Nick Coffee and Tyler Julian are founders of our company, not users. Do not include problems mentioned by Nick Coffee or Tyler Julian.

Output clause (processing and transforming)

This clause describes the output: how to process the input, transform it, and write it out. For this task, it should summarize the call into a list of topics discussed with a bit of detail about each one.

Return a list of topics discussed in the conversation as plain text list without any formatting, with each topic on a new line, and some detail about each topic, separated by a semicolon.

(If the processing is significant, I imagine breaking this down into a “transform” clause and a “output format” clause, but we’ll combine them for now.)

‍

Summarization prompt

Our full prompt is a composition of the above clauses, plus our input data:

<Persona clause>
<Input clause>
<Output clause>
<Transcript from a single call>

The output of this stage is a file with a combined list of all relevant topics discussed across every conversation.

Stage 2 — Synthesize

Goal: Synthesize themes across all conversations into a condensed list of the most important and recurring themes.

Input → All conversation topics
Output → Set of core problems

‍

Synthesis prompt

This task is fairly simple, so the input and output clauses fit into a single clause:

<Persona clause>

This is a list of topics and some detail about each topic, separated by a semicolon. Please return the top 5-10 most common and important topics.

<Insert full list of topics>

The output of this stage is a file with a highly condensed list of topics discussed across all conversations.

Stage 3 — Analyze

Goal: Analyze each conversation through the lens of the core problems.

Input → All transcripts, plus set of core problems
Output → For each call, a list of specific problems, anecdotes, and details

‍

Analysis prompt

Now that we have a set of core problems, we can go back through each transcript and retrieve insights based on the core problems.

Most notably, the set of core problems is fed into the output clause, in order to significantly improve the model’s ability to retrieve important details (showing the purpose of this multi-stage approach).

<Persona clause>
<Input clause>

You should make clear notes about the problems a user actually experiences, quoting anecdotes word for word where possible,
and noting any ways they are currently solving the problem.

Please returned detailed notes on the problems faced by the users in the conversation, as they relate to our core product themes:

<Insert list of synthesized topics>

<Insert transcript from a single call>

(There’s a bit more this stage does, but I’ve omitted it for brevity.)

Results

Overall, I’m pretty happy with the results. It’s a much faster way to review a lot of information very quickly than I could’ve done otherwise.

Some snippets from the final analysis:

Access Management and Exploration

Challenge: Understanding who has access to sensitive data across various platforms to ensure security.
Current approach: “I usually maintain a couple of spreadsheets to track who has access to what, but it's a hassle updating them all the time.”
Quote: "I need more visibility into who has access to our sensitive data especially in platforms like Snowflake."

Automated Compliance

Challenge: Ensuring ongoing compliance with data protection regulations through automated features.
Current approach: “We conduct periodic manual audits to ensure compliance, but it's time-consuming and prone to error.”
Quote: "Automating compliance checks would save us a lot of hassle and reduce the risk of overlooking something."

Easy-to-Build Access Rules

Challenge: Creating and managing access control rules can be difficult without user-friendly interfaces.
Current approach: “Right now, I use a combination of scripts and manual processes to set up access controls, which isn't very efficient."
Quote: "It would be great if there was an easier way to build these rules without technical complexity."

Discussion

The SQL of free form text! This process of chaining prompts together feels a lot like writing SQL, except we’re querying and summarizing free form text instead of database records. I’ve since learned that LCEL could be the right long term tool for scaling up these sorts of analyses.

Surprising outputs. While you can be explicit about desired output format, the results format above (challenge, current approach, quote) actually appeared spontaneously in some analyses, and I really liked it and only later enforced it as my desired format!

Cherry-picked quotes. Sometimes a quote is picked that doesn’t give the full story, and can even be misleading without the full context. So it’d be nice to be able to quickly double-check quote context and confirm what the person meant (involves retrieval).

The issue of bias. On our calls, we tend to talk about problems that we’ve already identified. Even if the model ignores our team’s speech in the transcripts, we’ve still already “led” our customers to talk about specific things. So the set of “core problems” is likely very biased based on the questions we ask. We’re unlikely to stumble on any unexpected problems — they are mostly problems we know and expect.

Interesting insights…? In the introduction, I said I was looking for interesting insights I may have missed. I don’t think I’ve achieved that goal yet, but we did get a lot more concrete stories to back up our current hypotheses. And I’m expecting to improve this to get closer to that goal over time.

Can we condense the final per-call analysis? The final analysis of each call is much easier to review than the full transcript, but that’s still ~100 analyses to review. I’m looking for ways to reduce the amount of information to the most important pieces, without losing too much nuance.

Is this cost effective? At roughly 10k tokens per transcript, ~100 transcripts, and $5 / 1M tokens, it’s around $10 for this full analysis. That’s fine at this scale, but it feels like it’ll need some optimizations to work at larger scales.

Changing the analysis. Say I want to re-run the entire analysis, but ask for something slightly different. Instead of asking for their “current problems,” I’d like to be able to ask about their “objections to Spyglass.” In the future, I’m planning to make it easy to make these sorts of changes without too much re-processing work.

Special thanks to Francisco Arceo for inspiring this approach and feedback.

‍

*1 We use Grain to record customer calls (not affiliated with them), which is nice for transcripts, but we’ve found that their summaries don’t understand our business context enough to give very useful summaries.

‍

R. Tyler Julian

June 12, 2024