Gathering insights from customer calls with AI
How I used LLMs to improve my understanding of my customers
Introduction
Recently I wanted to review ~100 customer call transcripts at my startup to look for any interesting insights I may have missed in my own notes.
I’m not an AI/ML engineer, but this seemed like a fun reason to explore the current capabilities of large language models (LLMs), so I’m experimenting and sharing my results.
(For the curious, the code is on GitHub.)
Method
I used a text generation LLM (e.g. OpenAI, Anthropic, etc.) to analyze my customer conversation transcripts in three stages:
- Summarize the problems faced by the users in each conversation into an aggregate list of all conversation topics.
- Synthesize those conversation topics into a set of core problems.
- Analyze each conversation with respect to the core problems to retrieve specific insights.
At the end, I wanted a coherent set of customer-specific problems, anecdotes, and their current solutions/workarounds.
Stages
Let’s walk through how each stage works and some challenges I encountered along the way.
One challenge is that LLMs lack an inherent understanding of importance, so this process largely includes techniques for telling the model what is important to us.1
Stage 1 — Summarize
Goal: Summarize common themes from each conversation.
Input → All conversation transcripts
Output → All conversation topics
Early attempts
At first, I tried a prompt like “return a list of topics discussed", however, this fails quickly, because the model thinks “I’m on my 5th cup of coffee today and should stop now.” is just as valid a problem as “We’ve been having trouble structuring roles at scale.”
To improve on this naive attempt, I broke down the prompt into some clauses to help it better understand the summarization task:
- Persona clause: context for how the model should behave.
- Input clause: how to read and clean the transcripts.
- Output clause: how to process the transcripts, transform it, and write it.
Now let’s look at these one at a time for summarization.
Persona clause (setting the context)
This preamble helps the model focus on the problems of our business. It’s a general purpose clause I’ll add to every prompt, nicely setting the tone of the model:
This clause alone helps the model filter out small talk problems like “too many meetings today” and “I wish it wasn’t raining.”
Input clause (reading and cleaning)
This clause describes the input data: how it’s formatted, what the model should ignore, and what it should focus on.
When dealing with transcripts, I want to make sure the model ignores our own problem statements and opinions, as well as anything that feels like fluff:
Output clause (processing and transforming)
This clause describes the output: how to process the input, transform it, and write it out. For this task, it should summarize the call into a list of topics discussed with a bit of detail about each one.
(If the processing is significant, I imagine breaking this down into a “transform” clause and a “output format” clause, but we’ll combine them for now.)
Summarization prompt
Our full prompt is a composition of the above clauses, plus our input data:
The output of this stage is a file with a combined list of all relevant topics discussed across every conversation.
Stage 2 — Synthesize
Goal: Synthesize themes across all conversations into a condensed list of the most important and recurring themes.
Input → All conversation topics
Output → Set of core problems
Synthesis prompt
This task is fairly simple, so the input and output clauses fit into a single clause:
The output of this stage is a file with a highly condensed list of topics discussed across all conversations.
Stage 3 — Analyze
Goal: Analyze each conversation through the lens of the core problems.
Input → All transcripts, plus set of core problems
Output → For each call, a list of specific problems, anecdotes, and details
Analysis prompt
Now that we have a set of core problems, we can go back through each transcript and retrieve insights based on the core problems.
Most notably, the set of core problems is fed into the output clause, in order to significantly improve the model’s ability to retrieve important details (showing the purpose of this multi-stage approach).
(There’s a bit more this stage does, but I’ve omitted it for brevity.)
Results
Overall, I’m pretty happy with the results. It’s a much faster way to review a lot of information very quickly than I could’ve done otherwise.
Some snippets from the final analysis:
Access Management and Exploration
- Challenge: Understanding who has access to sensitive data across various platforms to ensure security.
- Current approach: “I usually maintain a couple of spreadsheets to track who has access to what, but it's a hassle updating them all the time.”
- Quote: "I need more visibility into who has access to our sensitive data especially in platforms like Snowflake."
Automated Compliance
- Challenge: Ensuring ongoing compliance with data protection regulations through automated features.
- Current approach: “We conduct periodic manual audits to ensure compliance, but it's time-consuming and prone to error.”
- Quote: "Automating compliance checks would save us a lot of hassle and reduce the risk of overlooking something."
Easy-to-Build Access Rules
- Challenge: Creating and managing access control rules can be difficult without user-friendly interfaces.
- Current approach: “Right now, I use a combination of scripts and manual processes to set up access controls, which isn't very efficient."
- Quote: "It would be great if there was an easier way to build these rules without technical complexity."
Discussion
The SQL of free form text! This process of chaining prompts together feels a lot like writing SQL, except we’re querying and summarizing free form text instead of database records. I’ve since learned that LCEL could be the right long term tool for scaling up these sorts of analyses.
Surprising outputs. While you can be explicit about desired output format, the results format above (challenge, current approach, quote) actually appeared spontaneously in some analyses, and I really liked it and only later enforced it as my desired format!
Cherry-picked quotes. Sometimes a quote is picked that doesn’t give the full story, and can even be misleading without the full context. So it’d be nice to be able to quickly double-check quote context and confirm what the person meant (involves retrieval).
The issue of bias. On our calls, we tend to talk about problems that we’ve already identified. Even if the model ignores our team’s speech in the transcripts, we’ve still already “led” our customers to talk about specific things. So the set of “core problems” is likely very biased based on the questions we ask. We’re unlikely to stumble on any unexpected problems — they are mostly problems we know and expect.
Interesting insights…? In the introduction, I said I was looking for interesting insights I may have missed. I don’t think I’ve achieved that goal yet, but we did get a lot more concrete stories to back up our current hypotheses. And I’m expecting to improve this to get closer to that goal over time.
Can we condense the final per-call analysis? The final analysis of each call is much easier to review than the full transcript, but that’s still ~100 analyses to review. I’m looking for ways to reduce the amount of information to the most important pieces, without losing too much nuance.
Is this cost effective? At roughly 10k tokens per transcript, ~100 transcripts, and $5 / 1M tokens, it’s around $10 for this full analysis. That’s fine at this scale, but it feels like it’ll need some optimizations to work at larger scales.
Changing the analysis. Say I want to re-run the entire analysis, but ask for something slightly different. Instead of asking for their “current problems,” I’d like to be able to ask about their “objections to Spyglass.” In the future, I’m planning to make it easy to make these sorts of changes without too much re-processing work.
Special thanks to Francisco Arceo for inspiring this approach and feedback.