Introducing Podverse: AI superpowers for your podcast
Bringing AI-powered transcripts, summaries, and chat to podcasts
I’ve launched and open-sourced Podverse, a personal project that I have been working on to bring the power of generative AI to podcasts. The idea behind Podverse is to give podcasters all of the power of the latest AI tech to make their podcasts really shine.
Podverse is definitely a beta product, and I’m sure there are a lot of bugs and missing features. I would really love your feedback and ideas — feel free to drop me a line with any suggestions! All of the code for Podverse is open source and can be found at https://github.com/mdwelsh/podverse.
Podverse is a personal project and has nothing to do with my employer. I’m releasing the code as open source in case it is helpful for others wanting a reference implementation for integrating LLMs into a web app like this one.
I’m a huge fan of podcasts, and listen to a bunch of them (shout out to Omnibus and The Rest is History, which I love), but I’ve long felt that there’s so much great knowledge contained in podcasts that you can really only get to by listening to an entire episode. What if we could unlock all of this knowledge and make it searchable, shareable, and even use the power of AI to let people ask questions and learn more about the topics in the podcast? That’s what Podverse does.
Here’s how Podverse works:
- You give it the RSS feed for a podcast.
- It slurps up all of the metadata about the podcast and its episodes.
- For each episode, it uses AI to generate a text transcript, a summary of the episode, and even figures out the names of the speakers in the episode.
- All of this information is stored and indexed where you can access it through an AI chatbot so you can ask questions and find out more about any topic.
For me, being able to get access to everything in these podcasts in text form, in my web browser, and search everything has been amazing— I no longer feel like the stuff I’m learning in the podcasts is lost in my (incredibly poor) memory. I could imagine this being really useful for students, journalists, researchers, and others who want to tap into all of the great podcast content out there, but don’t have time to listen to every episode that might be relevant to a given topic.
How Podverse is built
Given all the excitement around AI and LLMs these days, I thought it would be useful to share some of the technical details on how Podverse works. If you are at all familiar with building LLM-powered applications, you’ll see it’s pretty straightforward. I wouldn’t say there’s anything particularly novel or exciting about Podverse’s design, but bringing everything together into a single coherent product has involved some nuanced decisions.
Nuts and bolts
The entire app is built in TypeScript using Next.js, deployed on Vercel. Going all in on the Next.js architecture made Podverse really easy to build in a short period of time. There’s no separation between the “frontend” and “backend”; just a bunch of React components, some of which happen to get rendered on the server. Next.js’s Server Actions are used to perform any backend data fetching and processing. While these are essentially REST endpoints behind the scenes, from the perspective of the Podverse code, they just look like functions that get called from React components. No need to faff about with defining a REST API and parsing JSON and all that.
I went with Supabase for both the database and blob storage, mainly because it’s so easy to get up and running. Supabase provides a really nice web UI for managing tables, debugging queries, creating edge functions, and so forth. I have a few gripes with Supabase — building and maintaining RLS policies is way harder than it should be, for example — but overall it’s been a great experience.
For auth, I went with Clerk, which has been great and super easy to integrate with. It provides user account management and social auth integrations out of the box, and I really spent almost no time setting things up. Highly recommended.
pgvector — built into Supabase
Another thing Supabase provides out of the box is pgvector, a PostgreSQL extension for vector storage and search. This is used in Podverse to store embeddings for episode summaries and transcript, which are used by the chatbot and search features, described more below. Setting up pgvector was as easy as toggling a switch in the Supabase API. To be honest I don’t know why anyone would use a separate vector database like Pinecone or Weaviate, at least at the scale that I am concerned with. Since it’s just part of the Postgres database I can configure whatever schema I want for the vector-indexed data and everything is in one place. This one was a no-brainer.
Inngest for background tasks
Background processing — such as generating episode transcripts and summaries — is implemented using Inngest, a really nice service that lets you define long-running workflows and background processing tasks that you can embed into your Next.js code. Inngest essentially runs your background tasks as serverless functions on Vercel, but takes care of details like concurrency, retry, sequencing, and so forth. I have a bunch of Inngest functions to do things like generate a transcript, identify speakers, and generate embeddings, all of which are coordinated by the Inngest cloud service.
Deepgram for transcripts
I use Deepgram for generating transcripts from the episode audio. Deepgram is amazing, super fast, and pretty affordable — it costs about 25 cents to process an hour of audio, and with incredible quality. It is also way faster than Amazon and Google’s ASR models.
Deepgram has a diarization option, which I use, that labels each utterance in the transcript using labels like “Speaker 0”, “Speaker 1”, etc. To determine the real names of each speaker, I pass the transcript through GPT4 with a prompt like:
The following is a transcript of a conversation between multiple
individuals, identified as "Speaker 0", "Speaker 1", and so forth.
Please identify the speakers in the conversation, based on the contents
of the transcript. Your response should be a JSON object, with the keys
representing the original speaker identifications (e.g., "Speaker 0",
"Speaker 1") and the values representing the identified speaker names
(e.g., "John Smith", "Jane Doe").
For example, your response might be:
{ "Speaker 0": "John Smith", "Speaker 1": "Jane Doe" }
ONLY return a JSON formatted response. DO NOT return any other information
or context. DO NOT prefix your response with backquotes.
I love this example because it encodes a fairly sophisticated “program” for the LLM to follow, but is entirely written in English. Based on my experimentation, this prompt works very well, although I’m sure there about a dozen ways it could be improved.
AI chat
The language model used for both background processing and the AI chat component is GPT4. I am fully aware that I might get better results and lower costs using a different model like Llama3 or something else. The model quality, latency, or cost have not yet been the bottleneck on this project so I haven’t taken the time to explore alternatives as of yet — GPT4 works great and I haven’t had a need to replace it yet.
The AI chat component is basically just a simple RAG pipeline. I did not use LangChain or LlamaIndex or any other framework for this; I just implemented it directly, and it’s pretty straightforward to do so. Once we have the speaker-identified transcript and episode summary, we split the content into chunks of 512 tokens and embed each one into pgvector.
In addition, we ask the LLM to generate a set of suggested queries for the episode, based on the summary and transcript. So, the Omnibus episode on Einstein’s Brain has example queries like:
What were Einstein’s views on pacifism and nuclear disarmament?
What happened to Einstein’s brain and how was it handled after his death?
which are of course discussed in the episode.
The AI chat component takes the user’s query and uses GPT4’s function calling API to fetch relevant chunks from pgvector. One interesting aspect here is that I augment the metadata for each chunk with information about the particular episode and the timestamp within that episode’s audio where the relevant information is found — that way we can generate a nice UI letting the user directly listen to the segment of the episode where the answer is found.
The UI for the AI chat interface is based on Vercel’s really slick AI chat library.
What’s next
I have a ton of ideas for features and improvements to Podverse, but I wanted to get it out there and get feedback from users and podcasters. Some of the things I’m tinkering with are automatic podcast recommendations, building an automated Wiki of content contained in podcasts, and links out to other reference materials, such as books, videos, and news articles about topics covered by a podcast.
I’d really like to hear from you if you have suggestions or bug reports. Please drop me a line at matt@ziggylabs.ai or message me on Twitter.