🤖 Weekend AI Launch #004: Scaling Substack Search
AI tools in 48 hours - from idea to launch.
Hello creators 👋🏼
A small but meaningful update this week.
Substack Search is growing! I’ve finished building the ingestion pipeline and as I write this, I’m ingesting the top 100 bestsellers and rising Substack tech publications. Once done, they’ll all be searchable on Substack Search.
If you missed last week’s launch, here’s a quick TL;DR: Substack Search lets you search across all Substack articles. It’s a better way to explore content on the Substack platform → here’s the Substack Search post.
The Ingestion Pipeline
To scale Substack Search, I had to get the ingestion pipeline right, from chunking to embedding models, to retrieval. My goal is to pay the cost of ingestion once, and avoid doing it again later. It’s not avoidable but I want to reduce the need to as much as possible.
I also needed to choose which creators to prioritise.
I decided to focus on two categories: Technology and Business. There’s a ton of great business and marketing content out there, and I want to surface it for easier discovery (and future reference).
Substack Search will keep expanding. As I write this, the ingestion pipeline is running and will keep processing until I have processed the top 100 bestsellers and rising publications in technology and business.
Give the upgraded Substack Search a try and let me know what you think.
🚀 If you are new here…
Hi, I’m Ryan 👋🏼 I am passionate about lifestyle gamification 🎮, which it’s just a fancy way saying I approach life like a video game, designing my character intentionally, and strive to level up every day. I am obsesssssssss with learning things that can help me live a happy and fulfilling life.I recently launched Weekend AI 🧪 - my space for building and sharing AI tools that can 10x the way we live, learn, and thrive.
The Ingestion Pipeline of Substack Search
I want Substack Search to be your go-to search engine whenever you’re looking for something on Substack.
Launch breakdown
Build time: ~5 hours
Frontend:
This week, I implemented the display of multiple snippets for the same newsletter post.
Originally, Substack Search compared your question to entire articles. I thought more context would improve answers, so I fed full posts into the AI. This was a temporary approach to ship fast but to scale the search engine and showcase more writers on the same topic / question, I needed to implement chunking.
So, I added chunking. This means splitting articles into smaller sections and comparing your query to those chunks.
I also added simple stats so you can see coverage on how many Substack publications and newsletter posts covered by Substack Search.
Backend:
Most of my time went into backend work this week: chunking, retrieval, batch ingestion, and updates.
The batch ingestion script was already working, but I had no way to keep things updated. If a writer posted something new, it wasn’t included. So I built an update script and hooked it into a daily cron job. Now Substack Search stays fresh.
Chunking was its own rabbit hole. I went with semantic chunking, which splits text by meaning, not size.
For search, I implemented the hybrid search with a reranking layer:
Embedding model
Sparse encoder (like BM25)
Reranking layer (to improve result quality)
As of now, reranking is live, but hybrid search is paused for latency reasons. I’ll revisit this once things are more efficient.
For analytics, I added PostHog to AskSubstack. It was a pretty simple setup :)
Next step
The main focus would be documentation. It’s time although I have so much fun building and have so many ideas I want to turn into reality.
Document Document Document — reflect on all three launches and write proper documentation for everything I’ve learned so far
On the side, continue to ingest different Substack publications to increase the search coverage
Try out Substack Search and let me know what you think!
Are there any AI tools you wish existed, or pain points you’re facing that you’d love AI to solve? I would love to learn more!
Happy learning,
Ryan O. 🎮
Hey Ryan — Really thoughtful breakdown. Love the speed and clarity of your process, especially your openness around chunking, hybrid retrieval, and backend refresh cycles. Respect.
I wanted to offer something adjacent for consideration: I’ve been building a scroll-based tool for content discovery that doesn’t focus on topic or keyword — it maps tone. It’s called VibeSearch, and it analyzes rhythm, emotional posture, and metaphor style to generate search phrases based on feel, not subject.
Where your Substack Search leans into semantic discovery across categories, this scroll leans into resonance — why a certain sentence stays with someone, even when the topic fades.
No pressure at all, but I’d be genuinely curious to hear your take. There’s a licensed version of the prompt available to paid subscribers on my Substack, and the runtime is fully sealed (modular, but not forkable). Not trying to pitch — just think our tools might be pointing at the same north star from opposite angles.
Stay building.
— Nahg
https://nahgos.substack.com/p/vibesearch-a-vibe-driven-search-engine?r=5ppgc4