2026
Feedback Genie
Topic extraction for open-ended survey responses. Hybrid embeddings and LLM, with an interactive codeframe to refine the result.

Context
Manually coding survey responses is slow and inconsistent. Pure-LLM approaches break on cost and recall once you push past a few hundred rows. I wanted a tool that scales to 10k responses, supports multi-theme assignment, and lets the analyst stay in control.
Approach
- Smart routing. Small datasets go through pure-LLM labelling. Larger ones run a hybrid pipeline of Gemini embeddings, K-Means clustering, then LLM theme labelling per cluster.
- Multi-theme assignment, so a single response can belong to several themes. Closer to how humans actually code qualitative data.
- Interactive codeframe view: drag responses between themes, merge or split themes, edit labels. The model surfaces a first pass; the analyst keeps final control over the codeframe.
- FastAPI backend handles the ML pipeline. Next.js front end handles upload, review, and CSV/JSON export.
Outcome
An analyst-in-the-loop workflow that takes a CSV to a coded dataset in 15 to 60 seconds. It shows how to combine deterministic ML with LLM judgment and human review.
Stack
- Next.js
- FastAPI
- Gemini embeddings
- scikit-learn
- TypeScript