2026

Feedback Genie

Topic extraction for open-ended survey responses. Hybrid embeddings and LLM, with an interactive codeframe to refine the result.

Context

Manually coding survey responses is slow and inconsistent. Pure-LLM approaches break on cost and recall once you push past a few hundred rows. I wanted a tool that scales to 10k responses, supports multi-theme assignment, and lets the analyst stay in control.

Approach

Smart routing. Small datasets go through pure-LLM labelling. Larger ones run a hybrid pipeline of Gemini embeddings, K-Means clustering, then LLM theme labelling per cluster.
Multi-theme assignment, so a single response can belong to several themes. Closer to how humans actually code qualitative data.
Interactive codeframe view: drag responses between themes, merge or split themes, edit labels. The model surfaces a first pass; the analyst keeps final control over the codeframe.
FastAPI backend handles the ML pipeline. Next.js front end handles upload, review, and CSV/JSON export.

Outcome

An analyst-in-the-loop workflow that takes a CSV to a coded dataset in 15 to 60 seconds. It shows how to combine deterministic ML with LLM judgment and human review.

Stack

Next.js
FastAPI
Gemini embeddings
scikit-learn
TypeScript

Context

Approach

Outcome

Stack

Links