It's the first thing every researcher tries now: paste a few transcripts into ChatGPT, ask "what are the main themes?", and watch it produce a confident, well-written answer in seconds. And for a first impression of your data, it's genuinely useful.
Then you try to use the output in an actual study, and it falls apart. Here is exactly where general-purpose chatbots break for qualitative analysis, when they're fine, and what to use when your findings need to survive review.
Where ChatGPT breaks for qualitative research
1. The quotes aren't real
Ask a chatbot for supporting quotes and it will give you fluent, plausible participant voices — often lightly paraphrased, sometimes invented outright. In a student paper that's an integrity problem; in commercial research it's a credibility time bomb. If you can't Ctrl+F a quote in your transcript, you can't defend it.
2. Nothing is traceable
A theme summary without coded segments is an opinion. Reviewers, supervisors, and clients ask the same questions: How many participants said this? Which ones? What exactly did they say? A chat conversation gives you none of that structure — no codebook, no code-to-segment mapping, no frequencies.
3. Long transcripts get silently dropped
Paste 40,000 words into a chat and the model will happily answer — based on whatever portion it actually attended to. You have no way of knowing which responses were never considered. There is no coverage accounting, and the gaps don't announce themselves.
4. It's usually a GDPR violation
Interview data is personal data, almost by definition. Consumer chatbot tiers typically offer no data processing agreement, may retain conversations, and may use them for training. If your participants signed a consent form — or your client signed an NDA — pasting their words into a free chatbot likely breaches it. EU ethics boards increasingly ask about this explicitly.
5. Results aren't reproducible
Run the same prompt twice, get two different theme lists. Without a fixed codebook applied consistently across the whole dataset, you can't demonstrate the systematic, rule-guided procedure that qualitative methods require.
When ChatGPT is perfectly fine
To be fair: for brainstorming interview questions, summarizing a single anonymized transcript for your own orientation, or drafting the discussion section from findings you already validated, a general chatbot is a great assistant. The problems start when it becomes your analysis instrument.
What a purpose-built tool does differently
Themera uses the same class of AI models — wrapped in the structure that research requires:
- Verbatim-enforced quotes. Every coded excerpt must be an exact substring of your source text. Quotes that fail this check are discarded automatically. What you cite is what your participant said.
- A real codebook. Names, definitions, and anchor examples — built inductively from your data, then applied consistently to every single response. You can rename, merge, and re-run with your edited codebook.
- Coverage you can see.A meter shows exactly how many responses were coded and lists the ones that weren't, so you place them yourself. Nothing is silently skipped.
- GDPR posture by default. EU data storage, no training on your data, DPA available — the paperwork your ethics board actually asks for.
- An exportable audit trail. DOCX and CSV exports with codebook, frequencies, and evidence quotes — ready for an appendix or a client deliverable.
The honest comparison
ChatGPT gives you a fast, fluent impression of your data and zero defensibility. Legacy tools like NVivo give you full defensibility and weeks of manual coding. Purpose-built AI analysis sits in between: minutes to a structured first pass, with the human-in-the-loop controls that make the result something you can stand behind.
If you want to see what that looks like on real data, view a sample analysis — codebook, coded segments, coverage and all. Or start free with your own transcripts: 3 analyses, no credit card.