A forensic teardown of Google’s “Gemini Apps Activity” export, runaway JSON blobs, and why GDPR alone won’t save your chat history.
TL;DR
-
Takeout works… sort of. You’ll get a ZIP, but the “Gemini Apps Activity” payload is one flat JSON array of 1‑turn Q&A pairs. No conversation IDs, no thread order, no titles.
-
Re‑assembling whole chats is impossible without guessing. Timestamp proximity heuristics? Fun for a hack‑night, useless for reliable recovery.
-
GDPR Article 15 & 20 do entitle you to a copy and portability, but Google’s current export barely clears the legal bar while missing every UX one.
-
OpenAI does it better. ChatGPT exports include per‑conversation metadata, mapping, and Markdown‑ready dialogs.
-
If you care about archives, log locally or pick a different tool. And yes, you can (and should) still file a formal DSAR if Takeout comes back empty.
1. The Experiment
“Under EU law my data is mine. Surely I can download all my Gemini chats, right?” - me, last Friday 18 Jul 2025
-
Turn chat history on (Google auto‑purges transient logs after 72 h if it’s off - ask me how I know).
-
Head to takeout.google.com → Deselect all → tick Gemini Apps Activity → Create export.
-
Wait. Grab coffee. Wait some more.
-
Receive email, unzip archive, and locate the
.jsonfile inside the nestedGemini-Appsfolder. -
Lets call this json consecutivly:
all_chats.json.
Expectation: neatly nested objects per conversation.
Reality: see below. 🙃
2. Example Anatomy of all_chats.json
$ jq 'length' all_chats.json
2448
$ jq '.[0] | keys' all_chats.json
[
"activityControls",
"header",
"products",
"safeHtmlItem",
"time",
"title"
]
Each element is exactly one prompt‑response turn:
{
"title": "Eingegebener Prompt: Why does TCP need a three‑way handshake?",
"time": "2025‑06‑07T14:52:31.927Z",
"safeHtmlItem": [{
"html": "<p>Short answer: reliability & SYN flooding mitigation...</p>"
}],
"header": "Gemini‑Apps",
"products": ["Gemini‑Apps"],
"activityControls": ["Gemini‑Apps‑Aktivitäten"]
}
Missing in action:
| What a grown‑up export should have | Status |
|---|---|
conversation_id / session_id |
❌ |
turn_index or parent_id |
❌ |
| Chat title | ❌ (title is just your prompt) |
| Markdown / HTML transcript | ❌ |
| Attachment blobs | partially, but detached |
Net result: 2 448 isolated islands; the export can’t tell which belong together.
3. Why Reconstruction Fails
Tried → Timestamp clustering
Chats spread across multiple tabs or phone vs. laptop sessions overlap. Breaks instantly.
Tried → Prompt similarity
Regexing “Next question” ≠ science.
Tried → Stable headers
Every entry says "Gemini‑Apps". Thanks, Google.
Even a best‑effort heuristic gives you, at best, probabilistic clusters. Good luck diffing that in six months.
4. GDPR vs. Reality
| GDPR Right | What it says | How Google complies |
|---|---|---|
| Art. 15 Access | “A copy of all personal data being processed.” | Yes - you get the raw text you typed plus model output. |
| Art. 20 Portability | “In a structured, commonly used, machine‑readable format.” | Technically - JSON is machine‑readable. |
| User expectation | Restorable full conversation history | Nope |
Legally defensible ≠ user‑friendly. If your archive arrives empty or truncated, submit a Data Subject Access Request via Google’s privacy form and attach evidence. They have one month to respond.
5. How Other LLMs Do It
| Platform | Export granularity | Conversation IDs? | Human‑readable? |
|---|---|---|---|
| OpenAI ChatGPT | Per‑chat JSON + Markdown | ✅ | ✅ |
| Anthropic Claude (API) | Your own logging | n/a | Up to you |
| Google Gemini | One JSON blob of Q&A | ❌ | ❌ |
Google is the outlier.
6. Making the Best of a Bad Dump
You can slice the blob into per‑turn files for grep‑ability:
# hn_script_fixed.py
import json, html, pathlib, re, datetime as dt
SRC = "all_chats.json" # <- adjust if the file lives elsewhere
DEST = pathlib.Path("gemini_turns")
DEST.mkdir(exist_ok=True)
def slugify(text: str, maxlen=40) -> str:
slug = re.sub(r"[^a-z0-9]+", "-", text.lower()).strip("-")
return slug[:maxlen] or "prompt"
data = json.load(open(SRC, encoding="utf-8"))
for turn in data:
# ----- 1. skip entries that clearly are not prompt/answer pairs
if "title" not in turn:
continue
# ----- 2. extract prompt text
prompt_raw = turn["title"]
prompt_text = re.sub(r"^Eingegebener Prompt:\s*", "", prompt_raw, flags=re.I).strip()
# ----- 3. collect ALL responses that exist
responses = [
html.unescape(item["html"])
for item in turn.get("safeHtmlItem", [])
if isinstance(item, dict) and "html" in item
]
if not responses:
# nothing to write (could log this if you like)
continue
# ----- 4. timestamp → filename
ts_str = turn.get("time", dt.datetime.utcnow().isoformat(timespec="seconds") + "Z")
try:
ts = dt.datetime.fromisoformat(ts_str.rstrip("Z"))
except ValueError:
ts = dt.datetime.utcnow()
fname = DEST / f"{ts:%Y%m%dT%H%M%S}_{slugify(prompt_text)}.md"
# ----- 5. write markdown
with open(fname, "w", encoding="utf-8") as f:
f.write(f"## Prompt\n\n{prompt_text}\n\n")
for idx, resp in enumerate(responses, 1):
f.write(f"## Response {idx}\n\n{resp}\n\n")
But that still leaves you with 2K lone Markdown snippets, not conversations.
7. Why Google Might Do This (Speculation)
-
Internal data model stores atomic turns, not threaded sessions.
-
Risk minimization: less context = less chance of leaking other users’ data.
-
“Good enough for compliance” mindset: meet the letter of GDPR, ignore spirit.
-
Product lock‑in: if exporting is painful, fewer users leave. Cynical, but proven.
8. What Needs to Happen
-
Google: add
conversation_id,parent_id, and a Markdown export. It’s 2025, not 2005. -
EU DPAs: clarify that “structured” also means re‑usable in context.
-
Developers: log your own chats (browser extensions, local proxies, whatever) if you care about reproducibility.
-
Users: keep filing DSARs - noise creates pressure.
9. Open Questions
-
Why do Gemini API users get better audit trails than the consumer UI in gemini? Unclear.
-
A cottage industry now sells ‘we’ll‑export‑your‑LLM‑chats‑for‑you’ services via browser extensions. What the heck?
10. Closing Thoughts
Exporting my Gemini history felt less like a backup and more like opening the black box, only to find 2 448 unrelated sticky notes. Google met the bare minimum, but fell miles short of the usability bar set by its competitors - and by its own “Don’t be evil” lineage.
Until they fix it, remember: the cloud is just someone else’s computer - and they won’t even give you a proper .zip of your chats.
Did you manage to reconstruct Gemini sessions more elegantly? Ping me on HN (@krausality) - curious minds want to know.