I Tried Exporting My Gemini Chats - Here’s What I Found

2025-07-19

A forensic teardown of Google’s “Gemini Apps Activity” export, runaway JSON blobs, and why GDPR alone won’t save your chat history.


TL;DR

  • Takeout works… sort of. You’ll get a ZIP, but the “Gemini Apps Activity” payload is one flat JSON array of 1‑turn Q&A pairs. No conversation IDs, no thread order, no titles.

  • Re‑assembling whole chats is impossible without guessing. Timestamp proximity heuristics? Fun for a hack‑night, useless for reliable recovery.

  • GDPR Article 15 & 20 do entitle you to a copy and portability, but Google’s current export barely clears the legal bar while missing every UX one.

  • OpenAI does it better. ChatGPT exports include per‑conversation metadata, mapping, and Markdown‑ready dialogs.

  • If you care about archives, log locally or pick a different tool. And yes, you can (and should) still file a formal DSAR if Takeout comes back empty.


1. The Experiment

“Under EU law my data is mine. Surely I can download all my Gemini chats, right?” - me, last Friday 18 Jul 2025

  1. Turn chat history on (Google auto‑purges transient logs after 72 h if it’s off - ask me how I know).

  2. Head to takeout.google.comDeselect all → tick Gemini Apps ActivityCreate export.

  3. Wait. Grab coffee. Wait some more.

  4. Receive email, unzip archive, and locate the .json file inside the nested Gemini-Apps folder.

  5. Lets call this json consecutivly: all_chats.json.

Expectation: neatly nested objects per conversation.
Reality: see below. 🙃


2. Example Anatomy of all_chats.json

$ jq 'length' all_chats.json
2448
$ jq '.[0] | keys' all_chats.json
[
  "activityControls",
  "header",
  "products",
  "safeHtmlItem",
  "time",
  "title"
]

Each element is exactly one prompt‑response turn:

{
  "title": "Eingegebener Prompt: Why does TCP need a three‑way handshake?",
  "time": "2025‑06‑07T14:52:31.927Z",
  "safeHtmlItem": [{
    "html": "<p>Short answer: reliability &amp; SYN flooding mitigation...</p>"
  }],
  "header": "Gemini‑Apps",
  "products": ["Gemini‑Apps"],
  "activityControls": ["Gemini‑Apps‑Aktivitäten"]
}

Missing in action:

What a grown‑up export should have Status
conversation_id / session_id
turn_index or parent_id
Chat title ❌ (title is just your prompt)
Markdown / HTML transcript
Attachment blobs partially, but detached

Net result: 2 448 isolated islands; the export can’t tell which belong together.


3. Why Reconstruction Fails

Tried → Timestamp clustering

Chats spread across multiple tabs or phone vs. laptop sessions overlap. Breaks instantly.

Tried → Prompt similarity

Regexing “Next question” ≠ science.

Tried → Stable headers

Every entry says "Gemini‑Apps". Thanks, Google.

Even a best‑effort heuristic gives you, at best, probabilistic clusters. Good luck diffing that in six months.


4. GDPR vs. Reality

GDPR Right What it says How Google complies
Art. 15 Access “A copy of all personal data being processed.” Yes - you get the raw text you typed plus model output.
Art. 20 Portability “In a structured, commonly used, machine‑readable format.” Technically - JSON is machine‑readable.
User expectation Restorable full conversation history Nope

Legally defensible ≠ user‑friendly. If your archive arrives empty or truncated, submit a Data Subject Access Request via Google’s privacy form and attach evidence. They have one month to respond.


5. How Other LLMs Do It

Platform Export granularity Conversation IDs? Human‑readable?
OpenAI ChatGPT Per‑chat JSON + Markdown
Anthropic Claude (API) Your own logging n/a Up to you
Google Gemini One JSON blob of Q&A

Google is the outlier.


6. Making the Best of a Bad Dump

You can slice the blob into per‑turn files for grep‑ability:

# hn_script_fixed.py
import json, html, pathlib, re, datetime as dt

SRC  = "all_chats.json"          # <- adjust if the file lives elsewhere
DEST = pathlib.Path("gemini_turns")
DEST.mkdir(exist_ok=True)

def slugify(text: str, maxlen=40) -> str:
    slug = re.sub(r"[^a-z0-9]+", "-", text.lower()).strip("-")
    return slug[:maxlen] or "prompt"

data = json.load(open(SRC, encoding="utf-8"))

for turn in data:
    # ----- 1. skip entries that clearly are not prompt/answer pairs
    if "title" not in turn:
        continue

    # ----- 2. extract prompt text
    prompt_raw  = turn["title"]
    prompt_text = re.sub(r"^Eingegebener Prompt:\s*", "", prompt_raw, flags=re.I).strip()

    # ----- 3. collect ALL responses that exist
    responses = [
        html.unescape(item["html"])
        for item in turn.get("safeHtmlItem", [])
        if isinstance(item, dict) and "html" in item
    ]
    if not responses:
        # nothing to write (could log this if you like)
        continue

    # ----- 4. timestamp → filename
    ts_str = turn.get("time", dt.datetime.utcnow().isoformat(timespec="seconds") + "Z")
    try:
        ts = dt.datetime.fromisoformat(ts_str.rstrip("Z"))
    except ValueError:
        ts = dt.datetime.utcnow()
    fname = DEST / f"{ts:%Y%m%dT%H%M%S}_{slugify(prompt_text)}.md"

    # ----- 5. write markdown
    with open(fname, "w", encoding="utf-8") as f:
        f.write(f"## Prompt\n\n{prompt_text}\n\n")
        for idx, resp in enumerate(responses, 1):
            f.write(f"## Response {idx}\n\n{resp}\n\n")

But that still leaves you with 2K lone Markdown snippets, not conversations.


7. Why Google Might Do This (Speculation)

  1. Internal data model stores atomic turns, not threaded sessions.

  2. Risk minimization: less context = less chance of leaking other users’ data.

  3. “Good enough for compliance” mindset: meet the letter of GDPR, ignore spirit.

  4. Product lock‑in: if exporting is painful, fewer users leave. Cynical, but proven.


8. What Needs to Happen

  • Google: add conversation_id, parent_id, and a Markdown export. It’s 2025, not 2005.

  • EU DPAs: clarify that “structured” also means re‑usable in context.

  • Developers: log your own chats (browser extensions, local proxies, whatever) if you care about reproducibility.

  • Users: keep filing DSARs - noise creates pressure.


9. Open Questions

  • Why do Gemini API users get better audit trails than the consumer UI in gemini? Unclear.

  • A cottage industry now sells ‘we’ll‑export‑your‑LLM‑chats‑for‑you’ services via browser extensions. What the heck?


10. Closing Thoughts

Exporting my Gemini history felt less like a backup and more like opening the black box, only to find 2 448 unrelated sticky notes. Google met the bare minimum, but fell miles short of the usability bar set by its competitors - and by its own “Don’t be evil” lineage.

Until they fix it, remember: the cloud is just someone else’s computer - and they won’t even give you a proper .zip of your chats.


Did you manage to reconstruct Gemini sessions more elegantly? Ping me on HN (@krausality) - curious minds want to know.