brain-structure

Structure of my digital brain - conversations with LLMs

GitHub License GitHub Release

This documents the structure of my brain project to collect and analyze my conversations with LLMs. And brain is obviously private. Here some statistics of my use case:

Get the raw data

It would be great if there was just a button “Download conversations as JSON or MD” but it is not. There are plugins, but they require sometimes to click on every single conversation. It can’t be that complicated, now can it?

Google Gemini

The output is currently only HTML plus created images as jpg.

And since I have some Android phones, whenever you ask “OK Google” on your Google home or Android phone, you get an entry that late has to be cleaned out. Better use the following two ones:

Claude

The steps are: Claudeclaude.ai Settings → Export data → JSON. Claude will then send you an email with a zip file that contains four json files:

ChatGPT

Instructions will follow. Should be JSON that you get via email.

Ideas

I got some ideas about reflecting on my conversations with AI and to learn from them. The structure:

brain/
├── sources/
│   ├── chatgpt_2024.json        # normalized, year-split source files
│   ├── chatgpt_2025.json
│   ├── gemini_2024.json
│   ├── gemini_2025.json
│   └── claude_2024.json
├── vault/
│   ├── chatgpt/
│   │   └── 2024-11-14 How Python decorators work.md
│   ├── gemini/
│   └── claude/
└── brain.py                     # the single CLI tool for everything

The script will have three commands:

The -v flag is on the subparser not the parent — I need to add --verbose to each subcommand, or move it. Drop brain.py in the root of your repo and you’re good to go. Here’s the full workflow:

Step 1 — Ingest your exports into sources/

python brain.py ingest ~/Downloads/chatgpt_export.zip ~/Downloads/takeout.zip ~/Downloads/claude.dms

This auto-detects each source, splits by year, and writes e.g. sources/chatgpt_2024.json, sources/gemini_2025.json. Re-running is safe — duplicates are skipped by ID.

Step 2 — Check stats

python brain.py stats

Shows per-file: conversation count, your word count, AI word count, average message length, and how many messages are flagged as noise pending review.

Step 3 — Clean interactively, one file at a time

python brain.py clean --source sources/gemini_2024.json

For each flagged message you see the context (previous message shown above it) and the reason it was flagged. You choose:

Step 4 — Export to Obsidian vault

python brain.py export

Writes to vault/chatgpt/, vault/gemini/, vault/claude/ — one .md per conversation, named YYYY-MM-DD Title.md, with YAML frontmatter (source, date, model) that Smart Connections and other Obsidian plugins can index.

Summary

The sources/*.json files stay as the canonical source of truth — the vault is always regeneratable from them. When you later want a different output format (Genspark, Notion, whatever), you just write a new exporter on top of the same source files.

Using Codex

The conversion to TypeScript was done with Codex - but not automatically part of the commit. Let’s change that.

Let’s try this:

git commit -m "Refactor auth logic

Co-authored-by: Codex <codex@openai.com>"