Conversing with 15 Years of Journal Entries

Using natural language processing to tag and semantically analyse unstructured content. Making it queryable through LLM models.

Personal build

The problem

Fifteen years of writing, drafts, entries, and notes, locked in formats that can't be asked anything. The same problem every organization has with its customer feedback, meeting notes, and research archives.

The approach

An ingestion pipeline pulls and semantically buckets the material, an LLM tags each entry against an evolving theme taxonomy, and embeddings make the whole archive semantically searchable. This is then made accessible through a dashboard and conversational AI assistants.

What it revealed

Themes and sentiment trace arcs invisible at the entry level. Any unstructured text, notes, charts, or visuals archive can be made meaningful and become a queryable asset with this architecture.

The same architecture applies to research repositories, presentation deck archives, customer feedback, meeting notes, or field survey notes, reanimating years of unstructured material that would otherwise be in danger of slipping into forgotten archives.

Under the hood

Python · Gmail API · SQLite · LLM tagging · sentence-transformers · Natural Language Processing