Karsten Lehmann 19 May 2026 19:50:00
A 200 MB video pinned to a project card. A 12 MB scan of a signed contract. A daily log file that has been growing for nine months. Three photos of a whiteboard. From the outside, all four of them look the same in Haven - a paperclip next to a document. From the inside, they have very little in common with the “blob in a column” that most database engines treat attachments as.
This post takes the lid off the attachment store. It is the same store the four sample apps in Haven (Mindoo Vega, Mindoo TeamEdit, Mindoo TeamGrid, Mindoo TodoManager) write into every time you press Upload, and the same store the document scanner drops a scanned PDF into. The mechanics underneath are what make features like application time travel work for files and not just for fields - and they are also the reason a long video streams smoothly in Haven without ever materialising as a single blob in browser memory.
Attachments are a chain of encrypted chunks, not a blob
An attachment in MindooDB is not stored as one binary lump anywhere. The file is sliced into chunks - 256 KB each by default - and each chunk is independently:
- encrypted independently with the same key as the document itself (the tenant’s default key, or a named key when the author picked one explicitly),
- signed by the author’s key, so the chunk carries proof of who wrote it,
- identified by a content hash computed over its encrypted bytes - so the store can spot identical chunks and store the bytes only once,
- linked back to its predecessor, so the chunks form an ordered chain.
The document itself only carries a small reference. A MindooDoc’s _attachments array looks like this:
_attachments: [
{
attachmentId: "0193ec...",
fileName: "report.pdf",
mimeType: "application/pdf",
size: 5242880,
lastChunkId: "<id of the last chunk>",
decryptionKeyId: "default",
createdAt: 1747680000000,
createdBy: "<author's public key>"
}
]
Notice what is not in there: the bytes. Not the chunks. Not a hash of them. Just a single lastChunkId - the id of the last chunk in the chain - plus a bit of metadata so the UI can render a row in the attachments panel without going anywhere near the binary data.
To actually read the file, MindooDB starts at the last chunk and walks the chain backwards until it reaches the first one. A full read assembles the chain in order; a random byte range fetches only the chunks that cover that range; a stream yields chunks one after another. The full attachment is never assembled into memory unless an app explicitly asks for it.
Why “last chunk”, not “first chunk”? Because attachments grow
Pointing at the last chunk rather than the first looks like a small design choice. It is the choice that most of the rest of this post rests on.
A chunk chain that grows by appending - each new chunk pointing back at the previous tail, and _attachments[].lastChunkId updated to the new tail - is append-only. Existing chunks never move, never get rewritten, never even get re-encrypted. To extend a nine-month-old log file by one more line you write one new chunk and update one tiny field in the document.
This composes beautifully with everything else MindooDB already does.
Time travel sees the right bytes, automatically. Every revision of the document carries its own _attachments array with its own lastChunkId. Open the document as of last Tuesday - either through the Database Browser’s time travel slider or through Haven’s application time travel - and the attachment chain walks back from last Tuesday’s tail. The chunks that have been appended since then are silently ignored, because the chain just does not reach them. There is no separate “historical attachment” pipeline; the historical pipeline is the regular pipeline, fed with an older lastChunkId. That is the property the TeamEdit launch post called out: without it, a historical read would happily include bytes that did not exist yet at the moment you are looking at.
Append-only logs are first-class. Adding one more line to a log attachment is literally one new chunk plus one updated last-chunk pointer. There is no compaction step, no special log file type, no separate log API - it is the regular attachment API used in append-only style. Combined with MindooDB’s signed change history, that gives you a tamper-evident audit log you can attach to any document and still scrub through with time travel.
Moving and duplicating attachments costs no I/O. The _attachments array is just JSON inside the document. Move an entry from one document to another and the new document now “owns” the same chain of chunks. The chunks themselves do not move; nothing is re-uploaded, nothing is re-downloaded, nothing is re-encrypted. Duplicate the entry into a second document and now both documents reference the same chain - again, with zero bytes copied. From the outside it looks like you carried a 300 MB video from one document to two; from the inside you carried a 200-byte JSON object.
Content-hash deduplication, inside the database
Each chunk’s contentHash is a hash of its encrypted payload. The store treats that hash as a deduplication key: two chunks with the same contentHash are still two distinct entries (each with its own metadata and its own place in its chain), but the underlying encrypted bytes are stored exactly once.
For attachments the encryption is set up to be deterministic: the same plaintext encrypted with the same key always produces the same ciphertext, and therefore the same contentHash. That extends deduplication from “the same file inside one document” to every document, every user, and every device sharing the same MindooDB database: as long as the bytes land inside the same database, the same logo, the same boilerplate contract, the same training dataset, the same firmware blob occupy one copy of the encrypted bytes on disk and travel the wire once during sync. Bandwidth and storage drop together.
The dedup window stops at the database boundary - each MindooDB database carries its own attachment store both on the server and in the browser, so copying the same file into a second database lands a second copy of the chunks, even within the same tenant. The contentHash itself, however, is computed identically across the whole tenant, which leaves the door open for future cross-database cache layers without changing anything visible to apps.
It is an explicit trade-off. Deterministic encryption reveals one metadata pattern - whether two encrypted blobs are byte-for-byte identical inside the same tenant - but it leaks nothing about the plaintext itself, the server still only ever sees ciphertext, and the upside is dramatic. For individual chunks that need stronger guarantees the non-deterministic mode is still available, at the cost of dedup.
The same content-addressing trick also makes attachment uploads resumable rather than restart-from-scratch. Because each chunk is independently named by its contentHash and arrives as a self-contained entry on the server, the sync protocol does not have to ship the file as one opaque blob - it asks the server which chunk ids it already holds and transfers only the ones still missing. If the laptop lid closes mid-upload, the train enters a tunnel, or the phone goes to sleep, the next sync run picks up exactly where the previous one stopped rather than restarting the upload from the first byte. The same negotiation also answers the dedup question (“the server already has this chunk because another teammate uploaded it last week”), so a resumed upload of a partly-duplicated file can finish almost instantly.
Random access and streaming
Because attachment chunks ride the same infrastructure as document changes, they inherit the rest of the platform for free. Sync transports them with the same protocol. The chunked encryption uses the same key model. The same signed-change semantics apply.
An app can read an attachment in three flavours: pull the full content of a small file in one call, fetch a precise byte range from the middle of a large one without touching the rest, or stream the file chunk by chunk for memory-bounded reads. A range request walks the chain, identifies which chunks cover the requested range, fetches only those, decrypts them, and trims the edges. A streamed read starts mid-chain and yields chunks one after another.
Two stores - on purpose
A small architectural detail that pays off here: a MindooDB database actually keeps two separate stores under the hood, not one - one for document changes, one for attachment chunks. They use the same encryption model, the same signing model, and the same wire protocol; they simply live side by side as two distinct stores.
In Haven Community today they are usually backed by the same underlying technology - IndexedDB in the browser, the same on-disk store on the server - which keeps the deployment story simple. The split is there so future versions of Haven can give attachment bytes their own lifecycle without touching the rest of the database.
A few directions we are actively planning around it:
- Partial attachment sync. Today a database either syncs everything or nothing - one big “Sync All”. With a separate attachment store we can sync only the chunks that match a policy: below a configurable size, recently used, manually pinned to a device. The Sync page already has the first half of this idea wired in: every per-row, per-tenant, and global Sync button has a sibling Docs only, no attachments action, which is genuinely useful today when a teammate has just pushed a large file you do not want to wait for before catching up on the document changes you actually need to read.
- Server-only attachments. Some teams want the documents replicated to every device for offline browsing, but want attachment bytes to live exclusively on the server and stream on demand.
- Tiered attachment storage. On the server side, a layered attachment store can keep hot chunks on local disk and migrate cold ones (least-recently-used, beyond a tenant quota) into S3-compatible object storage, transparently to MindooDB. Because chunks are content-addressed and immutable, “move to cold storage” is a pure metadata operation - no rewrites, no risk of mismatch.
None of these change the chunk format on the wire or how a document references its attachments.
Previewing the regular file types
Haven’s attachment preview dialog is wired to the chunk pipeline through a single small detection layer that maps the file’s MIME type and extension to a preview mode - image, pdf, text, markdown, docx, pptx, spreadsheet, video, or audio - and then renders accordingly:
- Images open inline at their native resolution.
- PDFs open in the browser’s built-in PDF viewer - it is good enough that we do not ship a heavier viewer and pay the bundle cost.
- Markdown is rendered with footnotes and task lists, inside a sandboxed frame so links and images in untrusted attachments stay isolated from the rest of Haven.
- Word documents (DOCX) render in place with page breaks and embedded images, no round trip through Microsoft Office.
- Excel workbooks (XLSX and friends) render as a sheet-tabbed table - the same renderer TeamGrid uses for clipboard interop.
- PowerPoint decks (PPTX) render slide by slide, with a windowed slide list and strict size limits because user-uploaded archives are treated as untrusted input.
- Plain text, JSON, XML, CSV, YAML, SVG, and log files are shown in a syntax-aware viewer.
All of those load as blobs, decrypted on-device from the chain we just walked, and the preview component never sees the raw store. The pipeline is the same whether the bytes come from the local replica (the offline case) or are fetched on demand from the server (the case where the user has not synced this database fully). The chunk layer underneath does not care.
Streaming long media without ever assembling the blob
Video and audio are the place where “load the whole file into one big blob and call it done” stops working. Even a short HD screencast or a phone-recorded clip is easily a few hundred megabytes of encrypted chunks; loading all of them into memory on a phone is a fairly reliable way to make Safari evict the tab, and on every device it pushes start-of-playback out to “wait for the whole file” rather than “wait for the first few seconds”.
So the media preview path skips the blob entirely. Decrypted chunks are fed straight into the browser’s streaming-video pipeline as they come off the chain; the player starts decoding while the rest is still being pulled. Memory stays bounded and playback starts well before the file is anywhere near complete.
That works beautifully for fragmented MP4 - the layout most modern phones, webcams, and video tools produce by default. The file is already shaped like a stream the browser is happy to play as it arrives.
Non-fragmented MP4: the case where naive streaming breaks
A surprisingly large number of MP4 files in the wild use an older “single-pass” layout that the browser’s streaming pipeline refuses to accept. Many editor exports, screen recorders, “Save As MP4” from desktop tools, and lots of stock-footage downloads still produce them - anything that wrote the file in one pass without rewriting its header to make it streamable afterwards.
The browser will happily play the same file once it is fully downloaded - the regular media pipeline tolerates the older layout; it just will not stream it as it arrives. Which means no progressive playback, no decrypt-as-you-go, no bounded memory. The naive answer is “download the whole file and play it from there”, which on any reasonably long video is exactly the failure mode we are trying to avoid.
Transmuxing on the fly with mp4box.js
So Haven transmuxes them. As the encrypted chunks come off the chain and get decrypted, they are fed through mp4box.js - the JavaScript port of GPAC’s MP4Box tool - which rebuilds the file on the fly into the streaming-friendly layout the browser expects and pushes the result straight into the player. No intermediate file, no temporary blob; from the player’s point of view it is just a stream that started playing, and from the storage’s point of view nothing changed - we are still walking the same chunk chain we would have walked for any other read.
The mp4box.js library is loaded lazily, so the transmuxer only enters the Haven bundle the first time a user actually opens a non-streamable video. Users who only ever work with documents, spreadsheets, and phone-recorded clips never pay the cost.
Before the pipeline opens, Haven does a tiny probe step. It asks for the first ~1 MB of the file - served from the local replica when available and fetched from the server otherwise - and inspects it just enough to decide whether the browser can already stream the file, whether transmuxing is needed, and which codecs are involved. The result is cached, so a later restart during seek does not re-probe.
Seek
Scrubbing the timeline restarts the stream from a new position rather than buffering the whole file: the player estimates which byte matches the scrubber position, rewinds a small safety margin for decoder stability, and resumes streaming from there. Each restart streams in only the chunks that cover the new region. On the small number of cases where seeking would risk loading too much of the file into browser memory, Haven refuses the seek with a clearly worded message recommending Download instead, rather than quietly bringing the tab down.
Everything above runs in the browser tab on top of the regular chunk pipeline. The encrypted bytes never reach the server in cleartext, the page never holds the full file in memory, and the bytes on disk on the server are still the original non-streamable MP4 the user uploaded - we did not rewrite it, just read it differently.
None of this is new in this week’s release - the chunked attachment store has been quietly carrying every Upload button in Haven since the platform shipped. What we expect to evolve over the next few releases is what gets layered on top: partial sync policies on the Sync page, server-side tiered storage, and in-place editing of common attachment formats - including Word documents and spreadsheets - directly in the browser, without round-tripping them through Microsoft 365 or a self-hosted document server. The chunk format itself, the _attachments reference, and the resumable upload protocol are already where we want them.