X-Git-Url: http://www.git.stargrave.org/?a=blobdiff_plain;f=Documentation%2Flei-store-format.pod;fp=Documentation%2Flei-store-format.pod;h=a42c770ee7d4232bd2e5698db321894794a5614a;hb=ba0c73ae03214e57004af4192b57141c1a0fff9f;hp=0000000000000000000000000000000000000000;hpb=6b3ba59d4bfdf20507fd890df6ff1454a93435e4;p=public-inbox.git diff --git a/Documentation/lei-store-format.pod b/Documentation/lei-store-format.pod new file mode 100644 index 00000000..a42c770e --- /dev/null +++ b/Documentation/lei-store-format.pod @@ -0,0 +1,91 @@ +% public-inbox developer manual + +=head1 NAME + +lei-store-format - lei/store format description + +=head1 DESCRIPTION + +C is a hybrid store based on L +("extindex") combined with L ("v2") for blob +storage. While v2 is ideal for archiving a single public mailing list; +it was never intended for personal mail nor storing multiple +blobs of the "same" message. + +As with extindex, it can index disparate C headers +belonging to the "same" message with different git blob OIDs. +Unlike v2 and extindex, C headers are NOT required; +allowing unsent draft messages to be stored and indexed. + +=head1 DIRECTORY LAYOUT + +Blob storage exists in the form of v2-style epochs. These epochs +are under the C directory (instead of C) to +prevent them from being accidentally treated as a v2 inbox. + +=head2 INDEX OVERVIEW AND DEFINITIONS + + $EPOCH - Integer starting with 0 based on time + $SCHEMA_VERSION - DB schema version (for Xapian) + $SHARD - Integer starting with 0 based on parallelism + + ~/.local/share/lei/store + - ipc.lock # lock file for internal lei IPC + - local/$EPOCH.git # normal bare git repositories + +Additionally, the following share the same roles they do in extindex: + + - ei.lock # lock file to protect global state + - ALL.git # empty, alternates for local/*.git + - ei$SCHEMA_VERSION/$SHARD # per-shard Xapian DB + - ei$SCHEMA_VERSION/over.sqlite3 # overview DB for WWW, IMAP + - ei$SCHEMA_VERSION/misc # misc Xapian DB + +=head2 XREF3 DEDUPLICATION + +Index deduplication follows extindex, see +L for +more information. + +=head2 BLOB DEDUPLICATION + +The contents of C repos is deduplicated by git blob +object IDs (currently SHA-1). This allows multiple copies of +cross-posted and personally Cc-ed messages to be stored with +different C, C and similar headers to +allow troubleshooting. + +=head2 VOLATILE METADATA + +Keywords and label information (as described in RFC 8621 for JMAP) +is stored in existing Xapian shards (C). +It is possible to search for messages matching labels and +keywords using C and C, respectively. As with all data +stored in Xapian indices, volatile metadata is associated with +the Xapian document, thus it is shared across different blobs of +the "same" message. + +=head1 IPC + +When L is run in daemon mode, L is used on +C is used to serialize writes to C across +multiple internal lei workers while minimizing commits. + +=head1 CAVEATS + +Reindexing and synchronization is not yet supported. + +=head1 THANKS + +Thanks to the Linux Foundation for sponsoring the development +and testing. + +=head1 COPYRIGHT + +Copyright 2021 all contributors L + +License: AGPL-3.0+ L + +=head1 SEE ALSO + +L, L