X-Git-Url: http://www.git.stargrave.org/?a=blobdiff_plain;f=Documentation%2Fpublic-inbox-v2-format.pod;h=10c63090721d3a238e6ec19896fb8d7ce3d8ce62;hb=77dc49f8978467f318192feccf067745328a1d82;hp=7dfe3296363b1c5032eca127ee13e901a364a9a8;hpb=96d4a98d1a28ec64b5abd8289ddd4177ff87ad7e;p=public-inbox.git diff --git a/Documentation/public-inbox-v2-format.pod b/Documentation/public-inbox-v2-format.pod index 7dfe3296..10c63090 100644 --- a/Documentation/public-inbox-v2-format.pod +++ b/Documentation/public-inbox-v2-format.pod @@ -16,21 +16,21 @@ Message-IDs. The key change in v2 is the inbox is no longer a bare git repository, but a directory with two or more git repositories. v2 divides git repositories by time "epochs" and Xapian -databases for parallelism by "partitions". +databases for parallelism by "shards". =head2 INBOX OVERVIEW AND DEFINITIONS -$EPOCH - Integer starting with 0 based on time -$SCHEMA_VERSION - PublicInbox::Search::SCHEMA_VERSION used by Xapian -$PART - Integer (0..NPROCESSORS) + $EPOCH - Integer starting with 0 based on time + $SCHEMA_VERSION - DB schema version (for Xapian) + $SHARD - Integer starting with 0 based on parallelism -foo/ # assuming "foo" is the name of the list -- inbox.lock # lock file (flock) to protect global state -- git/$EPOCH.git # normal git repositories -- all.git # empty git repo, alternates to git/$EPOCH.git -- xap$SCHEMA_VERSION/$PART # per-partition Xapian DB -- xap$SCHEMA_VERSION/over.sqlite3 # OVER-view DB for NNTP and threading -- msgmap.sqlite3 # same the v1 msgmap + foo/ # "foo" is the name of the inbox + - inbox.lock # lock file to protect global state + - git/$EPOCH.git # normal git repositories + - all.git # empty, alternates to $EPOCH.git + - xap$SCHEMA_VERSION/$SHARD # per-shard Xapian DB + - xap$SCHEMA_VERSION/over.sqlite3 # OVER-view DB for NNTP, threading + - msgmap.sqlite3 # same the v1 msgmap For blob lookups, the reader only needs to open the "all.git" repository with $GIT_DIR/objects/info/alternates which references @@ -95,16 +95,16 @@ are documented at: L -=head2 XAPIAN PARTITIONS +=head2 XAPIAN SHARDS Another second scalability problem in v1 was the inability to utilize multiple CPU cores for Xapian indexing. This is -addressed by using partitions in Xapian to perform import +addressed by using shards in Xapian to perform import indexing in parallel. As with git alternates, Xapian natively supports a read-only interface which transparently abstracts away the knowledge of -multiple partitions. This allows us to simplify our read-only +multiple shards. This allows us to simplify our read-only code paths. The performance of the storage device is now the bottleneck on @@ -117,6 +117,11 @@ Rotational storage devices are NOT recommended for indexing of large mail archives; but are fine for backup and usable for small instances. +Our use of the L requires Xapian document IDs to +remain stable. Using L and +L wrappers are recommended over tools +provided by Xapian. + =head2 OVERVIEW DB Towards the end of v2 development, it became apparent Xapian did @@ -130,10 +135,10 @@ The overview DB maintains all the header information necessary to implement the NNTP OVER/XOVER commands and non-search endpoints of of the PSGI UI. -In the future, Xapian will become completely optional for v2 (as -it is for v1) as SQLite turns out to be powerful enough to -maintain overview information. Most of the PSGI and all of the -NNTP functionality will be possible with only SQLite in addition +Xapian has become completely optional for v2 (as it is for v1), but +SQLite remains required for v2. SQLite turns out to be powerful +enough to maintain overview information. Most of the PSGI and all +of the NNTP functionality is possible with only SQLite in addition to git. The overview DB was an instrumental piece in maintaining near