X-Git-Url: http://www.git.stargrave.org/?a=blobdiff_plain;f=Documentation%2Fpublic-inbox-v2-format.pod;h=10c63090721d3a238e6ec19896fb8d7ce3d8ce62;hb=77dc49f8978467f318192feccf067745328a1d82;hp=05ef32a9b6782cf79469d2206c20d51d4bf636bd;hpb=cf35d38e7f845393659dfce0249a76d529a2c92c;p=public-inbox.git diff --git a/Documentation/public-inbox-v2-format.pod b/Documentation/public-inbox-v2-format.pod index 05ef32a9..10c63090 100644 --- a/Documentation/public-inbox-v2-format.pod +++ b/Documentation/public-inbox-v2-format.pod @@ -16,21 +16,21 @@ Message-IDs. The key change in v2 is the inbox is no longer a bare git repository, but a directory with two or more git repositories. v2 divides git repositories by time "epochs" and Xapian -databases for parallelism by "partitions". +databases for parallelism by "shards". =head2 INBOX OVERVIEW AND DEFINITIONS -$EPOCH - Integer starting with 0 based on time -$SCHEMA_VERSION - PublicInbox::Search::SCHEMA_VERSION used by Xapian -$PART - Integer (0..NPROCESSORS) + $EPOCH - Integer starting with 0 based on time + $SCHEMA_VERSION - DB schema version (for Xapian) + $SHARD - Integer starting with 0 based on parallelism -foo/ # assuming "foo" is the name of the list -- inbox.lock # lock file (flock) to protect global state -- git/$EPOCH.git # normal git repositories -- all.git # empty git repo, alternates to git/$EPOCH.git -- xap$SCHEMA_VERSION/$PART # per-partition Xapian DB -- xap$SCHEMA_VERSION/over.sqlite3 # OVER-view DB for NNTP and threading -- msgmap.sqlite3 # same the v1 msgmap + foo/ # "foo" is the name of the inbox + - inbox.lock # lock file to protect global state + - git/$EPOCH.git # normal git repositories + - all.git # empty, alternates to $EPOCH.git + - xap$SCHEMA_VERSION/$SHARD # per-shard Xapian DB + - xap$SCHEMA_VERSION/over.sqlite3 # OVER-view DB for NNTP, threading + - msgmap.sqlite3 # same the v1 msgmap For blob lookups, the reader only needs to open the "all.git" repository with $GIT_DIR/objects/info/alternates which references @@ -95,21 +95,21 @@ are documented at: L -=head2 XAPIAN PARTITIONS +=head2 XAPIAN SHARDS Another second scalability problem in v1 was the inability to utilize multiple CPU cores for Xapian indexing. This is -addressed by using partitions in Xapian to perform import +addressed by using shards in Xapian to perform import indexing in parallel. As with git alternates, Xapian natively supports a read-only interface which transparently abstracts away the knowledge of -multiple partitions. This allows us to simplify our read-only +multiple shards. This allows us to simplify our read-only code paths. The performance of the storage device is now the bottleneck on larger multi-core systems. In our experience, performance is -improves with high-quality and high-quantity solid-state storage. +improved with high-quality and high-quantity solid-state storage. Issuing TRIM commands with L was necessary to maintain consistent performance while developing this feature. @@ -117,6 +117,11 @@ Rotational storage devices are NOT recommended for indexing of large mail archives; but are fine for backup and usable for small instances. +Our use of the L requires Xapian document IDs to +remain stable. Using L and +L wrappers are recommended over tools +provided by Xapian. + =head2 OVERVIEW DB Towards the end of v2 development, it became apparent Xapian did @@ -130,10 +135,10 @@ The overview DB maintains all the header information necessary to implement the NNTP OVER/XOVER commands and non-search endpoints of of the PSGI UI. -In the future, Xapian will become completely optional for v2 (as -it is for v1) as SQLite turns out to be powerful enough to -maintain overview information. Most of the PSGI and all of the -NNTP functionality will be possible with only SQLite in addition +Xapian has become completely optional for v2 (as it is for v1), but +SQLite remains required for v2. SQLite turns out to be powerful +enough to maintain overview information. Most of the PSGI and all +of the NNTP functionality is possible with only SQLite in addition to git. The overview DB was an instrumental piece in maintaining near @@ -210,7 +215,7 @@ for all non-atomic operations. =head1 HEADERS -Same handling as with v1, except the Message-ID header will will +Same handling as with v1, except the Message-ID header will be generated if not provided or conflicting. "Bytes", "Lines" and "Content-Length" headers are stripped and not allowed, they can interfere with further processing.