X-Git-Url: http://www.git.stargrave.org/?a=blobdiff_plain;f=Documentation%2Fpublic-inbox-v2-format.pod;h=730f663381069df0b31957a7771f248349ec29c5;hb=95bdac7f09c69036efed537a4d03d5bdd2ae4eb6;hp=65a85c19470835ef6529515421cdefca6575538a;hpb=666f1b8f5c7c76333df4e1296c1668abf04f210f;p=public-inbox.git diff --git a/Documentation/public-inbox-v2-format.pod b/Documentation/public-inbox-v2-format.pod index 65a85c19..730f6633 100644 --- a/Documentation/public-inbox-v2-format.pod +++ b/Documentation/public-inbox-v2-format.pod @@ -16,21 +16,21 @@ Message-IDs. The key change in v2 is the inbox is no longer a bare git repository, but a directory with two or more git repositories. v2 divides git repositories by time "epochs" and Xapian -databases for parallelism by "partitions". +databases for parallelism by "shards". =head2 INBOX OVERVIEW AND DEFINITIONS -$EPOCH - Integer starting with 0 based on time -$SCHEMA_VERSION - PublicInbox::Search::SCHEMA_VERSION used by Xapian -$PART - Integer (0..NPROCESSORS) + $EPOCH - Integer starting with 0 based on time + $SCHEMA_VERSION - DB schema version (for Xapian) + $SHARD - Integer starting with 0 based on parallelism -foo/ # assuming "foo" is the name of the list -- inbox.lock # lock file (flock) to protect global state -- git/$EPOCH.git # normal git repositories -- all.git # empty git repo, alternates to git/$EPOCH.git -- xap$SCHEMA_VERSION/$PART # per-partition Xapian DB -- xap$SCHEMA_VERSION/over.sqlite3 # OVER-view DB for NNTP and threading -- msgmap.sqlite3 # same the v1 msgmap + foo/ # "foo" is the name of the inbox + - inbox.lock # lock file to protect global state + - git/$EPOCH.git # normal git repositories + - all.git # empty, alternates to $EPOCH.git + - xap$SCHEMA_VERSION/$SHARD # per-shard Xapian DB + - xap$SCHEMA_VERSION/over.sqlite3 # OVER-view DB for NNTP, threading + - msgmap.sqlite3 # same the v1 msgmap For blob lookups, the reader only needs to open the "all.git" repository with $GIT_DIR/objects/info/alternates which references @@ -95,16 +95,16 @@ are documented at: L -=head2 XAPIAN PARTITIONS +=head2 XAPIAN SHARDS Another second scalability problem in v1 was the inability to utilize multiple CPU cores for Xapian indexing. This is -addressed by using partitions in Xapian to perform import +addressed by using shards in Xapian to perform import indexing in parallel. As with git alternates, Xapian natively supports a read-only interface which transparently abstracts away the knowledge of -multiple partitions. This allows us to simplify our read-only +multiple shards. This allows us to simplify our read-only code paths. The performance of the storage device is now the bottleneck on @@ -135,10 +135,10 @@ The overview DB maintains all the header information necessary to implement the NNTP OVER/XOVER commands and non-search endpoints of of the PSGI UI. -In the future, Xapian will become completely optional for v2 (as -it is for v1) as SQLite turns out to be powerful enough to -maintain overview information. Most of the PSGI and all of the -NNTP functionality will be possible with only SQLite in addition +Xapian has become completely optional for v2 (as it is for v1), but +SQLite remains required for v2. SQLite turns out to be powerful +enough to maintain overview information. Most of the PSGI and all +of the NNTP functionality is possible with only SQLite in addition to git. The overview DB was an instrumental piece in maintaining near @@ -168,7 +168,7 @@ easier. object_id and Message-ID are already known. =item object_id The blob identifier git uses (currently SHA-1). No need to -publically expose this outside of normal git ops (cloning) and +publicly expose this outside of normal git ops (cloning) and there's no need to make this searchable. As with v1 of public-inbox, this is stored as part of the Xapian document so expensive name lookups can be avoided for document retrieval. @@ -230,7 +230,7 @@ and testing of the v2 repository format. =head1 COPYRIGHT -Copyright 2018-2019 all contributors L +Copyright 2018-2020 all contributors L License: AGPL-3.0+ L