X-Git-Url: http://www.git.stargrave.org/?a=blobdiff_plain;f=Documentation%2Fpublic-inbox-v2-format.pod;h=58e6b6d318b327f68c1a089a1444b546db6828ae;hb=6f43429b7722412373043e84757f55a533f94e10;hp=65a85c19470835ef6529515421cdefca6575538a;hpb=666f1b8f5c7c76333df4e1296c1668abf04f210f;p=public-inbox.git diff --git a/Documentation/public-inbox-v2-format.pod b/Documentation/public-inbox-v2-format.pod index 65a85c19..58e6b6d3 100644 --- a/Documentation/public-inbox-v2-format.pod +++ b/Documentation/public-inbox-v2-format.pod @@ -16,7 +16,7 @@ Message-IDs. The key change in v2 is the inbox is no longer a bare git repository, but a directory with two or more git repositories. v2 divides git repositories by time "epochs" and Xapian -databases for parallelism by "partitions". +databases for parallelism by "shards". =head2 INBOX OVERVIEW AND DEFINITIONS @@ -24,13 +24,13 @@ $EPOCH - Integer starting with 0 based on time $SCHEMA_VERSION - PublicInbox::Search::SCHEMA_VERSION used by Xapian $PART - Integer (0..NPROCESSORS) -foo/ # assuming "foo" is the name of the list -- inbox.lock # lock file (flock) to protect global state -- git/$EPOCH.git # normal git repositories -- all.git # empty git repo, alternates to git/$EPOCH.git -- xap$SCHEMA_VERSION/$PART # per-partition Xapian DB -- xap$SCHEMA_VERSION/over.sqlite3 # OVER-view DB for NNTP and threading -- msgmap.sqlite3 # same the v1 msgmap + foo/ # assuming "foo" is the name of the list + - inbox.lock # lock file (flock) to protect global state + - git/$EPOCH.git # normal git repositories + - all.git # empty git repo, alternates to git/$EPOCH.git + - xap$SCHEMA_VERSION/$SHARD # per-shard Xapian DB + - xap$SCHEMA_VERSION/over.sqlite3 # OVER-view DB for NNTP and threading + - msgmap.sqlite3 # same the v1 msgmap For blob lookups, the reader only needs to open the "all.git" repository with $GIT_DIR/objects/info/alternates which references @@ -95,16 +95,16 @@ are documented at: L -=head2 XAPIAN PARTITIONS +=head2 XAPIAN SHARDS Another second scalability problem in v1 was the inability to utilize multiple CPU cores for Xapian indexing. This is -addressed by using partitions in Xapian to perform import +addressed by using shards in Xapian to perform import indexing in parallel. As with git alternates, Xapian natively supports a read-only interface which transparently abstracts away the knowledge of -multiple partitions. This allows us to simplify our read-only +multiple shards. This allows us to simplify our read-only code paths. The performance of the storage device is now the bottleneck on @@ -135,10 +135,10 @@ The overview DB maintains all the header information necessary to implement the NNTP OVER/XOVER commands and non-search endpoints of of the PSGI UI. -In the future, Xapian will become completely optional for v2 (as -it is for v1) as SQLite turns out to be powerful enough to -maintain overview information. Most of the PSGI and all of the -NNTP functionality will be possible with only SQLite in addition +Xapian has become completely optional for v2 (as it is for v1), but +SQLite remains required for v2. SQLite turns out to be powerful +enough to maintain overview information. Most of the PSGI and all +of the NNTP functionality is possible with only SQLite in addition to git. The overview DB was an instrumental piece in maintaining near