=head1 NAME
-public-inbox v2 repository description
+public-inbox v2 format description
=head1 DESCRIPTION
The key change in v2 is the inbox is no longer a bare git
repository, but a directory with two or more git repositories.
v2 divides git repositories by time "epochs" and Xapian
-databases for parallelism by "partitions".
+databases for parallelism by "shards".
=head2 INBOX OVERVIEW AND DEFINITIONS
-$EPOCH - Integer starting with 0 based on time
-$SCHEMA_VERSION - PublicInbox::Search::SCHEMA_VERSION used by Xapian
-$PART - Integer (0..NPROCESSORS)
+ $EPOCH - Integer starting with 0 based on time
+ $SCHEMA_VERSION - DB schema version (for Xapian)
+ $SHARD - Integer starting with 0 based on parallelism
-foo/ # assuming "foo" is the name of the list
-- inbox.lock # lock file (flock) to protect global state
-- git/$EPOCH.git # normal git repositories
-- all.git # empty git repo, alternates to git/$EPOCH.git
-- xap$SCHEMA_VERSION/$PART # per-partition Xapian DB
-- xap$SCHEMA_VERSION/over.sqlite3 # OVER-view DB for NNTP and threading
-- msgmap.sqlite3 # same the v1 msgmap
+ foo/ # "foo" is the name of the inbox
+ - inbox.lock # lock file to protect global state
+ - git/$EPOCH.git # normal git repositories
+ - all.git # empty, alternates to $EPOCH.git
+ - xap$SCHEMA_VERSION/$SHARD # per-shard Xapian DB
+ - xap$SCHEMA_VERSION/over.sqlite3 # OVER-view DB for NNTP, threading
+ - msgmap.sqlite3 # same the v1 msgmap
For blob lookups, the reader only needs to open the "all.git"
repository with $GIT_DIR/objects/info/alternates which references
L<https://public-inbox.org/meta/20180209205140.GA11047@dcvr/>
-=head2 XAPIAN PARTITIONS
+=head2 XAPIAN SHARDS
Another second scalability problem in v1 was the inability to
utilize multiple CPU cores for Xapian indexing. This is
-addressed by using partitions in Xapian to perform import
+addressed by using shards in Xapian to perform import
indexing in parallel.
As with git alternates, Xapian natively supports a read-only
interface which transparently abstracts away the knowledge of
-multiple partitions. This allows us to simplify our read-only
+multiple shards. This allows us to simplify our read-only
code paths.
The performance of the storage device is now the bottleneck on
=item object_id
The blob identifier git uses (currently SHA-1). No need to
-publically expose this outside of normal git ops (cloning) and
+publicly expose this outside of normal git ops (cloning) and
there's no need to make this searchable. As with v1 of
public-inbox, this is stored as part of the Xapian document so
expensive name lookups can be avoided for document retrieval.
=head1 THANKS
Thanks to the Linux Foundation for sponsoring the development
-and testing of the v2 repository format.
+and testing of the v2 format.
=head1 COPYRIGHT
-Copyright 2018-2019 all contributors L<mailto:meta@public-inbox.org>
+Copyright 2018-2020 all contributors L<mailto:meta@public-inbox.org>
License: AGPL-3.0+ L<http://www.gnu.org/licenses/agpl-3.0.txt>