X-Git-Url: http://www.git.stargrave.org/?a=blobdiff_plain;f=Documentation%2Fpublic-inbox-index.pod;h=8a37580c01eabe08f16f397ec38e7d5b4fe4dbc0;hb=46742d95647c7a80cb2f60d5c134717dd91e22e2;hp=838a206919932e43061705981a240c1e55557146;hpb=3d41aa23f35501ca92aab8aa42980fa73f7fa74f;p=public-inbox.git diff --git a/Documentation/public-inbox-index.pod b/Documentation/public-inbox-index.pod index 838a2069..8a37580c 100644 --- a/Documentation/public-inbox-index.pod +++ b/Documentation/public-inbox-index.pod @@ -4,14 +4,15 @@ public-inbox-index - create and update search indices =head1 SYNOPSIS -public-inbox-index [OPTIONS] GIT_DIR +public-inbox-index [OPTIONS] INBOX_DIR... =head1 DESCRIPTION -public-inbox-index creates and updates the search and NNTP -article number database used by the read-only public-inbox HTTP -and NNTP interfaces. Currently, this requires L -and L and L Perl modules. +public-inbox-index creates and updates the search, overview and +NNTP article number database used by the read-only public-inbox +HTTP and NNTP interfaces. Currently, this requires +L and L Perl modules. L +is optional, only to support the PSGI search interface. Once the initial indices are created by public-inbox-index, L and L will @@ -22,73 +23,80 @@ relying on L to mirror an existing public-inbox; or if upgrading to a new version of public-inbox using the C<--reindex> option. -Having a search and article number database is essential to +Having the overview and article number database is essential to running the NNTP interface, and strongly recommended for the -HTTP interface as it provides thread grouping in addition -to normal search functionality. +HTTP interface as it provides thread grouping in addition to +normal search functionality. =head1 OPTIONS =over +=item --compact / -c + +Compacts the Xapian DBs after indexing. This is recommended +when using C<--reindex> to avoid running out of disk space +while indexing multiple inboxes. + +While option takes a negligible amount of time compared to +C<--reindex>, it requires temporarily duplicating the entire +contents of the Xapian DB. + +This switch may be specified twice, in which case compaction +happens both before and after indexing to minimize the temporal +footprint of the (re)indexing operation. + =item --reindex -Forces a search engine re-index of all messages in the -repository. This can be used for in-place upgrades while +Forces a re-index of all messages in the inbox. +This can be used for in-place upgrades and bugfixes while NNTP/HTTP server processes are utilizing the index. Keep in mind this roughly doubles the size of the already-large -Xapian database. +Xapian database. Using this with C<--compact> or running +L afterwards is recommended to +release free space. -This does not touch the NNTP article number database. +This does not touch the NNTP article number database or +affect threading. -=back +=item --prune -=head1 FILES +Run L to prune and expire reflogs if discontiguous history +is detected. This is intended to be used in mirrors after running +L or L to ensure data +is expunged from mirrors. -All public-inbox-specific files are contained within the -C<$GIT_DIR/public-inbox/> directory. All files are expected to -grow in size as more messages are archived, so using compaction -commands (e.g. L) is not recommended unless -the list is no longer active. +=item --max-size SIZE -=over +Sets or overrides L on a +per-invocation basis. See L +below. -=item $GIT_DIR/public-inbox/msgmap.sqlite3 +=back -The stable NNTP article number to Message-ID mapping is -stored in an SQLite3 database. +=head1 FILES -This is required for users of L, but -users of the L interface will find it -useful for attempting recovery from copy-paste truncations of -URLs containing long Message-IDs. +For v1 (ssoma) repositories described in L. +All public-inbox-specific files are contained within the +C<$GIT_DIR/public-inbox/> directory. -Avoid removing this file and regenerating it; it may cause -existing NNTP readers to lose sync and miss (or see duplicate) -messages. +v2 inboxes are described in L. -This file is relatively small, and typically less than 5% -of the space of the mail stored in a packed git repository. +=head1 CONFIGURATION -=item $GIT_DIR/public-inbox/xapian* +=over 8 -The database used by L. This directory name is -followed by a number indicating the index schema version this -installation of public-inbox uses. +=item publicinbox.indexMaxSize -These directories may be safely deleted or removed in full -while the NNTP and HTTP interfaces are no longer accessing -them. +Prevents indexing of messages larger than the specified size +value. A single suffix modifier of C, C or C is +supported, thus the value of C<1m> to prevents indexing of +messages larger than one megabyte. -In addition to providing a search interface for the HTTP -interface, the Xapian database is used to group and combine -related messages into threads. For NNTP servers, it also -provides a cache of metadata and header information often -requested by NNTP clients. +This is useful for avoiding memory exhaustion in mirrors. +This option is only available in public-inbox 1.5 or later. -This directory is large, often two to three times the size of -the objects stored in a packed git repository. Using the -C<--reindex> option makes it larger, still. +Default: none =back @@ -100,8 +108,24 @@ C<--reindex> option makes it larger, still. Used to override the default "~/.public-inbox/config" value. +=item XAPIAN_FLUSH_THRESHOLD + +The number of documents to update before committing changes to +disk. This environment is handled directly by Xapian, refer to +Xapian API documentation for more details. + +Default: our indexing code flushes every megabyte of mail seen +to keep memory usage low. Setting this environment variable to +any positive value will switch to a document count-based +threshold in Xapian. + =back +=head1 UPGRADING + +Occasionally, public-inbox will update it's schema version and +require a full index by running this command. + =head1 CONTACT Feedback welcome via plain-text mail to L @@ -111,7 +135,7 @@ and L =head1 COPYRIGHT -Copyright 2016-2018 all contributors L +Copyright 2016-2020 all contributors L License: AGPL-3.0+ L