=head1 NAME public-inbox-index - create and update search indices =head1 SYNOPSIS public-inbox-index [OPTIONS] INBOX_DIR... =head1 DESCRIPTION public-inbox-index creates and updates the search, overview and NNTP article number database used by the read-only public-inbox HTTP and NNTP interfaces. Currently, this requires L and L Perl modules. L is optional, only to support the PSGI search interface. Once the initial indices are created by public-inbox-index, L and L will automatically maintain them. Running this manually to update indices is only required if relying on L to mirror an existing public-inbox; or if upgrading to a new version of public-inbox using the C<--reindex> option. Having the overview and article number database is essential to running the NNTP interface, and strongly recommended for the HTTP interface as it provides thread grouping in addition to normal search functionality. =head1 OPTIONS =over =item --jobs=JOBS, -j Influences the number of Xapian indexing shards in a (L) inbox. C<--jobs=0> is accepted as of public-inbox 1.6.0 (PENDING) to disable parallel indexing. If the inbox has not been indexed, C shards will be created (one job is always needed for indexing the overview and article number mapping). Default: the number of existing Xapian shards =item --compact / -c Compacts the Xapian DBs after indexing. This is recommended when using C<--reindex> to avoid running out of disk space while indexing multiple inboxes. While option takes a negligible amount of time compared to C<--reindex>, it requires temporarily duplicating the entire contents of the Xapian DB. This switch may be specified twice, in which case compaction happens both before and after indexing to minimize the temporal footprint of the (re)indexing operation. Available since public-inbox 1.4.0. =item --reindex Forces a re-index of all messages in the inbox. This can be used for in-place upgrades and bugfixes while NNTP/HTTP server processes are utilizing the index. Keep in mind this roughly doubles the size of the already-large Xapian database. Using this with C<--compact> or running L afterwards is recommended to release free space. public-inbox protects writes to various indices with L, so it is safe to reindex (and rethread) while L, L or L run. This does not touch the NNTP article number database. It does not affect threading unless C<--rethread> is used. =item --rethread Regenerate internal THREADID and message thread associations when reindexing. This fixes some bugs in older versions of public-inbox. While it is possible to use this without C<--reindex>, it makes little sense to do so. Available in public-inbox 1.6.0 (PENDING). =item --prune Run L to prune and expire reflogs if discontiguous history is detected. This is intended to be used in mirrors after running L or L to ensure data is expunged from mirrors. Available since public-inbox 1.2.0. =item --max-size SIZE Sets or overrides L on a per-invocation basis. See L below. Available since public-inbox 1.5.0. =item --batch-size SIZE Sets or overrides L on a per-invocation basis. See L below. Available in public-inbox 1.6.0 (PENDING). =item --no-fsync Disables L and L operations on SQLite and Xapian. This is only effective with Xapian 1.4+. Available in public-inbox 1.6.0 (PENDING). =item --sequential-shard Sets or overrides L on a per-invocation basis. See L below. Available in public-inbox 1.6.0 (PENDING). =back =head1 FILES For v1 (ssoma) repositories described in L. All public-inbox-specific files are contained within the C<$GIT_DIR/public-inbox/> directory. v2 inboxes are described in L. =head1 CONFIGURATION =over 8 =item publicinbox.indexMaxSize Prevents indexing of messages larger than the specified size value. A single suffix modifier of C, C or C is supported, thus the value of C<1m> to prevents indexing of messages larger than one megabyte. This is useful for avoiding memory exhaustion in mirrors. This option is only available in public-inbox 1.5 or later. Default: none =item publicinbox.indexBatchSize Flushes changes to the filesystem and releases locks after indexing the given number of bytes. The default value of C<1m> (one megabyte) is low to minimize memory use and reduce contention with parallel invocations of L, L, and L. Increase this value on powerful systems to improve throughput at the expense of memory use. The reduction of lock granularity may not be noticeable on fast systems. This option is available in public-inbox 1.6 or later. public-inbox 1.5 and earlier used the current default, C<1m>. For L inboxes, this value is multiplied by the number of Xapian shards. Thus a typical v2 inbox with 3 shards will flush every 3 megabytes by default. Default: 1m (one megabyte) =item publicinbox.indexBatchSize Flushes changes to the filesystem and releases locks after indexing the given number of bytes. The default value of C<1m> (one megabyte) is low to minimize memory use and reduce contention with parallel invocations of L, L, and L. Increase this value on powerful systems to improve throughput at the expense of memory use. The reduction of lock granularity may not be noticeable on fast systems. This option is available in public-inbox 1.6 or later. public-inbox 1.5 and earlier used the current default, C<1m>. For L inboxes, this value is multiplied by the number of Xapian shards. Thus a typical v2 inbox with 3 shards will flush every 3 megabytes by default. Default: 1m (one megabyte) =item publicinbox.indexSequentialShard =item publicinbox..indexSequentialShard For L inboxes, setting this to C allows indexing Xapian shards in multiple passes. This speeds up indexing on rotational storage with high seek latency by allowing individual shards to fit into the kernel page cache. Using a higher-than-normal number of C<--jobs> with L may be required to ensure individual shards are small enough to fit into cache. Available in public-inbox 1.6.0 (PENDING). This is ignored on L inboxes. Default: false, shards are indexed in parallel =back =head1 ENVIRONMENT =over 8 =item PI_CONFIG Used to override the default "~/.public-inbox/config" value. =item XAPIAN_FLUSH_THRESHOLD The number of documents to update before committing changes to disk. This environment is handled directly by Xapian, refer to Xapian API documentation for more details. For public-inbox 1.6 and later, use C instead. Setting C for a large C<--reindex> may cause L, L and L tasks to wait long periods of time during C<--reindex>. Default: none, uses C =back =head1 UPGRADING Occasionally, public-inbox will update it's schema version and require a full index by running this command. =head1 CONTACT Feedback welcome via plain-text mail to L The mail archives are hosted at L and L =head1 COPYRIGHT Copyright 2016-2020 all contributors L License: AGPL-3.0+ L =head1 SEE ALSO L, L