X-Git-Url: http://www.git.stargrave.org/?a=blobdiff_plain;f=Documentation%2Fpublic-inbox-index.pod;h=3bdd5efc63fcde9d674dd9679e278df7bf29e751;hb=0b15dfc58ceaecdcb1c9285c3ad55813006c8338;hp=398ac516bf9c729ff595cc0d810749faa85091df;hpb=4bebfa0c80ad7f4596a7dca98b39121470a42af0;p=public-inbox.git diff --git a/Documentation/public-inbox-index.pod b/Documentation/public-inbox-index.pod index 398ac516..3bdd5efc 100644 --- a/Documentation/public-inbox-index.pod +++ b/Documentation/public-inbox-index.pod @@ -6,6 +6,8 @@ public-inbox-index - create and update search indices public-inbox-index [OPTIONS] INBOX_DIR... +public-inbox-index [OPTIONS] --all + =head1 DESCRIPTION public-inbox-index creates and updates the search, overview and @@ -32,6 +34,24 @@ normal search functionality. =over +=item --jobs=JOBS, -j + +Influences the number of Xapian indexing shards in a +(L) inbox. + +See L for a full description +of sharding. + +C<--jobs=0> is accepted as of public-inbox 1.6.0 +to disable parallel indexing regardless of the number of +pre-existing shards. + +If the inbox has not been indexed or initialized, C +shards will be created (one job is always needed for indexing +the overview and article number mapping). + +Default: the number of existing Xapian shards + =item --compact / -c Compacts the Xapian DBs after indexing. This is recommended @@ -46,6 +66,8 @@ This switch may be specified twice, in which case compaction happens both before and after indexing to minimize the temporal footprint of the (re)indexing operation. +Available since public-inbox 1.4.0. + =item --reindex Forces a re-index of all messages in the inbox. @@ -56,8 +78,31 @@ Xapian database. Using this with C<--compact> or running L afterwards is recommended to release free space. -This does not touch the NNTP article number database or -affect threading. +public-inbox protects writes to various indices with +L, so it is safe to reindex (and rethread) while +L, L or +L run. + +This does not touch the NNTP article number database. +It does not affect threading unless C<--rethread> is +used. + +=item --all + +Index all inboxes configured in ~/.public-inbox/config. +This is an alternative to specifying individual inboxes directories +on the command-line. + +=item --rethread + +Regenerate internal THREADID and message thread associations +when reindexing. + +This fixes some bugs in older versions of public-inbox. While +it is possible to use this without C<--reindex>, it makes little +sense to do so. + +Available in public-inbox 1.6.0+. =item --prune @@ -66,21 +111,83 @@ is detected. This is intended to be used in mirrors after running L or L to ensure data is expunged from mirrors. +Available since public-inbox 1.2.0. + =item --max-size SIZE Sets or overrides L on a per-invocation basis. See L below. +Available since public-inbox 1.5.0. + +=item --batch-size SIZE + +Sets or overrides L on a +per-invocation basis. See L +below. + +When using rotational storage but abundant RAM, using a large +value (e.g. C<500m>) with C<--sequential-shard> can +significantly speed up and reduce fragmentation during the +initial index and full C<--reindex> invocations (but not +incremental updates). + +Available in public-inbox 1.6.0+. + +=item --no-fsync + +Disables L and L operations on SQLite +and Xapian. This is only effective with Xapian 1.4+. This is +primarily intended for systems with low RAM and the small +(default) C<--batch-size=1m>. Users of large C<--batch-size> +may even find disabling L causes too much dirty +data to accumulate, resulting on latency spikes from writeback. + +Available in public-inbox 1.6.0+. + +=item --sequential-shard + +Sets or overrides L on a +per-invocation basis. See L +below. + +Available in public-inbox 1.6.0+. + +=item --skip-docdata + +Stop storing document data in Xapian on an existing inbox. + +See L for description and caveats. + +Available in public-inbox 1.6.0+. + +=item --update-extindex=EXTINDEX, -E + +Update the given external index (L. +Either the configured section name (e.g. C) or a directory name +may be specified. + +Defaults to C if C<[extindex "all"]> is configured, +otherwise no external indices are updated. + +May be specified multiple times in rare cases where multiple +external indices are configured. + +=item --no-update-extindex + +Do not update the C external index by default. This negates +all uses of C<-E> / C<--update-extindex=> on the command-line. + =back =head1 FILES -For v1 (ssoma) repositories described in L. +For v1 (ssoma) repositories described in L. All public-inbox-specific files are contained within the C<$GIT_DIR/public-inbox/> directory. -v2 inboxes are described in L. +v2 inboxes are described in L. =head1 CONFIGURATION @@ -93,10 +200,71 @@ value. A single suffix modifier of C, C or C is supported, thus the value of C<1m> to prevents indexing of messages larger than one megabyte. -This is useful for avoiding memory exhaustion in mirrors. +This is useful for avoiding memory exhaustion in mirrors +via git. It does not prevent L or +L from importing (and indexing) +a message. + +This option is only available in public-inbox 1.5 or later. Default: none +=item publicinbox.indexBatchSize + +Flushes changes to the filesystem and releases locks after +indexing the given number of bytes. The default value of C<1m> +(one megabyte) is low to minimize memory use and reduce +contention with parallel invocations of L, +L, and L. + +Increase this value on powerful systems to improve throughput at +the expense of memory use. The reduction of lock granularity +may not be noticeable on fast systems. With SSDs, values above +C<4m> have little benefit. + +For L inboxes, this value is +multiplied by the number of Xapian shards. Thus a typical v2 +inbox with 3 shards will flush every 3 megabytes by default +unless parallelism is disabled via C<--sequential-shard> +or C<--jobs=0>. + +This influences memory usage of Xapian, but it is not exact. +The actual memory used by Xapian and Perl has been observed +in excess of 10x this value. + +This option is available in public-inbox 1.6 or later. +public-inbox 1.5 and earlier used the current default, C<1m>. + +Default: 1m (one megabyte) + +=item publicinbox.indexSequentialShard + +For L inboxes, setting this to C +allows indexing Xapian shards in multiple passes. This speeds up +indexing on rotational storage with high seek latency by allowing +individual shards to fit into the kernel page cache. + +Using a higher-than-normal number of C<--jobs> with +L may be required to ensure individual +shards are small enough to fit into cache. + +Warning: interrupting C while this option +is in use may leave the search indices out-of-date with respect +to SQLite databases. WWW and IMAP users may notice incomplete +search results, but it is otherwise non-fatal. Using C<--reindex> +will bring everything back up-to-date. + +Available in public-inbox 1.6.0+. + +This is ignored on L inboxes. + +Default: false, shards are indexed in parallel + +=item publicinbox..indexSequentialShard + +Identical to L, +but only affect the inbox matching EnameE. + =back =head1 ENVIRONMENT @@ -113,10 +281,16 @@ The number of documents to update before committing changes to disk. This environment is handled directly by Xapian, refer to Xapian API documentation for more details. -Default: our indexing code flushes every megabyte of mail seen -to keep memory usage low. Setting this environment variable to -any positive value will switch to a document count-based -threshold in Xapian. +For public-inbox 1.6 and later, use C +instead. + +Setting C or +C for a large C<--reindex> may cause +L, L and +L tasks to wait long and unpredictable +periods of time during C<--reindex>. + +Default: none, uses C =back @@ -129,15 +303,15 @@ require a full index by running this command. Feedback welcome via plain-text mail to L -The mail archives are hosted at L -and L +The mail archives are hosted at L and +L =head1 COPYRIGHT -Copyright 2016-2020 all contributors L +Copyright 2016-2021 all contributors L License: AGPL-3.0+ L =head1 SEE ALSO -L, L +L, L, L