X-Git-Url: http://www.git.stargrave.org/?a=blobdiff_plain;f=Documentation%2Fpublic-inbox-index.pod;h=0848e8604c8aa842a9fdcda6aec4026a45f9442b;hb=8e9d4f877730dbdf4ebbd59cbd73a7a921c640e0;hp=aeb1b3a39fbfaf9c4da3144c3f18b98014774895;hpb=0e68dbad3dc5e3fbc44e8ba8be576b81455d3359;p=public-inbox.git diff --git a/Documentation/public-inbox-index.pod b/Documentation/public-inbox-index.pod index aeb1b3a3..0848e860 100644 --- a/Documentation/public-inbox-index.pod +++ b/Documentation/public-inbox-index.pod @@ -6,6 +6,8 @@ public-inbox-index - create and update search indices public-inbox-index [OPTIONS] INBOX_DIR... +public-inbox-index [OPTIONS] --all + =head1 DESCRIPTION public-inbox-index creates and updates the search, overview and @@ -34,11 +36,19 @@ normal search functionality. =item --jobs=JOBS, -j -Control the number of Xapian indexing jobs in a +Influences the number of Xapian indexing shards in a (L) inbox. -C<--jobs=0> is accepted as of public-inbox 1.6.0 (PENDING) -to disable parallel indexing. +See L for a full description +of sharding. + +C<--jobs=0> is accepted as of public-inbox 1.6.0 +to disable parallel indexing regardless of the number of +pre-existing shards. + +If the inbox has not been indexed or initialized, C +shards will be created (one job is always needed for indexing +the overview and article number mapping). Default: the number of existing Xapian shards @@ -77,6 +87,12 @@ This does not touch the NNTP article number database. It does not affect threading unless C<--rethread> is used. +=item --all + +Index all inboxes configured in ~/.public-inbox/config. +This is an alternative to specifying individual inboxes directories +on the command-line. + =item --rethread Regenerate internal THREADID and message thread associations @@ -86,7 +102,7 @@ This fixes some bugs in older versions of public-inbox. While it is possible to use this without C<--reindex>, it makes little sense to do so. -Available in public-inbox 1.6.0 (PENDING). +Available in public-inbox 1.6.0+. =item --prune @@ -111,24 +127,50 @@ Sets or overrides L on a per-invocation basis. See L below. -Available in public-inbox 1.6.0 (PENDING). +When using rotational storage but abundant RAM, using a large +value (e.g. C<500m>) with C<--sequential-shard> can +significantly speed up and reduce fragmentation during the +initial index and full C<--reindex> invocations (but not +incremental updates). + +Available in public-inbox 1.6.0+. -=item --no-sync +=item --no-fsync Disables L and L operations on SQLite -and Xapian. This is only effective with Xapian 1.4+. +and Xapian. This is only effective with Xapian 1.4+. This is +primarily intended for systems with low RAM and the small +(default) C<--batch-size=1m>. Users of large C<--batch-size> +may even find disabling L causes too much dirty +data to accumulate, resulting on latency spikes from writeback. + +Available in public-inbox 1.6.0+. + +=item --sequential-shard -Available in public-inbox 1.6.0 (PENDING). +Sets or overrides L on a +per-invocation basis. See L +below. + +Available in public-inbox 1.6.0+. + +=item --skip-docdata + +Stop storing document data in Xapian on an existing inbox. + +See L for description and caveats. + +Available in public-inbox 1.6.0+. =back =head1 FILES -For v1 (ssoma) repositories described in L. +For v1 (ssoma) repositories described in L. All public-inbox-specific files are contained within the C<$GIT_DIR/public-inbox/> directory. -v2 inboxes are described in L. +v2 inboxes are described in L. =head1 CONFIGURATION @@ -141,7 +183,11 @@ value. A single suffix modifier of C, C or C is supported, thus the value of C<1m> to prevents indexing of messages larger than one megabyte. -This is useful for avoiding memory exhaustion in mirrors. +This is useful for avoiding memory exhaustion in mirrors +via git. It does not prevent L or +L from importing (and indexing) +a message. + This option is only available in public-inbox 1.5 or later. Default: none @@ -156,17 +202,52 @@ L, and L. Increase this value on powerful systems to improve throughput at the expense of memory use. The reduction of lock granularity -may not be noticeable on fast systems. - -This option is available in public-inbox 1.6 or later. -public-inbox 1.5 and earlier used the current default, C<1m>. +may not be noticeable on fast systems. With SSDs, values above +C<4m> have little benefit. For L inboxes, this value is multiplied by the number of Xapian shards. Thus a typical v2 -inbox with 3 shards will flush every 3 megabytes by default. +inbox with 3 shards will flush every 3 megabytes by default +unless parallelism is disabled via C<--sequential-shard> +or C<--jobs=0>. + +This influences memory usage of Xapian, but it is not exact. +The actual memory used by Xapian and Perl has been observed +in excess of 10x this value. + +This option is available in public-inbox 1.6 or later. +public-inbox 1.5 and earlier used the current default, C<1m>. Default: 1m (one megabyte) +=item publicinbox.indexSequentialShard + +For L inboxes, setting this to C +allows indexing Xapian shards in multiple passes. This speeds up +indexing on rotational storage with high seek latency by allowing +individual shards to fit into the kernel page cache. + +Using a higher-than-normal number of C<--jobs> with +L may be required to ensure individual +shards are small enough to fit into cache. + +Warning: interrupting C while this option +is in use may leave the search indices out-of-date with respect +to SQLite databases. WWW and IMAP users may notice incomplete +search results, but it is otherwise non-fatal. Using C<--reindex> +will bring everything back up-to-date. + +Available in public-inbox 1.6.0+. + +This is ignored on L inboxes. + +Default: false, shards are indexed in parallel + +=item publicinbox..indexSequentialShard + +Identical to L, +but only affect the inbox matching EnameE. + =back =head1 ENVIRONMENT @@ -184,10 +265,13 @@ disk. This environment is handled directly by Xapian, refer to Xapian API documentation for more details. For public-inbox 1.6 and later, use C -instead. Setting C for a large C<--reindex> -may cause L, L and -L tasks to wait long periods of time -during C<--reindex>. +instead. + +Setting C or +C for a large C<--reindex> may cause +L, L and +L tasks to wait long and unpredictable +periods of time during C<--reindex>. Default: none, uses C