X-Git-Url: http://www.git.stargrave.org/?a=blobdiff_plain;f=Documentation%2Fpublic-inbox-index.pod;h=3ae3b0089275a5a60195cd169f12d65c6019fdbe;hb=c1e29b1a8ef1aeafc21c5524ae7e20b467627cf5;hp=ff2e5486748889667667de1ef0f86d0d6f390115;hpb=2ca7db34a51b858c9d7f6f7366afb9fffee86b6e;p=public-inbox.git diff --git a/Documentation/public-inbox-index.pod b/Documentation/public-inbox-index.pod index ff2e5486..3ae3b008 100644 --- a/Documentation/public-inbox-index.pod +++ b/Documentation/public-inbox-index.pod @@ -34,12 +34,16 @@ normal search functionality. =item --jobs=JOBS, -j -Control the number of Xapian indexing jobs in a +Influences the number of Xapian indexing shards in a (L) inbox. C<--jobs=0> is accepted as of public-inbox 1.6.0 (PENDING) to disable parallel indexing. +If the inbox has not been indexed or initialized, C +shards will be created (one job is always needed for indexing +the overview and article number mapping). + Default: the number of existing Xapian shards =item --compact / -c @@ -68,12 +72,25 @@ Xapian database. Using this with C<--compact> or running L afterwards is recommended to release free space. -public-inbox protects writes to various indices with L, -so it is safe to reindex while L, -L or L run. +public-inbox protects writes to various indices with +L, so it is safe to reindex (and rethread) while +L, L or +L run. + +This does not touch the NNTP article number database. +It does not affect threading unless C<--rethread> is +used. + +=item --rethread -This does not touch the NNTP article number database or -affect threading. +Regenerate internal THREADID and message thread associations +when reindexing. + +This fixes some bugs in older versions of public-inbox. While +it is possible to use this without C<--reindex>, it makes little +sense to do so. + +Available in public-inbox 1.6.0 (PENDING). =item --prune @@ -98,17 +115,37 @@ Sets or overrides L on a per-invocation basis. See L below. +When using rotational storage but abundant RAM, using a large +value (e.g. C<500m>) with C<--sequential-shard> can +significantly speed up the initial index and full C<--reindex> +invocations (but not incremental updates). + +Available in public-inbox 1.6.0 (PENDING). + +=item --no-fsync + +Disables L and L operations on SQLite +and Xapian. This is only effective with Xapian 1.4+. + +Available in public-inbox 1.6.0 (PENDING). + +=item --sequential-shard + +Sets or overrides L on a +per-invocation basis. See L +below. + Available in public-inbox 1.6.0 (PENDING). =back =head1 FILES -For v1 (ssoma) repositories described in L. +For v1 (ssoma) repositories described in L. All public-inbox-specific files are contained within the C<$GIT_DIR/public-inbox/> directory. -v2 inboxes are described in L. +v2 inboxes are described in L. =head1 CONFIGURATION @@ -136,17 +173,52 @@ L, and L. Increase this value on powerful systems to improve throughput at the expense of memory use. The reduction of lock granularity -may not be noticeable on fast systems. - -This option is available in public-inbox 1.6 or later. -public-inbox 1.5 and earlier used the current default, C<1m>. +may not be noticeable on fast systems. With SSDs, values above +C<4m> have little benefit. For L inboxes, this value is multiplied by the number of Xapian shards. Thus a typical v2 -inbox with 3 shards will flush every 3 megabytes by default. +inbox with 3 shards will flush every 3 megabytes by default +when unless parallelism is disabled via C<--sequential-shard> +or C<--jobs=0>. + +This influences memory usage of Xapian, but it is not exact. +The actual memory used by Xapian and Perl has been observed +in excess of 10x this value. + +This option is available in public-inbox 1.6 or later. +public-inbox 1.5 and earlier used the current default, C<1m>. Default: 1m (one megabyte) +=item publicinbox.indexSequentialShard + +For L inboxes, setting this to C +allows indexing Xapian shards in multiple passes. This speeds up +indexing on rotational storage with high seek latency by allowing +individual shards to fit into the kernel page cache. + +Using a higher-than-normal number of C<--jobs> with +L may be required to ensure individual +shards are small enough to fit into cache. + +Warning: interrupting C while this option +is in use may leave the search indices out-of-date with respect +to SQLite databases. WWW and IMAP users may notice incomplete +search results, but it is otherwise non-fatal. Using C<--reindex> +will bring everything back up-to-date. + +Available in public-inbox 1.6.0 (PENDING). + +This is ignored on L inboxes. + +Default: false, shards are indexed in parallel + +=item publicinbox..indexSequentialShard + +Identical to L, +but only affect the inbox matching EnameE. + =back =head1 ENVIRONMENT @@ -164,10 +236,13 @@ disk. This environment is handled directly by Xapian, refer to Xapian API documentation for more details. For public-inbox 1.6 and later, use C -instead. Setting C for a large C<--reindex> -may cause L, L and -L tasks to wait long periods of time -during C<--reindex>. +instead. + +Setting C or +C for a large C<--reindex> may cause +L, L and +L tasks to wait long and unpredictable +periods of time during C<--reindex>. Default: none, uses C