public-inbox-index [OPTIONS] INBOX_DIR...
+public-inbox-index [OPTIONS] --all
+
=head1 DESCRIPTION
public-inbox-index creates and updates the search, overview and
=over
-=item --jobs=JOBS, -j
+=item -j JOBS
+
+=item --jobs=JOBS
-Control the number of Xapian indexing jobs in a
+Influences the number of Xapian indexing shards in a
(L<public-inbox-v2-format(5)>) inbox.
-C<--jobs=0> is accepted as of public-inbox 1.6.0 (PENDING)
-to disable parallel indexing.
+See L<public-inbox-init(1)/--jobs> for a full description
+of sharding.
+
+C<--jobs=0> is accepted as of public-inbox 1.6.0
+to disable parallel indexing regardless of the number of
+pre-existing shards.
+
+If the inbox has not been indexed or initialized, C<JOBS - 1>
+shards will be created (one job is always needed for indexing
+the overview and article number mapping).
Default: the number of existing Xapian shards
-=item --compact / -c
+=item -c
+
+=item --compact
Compacts the Xapian DBs after indexing. This is recommended
when using C<--reindex> to avoid running out of disk space
It does not affect threading unless C<--rethread> is
used.
+=item --all
+
+Index all inboxes configured in ~/.public-inbox/config.
+This is an alternative to specifying individual inboxes directories
+on the command-line.
+
=item --rethread
Regenerate internal THREADID and message thread associations
it is possible to use this without C<--reindex>, it makes little
sense to do so.
-Available in public-inbox 1.6.0 (PENDING).
+Available in public-inbox 1.6.0+.
=item --prune
per-invocation basis. See L</publicinbox.indexBatchSize>
below.
-Available in public-inbox 1.6.0 (PENDING).
+When using rotational storage but abundant RAM, using a large
+value (e.g. C<500m>) with C<--sequential-shard> can
+significantly speed up and reduce fragmentation during the
+initial index and full C<--reindex> invocations (but not
+incremental updates).
+
+Available in public-inbox 1.6.0+.
+
+=item --no-fsync
+
+Disables L<fsync(2)> and L<fdatasync(2)> operations on SQLite
+and Xapian. This is only effective with Xapian 1.4+. This is
+primarily intended for systems with low RAM and the small
+(default) C<--batch-size=1m>. Users of large C<--batch-size>
+may even find disabling L<fdatasync(2)> causes too much dirty
+data to accumulate, resulting on latency spikes from writeback.
+
+Available in public-inbox 1.6.0+.
+
+=item --dangerous
+
+Speed up initial index by using in-place updates and denying support for
+concurrent readers. This is only effective with Xapian 1.4+.
+
+Available in public-inbox 1.8.0+
+
+=item --sequential-shard
+
+Sets or overrides L</publicinbox.indexSequentialShard> on a
+per-invocation basis. See L</publicinbox.indexSequentialShard>
+below.
+
+Available in public-inbox 1.6.0+.
+
+=item --skip-docdata
+
+Stop storing document data in Xapian on an existing inbox.
+
+See L<public-inbox-init(1)/--skip-docdata> for description and caveats.
+
+Available in public-inbox 1.6.0+.
+
+=item -E EXTINDEX
+
+=item --update-extindex=EXTINDEX
+
+Update the given external index (L<public-inbox-extindex-format(5)>.
+Either the configured section name (e.g. C<all>) or a directory name
+may be specified.
+
+Defaults to C<all> if C<[extindex "all"]> is configured,
+otherwise no external indices are updated.
+
+May be specified multiple times in rare cases where multiple
+external indices are configured.
+
+=item --no-update-extindex
+
+Do not update the C<all> external index by default. This negates
+all uses of C<-E> / C<--update-extindex=> on the command-line.
+
+=item --since=DATESTRING
+
+=item --after=DATESTRING
+
+=item --until=DATESTRING
+
+=item --before=DATESTRING
+
+Passed directly to L<git-log(1)> to limit changes for C<--reindex>
=back
=head1 FILES
-For v1 (ssoma) repositories described in L<public-inbox-v1-format>.
+For v1 (ssoma) repositories described in L<public-inbox-v1-format(5)>.
All public-inbox-specific files are contained within the
C<$GIT_DIR/public-inbox/> directory.
-v2 inboxes are described in L<public-inbox-v2-format>.
+v2 inboxes are described in L<public-inbox-v2-format(5)>.
=head1 CONFIGURATION
supported, thus the value of C<1m> to prevents indexing of
messages larger than one megabyte.
-This is useful for avoiding memory exhaustion in mirrors.
+This is useful for avoiding memory exhaustion in mirrors
+via git. It does not prevent L<public-inbox-mda(1)> or
+L<public-inbox-watch(1)> from importing (and indexing)
+a message.
+
This option is only available in public-inbox 1.5 or later.
Default: none
Increase this value on powerful systems to improve throughput at
the expense of memory use. The reduction of lock granularity
-may not be noticeable on fast systems.
-
-This option is available in public-inbox 1.6 or later.
-public-inbox 1.5 and earlier used the current default, C<1m>.
+may not be noticeable on fast systems. With SSDs, values above
+C<4m> have little benefit.
For L<public-inbox-v2-format(5)> inboxes, this value is
multiplied by the number of Xapian shards. Thus a typical v2
-inbox with 3 shards will flush every 3 megabytes by default.
+inbox with 3 shards will flush every 3 megabytes by default
+unless parallelism is disabled via C<--sequential-shard>
+or C<--jobs=0>.
+
+This influences memory usage of Xapian, but it is not exact.
+The actual memory used by Xapian and Perl has been observed
+in excess of 10x this value.
+
+This option is available in public-inbox 1.6 or later.
+public-inbox 1.5 and earlier used the current default, C<1m>.
Default: 1m (one megabyte)
+=item publicinbox.indexSequentialShard
+
+For L<public-inbox-v2-format(5)> inboxes, setting this to C<true>
+allows indexing Xapian shards in multiple passes. This speeds up
+indexing on rotational storage with high seek latency by allowing
+individual shards to fit into the kernel page cache.
+
+Using a higher-than-normal number of C<--jobs> with
+L<public-inbox-init(1)> may be required to ensure individual
+shards are small enough to fit into cache.
+
+Warning: interrupting C<public-inbox-index(1)> while this option
+is in use may leave the search indices out-of-date with respect
+to SQLite databases. WWW and IMAP users may notice incomplete
+search results, but it is otherwise non-fatal. Using C<--reindex>
+will bring everything back up-to-date.
+
+Available in public-inbox 1.6.0+.
+
+This is ignored on L<public-inbox-v1-format(5)> inboxes.
+
+Default: false, shards are indexed in parallel
+
+=item publicinbox.<name>.indexSequentialShard
+
+Identical to L</publicinbox.indexSequentialShard>,
+but only affect the inbox matching E<lt>nameE<gt>.
+
=back
=head1 ENVIRONMENT
Xapian API documentation for more details.
For public-inbox 1.6 and later, use C<publicinbox.indexBatchSize>
-instead. Setting C<XAPIAN_FLUSH_THRESHOLD> for a large C<--reindex>
-may cause L<public-inbox-mda(1)>, L<public-inbox-learn(1)> and
-L<public-inbox-watch(1)> tasks to wait long periods of time
-during C<--reindex>.
+instead.
+
+Setting C<XAPIAN_FLUSH_THRESHOLD> or
+C<publicinbox.indexBatchSize> for a large C<--reindex> may cause
+L<public-inbox-mda(1)>, L<public-inbox-learn(1)> and
+L<public-inbox-watch(1)> tasks to wait long and unpredictable
+periods of time during C<--reindex>.
Default: none, uses C<publicinbox.indexBatchSize>
Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>
-The mail archives are hosted at L<https://public-inbox.org/meta/>
-and L<http://hjrcffqmbrq6wope.onion/meta/>
+The mail archives are hosted at L<https://public-inbox.org/meta/> and
+L<http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>
=head1 COPYRIGHT
-Copyright 2016-2020 all contributors L<mailto:meta@public-inbox.org>
+Copyright all contributors L<mailto:meta@public-inbox.org>
License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>
=head1 SEE ALSO
-L<Search::Xapian>, L<DBD::SQLite>
+L<Search::Xapian>, L<DBD::SQLite>, L<public-inbox-extindex-format(5)>