3 public-inbox-index - create and update search indices
7 public-inbox-index [OPTIONS] INBOX_DIR...
9 public-inbox-index [OPTIONS] --all
13 public-inbox-index creates and updates the search, overview and
14 NNTP article number database used by the read-only public-inbox
15 HTTP and NNTP interfaces. Currently, this requires
16 L<DBD::SQLite> and L<DBI> Perl modules. L<Search::Xapian>
17 is optional, only to support the PSGI search interface.
19 Once the initial indices are created by public-inbox-index,
20 L<public-inbox-mda(1)> and L<public-inbox-watch(1)> will
21 automatically maintain them.
23 Running this manually to update indices is only required if
24 relying on L<git-fetch(1)> to mirror an existing public-inbox;
25 or if upgrading to a new version of public-inbox using
26 the C<--reindex> option.
28 Having the overview and article number database is essential to
29 running the NNTP interface, and strongly recommended for the
30 HTTP interface as it provides thread grouping in addition to
31 normal search functionality.
39 Influences the number of Xapian indexing shards in a
40 (L<public-inbox-v2-format(5)>) inbox.
42 C<--jobs=0> is accepted as of public-inbox 1.6.0 (PENDING)
43 to disable parallel indexing.
45 If the inbox has not been indexed or initialized, C<JOBS - 1>
46 shards will be created (one job is always needed for indexing
47 the overview and article number mapping).
49 Default: the number of existing Xapian shards
53 Compacts the Xapian DBs after indexing. This is recommended
54 when using C<--reindex> to avoid running out of disk space
55 while indexing multiple inboxes.
57 While option takes a negligible amount of time compared to
58 C<--reindex>, it requires temporarily duplicating the entire
59 contents of the Xapian DB.
61 This switch may be specified twice, in which case compaction
62 happens both before and after indexing to minimize the temporal
63 footprint of the (re)indexing operation.
65 Available since public-inbox 1.4.0.
69 Forces a re-index of all messages in the inbox.
70 This can be used for in-place upgrades and bugfixes while
71 NNTP/HTTP server processes are utilizing the index. Keep in
72 mind this roughly doubles the size of the already-large
73 Xapian database. Using this with C<--compact> or running
74 L<public-inbox-compact(1)> afterwards is recommended to
77 public-inbox protects writes to various indices with
78 L<flock(2)>, so it is safe to reindex (and rethread) while
79 L<public-inbox-watch(1)>, L<public-inbox-mda(1)> or
80 L<public-inbox-learn(1)> run.
82 This does not touch the NNTP article number database.
83 It does not affect threading unless C<--rethread> is
88 Index all inboxes configured in ~/.public-inbox/config.
89 This is an alternative to specifying individual inboxes directories
94 Regenerate internal THREADID and message thread associations
97 This fixes some bugs in older versions of public-inbox. While
98 it is possible to use this without C<--reindex>, it makes little
101 Available in public-inbox 1.6.0 (PENDING).
105 Run L<git-gc(1)> to prune and expire reflogs if discontiguous history
106 is detected. This is intended to be used in mirrors after running
107 L<public-inbox-edit(1)> or L<public-inbox-purge(1)> to ensure data
108 is expunged from mirrors.
110 Available since public-inbox 1.2.0.
112 =item --max-size SIZE
114 Sets or overrides L</publicinbox.indexMaxSize> on a
115 per-invocation basis. See L</publicinbox.indexMaxSize>
118 Available since public-inbox 1.5.0.
120 =item --batch-size SIZE
122 Sets or overrides L</publicinbox.indexBatchSize> on a
123 per-invocation basis. See L</publicinbox.indexBatchSize>
126 When using rotational storage but abundant RAM, using a large
127 value (e.g. C<500m>) with C<--sequential-shard> can
128 significantly speed up the initial index and full C<--reindex>
129 invocations (but not incremental updates).
131 Available in public-inbox 1.6.0 (PENDING).
135 Disables L<fsync(2)> and L<fdatasync(2)> operations on SQLite
136 and Xapian. This is only effective with Xapian 1.4+.
138 Available in public-inbox 1.6.0 (PENDING).
140 =item --sequential-shard
142 Sets or overrides L</publicinbox.indexSequentialShard> on a
143 per-invocation basis. See L</publicinbox.indexSequentialShard>
146 Available in public-inbox 1.6.0 (PENDING).
152 For v1 (ssoma) repositories described in L<public-inbox-v1-format(5)>.
153 All public-inbox-specific files are contained within the
154 C<$GIT_DIR/public-inbox/> directory.
156 v2 inboxes are described in L<public-inbox-v2-format(5)>.
162 =item publicinbox.indexMaxSize
164 Prevents indexing of messages larger than the specified size
165 value. A single suffix modifier of C<k>, C<m> or C<g> is
166 supported, thus the value of C<1m> to prevents indexing of
167 messages larger than one megabyte.
169 This is useful for avoiding memory exhaustion in mirrors
170 via git. It does not prevent L<public-inbox-mda(1)> or
171 L<public-inbox-watch(1)> from importing (and indexing)
174 This option is only available in public-inbox 1.5 or later.
178 =item publicinbox.indexBatchSize
180 Flushes changes to the filesystem and releases locks after
181 indexing the given number of bytes. The default value of C<1m>
182 (one megabyte) is low to minimize memory use and reduce
183 contention with parallel invocations of L<public-inbox-mda(1)>,
184 L<public-inbox-learn(1)>, and L<public-inbox-watch(1)>.
186 Increase this value on powerful systems to improve throughput at
187 the expense of memory use. The reduction of lock granularity
188 may not be noticeable on fast systems. With SSDs, values above
189 C<4m> have little benefit.
191 For L<public-inbox-v2-format(5)> inboxes, this value is
192 multiplied by the number of Xapian shards. Thus a typical v2
193 inbox with 3 shards will flush every 3 megabytes by default
194 unless parallelism is disabled via C<--sequential-shard>
197 This influences memory usage of Xapian, but it is not exact.
198 The actual memory used by Xapian and Perl has been observed
199 in excess of 10x this value.
201 This option is available in public-inbox 1.6 or later.
202 public-inbox 1.5 and earlier used the current default, C<1m>.
204 Default: 1m (one megabyte)
206 =item publicinbox.indexSequentialShard
208 For L<public-inbox-v2-format(5)> inboxes, setting this to C<true>
209 allows indexing Xapian shards in multiple passes. This speeds up
210 indexing on rotational storage with high seek latency by allowing
211 individual shards to fit into the kernel page cache.
213 Using a higher-than-normal number of C<--jobs> with
214 L<public-inbox-init(1)> may be required to ensure individual
215 shards are small enough to fit into cache.
217 Warning: interrupting C<public-inbox-index(1)> while this option
218 is in use may leave the search indices out-of-date with respect
219 to SQLite databases. WWW and IMAP users may notice incomplete
220 search results, but it is otherwise non-fatal. Using C<--reindex>
221 will bring everything back up-to-date.
223 Available in public-inbox 1.6.0 (PENDING).
225 This is ignored on L<public-inbox-v1-format(5)> inboxes.
227 Default: false, shards are indexed in parallel
229 =item publicinbox.<name>.indexSequentialShard
231 Identical to L</publicinbox.indexSequentialShard>,
232 but only affect the inbox matching E<lt>nameE<gt>.
242 Used to override the default "~/.public-inbox/config" value.
244 =item XAPIAN_FLUSH_THRESHOLD
246 The number of documents to update before committing changes to
247 disk. This environment is handled directly by Xapian, refer to
248 Xapian API documentation for more details.
250 For public-inbox 1.6 and later, use C<publicinbox.indexBatchSize>
253 Setting C<XAPIAN_FLUSH_THRESHOLD> or
254 C<publicinbox.indexBatchSize> for a large C<--reindex> may cause
255 L<public-inbox-mda(1)>, L<public-inbox-learn(1)> and
256 L<public-inbox-watch(1)> tasks to wait long and unpredictable
257 periods of time during C<--reindex>.
259 Default: none, uses C<publicinbox.indexBatchSize>
265 Occasionally, public-inbox will update it's schema version and
266 require a full index by running this command.
270 Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>
272 The mail archives are hosted at L<https://public-inbox.org/meta/>
273 and L<http://hjrcffqmbrq6wope.onion/meta/>
277 Copyright 2016-2020 all contributors L<mailto:meta@public-inbox.org>
279 License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>
283 L<Search::Xapian>, L<DBD::SQLite>