3 public-inbox-index - create and update search indices
7 public-inbox-index [OPTIONS] INBOX_DIR...
9 public-inbox-index [OPTIONS] --all
13 public-inbox-index creates and updates the search, overview and
14 NNTP article number database used by the read-only public-inbox
15 HTTP and NNTP interfaces. Currently, this requires
16 L<DBD::SQLite> and L<DBI> Perl modules. L<Search::Xapian>
17 is optional, only to support the PSGI search interface.
19 Once the initial indices are created by public-inbox-index,
20 L<public-inbox-mda(1)> and L<public-inbox-watch(1)> will
21 automatically maintain them.
23 Running this manually to update indices is only required if
24 relying on L<git-fetch(1)> to mirror an existing public-inbox;
25 or if upgrading to a new version of public-inbox using
26 the C<--reindex> option.
28 Having the overview and article number database is essential to
29 running the NNTP interface, and strongly recommended for the
30 HTTP interface as it provides thread grouping in addition to
31 normal search functionality.
41 Influences the number of Xapian indexing shards in a
42 (L<public-inbox-v2-format(5)>) inbox.
44 See L<public-inbox-init(1)/--jobs> for a full description
47 C<--jobs=0> is accepted as of public-inbox 1.6.0
48 to disable parallel indexing regardless of the number of
51 If the inbox has not been indexed or initialized, C<JOBS - 1>
52 shards will be created (one job is always needed for indexing
53 the overview and article number mapping).
55 Default: the number of existing Xapian shards
61 Compacts the Xapian DBs after indexing. This is recommended
62 when using C<--reindex> to avoid running out of disk space
63 while indexing multiple inboxes.
65 While option takes a negligible amount of time compared to
66 C<--reindex>, it requires temporarily duplicating the entire
67 contents of the Xapian DB.
69 This switch may be specified twice, in which case compaction
70 happens both before and after indexing to minimize the temporal
71 footprint of the (re)indexing operation.
73 Available since public-inbox 1.4.0.
77 Forces a re-index of all messages in the inbox.
78 This can be used for in-place upgrades and bugfixes while
79 NNTP/HTTP server processes are utilizing the index. Keep in
80 mind this roughly doubles the size of the already-large
81 Xapian database. Using this with C<--compact> or running
82 L<public-inbox-compact(1)> afterwards is recommended to
85 public-inbox protects writes to various indices with
86 L<flock(2)>, so it is safe to reindex (and rethread) while
87 L<public-inbox-watch(1)>, L<public-inbox-mda(1)> or
88 L<public-inbox-learn(1)> run.
90 This does not touch the NNTP article number database.
91 It does not affect threading unless C<--rethread> is
96 Index all inboxes configured in ~/.public-inbox/config.
97 This is an alternative to specifying individual inboxes directories
102 Regenerate internal THREADID and message thread associations
105 This fixes some bugs in older versions of public-inbox. While
106 it is possible to use this without C<--reindex>, it makes little
109 Available in public-inbox 1.6.0+.
113 Run L<git-gc(1)> to prune and expire reflogs if discontiguous history
114 is detected. This is intended to be used in mirrors after running
115 L<public-inbox-edit(1)> or L<public-inbox-purge(1)> to ensure data
116 is expunged from mirrors.
118 Available since public-inbox 1.2.0.
120 =item --max-size SIZE
122 Sets or overrides L</publicinbox.indexMaxSize> on a
123 per-invocation basis. See L</publicinbox.indexMaxSize>
126 Available since public-inbox 1.5.0.
128 =item --batch-size SIZE
130 Sets or overrides L</publicinbox.indexBatchSize> on a
131 per-invocation basis. See L</publicinbox.indexBatchSize>
134 When using rotational storage but abundant RAM, using a large
135 value (e.g. C<500m>) with C<--sequential-shard> can
136 significantly speed up and reduce fragmentation during the
137 initial index and full C<--reindex> invocations (but not
138 incremental updates).
140 Available in public-inbox 1.6.0+.
144 Disables L<fsync(2)> and L<fdatasync(2)> operations on SQLite
145 and Xapian. This is only effective with Xapian 1.4+. This is
146 primarily intended for systems with low RAM and the small
147 (default) C<--batch-size=1m>. Users of large C<--batch-size>
148 may even find disabling L<fdatasync(2)> causes too much dirty
149 data to accumulate, resulting on latency spikes from writeback.
151 Available in public-inbox 1.6.0+.
153 =item --sequential-shard
155 Sets or overrides L</publicinbox.indexSequentialShard> on a
156 per-invocation basis. See L</publicinbox.indexSequentialShard>
159 Available in public-inbox 1.6.0+.
163 Stop storing document data in Xapian on an existing inbox.
165 See L<public-inbox-init(1)/--skip-docdata> for description and caveats.
167 Available in public-inbox 1.6.0+.
171 =item --update-extindex=EXTINDEX
173 Update the given external index (L<public-inbox-extindex-format(5)>.
174 Either the configured section name (e.g. C<all>) or a directory name
177 Defaults to C<all> if C<[extindex "all"]> is configured,
178 otherwise no external indices are updated.
180 May be specified multiple times in rare cases where multiple
181 external indices are configured.
183 =item --no-update-extindex
185 Do not update the C<all> external index by default. This negates
186 all uses of C<-E> / C<--update-extindex=> on the command-line.
188 =item --since=DATESTRING
190 =item --after=DATESTRING
192 =item --until=DATESTRING
194 =item --before=DATESTRING
196 Passed directly to L<git-log(1)> to limit changes for C<--reindex>
202 For v1 (ssoma) repositories described in L<public-inbox-v1-format(5)>.
203 All public-inbox-specific files are contained within the
204 C<$GIT_DIR/public-inbox/> directory.
206 v2 inboxes are described in L<public-inbox-v2-format(5)>.
212 =item publicinbox.indexMaxSize
214 Prevents indexing of messages larger than the specified size
215 value. A single suffix modifier of C<k>, C<m> or C<g> is
216 supported, thus the value of C<1m> to prevents indexing of
217 messages larger than one megabyte.
219 This is useful for avoiding memory exhaustion in mirrors
220 via git. It does not prevent L<public-inbox-mda(1)> or
221 L<public-inbox-watch(1)> from importing (and indexing)
224 This option is only available in public-inbox 1.5 or later.
228 =item publicinbox.indexBatchSize
230 Flushes changes to the filesystem and releases locks after
231 indexing the given number of bytes. The default value of C<1m>
232 (one megabyte) is low to minimize memory use and reduce
233 contention with parallel invocations of L<public-inbox-mda(1)>,
234 L<public-inbox-learn(1)>, and L<public-inbox-watch(1)>.
236 Increase this value on powerful systems to improve throughput at
237 the expense of memory use. The reduction of lock granularity
238 may not be noticeable on fast systems. With SSDs, values above
239 C<4m> have little benefit.
241 For L<public-inbox-v2-format(5)> inboxes, this value is
242 multiplied by the number of Xapian shards. Thus a typical v2
243 inbox with 3 shards will flush every 3 megabytes by default
244 unless parallelism is disabled via C<--sequential-shard>
247 This influences memory usage of Xapian, but it is not exact.
248 The actual memory used by Xapian and Perl has been observed
249 in excess of 10x this value.
251 This option is available in public-inbox 1.6 or later.
252 public-inbox 1.5 and earlier used the current default, C<1m>.
254 Default: 1m (one megabyte)
256 =item publicinbox.indexSequentialShard
258 For L<public-inbox-v2-format(5)> inboxes, setting this to C<true>
259 allows indexing Xapian shards in multiple passes. This speeds up
260 indexing on rotational storage with high seek latency by allowing
261 individual shards to fit into the kernel page cache.
263 Using a higher-than-normal number of C<--jobs> with
264 L<public-inbox-init(1)> may be required to ensure individual
265 shards are small enough to fit into cache.
267 Warning: interrupting C<public-inbox-index(1)> while this option
268 is in use may leave the search indices out-of-date with respect
269 to SQLite databases. WWW and IMAP users may notice incomplete
270 search results, but it is otherwise non-fatal. Using C<--reindex>
271 will bring everything back up-to-date.
273 Available in public-inbox 1.6.0+.
275 This is ignored on L<public-inbox-v1-format(5)> inboxes.
277 Default: false, shards are indexed in parallel
279 =item publicinbox.<name>.indexSequentialShard
281 Identical to L</publicinbox.indexSequentialShard>,
282 but only affect the inbox matching E<lt>nameE<gt>.
292 Used to override the default "~/.public-inbox/config" value.
294 =item XAPIAN_FLUSH_THRESHOLD
296 The number of documents to update before committing changes to
297 disk. This environment is handled directly by Xapian, refer to
298 Xapian API documentation for more details.
300 For public-inbox 1.6 and later, use C<publicinbox.indexBatchSize>
303 Setting C<XAPIAN_FLUSH_THRESHOLD> or
304 C<publicinbox.indexBatchSize> for a large C<--reindex> may cause
305 L<public-inbox-mda(1)>, L<public-inbox-learn(1)> and
306 L<public-inbox-watch(1)> tasks to wait long and unpredictable
307 periods of time during C<--reindex>.
309 Default: none, uses C<publicinbox.indexBatchSize>
315 Occasionally, public-inbox will update it's schema version and
316 require a full index by running this command.
320 Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>
322 The mail archives are hosted at L<https://public-inbox.org/meta/> and
323 L<http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>
327 Copyright 2016-2021 all contributors L<mailto:meta@public-inbox.org>
329 License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>
333 L<Search::Xapian>, L<DBD::SQLite>, L<public-inbox-extindex-format(5)>