3 public-inbox-index - create and update search indices
7 public-inbox-index [OPTIONS] INBOX_DIR...
11 public-inbox-index creates and updates the search, overview and
12 NNTP article number database used by the read-only public-inbox
13 HTTP and NNTP interfaces. Currently, this requires
14 L<DBD::SQLite> and L<DBI> Perl modules. L<Search::Xapian>
15 is optional, only to support the PSGI search interface.
17 Once the initial indices are created by public-inbox-index,
18 L<public-inbox-mda(1)> and L<public-inbox-watch(1)> will
19 automatically maintain them.
21 Running this manually to update indices is only required if
22 relying on L<git-fetch(1)> to mirror an existing public-inbox;
23 or if upgrading to a new version of public-inbox using
24 the C<--reindex> option.
26 Having the overview and article number database is essential to
27 running the NNTP interface, and strongly recommended for the
28 HTTP interface as it provides thread grouping in addition to
29 normal search functionality.
37 Influences the number of Xapian indexing shards in a
38 (L<public-inbox-v2-format(5)>) inbox.
40 C<--jobs=0> is accepted as of public-inbox 1.6.0 (PENDING)
41 to disable parallel indexing.
43 If the inbox has not been indexed, C<JOBS - 1> shards
44 will be created (one job is always needed for indexing
45 the overview and article number mapping).
47 Default: the number of existing Xapian shards
51 Compacts the Xapian DBs after indexing. This is recommended
52 when using C<--reindex> to avoid running out of disk space
53 while indexing multiple inboxes.
55 While option takes a negligible amount of time compared to
56 C<--reindex>, it requires temporarily duplicating the entire
57 contents of the Xapian DB.
59 This switch may be specified twice, in which case compaction
60 happens both before and after indexing to minimize the temporal
61 footprint of the (re)indexing operation.
63 Available since public-inbox 1.4.0.
67 Forces a re-index of all messages in the inbox.
68 This can be used for in-place upgrades and bugfixes while
69 NNTP/HTTP server processes are utilizing the index. Keep in
70 mind this roughly doubles the size of the already-large
71 Xapian database. Using this with C<--compact> or running
72 L<public-inbox-compact(1)> afterwards is recommended to
75 public-inbox protects writes to various indices with
76 L<flock(2)>, so it is safe to reindex (and rethread) while
77 L<public-inbox-watch(1)>, L<public-inbox-mda(1)> or
78 L<public-inbox-learn(1)> run.
80 This does not touch the NNTP article number database.
81 It does not affect threading unless C<--rethread> is
86 Regenerate internal THREADID and message thread associations
89 This fixes some bugs in older versions of public-inbox. While
90 it is possible to use this without C<--reindex>, it makes little
93 Available in public-inbox 1.6.0 (PENDING).
97 Run L<git-gc(1)> to prune and expire reflogs if discontiguous history
98 is detected. This is intended to be used in mirrors after running
99 L<public-inbox-edit(1)> or L<public-inbox-purge(1)> to ensure data
100 is expunged from mirrors.
102 Available since public-inbox 1.2.0.
104 =item --max-size SIZE
106 Sets or overrides L</publicinbox.indexMaxSize> on a
107 per-invocation basis. See L</publicinbox.indexMaxSize>
110 Available since public-inbox 1.5.0.
112 =item --batch-size SIZE
114 Sets or overrides L</publicinbox.indexBatchSize> on a
115 per-invocation basis. See L</publicinbox.indexBatchSize>
118 Available in public-inbox 1.6.0 (PENDING).
122 Disables L<fsync(2)> and L<fdatasync(2)> operations on SQLite
123 and Xapian. This is only effective with Xapian 1.4+.
125 Available in public-inbox 1.6.0 (PENDING).
127 =item --sequential-shard
129 Sets or overrides L</publicinbox.indexSequentialShard> on a
130 per-invocation basis. See L</publicinbox.indexSequentialShard>
133 Available in public-inbox 1.6.0 (PENDING).
139 For v1 (ssoma) repositories described in L<public-inbox-v1-format>.
140 All public-inbox-specific files are contained within the
141 C<$GIT_DIR/public-inbox/> directory.
143 v2 inboxes are described in L<public-inbox-v2-format>.
149 =item publicinbox.indexMaxSize
151 Prevents indexing of messages larger than the specified size
152 value. A single suffix modifier of C<k>, C<m> or C<g> is
153 supported, thus the value of C<1m> to prevents indexing of
154 messages larger than one megabyte.
156 This is useful for avoiding memory exhaustion in mirrors.
157 This option is only available in public-inbox 1.5 or later.
161 =item publicinbox.indexBatchSize
163 Flushes changes to the filesystem and releases locks after
164 indexing the given number of bytes. The default value of C<1m>
165 (one megabyte) is low to minimize memory use and reduce
166 contention with parallel invocations of L<public-inbox-mda(1)>,
167 L<public-inbox-learn(1)>, and L<public-inbox-watch(1)>.
169 Increase this value on powerful systems to improve throughput at
170 the expense of memory use. The reduction of lock granularity
171 may not be noticeable on fast systems.
173 This option is available in public-inbox 1.6 or later.
174 public-inbox 1.5 and earlier used the current default, C<1m>.
176 For L<public-inbox-v2-format(5)> inboxes, this value is
177 multiplied by the number of Xapian shards. Thus a typical v2
178 inbox with 3 shards will flush every 3 megabytes by default.
180 Default: 1m (one megabyte)
182 =item publicinbox.indexBatchSize
184 Flushes changes to the filesystem and releases locks after
185 indexing the given number of bytes. The default value of C<1m>
186 (one megabyte) is low to minimize memory use and reduce
187 contention with parallel invocations of L<public-inbox-mda(1)>,
188 L<public-inbox-learn(1)>, and L<public-inbox-watch(1)>.
190 Increase this value on powerful systems to improve throughput at
191 the expense of memory use. The reduction of lock granularity
192 may not be noticeable on fast systems.
194 This option is available in public-inbox 1.6 or later.
195 public-inbox 1.5 and earlier used the current default, C<1m>.
197 For L<public-inbox-v2-format(5)> inboxes, this value is
198 multiplied by the number of Xapian shards. Thus a typical v2
199 inbox with 3 shards will flush every 3 megabytes by default.
201 Default: 1m (one megabyte)
203 =item publicinbox.indexSequentialShard
204 =item publicinbox.<inbox_name>.indexSequentialShard
206 For L<public-inbox-v2-format(5)> inboxes, setting this to C<true>
207 allows indexing Xapian shards in multiple passes. This speeds up
208 indexing on rotational storage with high seek latency by allowing
209 individual shards to fit into the kernel page cache.
211 Using a higher-than-normal number of C<--jobs> with
212 L<public-inbox-init(1)> may be required to ensure individual
213 shards are small enough to fit into cache.
215 Available in public-inbox 1.6.0 (PENDING).
217 This is ignored on L<public-inbox-v1-format(5)> inboxes.
219 Default: false, shards are indexed in parallel
229 Used to override the default "~/.public-inbox/config" value.
231 =item XAPIAN_FLUSH_THRESHOLD
233 The number of documents to update before committing changes to
234 disk. This environment is handled directly by Xapian, refer to
235 Xapian API documentation for more details.
237 For public-inbox 1.6 and later, use C<publicinbox.indexBatchSize>
238 instead. Setting C<XAPIAN_FLUSH_THRESHOLD> for a large C<--reindex>
239 may cause L<public-inbox-mda(1)>, L<public-inbox-learn(1)> and
240 L<public-inbox-watch(1)> tasks to wait long periods of time
243 Default: none, uses C<publicinbox.indexBatchSize>
249 Occasionally, public-inbox will update it's schema version and
250 require a full index by running this command.
254 Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>
256 The mail archives are hosted at L<https://public-inbox.org/meta/>
257 and L<http://hjrcffqmbrq6wope.onion/meta/>
261 Copyright 2016-2020 all contributors L<mailto:meta@public-inbox.org>
263 License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>
267 L<Search::Xapian>, L<DBD::SQLite>