3 public-inbox-index - create and update search indices
7 public-inbox-index [OPTIONS] INBOX_DIR...
11 public-inbox-index creates and updates the search, overview and
12 NNTP article number database used by the read-only public-inbox
13 HTTP and NNTP interfaces. Currently, this requires
14 L<DBD::SQLite> and L<DBI> Perl modules. L<Search::Xapian>
15 is optional, only to support the PSGI search interface.
17 Once the initial indices are created by public-inbox-index,
18 L<public-inbox-mda(1)> and L<public-inbox-watch(1)> will
19 automatically maintain them.
21 Running this manually to update indices is only required if
22 relying on L<git-fetch(1)> to mirror an existing public-inbox;
23 or if upgrading to a new version of public-inbox using
24 the C<--reindex> option.
26 Having the overview and article number database is essential to
27 running the NNTP interface, and strongly recommended for the
28 HTTP interface as it provides thread grouping in addition to
29 normal search functionality.
37 Compacts the Xapian DBs after indexing. This is recommended
38 when using C<--reindex> to avoid running out of disk space
39 while indexing multiple inboxes.
41 While option takes a negligible amount of time compared to
42 C<--reindex>, it requires temporarily duplicating the entire
43 contents of the Xapian DB.
45 This switch may be specified twice, in which case compaction
46 happens both before and after indexing to minimize the temporal
47 footprint of the (re)indexing operation.
49 Available since public-inbox 1.4.0.
53 Forces a re-index of all messages in the inbox.
54 This can be used for in-place upgrades and bugfixes while
55 NNTP/HTTP server processes are utilizing the index. Keep in
56 mind this roughly doubles the size of the already-large
57 Xapian database. Using this with C<--compact> or running
58 L<public-inbox-compact(1)> afterwards is recommended to
61 public-inbox protects writes to various indices with L<flock(2)>,
62 so it is safe to reindex while L<public-inbox-watch(1)>,
63 L<public-inbox-mda(1)> or L<public-inbox-learn(1)> run.
65 This does not touch the NNTP article number database or
70 Run L<git-gc(1)> to prune and expire reflogs if discontiguous history
71 is detected. This is intended to be used in mirrors after running
72 L<public-inbox-edit(1)> or L<public-inbox-purge(1)> to ensure data
73 is expunged from mirrors.
75 Available since public-inbox 1.2.0.
79 Sets or overrides L</publicinbox.indexMaxSize> on a
80 per-invocation basis. See L</publicinbox.indexMaxSize>
83 Available since public-inbox 1.5.0.
85 =item --batch-size SIZE
87 Sets or overrides L</publicinbox.indexBatchSize> on a
88 per-invocation basis. See L</publicinbox.indexBatchSize>
91 Available in public-inbox 1.6.0 (PENDING).
97 For v1 (ssoma) repositories described in L<public-inbox-v1-format>.
98 All public-inbox-specific files are contained within the
99 C<$GIT_DIR/public-inbox/> directory.
101 v2 inboxes are described in L<public-inbox-v2-format>.
107 =item publicinbox.indexMaxSize
109 Prevents indexing of messages larger than the specified size
110 value. A single suffix modifier of C<k>, C<m> or C<g> is
111 supported, thus the value of C<1m> to prevents indexing of
112 messages larger than one megabyte.
114 This is useful for avoiding memory exhaustion in mirrors.
115 This option is only available in public-inbox 1.5 or later.
119 =item publicinbox.indexBatchSize
121 Flushes changes to the filesystem and releases locks after
122 indexing the given number of bytes. The default value of C<1m>
123 (one megabyte) is low to minimize memory use and reduce
124 contention with parallel invocations of L<public-inbox-mda(1)>,
125 L<public-inbox-learn(1)>, and L<public-inbox-watch(1)>.
127 Increase this value on powerful systems to improve throughput at
128 the expense of memory use. The reduction of lock granularity
129 may not be noticeable on fast systems.
131 This option is available in public-inbox 1.6 or later.
132 public-inbox 1.5 and earlier used the current default, C<1m>.
134 For L<public-inbox-v2-format(5)> inboxes, this value is
135 multiplied by the number of Xapian shards. Thus a typical v2
136 inbox with 3 shards will flush every 3 megabytes by default.
138 Default: 1m (one megabyte)
148 Used to override the default "~/.public-inbox/config" value.
150 =item XAPIAN_FLUSH_THRESHOLD
152 The number of documents to update before committing changes to
153 disk. This environment is handled directly by Xapian, refer to
154 Xapian API documentation for more details.
156 For public-inbox 1.6 and later, use C<publicinbox.indexBatchSize>
157 instead. Setting C<XAPIAN_FLUSH_THRESHOLD> for a large C<--reindex>
158 may cause L<public-inbox-mda(1)>, L<public-inbox-learn(1)> and
159 L<public-inbox-watch(1)> tasks to wait long periods of time
162 Default: none, uses C<publicinbox.indexBatchSize>
168 Occasionally, public-inbox will update it's schema version and
169 require a full index by running this command.
173 Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>
175 The mail archives are hosted at L<https://public-inbox.org/meta/>
176 and L<http://hjrcffqmbrq6wope.onion/meta/>
180 Copyright 2016-2020 all contributors L<mailto:meta@public-inbox.org>
182 License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>
186 L<Search::Xapian>, L<DBD::SQLite>