3 public-inbox-tuning - tuning public-inbox
7 public-inbox intends to support a wide variety of hardware. While
8 we strive to provide the best out-of-the-box performance possible,
9 tuning knobs are an unfortunate necessity in some cases.
15 New inboxes: public-inbox-init -V2
19 Optional Inline::C use
23 Performance on rotational hard disk drives
27 Btrfs (and possibly other copy-on-write filesystems)
31 Performance on solid state drives
43 Scalability to many inboxes
47 =head2 New inboxes: public-inbox-init -V2
49 If you're starting a new inbox (and not mirroring an existing one),
50 the L<-V2|public-inbox-v2-format(5)> requires L<DBD::SQLite>, but is
51 orders of magnitude more scalable than the original C<-V1> format.
53 =head2 Optional Inline::C use
55 Our optional use of L<Inline::C> speeds up subprocess spawning from
56 large daemon processes.
58 To enable L<Inline::C>, either set the C<PERL_INLINE_DIRECTORY>
59 environment variable to point to a writable directory, or create
60 C<~/.cache/public-inbox/inline-c> for any user(s) running
61 public-inbox processes.
63 If libgit2 development files are installed and L<Inline::C>
64 is enabled (described above), per-inbox C<git cat-file --batch>
65 processes are replaced with a single L<perl(1)> process running
66 C<PublicInbox::Gcf2::loop> in read-only daemons. libgit2 use
67 will be available in public-inbox 1.7.0+
69 More (optional) L<Inline::C> use will be introduced in the future
70 to lower memory use and improve scalability.
72 Note: L<Inline::C> is required for L<lei(1)>, but not public-inbox-*
74 =head2 Performance on rotational hard disk drives
76 Random I/O performance is poor on rotational HDDs. Xapian indexing
77 performance degrades significantly as DBs grow larger than available
78 RAM. Attempts to parallelize random I/O on HDDs leads to pathological
79 slowdowns as inboxes grow.
81 While C<-V2> introduced Xapian shards as a parallelization
82 mechanism for SSDs; enabling C<publicInbox.indexSequentialShard>
83 repurposes sharding as mechanism to reduce the kernel page cache
84 footprint when indexing on HDDs.
86 Initializing a mirror with a high C<--jobs> count to create more
87 shards (in C<-V2> inboxes) will keep each shard smaller and
88 reduce its kernel page cache footprint. Keep in mind excessive
89 sharding imposes a performance penalty for read-only queries.
91 Users with large amounts of RAM are advised to set a large value
92 for C<publicinbox.indexBatchSize> as documented in
93 L<public-inbox-index(1)>.
95 C<dm-crypt> users on Linux 4.0+ are advised to try the
96 C<--perf-same_cpu_crypt> C<--perf-submit_from_crypt_cpus>
97 switches of L<cryptsetup(8)> to reduce I/O contention from
98 kernel workqueue threads.
100 =head2 Btrfs (and possibly other copy-on-write filesystems)
102 L<btrfs(5)> performance degrades from fragmentation when using
103 large databases and random writes. The Xapian + SQLite indices
104 used by public-inbox are no exception to that.
106 public-inbox 1.6.0+ disables copy-on-write (CoW) on Xapian and SQLite
107 indices on btrfs to achieve acceptable performance (even on SSD).
108 Disabling copy-on-write also disables checksumming, thus C<raid1>
109 (or higher) configurations may be corrupt after unsafe shutdowns.
111 Fortunately, these SQLite and Xapian indices are designed to
112 recoverable from git if missing.
114 Disabling CoW does not prevent all fragmentation. Large values
115 of C<publicInbox.indexBatchSize> also limit fragmentation during
118 Avoid snapshotting subvolumes containing Xapian and/or SQLite indices.
119 Snapshots use CoW despite our efforts to disable it, resulting
122 L<filefrag(8)> can be used to monitor fragmentation, and
123 C<btrfs filesystem defragment -fr $INBOX_DIR> may be necessary.
125 Large filesystems benefit significantly from the C<space_cache=v2>
126 mount option documented in L<btrfs(5)>.
128 Older, non-CoW filesystems are generally work well out-of-the-box
129 for our Xapian and SQLite indices.
131 =head2 Performance on solid state drives
133 While SSD read performance is generally good, SSD write performance
134 degrades as the drive ages and/or gets full. Issuing C<TRIM> commands
135 via L<fstrim(8)> or similar is required to sustain write performance.
137 Users of the Flash-Friendly File System
138 L<F2FS|https://en.wikipedia.org/wiki/F2FS> may benefit from
139 optimizations found in SQLite 3.21.0+. Benchmarks are greatly
142 =head2 Read-only daemons
144 L<public-inbox-httpd(1)>, L<public-inbox-imapd(1)>, and
145 L<public-inbox-nntpd(1)> are all designed for C10K (or higher)
146 levels of concurrency from a single process. SMP systems may
147 use C<--worker-processes=NUM> as documented in L<public-inbox-daemon(8)>
150 The open file descriptor limit (C<RLIMIT_NOFILE>, C<ulimit -n> in L<sh(1)>,
151 C<LimitNOFILE=> in L<systemd.exec(5)>) may need to be raised to
152 accommodate many concurrent clients.
154 Transport Layer Security (IMAPS, NNTPS, or via STARTTLS) significantly
155 increases memory use of client sockets, sure to account for that in
158 =head2 Other OS tuning knobs
160 Linux users: the C<sys.vm.max_map_count> sysctl may need to be increased if
161 handling thousands of inboxes (with L<public-inbox-extindex(1)>) to avoid
162 out-of-memory errors from git.
164 Other OSes may have similar tuning knobs (patches appreciated).
166 =head2 Scalability to many inboxes
168 L<public-inbox-extindex(1)> allows any number of public-inboxes
169 to share the same Xapian indices.
171 git 2.33+ startup time is orders-of-magnitude faster and uses
172 less memory when dealing with thousands of alternates required
173 for thousands of inboxes with L<public-inbox-extindex(1)>.
175 Frequent packing (via L<git-gc(1)>) both improves performance
176 and reduces the need to increase C<sys.vm.max_map_count>.
180 Feedback encouraged via plain-text mail to L<mailto:meta@public-inbox.org>
182 Information for *BSDs and non-traditional filesystems especially
185 Our archives are hosted at L<https://public-inbox.org/meta/>,
186 L<http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>, and other places
190 Copyright all contributors L<mailto:meta@public-inbox.org>
192 License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>