Documentation/public-inbox-tuning.pod

   1 =head1 NAME
   2
   3 public-inbox-tuning - tuning public-inbox
   4
   5 =head1 DESCRIPTION
   6
   7 public-inbox intends to support a wide variety of hardware.  While
   8 we strive to provide the best out-of-the-box performance possible,
   9 tuning knobs are an unfortunate necessity in some cases.
  10
  11 =over 4
  12
  13 =item 1
  14
  15 New inboxes: public-inbox-init -V2
  16
  17 =item 2
  18
  19 Process spawning
  20
  21 =item 3
  22
  23 Performance on rotational hard disk drives
  24
  25 =item 4
  26
  27 Btrfs (and possibly other copy-on-write filesystems)
  28
  29 =item 5
  30
  31 Performance on solid state drives
  32
  33 =item 6
  34
  35 Read-only daemons
  36
  37 =back
  38
  39 =head2 New inboxes: public-inbox-init -V2
  40
  41 If you're starting a new inbox (and not mirroring an existing one),
  42 the L<-V2|public-inbox-v2-format(5)> requires L<DBD::SQLite>, but is
  43 orders of magnitude more scalable than the original C<-V1> format.
  44
  45 =head2 Process spawning
  46
  47 Our optional use of L<Inline::C> speeds up subprocess spawning from
  48 large daemon processes.
  49
  50 To enable L<Inline::C>, either set the C<PERL_INLINE_DIRECTORY>
  51 environment variable to point to a writable directory, or create
  52 C<~/.cache/public-inbox/inline-c> for any user(s) running
  53 public-inbox processes.
  54
  55 More (optional) L<Inline::C> use will be introduced in the future
  56 to lower memory use and improve scalability.
  57
  58 =head2 Performance on rotational hard disk drives
  59
  60 Random I/O performance is poor on rotational HDDs.  Xapian indexing
  61 performance degrades significantly as DBs grow larger than available
  62 RAM.  Attempts to parallelize random I/O on HDDs leads to pathological
  63 slowdowns as inboxes grow.
  64
  65 While C<-V2> introduced Xapian shards as a parallelization
  66 mechanism for SSDs; enabling C<publicInbox.indexSequentialShard>
  67 repurposes sharding as mechanism to reduce the kernel page cache
  68 footprint when indexing on HDDs.
  69
  70 Initializing a mirror with a high C<--jobs> count to create more
  71 shards (in C<-V2> inboxes) will keep each shard smaller and
  72 reduce its kernel page cache footprint.  Keep in mind excessive
  73 sharding imposes a performance penalty for read-only queries.
  74
  75 Users with large amounts of RAM are advised to set a large value
  76 for C<publicinbox.indexBatchSize> as documented in
  77 L<public-inbox-index(1)>.
  78
  79 C<dm-crypt> users on Linux 4.0+ are advised to try the
  80 C<--perf-same_cpu_crypt> C<--perf-submit_from_crypt_cpus>
  81 switches of L<cryptsetup(8)> to reduce I/O contention from
  82 kernel workqueue threads.
  83
  84 =head2 Btrfs (and possibly other copy-on-write filesystems)
  85
  86 L<btrfs(5)> performance degrades from fragmentation when using
  87 large databases and random writes.  The Xapian + SQLite indices
  88 used by public-inbox are no exception to that.
  89
  90 public-inbox 1.6.0+ disables copy-on-write (CoW) on Xapian and SQLite
  91 indices on btrfs to achieve acceptable performance (even on SSD).
  92 Disabling copy-on-write also disables checksumming, thus C<raid1>
  93 (or higher) configurations may be corrupt after unsafe shutdowns.
  94
  95 Fortunately, these SQLite and Xapian indices are designed to
  96 recoverable from git if missing.
  97
  98 Disabling CoW does not prevent all fragmentation.  Large values
  99 of C<publicInbox.indexBatchSize> also limit fragmentation during
 100 the initial index.
 101
 102 Avoid snapshotting subvolumes containing Xapian and/or SQLite indices.
 103 Snapshots use CoW despite our efforts to disable it, resulting
 104 in fragmentation.
 105
 106 L<filefrag(8)> can be used to monitor fragmentation, and
 107 C<btrfs filesystem defragment -fr $INBOX_DIR> may be necessary.
 108
 109 Large filesystems benefit significantly from the C<space_cache=v2>
 110 mount option documented in L<btrfs(5)>.
 111
 112 Older, non-CoW filesystems are generally work well out-of-the-box
 113 for our Xapian and SQLite indices.
 114
 115 =head2 Performance on solid state drives
 116
 117 While SSD read performance is generally good, SSD write performance
 118 degrades as the drive ages and/or gets full.  Issuing C<TRIM> commands
 119 via L<fstrim(8)> or similar is required to sustain write performance.
 120
 121 Users of the Flash-Friendly File System
 122 L<F2FS|https://en.wikipedia.org/wiki/F2FS> may benefit from
 123 optimizations found in SQLite 3.21.0+.  Benchmarks are greatly
 124 appreciated.
 125
 126 =head2 Read-only daemons
 127
 128 L<public-inbox-httpd(1)>, L<public-inbox-imapd(1)>, and
 129 L<public-inbox-nntpd(1)> are all designed for C10K (or higher)
 130 levels of concurrency from a single process.  SMP systems may
 131 use C<--worker-processes=NUM> as documented in L<public-inbox-daemon(8)>
 132 for parallelism.
 133
 134 The open file descriptor limit (C<RLIMIT_NOFILE>, C<ulimit -n> in L<sh(1)>,
 135 C<LimitNOFILE=> in L<systemd.exec(5)>) may need to be raised to
 136 accommodate many concurrent clients.
 137
 138 Transport Layer Security (IMAPS, NNTPS, or via STARTTLS) significantly
 139 increases memory use of client sockets, sure to account for that in
 140 capacity planning.
 141
 142 =head1 CONTACT
 143
 144 Feedback encouraged via plain-text mail to L<mailto:meta@public-inbox.org>
 145
 146 Information for *BSDs and non-traditional filesystems especially
 147 welcome.
 148
 149 Our archives are hosted at L<https://public-inbox.org/meta/>,
 150 L<http://hjrcffqmbrq6wope.onion/meta/>, and other places
 151
 152 =head1 COPYRIGHT
 153
 154 Copyright 2020-2021 all contributors L<mailto:meta@public-inbox.org>
 155
 156 License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>