Documentation/technical/whyperl.txt

   1 why public-inbox is currently implemented in Perl 5
   2 ---------------------------------------------------
   3
   4 While Perl has many detractors and there's a lot not to like
   5 about Perl, we use it anyways because it offers benefits not
   6 (yet) available from other languages.
   7
   8 This document is somewhat inspired by https://sqlite.org/whyc.html
   9
  10 Other languages and runtimes may eventually be a possibility
  11 for us, and this document can serve as our requirements list
  12 for possible replacements.
  13
  14 As always, comments and corrections and additions welcome at
  15 <meta@public-inbox.org>.  We're not Perl experts, either.
  16
  17 Good Things
  18 -----------
  19
  20 * Availability
  21
  22   Perl 5 is installed on many, if not most GNU/Linux and
  23   BSD-based servers and workstations.  It is likely the most
  24   widely-installed programming environment that offers a
  25   significant amount of POSIX functionality.  Users won't
  26   have to waste bandwidth or space with giant toolchains or
  27   architecture-specific binaries.
  28
  29   Furthermore, Perl documentation is typically installed
  30   locally as manpages, allowing users to quickly refer
  31   to documentation as needed.
  32
  33 * Scripted, always editable by the end user
  34
  35   Users cannot lose access to the source code.  Code written
  36   entirely in any scripting language automatically satisfies
  37   the GPL-2.0, making it easier to satisfy the AGPL-3.0.
  38
  39   Use of a scripting language improves auditability for
  40   malicious changes.  It also reduces storage and bandwidth
  41   requirements for distributors, as the same scripts can be
  42   shared across multiple OSes and architectures.
  43
  44   Perl's availability and the low barrier to entry of
  45   scripting ensures it's easy for users to exercise their
  46   software freedom.
  47
  48 * Predictable performance
  49
  50   While Perl is neither fast or memory-efficient, its
  51   performance and memory use are predictable and does not
  52   require GC tuning by the user.
  53
  54   public-inbox is developed for (and mostly on) old
  55   hardware.  Perl was fast enough to power the web of the
  56   late 1990s, and any cheap VPS today has more than enough
  57   RAM and CPU for handling plain-text email.
  58
  59   Low hardware requirements increases the reach of our software
  60   to more users, improving centralization resistance.
  61
  62 * Compatibility
  63
  64   Unlike similarly powerful scripting languages, there is no
  65   forced migration to a major new version.  From 2000-2020,
  66   Perl had fewer breaking changes than Python or Ruby; we
  67   expect that trend to continue given the inertia of Perl 5.
  68
  69   As of April 2021, the Perl Steering Committee has confirmed
  70   Perl 7 will require `use v7.0' and existing code should
  71   continue working unchanged:
  72   https://nntp.perl.org/group/perl.perl5.porters/259789
  73   <CAMvkq_SyTKZD=1=mHXwyzVYYDQb8Go0N0TuE5ZATYe_M4BCm-g@mail.gmail.com>
  74
  75 * Built for text processing
  76
  77   Our focus is plain-text mail, and Perl has many built-ins
  78   optimized for text processing.  It also has good support
  79   for UTF-8 and legacy encodings found in old mail archives.
  80
  81 * Integration with distros and non-Perl libraries
  82
  83   Perl modules and bindings to common libraries such as
  84   SQLite and Xapian are already distributed by many
  85   GNU/Linux distros and BSD ports.
  86
  87   There should be no need to rely on language-specific
  88   package managers such as cpan(1), those systems increase
  89   the learning curve for users and systems administrators.
  90
  91 * Compactness and terseness
  92
  93   Less code generally means less bugs.  We try to avoid the
  94   "line noise" stereotype of some Perl codebases, yet still
  95   manage to write less code than one would with
  96   non-scripting languages.
  97
  98 * Performance ceiling and escape hatch
  99
 100   With optional Inline::C, we can be "as fast as C" in some
 101   cases.  Inline::C is widely-packaged by distros and it
 102   gives us an escape hatch for dealing with missing bindings
 103   or performance problems should they arise.  Inline::C use
 104   (as opposed to XS) also preserves the software freedom and
 105   auditability benefits to all users.
 106
 107   Unfortunately, most C toolchains are big; so Inline::C
 108   will always be optional for users who cannot afford the
 109   bandwidth or space.
 110
 111
 112 Bad Things
 113 ----------
 114
 115 * Slow startup time.  Tokenization, parsing, and compilation of
 116   pure Perl is not cached.  Inline::C does cache its results,
 117   however.
 118
 119   We work around slow startup times in tests by preloading
 120   code, similar to how mod_perl works for CGI.
 121
 122 * High space overhead and poor locality of small data
 123   structures, including the optree.  This may not be fixable
 124   in Perl itself given compatibility requirements of the C API.
 125
 126   These problems are exacerbated on modern 64-bit platforms,
 127   though the Linux x32 ABI offers promise.
 128
 129 * Lack of vectored I/O support (writev, sendmmsg, etc. syscalls)
 130   and "newer" POSIX functions in general.  APIs end up being
 131   slurpy, favoring large buffers and memory copies for
 132   concatenation rather than rope (aka "cord") structures.
 133
 134 * While mmap(2) is available via PerlIO::mmap, string ops
 135   (m//, substr(), index(), etc.) still require memory copies
 136   into userspace, negating a benefit of zero-copy.
 137
 138 * The XS/C API make it difficult to improve internals while
 139   preserving compatibility.
 140
 141 * Lack of optional type checking.  This may be a blessing in
 142   disguise, though, as it encourages us to simplify our data
 143   models and lowers cognitive overhead.
 144
 145 * SMP support is mostly limited to fork(), since many
 146   libraries (including much of the standard library) are not
 147   thread-safe.  Even with threads.pm, sharing data between
 148   interpreters within the same process is inefficient due to
 149   the lack of lock-free and wait-free data structures from
 150   projects such as Userspace RCU.
 151
 152 * Process spawning speed degrades as memory use increases.
 153   We work around this optionally via Inline::C and vfork(2),
 154   since Perl lacks an approximation of posix_spawn(3).
 155
 156   We also use `undef' and `delete' ops to free large buffers
 157   as soon as we're done using them to save memory.
 158
 159
 160 Red herrings to ignore when evaluating other runtimes
 161 -----------------------------------------------------
 162
 163 These don't discount a language or runtime from being
 164 being used, they're just not interesting.
 165
 166 * Lightweight threading
 167
 168   While lightweight threading implementations are
 169   convenient, they tend to be significantly heavier than a
 170   pure event-loop systems (or multi-threaded event-loop
 171   systems)
 172
 173   Lightweight threading implementations have stack overhead
 174   and growth typically measured in kilobytes.  The userspace
 175   state overhead of event-based systems is an order of
 176   magnitude less, and a sunk cost regardless of concurrency
 177   model.