1 why public-inbox is currently implemented in Perl 5
2 ---------------------------------------------------
4 While Perl has many detractors and there's a lot not to like
5 about Perl, we use it anyways because it offers benefits not
6 (yet) available from other languages.
8 This document is somewhat inspired by https://sqlite.org/whyc.html
10 Other languages and runtimes may eventually be a possibility
11 for us, and this document can serve as our requirements list
12 for possible replacements.
14 As always, comments and corrections and additions welcome at
15 <meta@public-inbox.org>. We're not Perl experts, either.
22 Perl 5 is installed on many, if not most GNU/Linux and
23 BSD-based servers and workstations. It is likely the most
24 widely-installed programming environment that offers a
25 significant amount of POSIX functionality. Users won't
26 have to waste bandwidth or space with giant toolchains or
27 architecture-specific binaries.
29 Furthermore, Perl documentation is typically installed
30 locally as manpages, allowing users to quickly refer
31 to documentation as needed.
33 * Scripted, always editable by the end user
35 Users cannot lose access to the source code. Code written
36 entirely in any scripting language automatically satisfies
37 the GPL-2.0, making it easier to satisfy the AGPL-3.0.
39 Use of a scripting language improves auditability for
40 malicious changes. It also reduces storage and bandwidth
41 requirements for distributors, as the same scripts can be
42 shared across multiple OSes and architectures.
44 Perl's availability and the low barrier to entry of
45 scripting ensures it's easy for users to exercise their
48 * Predictable performance
50 While Perl is neither fast or memory-efficient, its
51 performance and memory use are predictable and does not
52 require GC tuning by the user.
54 public-inbox is developed for (and mostly on) old
55 hardware. Perl was fast enough to power the web of the
56 late 1990s, and any cheap VPS today has more than enough
57 RAM and CPU for handling plain-text email.
59 Low hardware requirements increases the reach of our software
60 to more users, improving centralization resistance.
64 Unlike similarly powerful scripting languages, there is no
65 forced migration to a major new version. From 2000-2020,
66 Perl had fewer breaking changes than Python or Ruby; we
67 expect that trend to continue given the inertia of Perl 5.
69 As of April 2021, the Perl Steering Committee has confirmed
70 Perl 7 will require `use v7.0' and existing code should
71 continue working unchanged:
72 https://nntp.perl.org/group/perl.perl5.porters/259789
73 <CAMvkq_SyTKZD=1=mHXwyzVYYDQb8Go0N0TuE5ZATYe_M4BCm-g@mail.gmail.com>
75 * Built for text processing
77 Our focus is plain-text mail, and Perl has many built-ins
78 optimized for text processing. It also has good support
79 for UTF-8 and legacy encodings found in old mail archives.
81 * Integration with distros and non-Perl libraries
83 Perl modules and bindings to common libraries such as
84 SQLite and Xapian are already distributed by many
85 GNU/Linux distros and BSD ports.
87 There should be no need to rely on language-specific
88 package managers such as cpan(1), those systems increase
89 the learning curve for users and systems administrators.
91 * Compactness and terseness
93 Less code generally means less bugs. We try to avoid the
94 "line noise" stereotype of some Perl codebases, yet still
95 manage to write less code than one would with
96 non-scripting languages.
98 * Performance ceiling and escape hatch
100 With optional Inline::C, we can be "as fast as C" in some
101 cases. Inline::C is widely-packaged by distros and it
102 gives us an escape hatch for dealing with missing bindings
103 or performance problems should they arise. Inline::C use
104 (as opposed to XS) also preserves the software freedom and
105 auditability benefits to all users.
107 Unfortunately, most C toolchains are big; so Inline::C
108 will always be optional for users who cannot afford the
115 * Slow startup time. Tokenization, parsing, and compilation of
116 pure Perl is not cached. Inline::C does cache its results,
119 We work around slow startup times in tests by preloading
120 code, similar to how mod_perl works for CGI.
122 * High space overhead and poor locality of small data
123 structures, including the optree. This may not be fixable
124 in Perl itself given compatibility requirements of the C API.
126 These problems are exacerbated on modern 64-bit platforms,
127 though the Linux x32 ABI offers promise.
129 * Lack of vectored I/O support (writev, sendmmsg, etc. syscalls)
130 and "newer" POSIX functions in general. APIs end up being
131 slurpy, favoring large buffers and memory copies for
132 concatenation rather than rope (aka "cord") structures.
134 * While mmap(2) is available via PerlIO::mmap, string ops
135 (m//, substr(), index(), etc.) still require memory copies
136 into userspace, negating a benefit of zero-copy.
138 * The XS/C API make it difficult to improve internals while
139 preserving compatibility.
141 * Lack of optional type checking. This may be a blessing in
142 disguise, though, as it encourages us to simplify our data
143 models and lowers cognitive overhead.
145 * SMP support is mostly limited to fork(), since many
146 libraries (including much of the standard library) are not
147 thread-safe. Even with threads.pm, sharing data between
148 interpreters within the same process is inefficient due to
149 the lack of lock-free and wait-free data structures from
150 projects such as Userspace RCU.
152 * Process spawning speed degrades as memory use increases.
153 We work around this optionally via Inline::C and vfork(2),
154 since Perl lacks an approximation of posix_spawn(3).
156 We also use `undef' and `delete' ops to free large buffers
157 as soon as we're done using them to save memory.
160 Red herrings to ignore when evaluating other runtimes
161 -----------------------------------------------------
163 These don't discount a language or runtime from being
164 being used, they're just not interesting.
166 * Lightweight threading
168 While lightweight threading implementations are
169 convenient, they tend to be significantly heavier than a
170 pure event-loop systems (or multi-threaded event-loop
173 Lightweight threading implementations have stack overhead
174 and growth typically measured in kilobytes. The userspace
175 state overhead of event-based systems is an order of
176 magnitude less, and a sunk cost regardless of concurrency