Eric Wong [Fri, 12 Aug 2022 09:14:48 +0000 (09:14 +0000)]
pop3: quiet warning for cached active statements
Setting the $if_active parameter of ->prepare_cached to `1'
seemed to be the best option many years ago, so it's probably
the best option going forward when caching prepared statements.
Fixes: cab36ebd00ca72f8 ("pop3: remove untouched rows on QUIT/disconnect")
Eric Wong [Thu, 11 Aug 2022 20:13:09 +0000 (20:13 +0000)]
examples: consolidate systemd socket examples
systemd.socket(5) files can actually contain multiple listen
sockets, so shave down inode overhead and simplify config
file management by consolidating all applicable ports into
a single file for each daemon.
Eric Wong [Thu, 11 Aug 2022 20:13:08 +0000 (20:13 +0000)]
doc: drop ancient Apache and WEBrick examples
Having old, unmaintained docs for other HTTP servers is likely
harmful at this point. public-inbox-httpd is specifically
designed to handle git repos on slow storage and stream giant
mbox.gz files fairly to slow clients.
Eric Wong [Thu, 11 Aug 2022 20:33:39 +0000 (20:33 +0000)]
devel/syscall-list: support non-Linux, show sizeof(pid_t)
While I have no intention of using syscall numbers for
non-Linux, sizeof(pid_t) was useful for OpenBSD. And maybe
Linux can have real competition from other OSes with stable
syscall numbers someday.
Eric Wong [Thu, 11 Aug 2022 20:00:21 +0000 (20:00 +0000)]
pop3d: enable native fcntl locks on all *BSDs
...as we've already done for the simpler case of mbox locking in lei.
I've just confirmed NetBSD and OpenBSD share the same "struct flock"
with FreeBSD, and assume DragonflyBSD is the same. sizeof(pid_t) == 4
in all places I've checked, and it's unlikely we'll need 64-bit
pid_t any time soon...
Eric Wong [Thu, 11 Aug 2022 20:00:20 +0000 (20:00 +0000)]
www: inbox: favor "pop3://" over "pop://"
curl only supports "pop3://" and "pop3s://", despite RFC 2384
existing for "pop://". AFAIK, there's no RFCs for "pop3://"
and "pop3s://", but please let us know if there are.
In any case, real-world cases like curl are more relevant.
Eric Wong [Wed, 10 Aug 2022 15:58:01 +0000 (15:58 +0000)]
daemon: rely on $SIG{__WARN__} for error output
warn/carp usage is unavoidable given Perl itself and standard
libraries, so just rely on localized $SIG{__WARN__} from 60d262483a4d6ddf (daemon: use per-listener SIG{__WARN__} callbacks, 2022-08-08)
for all error reporting.
While we're in the area, make some of the error handling more
consistent between IMAP/NNTP/POP3.
Eric Wong [Wed, 10 Aug 2022 07:40:31 +0000 (07:40 +0000)]
www_text: add AUTH=ANONYMOUS to IMAP URLs
While the ';' requires escaping on the command-line, the
presence of ";AUTH=ANONYMOUS" communicates clearly that
anonymous access is supported in accordance to RFC 4505.
Eric Wong [Wed, 10 Aug 2022 06:00:53 +0000 (06:00 +0000)]
pop3: remove untouched rows on QUIT/disconnect
Some POP3 clients may connect and never retrieve messages nor
trigger deletes. In that case, save some storage by removing
unused rows from the `deletes' and `users' tables.
Eric Wong [Mon, 8 Aug 2022 23:53:10 +0000 (23:53 +0000)]
imap: mailboxes list across listeners
Since IMAP mailbox lists are tied to the PublicInbox::Config
object, we can share them the same way the config object is
shared when an -imapd or -netd instance has multiple listeners.
This ought to reduce memory use and startup time when binding
multiple sockets which share a common config file.
Eric Wong [Mon, 8 Aug 2022 23:53:09 +0000 (23:53 +0000)]
daemon: cleanup internal data structures
This avoids dangling {''} entries in $xnetd and
%tls_opt hashes. Furthermore, we can safely undef
%tls_opt once it's associated with each $xnetd object.
Eric Wong [Mon, 8 Aug 2022 23:53:08 +0000 (23:53 +0000)]
daemon: use per-listener SIG{__WARN__} callbacks
This allows "-l $ADDRESS?err=/path/to/err.log to isolate normal
warn() (and carp()) messages for a particular listen address to
track down errors more easily.
Eric Wong [Mon, 8 Aug 2022 23:53:07 +0000 (23:53 +0000)]
daemon: use default address + well-known ports for scheme
This ensures the "bound $URL" diagnostic message at startup
always shows the URL scheme handled if not relying on socket
inheritance.
This also avoids duplicate/unused data structures when binding
sockets ourselves, as bound socket names can expand from short
names to longer names (e.g. "0:119" => "0.0.0.0:119").
Eric Wong [Mon, 8 Aug 2022 23:16:47 +0000 (23:16 +0000)]
imap: prioritize AUTH=ANONYMOUS clients
...by deprioritizing clients using a username + password.
As IMAP provides AUTH=ANONYMOUS for designating anonymous
access, we'll rely on it as a heuristic for favoring "good"
clients. Clients using a username + password seem to (more
often than not) be malicious and looking for info which doesn't
belong in public inboxes.
This copies the technique used by WWW + -httpd to deprioritize
expensive mbox.gz downloads.
Eric Wong [Mon, 8 Aug 2022 23:16:46 +0000 (23:16 +0000)]
imap: only give AUTH=ANONYMOUS clients prefetch
Looking at IMAP traffic on public-inbox.org, it seems there is a
fair amount of traffic coming from malicious clients assuming
the IMAP server is compromised and searching for private
information. Since AUTH=ANONYMOUS clients are more likely to
be legitimate clients looking for publicly-archived mail,
give them priority.
Eric Wong [Fri, 5 Aug 2022 08:29:54 +0000 (08:29 +0000)]
daemon: dedupe PublicInbox::Config objects by pathname
This means all Inbox, Git, Over, Msgmap, Search objects also get
deduplicated if they belong to the same config file, reducing
memory and FD usage. This helps save memory and improve cache
hit rates in -netd setups where NNTP, IMAP, HTTP, and POP3
servers run in the same process.
InboxIdle was the only bit which needed adjustment, but there
may be other bugs lurking despite all tests passing.
Eric Wong [Thu, 4 Aug 2022 20:08:21 +0000 (20:08 +0000)]
www: gzip_filter: avoid errors after ->write failure
->zflush must return a string to its caller, not undef.
Additionally, {http_out} may be deleted on ->write if ->close
recurses.
This should fix the following errors:
Use of uninitialized value $_[1] in string eq at PublicInbox/HTTP.pm line 211.
E: Can't call method "close" on an undefined value at GzipFilter.pm line 167.
Eric Wong [Thu, 4 Aug 2022 08:17:02 +0000 (08:17 +0000)]
feed: avoid unnecessary map loop in non-over path
We can bless objects while doing the initial insertion to avoid
extra the extra map iteration and temporary array(s). Fewer ops
means memory savings for the likely case of ->over users, too.
Eric Wong [Thu, 4 Aug 2022 08:17:01 +0000 (08:17 +0000)]
imap: ensure_slices_exist: drop needless map and array
We can reduce ops and temporary objects here by folding the
stringification into the `for' loop and push directly into the
{mailboxlist} array; relying on autovivification to turn it into
a noop for the initial population.
Eric Wong [Thu, 4 Aug 2022 07:23:49 +0000 (07:23 +0000)]
TODO: remove done items, adjust/add/abandon some
public-inbox-pop3d (and -netd) gives us POP3 support, and
it seems to work. Proxy support can come independently,
probably after JMAP.
public-inbox-netd provides the multi-protocol "super server"
which allows code memory savings. Work is ongoing to further
reduce memory use...
Automatically updating on TLS cert and key changes on
inotify/EVFILT_VNODE won't be done, since (IMHO) there's too
much risk of inadvertent updates on incomplete changes.
My same train-of-thought applies to auto-reloading on config
file changes: an admin may save a file halfway through a
multi-step change and auto-reloading can be too surprising and
break things.
I don't think lei+FUSE will be as portable or useful as a
local IMAP server (and maybe JMAP, eventually); but r/w IMAP
support would be nice..
Finally, git SHA-256 repo support will need to be taken into
account.
Eric Wong [Thu, 4 Aug 2022 06:27:39 +0000 (06:27 +0000)]
daemon: handle per-listener options on inherited, well-known ports
We must not clobber already-parsed per-listener options when
handling inherited sockets which are well-known. Unfortunately,
this isn't easy to test in a non-intrusive way for regular
users.
Eric Wong [Wed, 3 Aug 2022 20:03:56 +0000 (20:03 +0000)]
nntp: speed up group listings via ->ALL->misc
By taking advantage of the new ART_MIN/ART_MAX value in MiscIdx,
we can avoid the overhead of opening per-inbox msgmap DB files.
The result gives us a ~40 speedup with 50K newgroups.
Eric Wong [Wed, 3 Aug 2022 08:06:03 +0000 (08:06 +0000)]
daemon: reload TLS certs and keys on SIGHUP
This allows new TLS certificates to be loaded for new clients
without having to timeout nor drop existing clients with
established connections made with the old certs. This should
benefit users with admins who expire certificates frequently (as
encouraged by Let's Encrypt).
Socket ->write failures are expected and common for TCP traffic,
especially if it's facing unreliable remote connections. So
just bail out silently if our {gz} field was already clobbered
during the small bit of recursion we hit on ->write failures
from async responses.
This ought to fix some GzipFilter::zflush errors (via $forward
->close from PublicInbox::HTTP) I've been noticing on
deployments running -netd. I'm still unsure as to why I hadn't
seen them before, but it might've only been ignorance on my
part...
Eric Wong [Mon, 1 Aug 2022 21:24:47 +0000 (21:24 +0000)]
daemon: share FDs for identical log paths
We rely on the %logs hash for SIGUSR1 log reopening. Without this sharing,
some FDs would be hidden inside its respective {HTTP,IMAP,POP3}D
object and not reopened on USR2
Eric Wong [Mon, 1 Aug 2022 21:24:42 +0000 (21:24 +0000)]
httpd: make internals slightly more generic
This brings the HTTP server closer to the IMAP/NNTP/POP3
implementations and eliminates package-wide globals in
PublicInbox::HTTPD. The end goal is to be able to host
completely different PSGI applications on different listen
ports.
Eric Wong [Sat, 30 Jul 2022 09:38:24 +0000 (09:38 +0000)]
solver: avoid deprecation warnings in git 2.36.0+
git deprecated core.fsyncObjectFiles in favor of core.fsync
with 2.36.0+, while GIT_TEST_FSYNC was added in 2.35.0. So
use the environment variable since it's been supported slightly
longer than the new configuration knob.
Eric Wong [Fri, 22 Jul 2022 20:18:09 +0000 (20:18 +0000)]
www: drop --subject from "git send-email" instructions
Apparently, --subject doesn't work[1] with "git send-email" in
this context. So drop the CLI arg and add a note to tell the
user to set a "Subject:" line in their response body, instead.
[1] I'm not sure if --subject ever worked as I thought it would,
or if it's a regression. In either case, there are current
versions of git where it doesn't, so just tell users to use
the currently supported method.
Eric Wong [Sat, 23 Jul 2022 15:52:09 +0000 (15:52 +0000)]
add xt/mem-nntpd-tls maintainer test
This ensures memory usage is reasonable when DEFLATE and TLS are
enabled. It's also our only coverage for NNTP COMPRESS since
Net::NNTP has yet to implement compression support:
Eric Wong [Sat, 23 Jul 2022 06:12:16 +0000 (06:12 +0000)]
pop3: reduce memory use while generating the mailbox cache
While the cache itself is relatively compact for 50K messages,
generating it was inefficient due to our schema and Over.pm APIs
being designed for NNTP. While we won't change our schema for
now, we can choose better DBI APIs to use and limit our ephemeral
memory use.
This amounts to a 60% reduction in memory usage and a 5-10%
speedup against org.kernel.vger.git.0:
Eric Wong [Sat, 23 Jul 2022 04:41:54 +0000 (04:41 +0000)]
nntp: resolve inboxes immediately on group listings
This prevents potential races between SIGHUP config reloads
while gigantic group listings are streaming, allowing us to
avoid many invalidation checks.
This also reduces send(2) syscalls and avoid Perl internal pad
allocations in a few places where it's not beneficial. There
might be a slight (0.5%) speedup, but I'm not sure if that's
down to system noise, power/thermal management, or other users
on my VM.
Eric Wong [Sat, 23 Jul 2022 04:41:51 +0000 (04:41 +0000)]
nntp: listgroup_range_i: remove useless `map' op
No need to iterate through the array twice; and this even seems
a hair faster than what I got with commit 726d6e71aee5d974
(nntp: small speed up for multi-line responses, 2020-12-04)
Eric Wong [Thu, 21 Jul 2022 05:36:12 +0000 (05:36 +0000)]
pop3: drop File::FcntlLock requirement for FreeBSD and Linux
I know Linux has a stable ABI for this, and FreeBSD seems to,
too (*BSDs don't have stable syscall numbers, though).
I suspect this is safe enough for all *BSDs.
This is stricter than the MboxLock one since we use exact byte
ranges with these locks.
Eric Wong [Wed, 20 Jul 2022 22:57:07 +0000 (22:57 +0000)]
www: note "x=m" and "t=1" (mis)use for GET requests
We require "x=m" (requests for mboxes) to be POST requests to
avoid unnecessary traffic from crawlers. "t=1" only collapses
threads in the summary view, which isn't normally accessible
from <form> elements.
This also fixes the missing "[summary|nested]" element when
"x=m" is used.
Eric Wong [Wed, 20 Jul 2022 09:24:11 +0000 (09:24 +0000)]
pop3: TOP requests do not expire messages
RFC 2449 only documents "EXPIRE 0" behavior for RETR requests
which fetch the whole message. TOP requests only fetch
the headers and top $N lines of the body, so it's probably
harmful for deletions to be triggered in those cases.
Eric Wong [Wed, 20 Jul 2022 09:24:09 +0000 (09:24 +0000)]
public-inbox-pop3d - a mostly read-only POP3 server
Old account expiry has not been implemented, but it seems to
work well with both mpop(1) and getmail(1). The strictness of
mpop was particularly helpful in ironing out bugs in our
implementation of (dreaded) message sequence numbers.
"EXPIRE 0" (RFC 2449) can theoretically save numerous "DELE"
commands, but that's untested by real-world clients. mpop
supports PIPELINING which is effective in hiding latency,
and the core networking functionality is already well-tested
from our NNTP and IMAP implementations.
Configuration requires "publicinbox.pop3state" to point to
a directory writable by the otherwise read-only daemon.
See public-inbox-pop3d(1) manpage for more usage details.
Eric Wong [Wed, 20 Jul 2022 01:22:04 +0000 (01:22 +0000)]
netd: load modules for well-known ports
When inheriting well-known ports from systemd (or similar),
we can auto-load the proper *D.pm file based on the port number
without requiring command-line args.
load_mod also gets fixed to use its argument, instead of implicit
$1 since that won't work for our well-known.
Eric Wong [Tue, 19 Jul 2022 22:42:53 +0000 (22:42 +0000)]
lei note-event: inline note_event_arm_done
This was a single-caller sub since 47d4e53734820b4e
(lei_mail_sync: rely on flock(2), avoid IPC, 2021-09-18)
and unlikely to be used further, so inline it and save
a few KB of memory.
Eric Wong [Tue, 19 Jul 2022 22:42:52 +0000 (22:42 +0000)]
lei: avoid deadlock on inotify/EVFILT_VNODE wakeups
Enqueuing "note-event" requests from the DS event loop must
not wait on workers being able to drain the queue quickly
enough. Thus we make the SOCK_SEQPACKET writes nonblocking
and rely on the lei-daemon event loop to enqueue writes.
This is a unique problem for "note-event" since it reuses
workers in between commands, while most lei commands currently
fork off new workers.
Eric Wong [Mon, 20 Jun 2022 19:27:30 +0000 (19:27 +0000)]
search: do not index base-85 binary patches
Base-85 binary patches generated by git lead to many false
positives, so skip over gibberish words which may occur in them.
To avoid regressions in search results, continue to allow
searching for exact size matches (via "literal $SIZE") and the
phrase "GIT binary patch" for the mere presence of a binary
patch.
Eric Wong [Mon, 20 Jun 2022 19:27:29 +0000 (19:27 +0000)]
search: support "patchid:" prefix (git patch-id --stable)
This allows easy searching via patch-id from a git commit.
Currently, abbreviations are not supported, and it seems
needless to support them since AFAIK (git) doesn't generate
nor resolve abbreviated patch-ids anywhere.
t/spawn: Find invalid PID to try to join its process group
In the container used to build packages of the GNU Guix distribution, PID 1
runs as the same user as the test so this spawn that should fail actually
succeeds.
Fix the problem by going through different PIDs and picking one that
either doesn't exist or we aren't allowed to signal.