Eric Wong [Sun, 21 Aug 2022 22:21:00 +0000 (22:21 +0000)]
www: support `+' in inbox names
`+' already seemed to works for IMAP mailboxes and NNTP newsgroup
names and git-config doesn't complain, either. So allow it as
the path components of WWW URLs so projects like `libstdc++' can
use it.
Eric Wong [Sat, 20 Aug 2022 08:01:35 +0000 (08:01 +0000)]
www: mbox* drop unneeded {base_url} memoizations
That field is not needed since List-* and Archived-At headers
are no longer appended as of commit: 1bf653ad139bf7bb (nntp+www: drop List-* and Archived-At headers, 2020-12-10)
Eric Wong [Sat, 20 Aug 2022 08:01:33 +0000 (08:01 +0000)]
view: do not show pagination footer for small inboxes
For new public inboxes with few messages, the dead pagination
footer is a worthless and confusing waste of space: "page: \n";
without `next' or `prev' links for users to follow.
Eric Wong [Wed, 17 Aug 2022 09:33:15 +0000 (09:33 +0000)]
lei inspect: less scary exception for invalid "docid:" inspect
It still says "Exception:", but doesn't pointlessly print out
the line number and file of the exception when it's a data/input
problem, and not a code problem on our end.
Eric Wong [Tue, 16 Aug 2022 03:44:03 +0000 (03:44 +0000)]
lei: do not wait for sto->done on disconnected EOF
lei-daemon (the top-level daemon process) should not have
synchronous waits, and this was causing a deadlock with
interrupted commands. There may still be a bug lurking in
lei/store despite this fix, though. I originally thought commit fd261b9e65674505 (lei_store_err: use level-trigger for error pipe, 2022-08-15)
was sufficient, but at least this change is needed, as well.
Eric Wong [Fri, 12 Aug 2022 22:09:19 +0000 (22:09 +0000)]
pop3: speed up STAT slightly (~1%)
We can calculate the total size of the mailbox while generating
the cache, which allows us to iterate the cache again to
calculate the size of the mailbox slice. While we're in the
area, simplify the loop and avoid needlessly updating the `$beg'
variable.
This adds a small amount of constant time overhead to DELE,
however that is amortized across multiple requests for fairness.
Eric Wong [Fri, 12 Aug 2022 09:14:48 +0000 (09:14 +0000)]
pop3: quiet warning for cached active statements
Setting the $if_active parameter of ->prepare_cached to `1'
seemed to be the best option many years ago, so it's probably
the best option going forward when caching prepared statements.
Fixes: cab36ebd00ca72f8 ("pop3: remove untouched rows on QUIT/disconnect")
Eric Wong [Thu, 11 Aug 2022 20:13:09 +0000 (20:13 +0000)]
examples: consolidate systemd socket examples
systemd.socket(5) files can actually contain multiple listen
sockets, so shave down inode overhead and simplify config
file management by consolidating all applicable ports into
a single file for each daemon.
Eric Wong [Thu, 11 Aug 2022 20:13:08 +0000 (20:13 +0000)]
doc: drop ancient Apache and WEBrick examples
Having old, unmaintained docs for other HTTP servers is likely
harmful at this point. public-inbox-httpd is specifically
designed to handle git repos on slow storage and stream giant
mbox.gz files fairly to slow clients.
Eric Wong [Thu, 11 Aug 2022 20:33:39 +0000 (20:33 +0000)]
devel/syscall-list: support non-Linux, show sizeof(pid_t)
While I have no intention of using syscall numbers for
non-Linux, sizeof(pid_t) was useful for OpenBSD. And maybe
Linux can have real competition from other OSes with stable
syscall numbers someday.
Eric Wong [Thu, 11 Aug 2022 20:00:21 +0000 (20:00 +0000)]
pop3d: enable native fcntl locks on all *BSDs
...as we've already done for the simpler case of mbox locking in lei.
I've just confirmed NetBSD and OpenBSD share the same "struct flock"
with FreeBSD, and assume DragonflyBSD is the same. sizeof(pid_t) == 4
in all places I've checked, and it's unlikely we'll need 64-bit
pid_t any time soon...
Eric Wong [Thu, 11 Aug 2022 20:00:20 +0000 (20:00 +0000)]
www: inbox: favor "pop3://" over "pop://"
curl only supports "pop3://" and "pop3s://", despite RFC 2384
existing for "pop://". AFAIK, there's no RFCs for "pop3://"
and "pop3s://", but please let us know if there are.
In any case, real-world cases like curl are more relevant.
Eric Wong [Wed, 10 Aug 2022 15:58:01 +0000 (15:58 +0000)]
daemon: rely on $SIG{__WARN__} for error output
warn/carp usage is unavoidable given Perl itself and standard
libraries, so just rely on localized $SIG{__WARN__} from 60d262483a4d6ddf (daemon: use per-listener SIG{__WARN__} callbacks, 2022-08-08)
for all error reporting.
While we're in the area, make some of the error handling more
consistent between IMAP/NNTP/POP3.
Eric Wong [Wed, 10 Aug 2022 07:40:31 +0000 (07:40 +0000)]
www_text: add AUTH=ANONYMOUS to IMAP URLs
While the ';' requires escaping on the command-line, the
presence of ";AUTH=ANONYMOUS" communicates clearly that
anonymous access is supported in accordance to RFC 4505.
Eric Wong [Wed, 10 Aug 2022 06:00:53 +0000 (06:00 +0000)]
pop3: remove untouched rows on QUIT/disconnect
Some POP3 clients may connect and never retrieve messages nor
trigger deletes. In that case, save some storage by removing
unused rows from the `deletes' and `users' tables.
Eric Wong [Mon, 8 Aug 2022 23:53:10 +0000 (23:53 +0000)]
imap: mailboxes list across listeners
Since IMAP mailbox lists are tied to the PublicInbox::Config
object, we can share them the same way the config object is
shared when an -imapd or -netd instance has multiple listeners.
This ought to reduce memory use and startup time when binding
multiple sockets which share a common config file.
Eric Wong [Mon, 8 Aug 2022 23:53:09 +0000 (23:53 +0000)]
daemon: cleanup internal data structures
This avoids dangling {''} entries in $xnetd and
%tls_opt hashes. Furthermore, we can safely undef
%tls_opt once it's associated with each $xnetd object.
Eric Wong [Mon, 8 Aug 2022 23:53:08 +0000 (23:53 +0000)]
daemon: use per-listener SIG{__WARN__} callbacks
This allows "-l $ADDRESS?err=/path/to/err.log to isolate normal
warn() (and carp()) messages for a particular listen address to
track down errors more easily.
Eric Wong [Mon, 8 Aug 2022 23:53:07 +0000 (23:53 +0000)]
daemon: use default address + well-known ports for scheme
This ensures the "bound $URL" diagnostic message at startup
always shows the URL scheme handled if not relying on socket
inheritance.
This also avoids duplicate/unused data structures when binding
sockets ourselves, as bound socket names can expand from short
names to longer names (e.g. "0:119" => "0.0.0.0:119").
Eric Wong [Mon, 8 Aug 2022 23:16:47 +0000 (23:16 +0000)]
imap: prioritize AUTH=ANONYMOUS clients
...by deprioritizing clients using a username + password.
As IMAP provides AUTH=ANONYMOUS for designating anonymous
access, we'll rely on it as a heuristic for favoring "good"
clients. Clients using a username + password seem to (more
often than not) be malicious and looking for info which doesn't
belong in public inboxes.
This copies the technique used by WWW + -httpd to deprioritize
expensive mbox.gz downloads.
Eric Wong [Mon, 8 Aug 2022 23:16:46 +0000 (23:16 +0000)]
imap: only give AUTH=ANONYMOUS clients prefetch
Looking at IMAP traffic on public-inbox.org, it seems there is a
fair amount of traffic coming from malicious clients assuming
the IMAP server is compromised and searching for private
information. Since AUTH=ANONYMOUS clients are more likely to
be legitimate clients looking for publicly-archived mail,
give them priority.
Eric Wong [Fri, 5 Aug 2022 08:29:54 +0000 (08:29 +0000)]
daemon: dedupe PublicInbox::Config objects by pathname
This means all Inbox, Git, Over, Msgmap, Search objects also get
deduplicated if they belong to the same config file, reducing
memory and FD usage. This helps save memory and improve cache
hit rates in -netd setups where NNTP, IMAP, HTTP, and POP3
servers run in the same process.
InboxIdle was the only bit which needed adjustment, but there
may be other bugs lurking despite all tests passing.
Eric Wong [Thu, 4 Aug 2022 20:08:21 +0000 (20:08 +0000)]
www: gzip_filter: avoid errors after ->write failure
->zflush must return a string to its caller, not undef.
Additionally, {http_out} may be deleted on ->write if ->close
recurses.
This should fix the following errors:
Use of uninitialized value $_[1] in string eq at PublicInbox/HTTP.pm line 211.
E: Can't call method "close" on an undefined value at GzipFilter.pm line 167.
Eric Wong [Thu, 4 Aug 2022 08:17:02 +0000 (08:17 +0000)]
feed: avoid unnecessary map loop in non-over path
We can bless objects while doing the initial insertion to avoid
extra the extra map iteration and temporary array(s). Fewer ops
means memory savings for the likely case of ->over users, too.
Eric Wong [Thu, 4 Aug 2022 08:17:01 +0000 (08:17 +0000)]
imap: ensure_slices_exist: drop needless map and array
We can reduce ops and temporary objects here by folding the
stringification into the `for' loop and push directly into the
{mailboxlist} array; relying on autovivification to turn it into
a noop for the initial population.
Eric Wong [Thu, 4 Aug 2022 07:23:49 +0000 (07:23 +0000)]
TODO: remove done items, adjust/add/abandon some
public-inbox-pop3d (and -netd) gives us POP3 support, and
it seems to work. Proxy support can come independently,
probably after JMAP.
public-inbox-netd provides the multi-protocol "super server"
which allows code memory savings. Work is ongoing to further
reduce memory use...
Automatically updating on TLS cert and key changes on
inotify/EVFILT_VNODE won't be done, since (IMHO) there's too
much risk of inadvertent updates on incomplete changes.
My same train-of-thought applies to auto-reloading on config
file changes: an admin may save a file halfway through a
multi-step change and auto-reloading can be too surprising and
break things.
I don't think lei+FUSE will be as portable or useful as a
local IMAP server (and maybe JMAP, eventually); but r/w IMAP
support would be nice..
Finally, git SHA-256 repo support will need to be taken into
account.
Eric Wong [Thu, 4 Aug 2022 06:27:39 +0000 (06:27 +0000)]
daemon: handle per-listener options on inherited, well-known ports
We must not clobber already-parsed per-listener options when
handling inherited sockets which are well-known. Unfortunately,
this isn't easy to test in a non-intrusive way for regular
users.
Eric Wong [Wed, 3 Aug 2022 20:03:56 +0000 (20:03 +0000)]
nntp: speed up group listings via ->ALL->misc
By taking advantage of the new ART_MIN/ART_MAX value in MiscIdx,
we can avoid the overhead of opening per-inbox msgmap DB files.
The result gives us a ~40 speedup with 50K newgroups.
Eric Wong [Wed, 3 Aug 2022 08:06:03 +0000 (08:06 +0000)]
daemon: reload TLS certs and keys on SIGHUP
This allows new TLS certificates to be loaded for new clients
without having to timeout nor drop existing clients with
established connections made with the old certs. This should
benefit users with admins who expire certificates frequently (as
encouraged by Let's Encrypt).
Socket ->write failures are expected and common for TCP traffic,
especially if it's facing unreliable remote connections. So
just bail out silently if our {gz} field was already clobbered
during the small bit of recursion we hit on ->write failures
from async responses.
This ought to fix some GzipFilter::zflush errors (via $forward
->close from PublicInbox::HTTP) I've been noticing on
deployments running -netd. I'm still unsure as to why I hadn't
seen them before, but it might've only been ignorance on my
part...
Eric Wong [Mon, 1 Aug 2022 21:24:47 +0000 (21:24 +0000)]
daemon: share FDs for identical log paths
We rely on the %logs hash for SIGUSR1 log reopening. Without this sharing,
some FDs would be hidden inside its respective {HTTP,IMAP,POP3}D
object and not reopened on USR2
Eric Wong [Mon, 1 Aug 2022 21:24:42 +0000 (21:24 +0000)]
httpd: make internals slightly more generic
This brings the HTTP server closer to the IMAP/NNTP/POP3
implementations and eliminates package-wide globals in
PublicInbox::HTTPD. The end goal is to be able to host
completely different PSGI applications on different listen
ports.
Eric Wong [Sat, 30 Jul 2022 09:38:24 +0000 (09:38 +0000)]
solver: avoid deprecation warnings in git 2.36.0+
git deprecated core.fsyncObjectFiles in favor of core.fsync
with 2.36.0+, while GIT_TEST_FSYNC was added in 2.35.0. So
use the environment variable since it's been supported slightly
longer than the new configuration knob.
Eric Wong [Fri, 22 Jul 2022 20:18:09 +0000 (20:18 +0000)]
www: drop --subject from "git send-email" instructions
Apparently, --subject doesn't work[1] with "git send-email" in
this context. So drop the CLI arg and add a note to tell the
user to set a "Subject:" line in their response body, instead.
[1] I'm not sure if --subject ever worked as I thought it would,
or if it's a regression. In either case, there are current
versions of git where it doesn't, so just tell users to use
the currently supported method.
Eric Wong [Sat, 23 Jul 2022 15:52:09 +0000 (15:52 +0000)]
add xt/mem-nntpd-tls maintainer test
This ensures memory usage is reasonable when DEFLATE and TLS are
enabled. It's also our only coverage for NNTP COMPRESS since
Net::NNTP has yet to implement compression support:
Eric Wong [Sat, 23 Jul 2022 06:12:16 +0000 (06:12 +0000)]
pop3: reduce memory use while generating the mailbox cache
While the cache itself is relatively compact for 50K messages,
generating it was inefficient due to our schema and Over.pm APIs
being designed for NNTP. While we won't change our schema for
now, we can choose better DBI APIs to use and limit our ephemeral
memory use.
This amounts to a 60% reduction in memory usage and a 5-10%
speedup against org.kernel.vger.git.0:
Eric Wong [Sat, 23 Jul 2022 04:41:54 +0000 (04:41 +0000)]
nntp: resolve inboxes immediately on group listings
This prevents potential races between SIGHUP config reloads
while gigantic group listings are streaming, allowing us to
avoid many invalidation checks.
This also reduces send(2) syscalls and avoid Perl internal pad
allocations in a few places where it's not beneficial. There
might be a slight (0.5%) speedup, but I'm not sure if that's
down to system noise, power/thermal management, or other users
on my VM.
Eric Wong [Sat, 23 Jul 2022 04:41:51 +0000 (04:41 +0000)]
nntp: listgroup_range_i: remove useless `map' op
No need to iterate through the array twice; and this even seems
a hair faster than what I got with commit 726d6e71aee5d974
(nntp: small speed up for multi-line responses, 2020-12-04)
Eric Wong [Thu, 21 Jul 2022 05:36:12 +0000 (05:36 +0000)]
pop3: drop File::FcntlLock requirement for FreeBSD and Linux
I know Linux has a stable ABI for this, and FreeBSD seems to,
too (*BSDs don't have stable syscall numbers, though).
I suspect this is safe enough for all *BSDs.
This is stricter than the MboxLock one since we use exact byte
ranges with these locks.
Eric Wong [Wed, 20 Jul 2022 22:57:07 +0000 (22:57 +0000)]
www: note "x=m" and "t=1" (mis)use for GET requests
We require "x=m" (requests for mboxes) to be POST requests to
avoid unnecessary traffic from crawlers. "t=1" only collapses
threads in the summary view, which isn't normally accessible
from <form> elements.
This also fixes the missing "[summary|nested]" element when
"x=m" is used.