Eric Wong [Fri, 2 Sep 2022 10:11:48 +0000 (10:11 +0000)]
solver: do not count duplicates in patch count
We're considering duplicate patches from cross-posted lists
identical, so don't double-count them when displaying the
"applying [X/Y]" message since (successful) duplicates get
skipped.
Eric Wong [Fri, 2 Sep 2022 09:12:54 +0000 (09:12 +0000)]
extmsg: shorten partial Message-IDs minimum to 14
Gnus seems to start Message-IDs with 10 random characters
followed by ".fsf@$DOMAIN". In case of mis-linkification or
mis-selection from stopping at the `@', ensuring the first 14
characters are accepted as a search parameter for the truncated
Message-ID improves usability.
Eric Wong [Fri, 2 Sep 2022 09:10:54 +0000 (09:10 +0000)]
www: omit [thread overview] link for unindexed v1
Unindexed v1 inboxes do not have the thread overview skeleton
at the bottom of /$MSGID/ pages, so do not link to it.
And for rare messages without a Date: header (or any headers!),
this also ensures the [thread overview] is shown regardless.
Eric Wong [Fri, 2 Sep 2022 09:10:53 +0000 (09:10 +0000)]
www: fix top nav bar for unindexed v1 inboxes
For /$INBOX/$MSGID/ pages, we need to point all nav bar links
../ regardless of whether ->over exists. I've also verified
this doesn't affect /$INBOX/new.html at all.
Eric Wong [Mon, 29 Aug 2022 09:26:47 +0000 (09:26 +0000)]
viewvcs: show "blob $OID" rather than "$OID blob"
This is more consistent with the rest of the output where it's
"$TYPE $OID" rather than "$OID $TYPE". The former also allows
easy copy+pasting into commands for both "git cat-file blob $OID"
and "lei blob $OID".
Eric Wong [Mon, 29 Aug 2022 09:26:43 +0000 (09:26 +0000)]
solver: early make hints detection more robust
Hints fields can change, so we'll use a simple boolean rather
than checking a static count. We'll also short-circuit out
reliably regardless of hints when a full OID is given.
Eric Wong [Mon, 29 Aug 2022 09:26:41 +0000 (09:26 +0000)]
www: atom: fix "changed" href to nowhere
The HTML generated for the Atom feed doesn't have the footer
of /T/ and /t/ HTML-only views, so just make "changed" in
the diffstat go directly to the permalink #related anchor.
Fixes: 66512e177390 ("view: generate query in single-message and commit views")
Eric Wong [Mon, 29 Aug 2022 09:26:39 +0000 (09:26 +0000)]
view: /$INBOX/: show "messages from $old to $new"
With the ViewVCS commit view using /$INBOX/?t=YYYYMMDDhhmmss-
links, the use of `t=' may not be immediately obvious to a
reader and confuse them into thinking the inbox hasn't been
updated in a while.
So add a header to the top of the page whenever the `t=' query
parameter is used.
And kill a couple of redundant variable assignments while we're
at it.
Eric Wong [Mon, 29 Aug 2022 09:26:38 +0000 (09:26 +0000)]
treewide: ditch inbox->recent method
It's a needless wrapper, nowadays. Originally, ->over was added
on experimental basis to optimize for /$INBOX/ where Xapian
->search is slower on gigantic (LKML-sized) inboxes.
Nowadays with extindex, ->over is here to stay given NNTP and
IMAP both benefit from it. So reduce the interpreter stack
overhead and just access ->over directly.
lxs->recent was never used outside of tests, anyways.
And while we're in the area, avoid needlessly bumping the
refcount of $ctx->{ibx} in View::paginate_recent.
Eric Wong [Mon, 29 Aug 2022 09:26:37 +0000 (09:26 +0000)]
view: speed up /$INBOX/ landing page by 0.5-1.0%
Array lookups and extra arithmetic in Perl is slower than
bumping the internal array offset inside the interpreter.
Fwiw, using: my ($level, $subj) = splice(@extra, 0, 2)
did not result in a performance improvement.
Eric Wong [Mon, 29 Aug 2022 09:26:34 +0000 (09:26 +0000)]
viewvcs: use array for highlighted blob display
This can avoid at least one expensive copy for displaying
large blobs with syntax highlighting.
However, we cannot blindly change everything to arrays, either:
the cost of invoking Compress::Raw::Zlib->deflate must be taken
into account. Joining short strings via `.=', `.', `join' or
interpolation is typically faster since it avoids ->deflate
method calls (and non-magic perlops are the fastest dispatches
in Perl).
Eric Wong [Mon, 29 Aug 2022 09:26:31 +0000 (09:26 +0000)]
viewvcs: share File::Temp::Dir with solver
This allows reusing inodes for /$COMMIT_OID/s/ requests.
We'll also replace `log' with `lh' in the field name to
avoid confusion with the `log' perlop.
Eric Wong [Sun, 28 Aug 2022 03:59:50 +0000 (03:59 +0000)]
linkify: avoid digits and dashes in placeholders
The `highlight' module seems to highlight every digit in
YAML (and possibly other) source files. This causes problems
in linkify_2 which replaces the placeholders with proper URIs.
I suspect `-' and other punctuation characters will cause
similar problems, so we must stick to [A-Za-z].
Thus transliterate 0-9 to A-J in the hex key to ensure highlight
doesn't see digit characters, and rename the prefix to be
project-name independent.
Unindexed v1 inboxes were leaving $smsg objects unpopulated when
using public-inbox-httpd (but not generic PSGI servers) and
causing missing HTML content and uninitialized value warnings.
Our existing tests for unindexed v1 inboxes only assumed generic
PSGI servers and synchronous blob retrieval. Due to changes
several years ago to make git blob retrieval async for slow
storage using public-inbox-httpd, our tests were insufficient to
detect this regression.
So ensure $smsg->populate runs in a few places and rewrite
t/plack.t to test against both generic PSGI and -httpd
implementations.
Fortunately, unindexed v1 inboxes are uncommon, and this
bug was only (finally) discovered while developing other
features.
For ensuring we can test (and not blindly follow) redirects with
-httpd, we now provide our own LWP::UserAgent (used internally
by Plack::Test::ExternalServer) with redirect following
disabled to P:T:ES::test_psgi.
Eric Wong [Fri, 26 Aug 2022 03:20:20 +0000 (03:20 +0000)]
view: add "this message" link above dfblob: textarea
When jumping to #related from /T/ or /t/ views, it could be
disconcerting to not have the current message as context.
So add a "this message" link back up to #t as we have always
done with the reply instructions.
Eric Wong [Tue, 23 Aug 2022 08:32:01 +0000 (08:32 +0000)]
ibx_async_cat: access ->{git} directly
This will enable callers to pass non-Inbox-ish hashrefs as the
arg. This benefits existing Inbox-ish objects, too, as it
avoids a slow method dispatch for both ExtSearch and Inbox.
Eric Wong [Tue, 23 Aug 2022 08:31:58 +0000 (08:31 +0000)]
viewvcs: remove patchid line from commit header
I'm considering dropping this entirely since dfpre:, dfpost:
dfn:, and s: can be just as powerful, if not more. patchid: is
inaccurate if either non-standard diff generation options are
used (e.g. -W or -U6); or if a MUAs mangle whitespace.
We'll keep patchid: at the top search input box for now, but the
textarea at the bottom (and possibly another textarea for a more
exact match) is probably more useful and flexible.
Eric Wong [Tue, 23 Aug 2022 08:31:57 +0000 (08:31 +0000)]
viewdiff: linkify diffstats for non-format-patch emails
Some folks unfortunately use "git diff --stat -p" to generate
patches. These messages lack the /^---$/ line and causing
diffstats to not get linkified properly. We now treat the
/^---$/ as optional and rely on the presence of file lines with
/ \| / proceeding a /\d+ files? changed,/ line.
Eric Wong [Tue, 23 Aug 2022 08:31:56 +0000 (08:31 +0000)]
view: generate query in single-message and commit views
The dfblob: search prefix is probably under-utilized, but is
extremely powerful IMHO. To make it easier-to-use, add a search
textarea with it prefilled with values for the existing patch
message. This allows users to easily run a query for all
patches which alter or result in either pre or post-image
blobs in the current patch.
Behavior changes are as follows: "changed" in the diffstat
jumps to the bottom of the message. For /T/ and /t/, it
goes to the "related" anchor which is just above the reply
instructions in the single-message view. For the single
message view, it'll jump to the textarea search form.
I initially wanted to use a normal `<a href=' link, but
figured the textarea is advantageous for two reasons:
1) users should be able to edit the query before submitting
2) crawlers are less likely to waste CPU/disk on forms
It's probably too noisy to add this directly to the /T/ and /t/
views, but seems like a good place to put above the reply
instructions in the single message view.
Note that the queries used by the /$COMMIT_OID/s/ view is
subtly different than the /$MSGID/ view since git will lengthen
its abbreviations over time, while emails are immutable.
I tried adding dfn: (filename) and s: (subject) support, but
couldn't come up with cases where it really made sense for
/$MSGID/. /$COMMIT_OID/s/ may benefit from it, since patchid:
could be flaky due to non-standard diff generation options.
Eric Wong [Mon, 22 Aug 2022 02:33:46 +0000 (02:33 +0000)]
viewvcs: start improving display of git commits
For non-merge git commits, we already have ViewDiff for
displaying patch emails, we can reuse it to display non-merge
git commits.
AFAIK, this is the first web-based git repository viewer
to display the output of "git-patch-id --stable".
It currently fills in the search form box with "patchid:",
but maybe it'll do more than that.
More work will be done to support bidirectional mapping
of commits to emails in the future.
Eric Wong [Mon, 22 Aug 2022 02:33:44 +0000 (02:33 +0000)]
qspawn: improve error reporting and handling
First off, avoid potential circular references (via {qx_arg}) by
dropping the {-qsp} field from $ctx and SolverGit objects.
Instead, we only share a reference to an optional error buffer
string {qsp_err}.
We'll also attempt to call qspawn.wcb if qx_cb fails, and warn
in more places w/o checking for $env since we now rely on warn()
instead of $env->{'psgi.errors'}.
This makes error handling simpler and safer in future callers.
Eric Wong [Sun, 21 Aug 2022 22:21:00 +0000 (22:21 +0000)]
www: support `+' in inbox names
`+' already seemed to works for IMAP mailboxes and NNTP newsgroup
names and git-config doesn't complain, either. So allow it as
the path components of WWW URLs so projects like `libstdc++' can
use it.
Eric Wong [Sat, 20 Aug 2022 08:01:35 +0000 (08:01 +0000)]
www: mbox* drop unneeded {base_url} memoizations
That field is not needed since List-* and Archived-At headers
are no longer appended as of commit: 1bf653ad139bf7bb (nntp+www: drop List-* and Archived-At headers, 2020-12-10)
Eric Wong [Sat, 20 Aug 2022 08:01:33 +0000 (08:01 +0000)]
view: do not show pagination footer for small inboxes
For new public inboxes with few messages, the dead pagination
footer is a worthless and confusing waste of space: "page: \n";
without `next' or `prev' links for users to follow.
Eric Wong [Wed, 17 Aug 2022 09:33:15 +0000 (09:33 +0000)]
lei inspect: less scary exception for invalid "docid:" inspect
It still says "Exception:", but doesn't pointlessly print out
the line number and file of the exception when it's a data/input
problem, and not a code problem on our end.
Eric Wong [Tue, 16 Aug 2022 03:44:03 +0000 (03:44 +0000)]
lei: do not wait for sto->done on disconnected EOF
lei-daemon (the top-level daemon process) should not have
synchronous waits, and this was causing a deadlock with
interrupted commands. There may still be a bug lurking in
lei/store despite this fix, though. I originally thought commit fd261b9e65674505 (lei_store_err: use level-trigger for error pipe, 2022-08-15)
was sufficient, but at least this change is needed, as well.
Eric Wong [Fri, 12 Aug 2022 22:09:19 +0000 (22:09 +0000)]
pop3: speed up STAT slightly (~1%)
We can calculate the total size of the mailbox while generating
the cache, which allows us to iterate the cache again to
calculate the size of the mailbox slice. While we're in the
area, simplify the loop and avoid needlessly updating the `$beg'
variable.
This adds a small amount of constant time overhead to DELE,
however that is amortized across multiple requests for fairness.
Eric Wong [Fri, 12 Aug 2022 09:14:48 +0000 (09:14 +0000)]
pop3: quiet warning for cached active statements
Setting the $if_active parameter of ->prepare_cached to `1'
seemed to be the best option many years ago, so it's probably
the best option going forward when caching prepared statements.
Fixes: cab36ebd00ca72f8 ("pop3: remove untouched rows on QUIT/disconnect")
Eric Wong [Thu, 11 Aug 2022 20:13:09 +0000 (20:13 +0000)]
examples: consolidate systemd socket examples
systemd.socket(5) files can actually contain multiple listen
sockets, so shave down inode overhead and simplify config
file management by consolidating all applicable ports into
a single file for each daemon.
Eric Wong [Thu, 11 Aug 2022 20:13:08 +0000 (20:13 +0000)]
doc: drop ancient Apache and WEBrick examples
Having old, unmaintained docs for other HTTP servers is likely
harmful at this point. public-inbox-httpd is specifically
designed to handle git repos on slow storage and stream giant
mbox.gz files fairly to slow clients.
Eric Wong [Thu, 11 Aug 2022 20:33:39 +0000 (20:33 +0000)]
devel/syscall-list: support non-Linux, show sizeof(pid_t)
While I have no intention of using syscall numbers for
non-Linux, sizeof(pid_t) was useful for OpenBSD. And maybe
Linux can have real competition from other OSes with stable
syscall numbers someday.
Eric Wong [Thu, 11 Aug 2022 20:00:21 +0000 (20:00 +0000)]
pop3d: enable native fcntl locks on all *BSDs
...as we've already done for the simpler case of mbox locking in lei.
I've just confirmed NetBSD and OpenBSD share the same "struct flock"
with FreeBSD, and assume DragonflyBSD is the same. sizeof(pid_t) == 4
in all places I've checked, and it's unlikely we'll need 64-bit
pid_t any time soon...
Eric Wong [Thu, 11 Aug 2022 20:00:20 +0000 (20:00 +0000)]
www: inbox: favor "pop3://" over "pop://"
curl only supports "pop3://" and "pop3s://", despite RFC 2384
existing for "pop://". AFAIK, there's no RFCs for "pop3://"
and "pop3s://", but please let us know if there are.
In any case, real-world cases like curl are more relevant.
Eric Wong [Wed, 10 Aug 2022 15:58:01 +0000 (15:58 +0000)]
daemon: rely on $SIG{__WARN__} for error output
warn/carp usage is unavoidable given Perl itself and standard
libraries, so just rely on localized $SIG{__WARN__} from 60d262483a4d6ddf (daemon: use per-listener SIG{__WARN__} callbacks, 2022-08-08)
for all error reporting.
While we're in the area, make some of the error handling more
consistent between IMAP/NNTP/POP3.
Eric Wong [Wed, 10 Aug 2022 07:40:31 +0000 (07:40 +0000)]
www_text: add AUTH=ANONYMOUS to IMAP URLs
While the ';' requires escaping on the command-line, the
presence of ";AUTH=ANONYMOUS" communicates clearly that
anonymous access is supported in accordance to RFC 4505.
Eric Wong [Wed, 10 Aug 2022 06:00:53 +0000 (06:00 +0000)]
pop3: remove untouched rows on QUIT/disconnect
Some POP3 clients may connect and never retrieve messages nor
trigger deletes. In that case, save some storage by removing
unused rows from the `deletes' and `users' tables.
Eric Wong [Mon, 8 Aug 2022 23:53:10 +0000 (23:53 +0000)]
imap: mailboxes list across listeners
Since IMAP mailbox lists are tied to the PublicInbox::Config
object, we can share them the same way the config object is
shared when an -imapd or -netd instance has multiple listeners.
This ought to reduce memory use and startup time when binding
multiple sockets which share a common config file.
Eric Wong [Mon, 8 Aug 2022 23:53:09 +0000 (23:53 +0000)]
daemon: cleanup internal data structures
This avoids dangling {''} entries in $xnetd and
%tls_opt hashes. Furthermore, we can safely undef
%tls_opt once it's associated with each $xnetd object.
Eric Wong [Mon, 8 Aug 2022 23:53:08 +0000 (23:53 +0000)]
daemon: use per-listener SIG{__WARN__} callbacks
This allows "-l $ADDRESS?err=/path/to/err.log to isolate normal
warn() (and carp()) messages for a particular listen address to
track down errors more easily.
Eric Wong [Mon, 8 Aug 2022 23:53:07 +0000 (23:53 +0000)]
daemon: use default address + well-known ports for scheme
This ensures the "bound $URL" diagnostic message at startup
always shows the URL scheme handled if not relying on socket
inheritance.
This also avoids duplicate/unused data structures when binding
sockets ourselves, as bound socket names can expand from short
names to longer names (e.g. "0:119" => "0.0.0.0:119").
Eric Wong [Mon, 8 Aug 2022 23:16:47 +0000 (23:16 +0000)]
imap: prioritize AUTH=ANONYMOUS clients
...by deprioritizing clients using a username + password.
As IMAP provides AUTH=ANONYMOUS for designating anonymous
access, we'll rely on it as a heuristic for favoring "good"
clients. Clients using a username + password seem to (more
often than not) be malicious and looking for info which doesn't
belong in public inboxes.
This copies the technique used by WWW + -httpd to deprioritize
expensive mbox.gz downloads.
Eric Wong [Mon, 8 Aug 2022 23:16:46 +0000 (23:16 +0000)]
imap: only give AUTH=ANONYMOUS clients prefetch
Looking at IMAP traffic on public-inbox.org, it seems there is a
fair amount of traffic coming from malicious clients assuming
the IMAP server is compromised and searching for private
information. Since AUTH=ANONYMOUS clients are more likely to
be legitimate clients looking for publicly-archived mail,
give them priority.
Eric Wong [Fri, 5 Aug 2022 08:29:54 +0000 (08:29 +0000)]
daemon: dedupe PublicInbox::Config objects by pathname
This means all Inbox, Git, Over, Msgmap, Search objects also get
deduplicated if they belong to the same config file, reducing
memory and FD usage. This helps save memory and improve cache
hit rates in -netd setups where NNTP, IMAP, HTTP, and POP3
servers run in the same process.
InboxIdle was the only bit which needed adjustment, but there
may be other bugs lurking despite all tests passing.
Eric Wong [Thu, 4 Aug 2022 20:08:21 +0000 (20:08 +0000)]
www: gzip_filter: avoid errors after ->write failure
->zflush must return a string to its caller, not undef.
Additionally, {http_out} may be deleted on ->write if ->close
recurses.
This should fix the following errors:
Use of uninitialized value $_[1] in string eq at PublicInbox/HTTP.pm line 211.
E: Can't call method "close" on an undefined value at GzipFilter.pm line 167.
Eric Wong [Thu, 4 Aug 2022 08:17:02 +0000 (08:17 +0000)]
feed: avoid unnecessary map loop in non-over path
We can bless objects while doing the initial insertion to avoid
extra the extra map iteration and temporary array(s). Fewer ops
means memory savings for the likely case of ->over users, too.
Eric Wong [Thu, 4 Aug 2022 08:17:01 +0000 (08:17 +0000)]
imap: ensure_slices_exist: drop needless map and array
We can reduce ops and temporary objects here by folding the
stringification into the `for' loop and push directly into the
{mailboxlist} array; relying on autovivification to turn it into
a noop for the initial population.