Eric Wong [Fri, 30 Sep 2022 09:21:39 +0000 (09:21 +0000)]
lei_to_mail: propagate errors to script/lei
We need to rely on lei->fail to propagate errors in lei workers
to the script/lei client, otherwise tests and other scripts can
stumble forward with incomplete/incorrect/broken outputs.
This helps me focus on occasional t/lei-up.t failures I see on
CentOS 7.x where OverIdx->adj_counter fails on "lei up --all"...
Eric Wong [Fri, 30 Sep 2022 09:21:38 +0000 (09:21 +0000)]
t/lei-up: improve diagnostics for this test
I'm getting occasional failures for this test on CentOS 7.x (but
not on FreeBSD nor Debian 10/11). I'm not why, yet, so just
improve diagnostics for now.
Eric Wong [Thu, 29 Sep 2022 17:48:29 +0000 (17:48 +0000)]
treewide: use --globoff with curl(1)
curl 7.29.0 (on CentOS 7.x) seems to mishandle square-bracketed
IPv6 addresses, at least. Furthermore, we don't actually need
nor use the globbing in curl for lei when forwarding requests
from the lei command-line. lei has its own globbing and
`--globoff' behavior for externals and none of it is intended
for curl.
Eric Wong [Mon, 26 Sep 2022 10:17:15 +0000 (10:17 +0000)]
git: reduce early bare-bones memory use
The {-git_path} cache can rely on auto-vivification, and
{alt_st} may not be needed for short-lived repos. So don't
populate those fields until they're needed, since we can
expect to handle thousands of git repos, too.
Eric Wong [Mon, 26 Sep 2022 10:17:14 +0000 (10:17 +0000)]
viewvcs: load blobs asynchronously
This actually leads to a nice 3-5% speedup under parallel loads
when using git(1) w/o SHA-1 collision detection enabled. Gcf2
is slower since libgit2 has SHA-1 collision detection enabled
on my system.
Since we're in the area, improve location of comments w.r.t.
cgit CSS class names and note the reliance on scratchpad for
performance in a tight loop.
Eric Wong [Mon, 26 Sep 2022 10:17:13 +0000 (10:17 +0000)]
gcf2: support worktree $GIT_DIR
We must use `git rev-parse --git-path objects' instead of
blindly appending '/objects' to $GIT_DIR, since appending
doesn't work when $GIT_DIR is a worktree.
Eric Wong [Mon, 26 Sep 2022 10:17:12 +0000 (10:17 +0000)]
viewdiff: save memory by eliminating two captures
Avoid relying on $DIGIT captures when @- and @+ to access
last match start and end, respectively. The elimination of
the post capture ought to allow the use of sv_chop to advance
the string start pointer without memory copies.
This ought to save 1-2MB of memory on my system since I've
noticed the captures was using a big chunk of scratchpad
space.
This avoids `Wide character in print' warnings and ensures the
UTF-8 characters in `Signed-off-by' trailers are properly rendered
in HTML even when attempting to decode and display
application/octet-stream mbox attachments as HTML.
Linkification and reconstruction for coderepos is probably
still broken, but that is a much bigger task to fix, I think.
Fixes: ab9c03ff4aa369b3 ("www: use PerlIO::scalar (zfh) for buffering")
Eric Wong [Sat, 10 Sep 2022 20:10:24 +0000 (20:10 +0000)]
solver: do not show redundant URLs in log
Messages in /all/ can get duplicated at times due to
list-appended signatures or buggy/malicious clients.
They'll all show up based on /$INBOX/$MSGID/,
so deduplicate the URLs to avoid noise.
Eric Wong [Sat, 10 Sep 2022 20:10:23 +0000 (20:10 +0000)]
view: fix solver links with multiple messages
For redundant messages sharing Message-IDs, the link to solver
(/$INBOX/$OID/s/) was going up too many levels for /$INBOX/$MSGID/
when there were multiple messages sharing the same $MSGID.
Unfortunately, redundant messages are common with /all/
due to signature trailers. So dynamically assigning {-spfx}
is tricky and error prone from counting `/'.
So simplify the code a bit by setting {-spfx} once per HTTP
request, instead of every single message.
Eric Wong [Sat, 10 Sep 2022 08:17:29 +0000 (08:17 +0000)]
viewvcs: switch to `print $zfh'
Again, ->zmore has proven expensive due to the overhead of
calling ->deflate on small strings, so print directly to the
file handle and let the PerlIO::scalar layer take care of
buffering. One of the ->zmore calls was a no-op, even, so
drop that entirely.
Eric Wong [Sat, 10 Sep 2022 08:17:28 +0000 (08:17 +0000)]
www_listing: switch to `print $zfh'
Again, ->deflate (and thus ->zmore) calls are relatively
expensive compared to `print' ops using PerlIO::scalar
behind-the-scenes. While I can likely optimize the `join' away
here, too, that will happen in a future commit.
Eric Wong [Sat, 10 Sep 2022 08:17:26 +0000 (08:17 +0000)]
feed: new_html_i: switch from zmore to `print $zfh'
eml_entry will enable zfh (PerlIO::scalar) buffering, anyways,
so there's no point in calling ->zmore to compress small
strings. The use of zfh for the skeleton is debatable, but
probably of no consequence given html_footer will hit it,
anyways.
Eric Wong [Sat, 10 Sep 2022 08:17:23 +0000 (08:17 +0000)]
httpd/async: describe which ->write subs it can call
I initially wanted to rename GzipFilter->write to
GzipFilter->writev to reflect the multi-argument nature of the
sub, and it wasn't worth the memory to maintain an alias.
Eric Wong [Sat, 10 Sep 2022 08:17:20 +0000 (08:17 +0000)]
viewdiff: diff_before_or_after: avoid extra capture
/(.*?)\z/ will capture the "$X insertions(+), $Y deletions(-)"
bit anyways, along with whatever extra notes before the
/^diff --git / line. So just rely on /(.*?)\z/ and avoid
the special case before it.
Eric Wong [Sat, 10 Sep 2022 08:17:19 +0000 (08:17 +0000)]
www: use PerlIO::scalar (zfh) for buffering
Calling Compress::Raw::Zlib::deflate is fairly expensive.
Relying on the `.=' (concat) operator inside ->zadd operator is
faster, but the method dispatch overhead is noticeable compared
to the original code where we had bare `.=' littered throughout.
Fortunately, `print' and `say' with the PerlIO::scalar IO layer
appears to offer better performance without high method dispatch
overhead. This doesn't allow us to save as much memory as I
originally hoped, but does allow us to rely less on concat
operators in other places and just pass a list of args to
`print' and `say' as a appropriate.
This does reduce scratchpad use, however, allowing for large
memory savings, and we still ->deflate every single $eml.
Eric Wong [Sat, 10 Sep 2022 08:17:16 +0000 (08:17 +0000)]
view: switch a few things to ctx->zmore
Unfortunately, this is actually slower. However, this
hopefully makes it easier to improve the internals and
make performance improvements down the line.
I'm not sure if Devel::Size::total_size can be trusted due
to the regexps and crashes[1], but when it works, it's showing
around a 900 byte size reduction, too.
Eric Wong [Sat, 10 Sep 2022 08:17:11 +0000 (08:17 +0000)]
view: reduce ascii_html calls and {obuf} use
We can rely on {-html_tip} for some things at the top of the
page, and reduce ascii_html and obfuscate_addrs calls by
working on the whole buffer at once.
Eric Wong [Sat, 10 Sep 2022 08:17:09 +0000 (08:17 +0000)]
view: _th_index_lite: avoid one s///, improve symmetry
We can replace an expensive `s///' substitution with a simpler
`chop'. Furthermore, we can delay the "</b>\n" replacement
to ensure it's on the same line of Perl code as the `<b>'
opening tag for readability.
Eric Wong [Sat, 10 Sep 2022 08:17:06 +0000 (08:17 +0000)]
view: reduce subroutine calls for submsg_hdr
Favor fewer, yet more expensive operations than many smaller
ones. While we're still directly manipulating ctx->{obuf} after
this, this change makes it easier for us to avoid doing so in
the future.
Eric Wong [Sat, 10 Sep 2022 08:17:02 +0000 (08:17 +0000)]
view: simplify _parent_headers
Having References but lacking In-Reply-To is an uncommon case
with email, nowadays. So just rely on ->linkify_mids to handle
linkification and HTML escaping Furthermore, headers are short
enough to return as-is (and rely on CoW improvements in Perl
5.1x) since linkify_mids needs to operate on an independent
string, anyways.
Eric Wong [Sat, 10 Sep 2022 08:17:00 +0000 (08:17 +0000)]
www_listing: avoid unnecessary work for common cases
We need to branch for non-empty `q=' parameters anyways, but
`q=' is usually empty/unset. While we're in the area, `chomp'
reads `$/' while `chop' is simpler. Furthermore, we can shave
a few bytes off the form HTML by omitting spaces before `/>'
and placing `\n' to wrap long lines before attribute names.
Eric Wong [Sat, 10 Sep 2022 08:16:58 +0000 (08:16 +0000)]
viewvcs: use shorter and simpler ctx->html_done
We only return 200s for any response large enough to warrant
->html_done, so we can just assume it. ViewVCS can also take
advantage of it with some tweaking to avoid an extra method
dispatch.
Eric Wong [Sat, 10 Sep 2022 08:16:52 +0000 (08:16 +0000)]
xt: fold perf-obfuscate into perf-msgview, future-proof
perf-obfuscate was close enough to perf-msgview that it only
required setting the `obfuscate' field of the inbox.
Then update perf-msgview to account for upcoming internal
changes. The current use of {obuf} and concat ops results in
excessive scratchpad space and I may be able to even get
speedups by avoiding concat ops.
Eric Wong [Sat, 10 Sep 2022 01:18:59 +0000 (01:18 +0000)]
lei: bail out earlier on IMAP writer failures
Excessive IMAP connections can overload IMAP servers and cause
clients to be disconnected without diagnostic messages.
Use $lei->fail on these exceptions to propagate errors to the
CLI ASAP to avoid further errors down the line.
This ought to make problems more apparent for users using IMAP
destinations.
Eric Wong [Sun, 4 Sep 2022 04:27:49 +0000 (04:27 +0000)]
prepare HTML rendering maintainer tests for upcoming changes
There'll be a number of upcoming changes to HTML rendering
of messages to hopefully reduce memory usage and speedups
by writing out to the gzip buffer earlier.
Update the tests now so it'll be easier to test before
and after results.
Eric Wong [Fri, 2 Sep 2022 10:11:48 +0000 (10:11 +0000)]
solver: do not count duplicates in patch count
We're considering duplicate patches from cross-posted lists
identical, so don't double-count them when displaying the
"applying [X/Y]" message since (successful) duplicates get
skipped.
Eric Wong [Fri, 2 Sep 2022 09:12:54 +0000 (09:12 +0000)]
extmsg: shorten partial Message-IDs minimum to 14
Gnus seems to start Message-IDs with 10 random characters
followed by ".fsf@$DOMAIN". In case of mis-linkification or
mis-selection from stopping at the `@', ensuring the first 14
characters are accepted as a search parameter for the truncated
Message-ID improves usability.
Eric Wong [Fri, 2 Sep 2022 09:10:54 +0000 (09:10 +0000)]
www: omit [thread overview] link for unindexed v1
Unindexed v1 inboxes do not have the thread overview skeleton
at the bottom of /$MSGID/ pages, so do not link to it.
And for rare messages without a Date: header (or any headers!),
this also ensures the [thread overview] is shown regardless.
Eric Wong [Fri, 2 Sep 2022 09:10:53 +0000 (09:10 +0000)]
www: fix top nav bar for unindexed v1 inboxes
For /$INBOX/$MSGID/ pages, we need to point all nav bar links
../ regardless of whether ->over exists. I've also verified
this doesn't affect /$INBOX/new.html at all.
Eric Wong [Mon, 29 Aug 2022 09:26:47 +0000 (09:26 +0000)]
viewvcs: show "blob $OID" rather than "$OID blob"
This is more consistent with the rest of the output where it's
"$TYPE $OID" rather than "$OID $TYPE". The former also allows
easy copy+pasting into commands for both "git cat-file blob $OID"
and "lei blob $OID".
Eric Wong [Mon, 29 Aug 2022 09:26:43 +0000 (09:26 +0000)]
solver: early make hints detection more robust
Hints fields can change, so we'll use a simple boolean rather
than checking a static count. We'll also short-circuit out
reliably regardless of hints when a full OID is given.
Eric Wong [Mon, 29 Aug 2022 09:26:41 +0000 (09:26 +0000)]
www: atom: fix "changed" href to nowhere
The HTML generated for the Atom feed doesn't have the footer
of /T/ and /t/ HTML-only views, so just make "changed" in
the diffstat go directly to the permalink #related anchor.
Fixes: 66512e177390 ("view: generate query in single-message and commit views")
Eric Wong [Mon, 29 Aug 2022 09:26:39 +0000 (09:26 +0000)]
view: /$INBOX/: show "messages from $old to $new"
With the ViewVCS commit view using /$INBOX/?t=YYYYMMDDhhmmss-
links, the use of `t=' may not be immediately obvious to a
reader and confuse them into thinking the inbox hasn't been
updated in a while.
So add a header to the top of the page whenever the `t=' query
parameter is used.
And kill a couple of redundant variable assignments while we're
at it.
Eric Wong [Mon, 29 Aug 2022 09:26:38 +0000 (09:26 +0000)]
treewide: ditch inbox->recent method
It's a needless wrapper, nowadays. Originally, ->over was added
on experimental basis to optimize for /$INBOX/ where Xapian
->search is slower on gigantic (LKML-sized) inboxes.
Nowadays with extindex, ->over is here to stay given NNTP and
IMAP both benefit from it. So reduce the interpreter stack
overhead and just access ->over directly.
lxs->recent was never used outside of tests, anyways.
And while we're in the area, avoid needlessly bumping the
refcount of $ctx->{ibx} in View::paginate_recent.
Eric Wong [Mon, 29 Aug 2022 09:26:37 +0000 (09:26 +0000)]
view: speed up /$INBOX/ landing page by 0.5-1.0%
Array lookups and extra arithmetic in Perl is slower than
bumping the internal array offset inside the interpreter.
Fwiw, using: my ($level, $subj) = splice(@extra, 0, 2)
did not result in a performance improvement.
Eric Wong [Mon, 29 Aug 2022 09:26:34 +0000 (09:26 +0000)]
viewvcs: use array for highlighted blob display
This can avoid at least one expensive copy for displaying
large blobs with syntax highlighting.
However, we cannot blindly change everything to arrays, either:
the cost of invoking Compress::Raw::Zlib->deflate must be taken
into account. Joining short strings via `.=', `.', `join' or
interpolation is typically faster since it avoids ->deflate
method calls (and non-magic perlops are the fastest dispatches
in Perl).
Eric Wong [Mon, 29 Aug 2022 09:26:31 +0000 (09:26 +0000)]
viewvcs: share File::Temp::Dir with solver
This allows reusing inodes for /$COMMIT_OID/s/ requests.
We'll also replace `log' with `lh' in the field name to
avoid confusion with the `log' perlop.
Eric Wong [Sun, 28 Aug 2022 03:59:50 +0000 (03:59 +0000)]
linkify: avoid digits and dashes in placeholders
The `highlight' module seems to highlight every digit in
YAML (and possibly other) source files. This causes problems
in linkify_2 which replaces the placeholders with proper URIs.
I suspect `-' and other punctuation characters will cause
similar problems, so we must stick to [A-Za-z].
Thus transliterate 0-9 to A-J in the hex key to ensure highlight
doesn't see digit characters, and rename the prefix to be
project-name independent.
Unindexed v1 inboxes were leaving $smsg objects unpopulated when
using public-inbox-httpd (but not generic PSGI servers) and
causing missing HTML content and uninitialized value warnings.
Our existing tests for unindexed v1 inboxes only assumed generic
PSGI servers and synchronous blob retrieval. Due to changes
several years ago to make git blob retrieval async for slow
storage using public-inbox-httpd, our tests were insufficient to
detect this regression.
So ensure $smsg->populate runs in a few places and rewrite
t/plack.t to test against both generic PSGI and -httpd
implementations.
Fortunately, unindexed v1 inboxes are uncommon, and this
bug was only (finally) discovered while developing other
features.
For ensuring we can test (and not blindly follow) redirects with
-httpd, we now provide our own LWP::UserAgent (used internally
by Plack::Test::ExternalServer) with redirect following
disabled to P:T:ES::test_psgi.
Eric Wong [Fri, 26 Aug 2022 03:20:20 +0000 (03:20 +0000)]
view: add "this message" link above dfblob: textarea
When jumping to #related from /T/ or /t/ views, it could be
disconcerting to not have the current message as context.
So add a "this message" link back up to #t as we have always
done with the reply instructions.
Eric Wong [Tue, 23 Aug 2022 08:32:01 +0000 (08:32 +0000)]
ibx_async_cat: access ->{git} directly
This will enable callers to pass non-Inbox-ish hashrefs as the
arg. This benefits existing Inbox-ish objects, too, as it
avoids a slow method dispatch for both ExtSearch and Inbox.