Eric Wong [Mon, 28 Nov 2022 05:31:24 +0000 (05:31 +0000)]
lei_mirror: allow --epoch on mixed v1/v2 clones
It's entirely possible an instance will have both v1 and v2
inboxes (or v2 inboxes and coderepos). Don't punish --epoch
users by forcing them to run multiple commands.
Eric Wong [Mon, 28 Nov 2022 05:31:20 +0000 (05:31 +0000)]
lei_mirror: reduce noise on interrupted clones
We don't need git-config or other commands failing loudly.
`git clone' and subcommands it spawns may still spew, but it's no
worse than interrupting `git clone' itself, now.
We accomplish this by localizing $LIVE (formerly %LIVE) and
detecting when its auto-vivification into a hashref goes
out-of-scope during the `DESTRUCT' ${^GLOBAL_PHASE}.
We can't use ${^GLOBAL_PHASE}, yet, either, since it appeared in
Perl 5.14 and we're still migrating slowly to Perl 5.12 before
going to 5.14.
Eric Wong [Mon, 28 Nov 2022 05:31:18 +0000 (05:31 +0000)]
lei_mirror: initialize placeholders with "head" from manifest
This only affects v2 epochs, but ensures our bases are covered,
at least. We'll have to update PublicInbox::Fetch later to
deal with "head" entries in manifest.js.gz, too.
Eric Wong [Mon, 28 Nov 2022 05:31:17 +0000 (05:31 +0000)]
clone: support --dry-run / -n flag
It still makes HTTP(S) requests to retrieve the manifest or
scrape HTML, but doesn't make permanent changes to the FS
(aside from modifying {acm}time of ${TMPDIR-/tmp}).
Eric Wong [Mon, 28 Nov 2022 05:31:15 +0000 (05:31 +0000)]
lei_mirror: load most modules up-front
lei lazy loads LeiMirror itself lazily, anyways, and it only
supports HTTP(S) mirrors, so there's no point in delaying most
of the modules it loads. Some of the inbox-specific and
v2-specific stuff can be lazy-loaded, however, since this
will support mirroring non-inbox repositories, too.
Eric Wong [Mon, 28 Nov 2022 05:31:13 +0000 (05:31 +0000)]
lei_mirror: consolidate clone process management
This simplifies our code by having fewer places check process
limits and perform reaping. We'll also print command names
immediately before executing, instead of right before waiting
for running processes.
Eric Wong [Mon, 28 Nov 2022 05:31:04 +0000 (05:31 +0000)]
clone: support parallel v1 clones
This opens the door to parallel cloning of coderepos, too. We
can also get rid of needless AutoReap usage, here, too since
it's usage has been 100% synchronous and not DESTROY-based as
they are in tests.
Eric Wong [Mon, 28 Nov 2022 05:31:03 +0000 (05:31 +0000)]
lei_mirror: rely on global process reaper
We no longer rely on SIGCHLD for predictability, and instead
call waitpid at safe points. This will make it easier for us to
do parallel mirroring of multiple inboxes while preserving
proper dependencies via ->DESTROY callbacks.
Eric Wong [Mon, 28 Nov 2022 05:31:00 +0000 (05:31 +0000)]
clone: parallelize v2 epoch clones
This is a first step in supporting completely parallelized
clones. Eventually, everything will be parallelized and
dependencies will be managed via callbacks.
Eric Wong [Mon, 28 Nov 2022 05:30:58 +0000 (05:30 +0000)]
clone: support multi-inbox clone
This is to ensure we can do `public-inbox-clone https://yhbt.net/lore'
or `public-inbox-clone https://lore.kernel.org/' and clone all
inboxes (and whatever else git stores).
mephi42 [Mon, 28 Nov 2022 20:25:21 +0000 (21:25 +0100)]
nntpd: fix LISTGROUP with range
This reverts 0c62cffc2389 ("nntp: listgroup_range_i: remove useless
`map' op") and adds a test that demonstrates the breakage: the server
returns lines like
Eric Wong [Sun, 27 Nov 2022 09:15:47 +0000 (09:15 +0000)]
content_hash: handle References as octets
The alsa-devel archives on lore has some UTF-8 References:
headers, so we need to treat them as octets, again, otherwise
(re)indexing triggers cascading failures.
Fixes: 5198c976ce8b "eml: header_raw converts octets to Perl UTF-8"
Eric Wong [Thu, 24 Nov 2022 21:31:55 +0000 (21:31 +0000)]
eml: header_raw converts octets to Perl UTF-8
This fixes the display of raw (non-RFC 2047) names and subjects
in HTML message views.
SMTPUTF8 (RFC 6531) allows raw UTF-8 in headers without RFC 2047
encoding, so let Perl handle it as a character sequence for the
rest of our consumers. Thus, the old special case in
PublicInbox::Smsg->populate is no longer necessary and gone.
The one regression notice so far (and fixed here) is compressed
IMAP envelope responses still needs raw bytes since the zlib
wrapper is designed for octets, not Perl UTF-8 chars. Thus we
reverse utf8::decode with utf8::encode in PublicInbox::IMAP::_esc.
->header_set also forces encoding to bytes, since all existing
callers would either be dealing with ->header_raw results or
be RFC-2047-encoded anyways.
Reindexing is not necessary with this change due to the prior
PublicInbox::Smsg->populate special case.
Eric Wong [Wed, 23 Nov 2022 04:09:58 +0000 (04:09 +0000)]
lei_curl: use http.proxy config from git if available
Since HTTP(S) URLs hit by lei or public-inbox-{clone,fetch} are
expected to be git endpoints anyways, fall back to using
http.proxy from git configs to save the user from having to
maintain the same configuration for different things.
Eric Wong [Mon, 14 Nov 2022 08:07:02 +0000 (08:07 +0000)]
lei q|up: limit default write --jobs for IMAP(S)
Eric Wong <e@80x24.org> wrote:
> Thanks for confirming things work as intended. I think the
> default should be clamped, though... 15 seems a bit high for
> smaller IMAP servers *shrug*
--------8<-------
Subject: [PATCH] lei q|up: limit default write --jobs for IMAP(S)
IMAP(S) servers often limit per-user connections, so avoid
bumping into limits to improve the out-of-the-box experience.
4 seems like a conservative default, since we already chose
that number for remote HTTP(S) endpoints.
Eric Wong [Tue, 1 Nov 2022 09:36:12 +0000 (09:36 +0000)]
lei: fix globbing semantics to match end-of-filename
Globs such as `*/foo' should not match `*/foobar'. I noticed
this while adding glob support to public-inbox-clone.
This may subtly break some existing cases, but there aren't many
lei users, yet, and globbing semantics should match what most
other glob-using programs, do...
We'll also make `lei ls-mail-sync' behave more consistently with
`lei ls-external', as far as the basename matching fallback
goes.
Eric Wong [Thu, 20 Oct 2022 08:43:12 +0000 (08:43 +0000)]
treewide: replace /^I: / prefix with /^# /
This is like more familiar to readers of TAP (Test Anywhere
Protocol) output, as well as shell and Perl scripters which also
use `#' for comments.
AFAIK, nobody is parsing our stderr, and I'm not sure how
standardized the `I:' prefix is (nor `W:' and `E:' are). It's
already the prevailing style in Lei* code, too, so things have
been moving in that direction for a bit.
Eric Wong [Thu, 20 Oct 2022 08:43:09 +0000 (08:43 +0000)]
clone|fetch: preserve mtime of modified manifest.js.gz
When we cull manifest.js.gz for ignored epochs, attempt to
preserve mtime of the updated manifest.js.gz since it can
be used to optimize future fetches.
Eric Wong [Mon, 17 Oct 2022 09:30:53 +0000 (09:30 +0000)]
sigfd: set SIGWINCH for MIPS and PA-RISC on Linux
SIGWINCH is actually different for these architectures on Linux
according to the signal(7) man page.
Note: AFAICS there's no parisc machine in the GCC Farm[1],
so it remains untested. I've only tested mips64 for mips,
but I expect them to both work.
OpenBSD (on gcc231) octeon defines SIGWINCH as the common `28',
so it appears Linux is the only one with arch-dependent signal
numbers (ditto with syscalls).
Eric Wong [Mon, 10 Oct 2022 21:34:22 +0000 (21:34 +0000)]
www: viewvcs: display annotated tags as discreet objects
This emphasizes annotated tags as their own object type in the
web UI while being able to link to the existing show_commit()
linkification and dfblob: search.
Eric Wong [Mon, 10 Oct 2022 21:34:21 +0000 (21:34 +0000)]
xt/solver: skip on missing publicinbox.git.coderepo
Solver tests can never succeed without coderepos configured,
since that's the whole point of solver. And improve the
original skip message to note that it's about the `git'
public-inbox, not `git' itself.
Eric Wong [Sat, 8 Oct 2022 08:24:48 +0000 (08:24 +0000)]
www_coderepo: allow searching one extindex|inbox
I'm not sure how to best make a UI for one coderepo to many
inboxes/extindices, yet; but at least allow a simple 1:1
mapping, for now. This ensures /$CODEREPO/$OID/s/ can work
as effectively as /$INBOX/$OID/s/ when looking for emails
associated with a git commit.
Eric Wong [Sat, 8 Oct 2022 08:24:47 +0000 (08:24 +0000)]
www: cgit: fix fallback to WwwCoderepo on array responses
For fast PSGI responses which don't require returning a coderef,
just reuse qspawn.wcb directly on the arrayref to avoid an undef
$wcb from firing in psgi_return_init_cb.
I only noticed this because the ViewVCS search form is broken
for /$CODEREPO/$OID/s/ endpoints at the moment.
Eric Wong [Sat, 8 Oct 2022 08:24:46 +0000 (08:24 +0000)]
www_coderepo: update blurb on the goal/purpose of this
I think putting too much functionality in web services leads
to ignorance of local/offline tools, so this web UI will give
hints here and there for web users. Things like diff options
can get expensive and become cache-unfriendly on the web server,
so promoting local tools can reduce overall network traffic
and server load.
Eric Wong [Wed, 5 Oct 2022 22:29:41 +0000 (22:29 +0000)]
www: support publicinbox.cgit knob
For backwards-compatibility, this defaults to `first'. When set
to `fallback', PublicInbox::WwwCoderepo is favored and cgit is
only used as a fallback. Eventually, `rewrite' will also be
supported to rewrite cgit URLs to WwwCoderepo ones.
Of course, WwwCoderepo is still missing search and other key
features, but that's being worked on...
Eric Wong [Wed, 5 Oct 2022 22:29:40 +0000 (22:29 +0000)]
www: cgit: fall back to WwwCoderepo on 404s
We can't rely on 3-element array response when calling
WwwCoderepo for ViewVCS endpoints since that uses Qspawn
internally. Thus, we have to allow two Qspawn objects to run in
parallel and ensure `qspawn.wcb' only gets called once, so we
end up duplicating the entire $ctx to ensure this.
Eric Wong [Tue, 4 Oct 2022 19:12:35 +0000 (19:12 +0000)]
www_coderepo: an alternative to cgit
This will allow it to easily map a single coderepo to multiple
inboxes (or multiple coderepos to any number of inboxes).
For now, this is just a summary, but $REPO/$OID/s/ support
will be added, along with archive downloads.
Indexing of coderepos will probably be supported via -extindex,
only.
Eric Wong [Tue, 4 Oct 2022 19:12:32 +0000 (19:12 +0000)]
cgit: use Perl 5.10-isms, optimize, and golf
We can reduce variable assignments in a few places and filter
keys more quickly using the `grep' Perl op rather than relying on
`m// or next' inside a loop. Similar changes to the NNTP and IMAP
(e.g. b700fce60f25038e (nntp: NEWNEWS: speed up filtering, 2020-11-27))
yielded good improvements.
Eric Wong [Sun, 2 Oct 2022 15:11:01 +0000 (15:11 +0000)]
viewdiff: fix parts of diff being appended after signature
I'm not sure what kind of brain fart introduced this in c1e7a048be9d32cd, but it happened :x. We'll undef the $x
variable ASAP to save memory and make future errors like this
one more noticeable.
Eric Wong [Sat, 1 Oct 2022 18:52:50 +0000 (15:52 -0300)]
www_stream: use DESTROY to cleanup temporary gits
Relying on a timer to handle cleanup in f9ac22a4b485 was
sub-optimal since the delay could prove expensive under heavy
traffic. So rely on ->DESTROY instead since we we no longer
hold reference cycles by the time the show_blob callback
executes.
Eric Wong [Sat, 1 Oct 2022 00:33:15 +0000 (00:33 +0000)]
lei: force --jobs=1,1 for SQLite < 3.8.3
SQLite prior to 3.8.3 did not reset its PRNG for generating
unique temporary file names, so it would barf on t/lei-up.t
occasionally due to O_EXCL -> EEXIST conflicts.
This fixes occasional test failures under CentOS 7.x which ships
SQLite 3.7.17.
Eric Wong [Fri, 30 Sep 2022 09:21:40 +0000 (09:21 +0000)]
t/altid_v2: improve test style
Favor `is' for equality checks since it reports differences,
and `xbail' over `BAIL_OUT' since it's easier-to-type w/o caps
and more powerful.
These are just things noticed while I was looking at another
odd failure on CentOS 7.x with this test, but I suspect it
was a transient failure caused by running the test suite
from multiple terminals in parallel.
Eric Wong [Fri, 30 Sep 2022 09:21:39 +0000 (09:21 +0000)]
lei_to_mail: propagate errors to script/lei
We need to rely on lei->fail to propagate errors in lei workers
to the script/lei client, otherwise tests and other scripts can
stumble forward with incomplete/incorrect/broken outputs.
This helps me focus on occasional t/lei-up.t failures I see on
CentOS 7.x where OverIdx->adj_counter fails on "lei up --all"...
Eric Wong [Fri, 30 Sep 2022 09:21:38 +0000 (09:21 +0000)]
t/lei-up: improve diagnostics for this test
I'm getting occasional failures for this test on CentOS 7.x (but
not on FreeBSD nor Debian 10/11). I'm not why, yet, so just
improve diagnostics for now.
Eric Wong [Thu, 29 Sep 2022 17:48:29 +0000 (17:48 +0000)]
treewide: use --globoff with curl(1)
curl 7.29.0 (on CentOS 7.x) seems to mishandle square-bracketed
IPv6 addresses, at least. Furthermore, we don't actually need
nor use the globbing in curl for lei when forwarding requests
from the lei command-line. lei has its own globbing and
`--globoff' behavior for externals and none of it is intended
for curl.
Eric Wong [Mon, 26 Sep 2022 10:17:15 +0000 (10:17 +0000)]
git: reduce early bare-bones memory use
The {-git_path} cache can rely on auto-vivification, and
{alt_st} may not be needed for short-lived repos. So don't
populate those fields until they're needed, since we can
expect to handle thousands of git repos, too.
Eric Wong [Mon, 26 Sep 2022 10:17:14 +0000 (10:17 +0000)]
viewvcs: load blobs asynchronously
This actually leads to a nice 3-5% speedup under parallel loads
when using git(1) w/o SHA-1 collision detection enabled. Gcf2
is slower since libgit2 has SHA-1 collision detection enabled
on my system.
Since we're in the area, improve location of comments w.r.t.
cgit CSS class names and note the reliance on scratchpad for
performance in a tight loop.
Eric Wong [Mon, 26 Sep 2022 10:17:13 +0000 (10:17 +0000)]
gcf2: support worktree $GIT_DIR
We must use `git rev-parse --git-path objects' instead of
blindly appending '/objects' to $GIT_DIR, since appending
doesn't work when $GIT_DIR is a worktree.
Eric Wong [Mon, 26 Sep 2022 10:17:12 +0000 (10:17 +0000)]
viewdiff: save memory by eliminating two captures
Avoid relying on $DIGIT captures when @- and @+ to access
last match start and end, respectively. The elimination of
the post capture ought to allow the use of sv_chop to advance
the string start pointer without memory copies.
This ought to save 1-2MB of memory on my system since I've
noticed the captures was using a big chunk of scratchpad
space.
This avoids `Wide character in print' warnings and ensures the
UTF-8 characters in `Signed-off-by' trailers are properly rendered
in HTML even when attempting to decode and display
application/octet-stream mbox attachments as HTML.
Linkification and reconstruction for coderepos is probably
still broken, but that is a much bigger task to fix, I think.
Fixes: ab9c03ff4aa369b3 ("www: use PerlIO::scalar (zfh) for buffering")