Eric Wong [Mon, 28 Nov 2022 05:32:17 +0000 (05:32 +0000)]
lei_mirror: remove janky mirror.done stamp file
This makes a fundamental (and overdue) change to the core of
lei in how it handles child errors. Every process which
generates or receives a child error will remember it before
passing it on. This ensures _wq_done_wait callbacks will
know of prior errors aside from $? when it runs.
Eric Wong [Mon, 28 Nov 2022 05:32:14 +0000 (05:32 +0000)]
lei_mirror: respect `./' and `../' prefixes for CLI args
Users may wish to keep objstore and manifest files at
a higher level to prevent direct access via HTTP(S),
so those relative paths probably make sense.
Eric Wong [Mon, 28 Nov 2022 05:32:10 +0000 (05:32 +0000)]
clone|fetch: support passing --prune(-tags) to `git fetch'
We need to be able to get rid of removed branches and tags on
the remote. --prune-tags is implied for non-objstore repos,
and incompatible with objstore repos.
Eric Wong [Mon, 28 Nov 2022 05:32:08 +0000 (05:32 +0000)]
lei_mirror: delay configuring forkgroups
When relying on `public-inbox-clone --manifest=', idempotent
`git config' invocations can take a considerable amount of
time. We still configure inboxes idempotently since it
allows quickly changing URLs to mirrors, but we just defer
it until an update is actually needed.
Eric Wong [Mon, 28 Nov 2022 05:32:02 +0000 (05:32 +0000)]
lei_mirror: properly pack-refs in non-forkgroup repos
We need to ensure `git update-ref --stdin' is complete
before running `git pack-refs', otherwise loose refs can
remain while update-ref is still running.
Eric Wong [Mon, 28 Nov 2022 05:32:01 +0000 (05:32 +0000)]
fetch: eliminate File::Temp->filename var
File::Temp objects are overloaded to automatically
call ->filename when stringified, so there's no need
to store the ->filename result on the Perl stack.
Eric Wong [Mon, 28 Nov 2022 05:31:54 +0000 (05:31 +0000)]
lei_mirror: forkgroups use `git fetch --multiple'
This offloads network parallelization and safety off to git
itself while reducing the amount of unnecessary process spawning
we do. This also improves readability of pack-refs invocations
and reduces the need for them.
To prevent heavily-forked repos from hitting system command-line
size limits, we group refs to be updated in the "fgrptmp" group.
Eric Wong [Mon, 28 Nov 2022 05:31:51 +0000 (05:31 +0000)]
lei_mirror: drop git <1.8.5 support
Supporting git <1.8.5 via fetch on non-forkgroup repos would
make auto-GC dangerous, and I want to support auto-GC instead
of relying on the preciousObjects extension.
Since git 1.8.5 is 9 years old at this point, and grokmirror
(used by the only CentOS 7.x user I know of) already relies on
newer git, simplify our code and only fetch into forkgroups.
Eric Wong [Mon, 28 Nov 2022 05:31:49 +0000 (05:31 +0000)]
lei_mirror: preserve permissions of existing alternates file
We don't want to be clobbering permissions when changing to
relative paths. Furthermore, we can avoid writing to the
alternates file if there are no changes.
Eric Wong [Mon, 28 Nov 2022 05:31:45 +0000 (05:31 +0000)]
clone: flesh out --objstore behavior and document
We can support absolute paths to avoid surprising behaviors,
but relative paths are preferred since the goal is to be
accessible over the "dumb" HTTP git transport (the dumb
transport is uses less memory and CPU on the server).
Eric Wong [Mon, 28 Nov 2022 05:31:40 +0000 (05:31 +0000)]
lei_mirror: avoid convoluted lazy_cb usage
lazy_cb should only be used for lei command dispatch and
completion callbacks when the method isn't known at startup.
There's zero reason to use it when the method is known
ahead-of-time, especially when there's a comment pointing
reviewers towards the only possible method it can dispatch.
Eric Wong [Mon, 28 Nov 2022 05:31:33 +0000 (05:31 +0000)]
clone: support --inbox-version
This is part of `lei add-external --mirror', and it makes
sense to have for development and testing. We'll also add
a fallback in case somebody tries --inbox-version and fails
due to a newer remote instances of public-inbox.
Eric Wong [Mon, 28 Nov 2022 05:31:27 +0000 (05:31 +0000)]
lei_mirror: do not fetch descriptions if using manifest
If a manifest exists, we can expect the description to always be
present, thus there's no need to make a separate HTTP(S) request
since we can use it as-is from the manifest for v1||coderepos
and strip / \[epoch [0-9]+\]\z/ from v1.
Eric Wong [Mon, 28 Nov 2022 05:31:24 +0000 (05:31 +0000)]
lei_mirror: allow --epoch on mixed v1/v2 clones
It's entirely possible an instance will have both v1 and v2
inboxes (or v2 inboxes and coderepos). Don't punish --epoch
users by forcing them to run multiple commands.
Eric Wong [Mon, 28 Nov 2022 05:31:20 +0000 (05:31 +0000)]
lei_mirror: reduce noise on interrupted clones
We don't need git-config or other commands failing loudly.
`git clone' and subcommands it spawns may still spew, but it's no
worse than interrupting `git clone' itself, now.
We accomplish this by localizing $LIVE (formerly %LIVE) and
detecting when its auto-vivification into a hashref goes
out-of-scope during the `DESTRUCT' ${^GLOBAL_PHASE}.
We can't use ${^GLOBAL_PHASE}, yet, either, since it appeared in
Perl 5.14 and we're still migrating slowly to Perl 5.12 before
going to 5.14.
Eric Wong [Mon, 28 Nov 2022 05:31:18 +0000 (05:31 +0000)]
lei_mirror: initialize placeholders with "head" from manifest
This only affects v2 epochs, but ensures our bases are covered,
at least. We'll have to update PublicInbox::Fetch later to
deal with "head" entries in manifest.js.gz, too.
Eric Wong [Mon, 28 Nov 2022 05:31:17 +0000 (05:31 +0000)]
clone: support --dry-run / -n flag
It still makes HTTP(S) requests to retrieve the manifest or
scrape HTML, but doesn't make permanent changes to the FS
(aside from modifying {acm}time of ${TMPDIR-/tmp}).
Eric Wong [Mon, 28 Nov 2022 05:31:15 +0000 (05:31 +0000)]
lei_mirror: load most modules up-front
lei lazy loads LeiMirror itself lazily, anyways, and it only
supports HTTP(S) mirrors, so there's no point in delaying most
of the modules it loads. Some of the inbox-specific and
v2-specific stuff can be lazy-loaded, however, since this
will support mirroring non-inbox repositories, too.
Eric Wong [Mon, 28 Nov 2022 05:31:13 +0000 (05:31 +0000)]
lei_mirror: consolidate clone process management
This simplifies our code by having fewer places check process
limits and perform reaping. We'll also print command names
immediately before executing, instead of right before waiting
for running processes.
Eric Wong [Mon, 28 Nov 2022 05:31:04 +0000 (05:31 +0000)]
clone: support parallel v1 clones
This opens the door to parallel cloning of coderepos, too. We
can also get rid of needless AutoReap usage, here, too since
it's usage has been 100% synchronous and not DESTROY-based as
they are in tests.
Eric Wong [Mon, 28 Nov 2022 05:31:03 +0000 (05:31 +0000)]
lei_mirror: rely on global process reaper
We no longer rely on SIGCHLD for predictability, and instead
call waitpid at safe points. This will make it easier for us to
do parallel mirroring of multiple inboxes while preserving
proper dependencies via ->DESTROY callbacks.
Eric Wong [Mon, 28 Nov 2022 05:31:00 +0000 (05:31 +0000)]
clone: parallelize v2 epoch clones
This is a first step in supporting completely parallelized
clones. Eventually, everything will be parallelized and
dependencies will be managed via callbacks.
Eric Wong [Mon, 28 Nov 2022 05:30:58 +0000 (05:30 +0000)]
clone: support multi-inbox clone
This is to ensure we can do `public-inbox-clone https://yhbt.net/lore'
or `public-inbox-clone https://lore.kernel.org/' and clone all
inboxes (and whatever else git stores).
mephi42 [Mon, 28 Nov 2022 20:25:21 +0000 (21:25 +0100)]
nntpd: fix LISTGROUP with range
This reverts 0c62cffc2389 ("nntp: listgroup_range_i: remove useless
`map' op") and adds a test that demonstrates the breakage: the server
returns lines like
Eric Wong [Sun, 27 Nov 2022 09:15:47 +0000 (09:15 +0000)]
content_hash: handle References as octets
The alsa-devel archives on lore has some UTF-8 References:
headers, so we need to treat them as octets, again, otherwise
(re)indexing triggers cascading failures.
Fixes: 5198c976ce8b "eml: header_raw converts octets to Perl UTF-8"
Eric Wong [Thu, 24 Nov 2022 21:31:55 +0000 (21:31 +0000)]
eml: header_raw converts octets to Perl UTF-8
This fixes the display of raw (non-RFC 2047) names and subjects
in HTML message views.
SMTPUTF8 (RFC 6531) allows raw UTF-8 in headers without RFC 2047
encoding, so let Perl handle it as a character sequence for the
rest of our consumers. Thus, the old special case in
PublicInbox::Smsg->populate is no longer necessary and gone.
The one regression notice so far (and fixed here) is compressed
IMAP envelope responses still needs raw bytes since the zlib
wrapper is designed for octets, not Perl UTF-8 chars. Thus we
reverse utf8::decode with utf8::encode in PublicInbox::IMAP::_esc.
->header_set also forces encoding to bytes, since all existing
callers would either be dealing with ->header_raw results or
be RFC-2047-encoded anyways.
Reindexing is not necessary with this change due to the prior
PublicInbox::Smsg->populate special case.
Eric Wong [Wed, 23 Nov 2022 04:09:58 +0000 (04:09 +0000)]
lei_curl: use http.proxy config from git if available
Since HTTP(S) URLs hit by lei or public-inbox-{clone,fetch} are
expected to be git endpoints anyways, fall back to using
http.proxy from git configs to save the user from having to
maintain the same configuration for different things.
Eric Wong [Mon, 14 Nov 2022 08:07:02 +0000 (08:07 +0000)]
lei q|up: limit default write --jobs for IMAP(S)
Eric Wong <e@80x24.org> wrote:
> Thanks for confirming things work as intended. I think the
> default should be clamped, though... 15 seems a bit high for
> smaller IMAP servers *shrug*
--------8<-------
Subject: [PATCH] lei q|up: limit default write --jobs for IMAP(S)
IMAP(S) servers often limit per-user connections, so avoid
bumping into limits to improve the out-of-the-box experience.
4 seems like a conservative default, since we already chose
that number for remote HTTP(S) endpoints.
Eric Wong [Tue, 1 Nov 2022 09:36:12 +0000 (09:36 +0000)]
lei: fix globbing semantics to match end-of-filename
Globs such as `*/foo' should not match `*/foobar'. I noticed
this while adding glob support to public-inbox-clone.
This may subtly break some existing cases, but there aren't many
lei users, yet, and globbing semantics should match what most
other glob-using programs, do...
We'll also make `lei ls-mail-sync' behave more consistently with
`lei ls-external', as far as the basename matching fallback
goes.