Eric Wong [Mon, 12 Dec 2022 04:22:01 +0000 (04:22 +0000)]
t/httpd-unix: eliminate some busy waits
A small step towards making our test suite as sleep-less as
possible. We can use FIFOs to coordinate processes in a few
places, while other spots can take advantage of disabling
FD_CLOEXEC to further eliminate back-and-forth traffic between
processes.
This speeds up t/httpd-unix.t by ~20 ms on my system.
Eric Wong [Thu, 1 Dec 2022 11:21:32 +0000 (11:21 +0000)]
lei_saved_search: expand only/include/exclude to absolute paths
While users may specify relative paths for convenience on the
command-line, absolute paths are required for `lei up' since
that (especially `lei up --all') could run from anywhere.
Note that we need to do this when parsing the command-line
options, since shortcuts for URL matching on URL path components
are allowed for `lei q', and those same shortcuts may remain
in effect across to `lei up' as the underlying external may
be moved to a different URI host.
Eric Wong [Mon, 28 Nov 2022 05:32:27 +0000 (05:32 +0000)]
lei_mirror: omit trailing slash for git remote.*.url
While PublicInbox::WWW URLs have a trailing slash in them
for compatibility with static web server mirrors, URLs
intended for `git clone' don't benefit from this and the
trailing `/' just looks awkward.
Eric Wong [Mon, 28 Nov 2022 05:32:24 +0000 (05:32 +0000)]
lei_mirror: eliminate circular references
...by using local-ized globals. While non-globals could work,
eliminating the {todo} and {fgrp_todo} refs in all sub-refs
is more error-prone and the `local' construct is convenient.
This allows us to get rid of the `delete $fgrp->{-fini}' call
in pack_refs and eliminates the indiscriminate reaping of all
processes before calling fgrp_fetch_all. This means we can
fully depend on DESTROY to provide predictable dependency
handling while supporting parallelization.
Global $TODO and $FGRP_TODO now become SCALAR refs on
consumption so they can act as assertions to detect future bugs.
Eric Wong [Mon, 28 Nov 2022 05:32:23 +0000 (05:32 +0000)]
lei_mirror: support {symlinks} from manifest
It's part of grokmirror, and useful for keeping compatibility.
We can make use of File::Spec->abs2rel here to ensure our
symlinks are relative and the entire mirror can be copied
as a whole.
Eric Wong [Mon, 28 Nov 2022 05:32:22 +0000 (05:32 +0000)]
lei_mirror: set {head} from manifest
We handle symbolic refs properly, at least. It's also possible
for $GIT_DIR/HEAD to contain a full SHA-1/SHA-256, and we'll
support that by using update-ref --no-deref
Eric Wong [Mon, 28 Nov 2022 05:32:20 +0000 (05:32 +0000)]
lei_mirror: simplify forkgroup-related subs
We can pass fewer variables around on stack since $fgrp is just
a copy of $self. We can also rely more on explicit callback
passing rather than relying on OnDestroy and ->cancel for
conditional calls.
Eric Wong [Mon, 28 Nov 2022 05:32:19 +0000 (05:32 +0000)]
lei_mirror: run v1_done earlier on forkgroup done
There's likely a circular reference somewhere which was
preventing v1_done from running early. In any case, this allows
v1_done to run in parallel with the pack-refs process since
there's no ordering dependency between ref-packing and v1_done.
Eric Wong [Mon, 28 Nov 2022 05:32:17 +0000 (05:32 +0000)]
lei_mirror: remove janky mirror.done stamp file
This makes a fundamental (and overdue) change to the core of
lei in how it handles child errors. Every process which
generates or receives a child error will remember it before
passing it on. This ensures _wq_done_wait callbacks will
know of prior errors aside from $? when it runs.
Eric Wong [Mon, 28 Nov 2022 05:32:14 +0000 (05:32 +0000)]
lei_mirror: respect `./' and `../' prefixes for CLI args
Users may wish to keep objstore and manifest files at
a higher level to prevent direct access via HTTP(S),
so those relative paths probably make sense.
Eric Wong [Mon, 28 Nov 2022 05:32:10 +0000 (05:32 +0000)]
clone|fetch: support passing --prune(-tags) to `git fetch'
We need to be able to get rid of removed branches and tags on
the remote. --prune-tags is implied for non-objstore repos,
and incompatible with objstore repos.
Eric Wong [Mon, 28 Nov 2022 05:32:08 +0000 (05:32 +0000)]
lei_mirror: delay configuring forkgroups
When relying on `public-inbox-clone --manifest=', idempotent
`git config' invocations can take a considerable amount of
time. We still configure inboxes idempotently since it
allows quickly changing URLs to mirrors, but we just defer
it until an update is actually needed.
Eric Wong [Mon, 28 Nov 2022 05:32:02 +0000 (05:32 +0000)]
lei_mirror: properly pack-refs in non-forkgroup repos
We need to ensure `git update-ref --stdin' is complete
before running `git pack-refs', otherwise loose refs can
remain while update-ref is still running.
Eric Wong [Mon, 28 Nov 2022 05:32:01 +0000 (05:32 +0000)]
fetch: eliminate File::Temp->filename var
File::Temp objects are overloaded to automatically
call ->filename when stringified, so there's no need
to store the ->filename result on the Perl stack.
Eric Wong [Mon, 28 Nov 2022 05:31:54 +0000 (05:31 +0000)]
lei_mirror: forkgroups use `git fetch --multiple'
This offloads network parallelization and safety off to git
itself while reducing the amount of unnecessary process spawning
we do. This also improves readability of pack-refs invocations
and reduces the need for them.
To prevent heavily-forked repos from hitting system command-line
size limits, we group refs to be updated in the "fgrptmp" group.
Eric Wong [Mon, 28 Nov 2022 05:31:51 +0000 (05:31 +0000)]
lei_mirror: drop git <1.8.5 support
Supporting git <1.8.5 via fetch on non-forkgroup repos would
make auto-GC dangerous, and I want to support auto-GC instead
of relying on the preciousObjects extension.
Since git 1.8.5 is 9 years old at this point, and grokmirror
(used by the only CentOS 7.x user I know of) already relies on
newer git, simplify our code and only fetch into forkgroups.
Eric Wong [Mon, 28 Nov 2022 05:31:49 +0000 (05:31 +0000)]
lei_mirror: preserve permissions of existing alternates file
We don't want to be clobbering permissions when changing to
relative paths. Furthermore, we can avoid writing to the
alternates file if there are no changes.
Eric Wong [Mon, 28 Nov 2022 05:31:45 +0000 (05:31 +0000)]
clone: flesh out --objstore behavior and document
We can support absolute paths to avoid surprising behaviors,
but relative paths are preferred since the goal is to be
accessible over the "dumb" HTTP git transport (the dumb
transport is uses less memory and CPU on the server).
Eric Wong [Mon, 28 Nov 2022 05:31:40 +0000 (05:31 +0000)]
lei_mirror: avoid convoluted lazy_cb usage
lazy_cb should only be used for lei command dispatch and
completion callbacks when the method isn't known at startup.
There's zero reason to use it when the method is known
ahead-of-time, especially when there's a comment pointing
reviewers towards the only possible method it can dispatch.
Eric Wong [Mon, 28 Nov 2022 05:31:33 +0000 (05:31 +0000)]
clone: support --inbox-version
This is part of `lei add-external --mirror', and it makes
sense to have for development and testing. We'll also add
a fallback in case somebody tries --inbox-version and fails
due to a newer remote instances of public-inbox.
Eric Wong [Mon, 28 Nov 2022 05:31:27 +0000 (05:31 +0000)]
lei_mirror: do not fetch descriptions if using manifest
If a manifest exists, we can expect the description to always be
present, thus there's no need to make a separate HTTP(S) request
since we can use it as-is from the manifest for v1||coderepos
and strip / \[epoch [0-9]+\]\z/ from v1.
Eric Wong [Mon, 28 Nov 2022 05:31:24 +0000 (05:31 +0000)]
lei_mirror: allow --epoch on mixed v1/v2 clones
It's entirely possible an instance will have both v1 and v2
inboxes (or v2 inboxes and coderepos). Don't punish --epoch
users by forcing them to run multiple commands.
Eric Wong [Mon, 28 Nov 2022 05:31:20 +0000 (05:31 +0000)]
lei_mirror: reduce noise on interrupted clones
We don't need git-config or other commands failing loudly.
`git clone' and subcommands it spawns may still spew, but it's no
worse than interrupting `git clone' itself, now.
We accomplish this by localizing $LIVE (formerly %LIVE) and
detecting when its auto-vivification into a hashref goes
out-of-scope during the `DESTRUCT' ${^GLOBAL_PHASE}.
We can't use ${^GLOBAL_PHASE}, yet, either, since it appeared in
Perl 5.14 and we're still migrating slowly to Perl 5.12 before
going to 5.14.
Eric Wong [Mon, 28 Nov 2022 05:31:18 +0000 (05:31 +0000)]
lei_mirror: initialize placeholders with "head" from manifest
This only affects v2 epochs, but ensures our bases are covered,
at least. We'll have to update PublicInbox::Fetch later to
deal with "head" entries in manifest.js.gz, too.
Eric Wong [Mon, 28 Nov 2022 05:31:17 +0000 (05:31 +0000)]
clone: support --dry-run / -n flag
It still makes HTTP(S) requests to retrieve the manifest or
scrape HTML, but doesn't make permanent changes to the FS
(aside from modifying {acm}time of ${TMPDIR-/tmp}).
Eric Wong [Mon, 28 Nov 2022 05:31:15 +0000 (05:31 +0000)]
lei_mirror: load most modules up-front
lei lazy loads LeiMirror itself lazily, anyways, and it only
supports HTTP(S) mirrors, so there's no point in delaying most
of the modules it loads. Some of the inbox-specific and
v2-specific stuff can be lazy-loaded, however, since this
will support mirroring non-inbox repositories, too.
Eric Wong [Mon, 28 Nov 2022 05:31:13 +0000 (05:31 +0000)]
lei_mirror: consolidate clone process management
This simplifies our code by having fewer places check process
limits and perform reaping. We'll also print command names
immediately before executing, instead of right before waiting
for running processes.
Eric Wong [Mon, 28 Nov 2022 05:31:04 +0000 (05:31 +0000)]
clone: support parallel v1 clones
This opens the door to parallel cloning of coderepos, too. We
can also get rid of needless AutoReap usage, here, too since
it's usage has been 100% synchronous and not DESTROY-based as
they are in tests.
Eric Wong [Mon, 28 Nov 2022 05:31:03 +0000 (05:31 +0000)]
lei_mirror: rely on global process reaper
We no longer rely on SIGCHLD for predictability, and instead
call waitpid at safe points. This will make it easier for us to
do parallel mirroring of multiple inboxes while preserving
proper dependencies via ->DESTROY callbacks.
Eric Wong [Mon, 28 Nov 2022 05:31:00 +0000 (05:31 +0000)]
clone: parallelize v2 epoch clones
This is a first step in supporting completely parallelized
clones. Eventually, everything will be parallelized and
dependencies will be managed via callbacks.
Eric Wong [Mon, 28 Nov 2022 05:30:58 +0000 (05:30 +0000)]
clone: support multi-inbox clone
This is to ensure we can do `public-inbox-clone https://yhbt.net/lore'
or `public-inbox-clone https://lore.kernel.org/' and clone all
inboxes (and whatever else git stores).