]> Sergey Matveev's repositories - public-inbox.git/log
public-inbox.git
16 months agolei_mirror: break circular references
Eric Wong [Mon, 12 Dec 2022 09:58:54 +0000 (09:58 +0000)]
lei_mirror: break circular references

It seems more graceful than dying and breaking a mirror, since
the {reference} in util-linux was irrelevant anyways with the
move to forkgroups.

16 months agolei_mirror: trim current symlinks from warning
Eric Wong [Mon, 12 Dec 2022 09:58:53 +0000 (09:58 +0000)]
lei_mirror: trim current symlinks from warning

This quiets needless warnings from current symlinks, while still
complaining about out-of-date ones.

16 months agot/httpd-unix: eliminate some busy waits
Eric Wong [Mon, 12 Dec 2022 04:22:01 +0000 (04:22 +0000)]
t/httpd-unix: eliminate some busy waits

A small step towards making our test suite as sleep-less as
possible.  We can use FIFOs to coordinate processes in a few
places, while other spots can take advantage of disabling
FD_CLOEXEC to further eliminate back-and-forth traffic between
processes.

This speeds up t/httpd-unix.t by ~20 ms on my system.

16 months agotests: replace select/usleep calls with tick()
Eric Wong [Mon, 12 Dec 2022 04:22:00 +0000 (04:22 +0000)]
tests: replace select/usleep calls with tick()

This makes it easier to identify places in tests which cause
unnecessary slowdowns doing busy waits.

17 months agolei_saved_search: expand only/include/exclude to absolute paths
Eric Wong [Thu, 1 Dec 2022 11:21:32 +0000 (11:21 +0000)]
lei_saved_search: expand only/include/exclude to absolute paths

While users may specify relative paths for convenience on the
command-line, absolute paths are required for `lei up' since
that (especially `lei up --all') could run from anywhere.

Note that we need to do this when parsing the command-line
options, since shortcuts for URL matching on URL path components
are allowed for `lei q', and those same shortcuts may remain
in effect across to `lei up' as the underlying external may
be moved to a different URI host.

17 months agolei: stricter external checks for valid $GIT_DIR/objects
Eric Wong [Thu, 1 Dec 2022 11:21:31 +0000 (11:21 +0000)]
lei: stricter external checks for valid $GIT_DIR/objects

I ended up with my $HOME in
~/.cache/lei/all_locals_ever.git/objects/info/alterntes
and am trying to avoid that in the future.

17 months agolei_mirror: handle forkgroup changes
Eric Wong [Mon, 28 Nov 2022 05:32:32 +0000 (05:32 +0000)]
lei_mirror: handle forkgroup changes

Forkgroups for projects are not static and may change at
the whim of the remote sysadmin.  Ensure we can migrate
to the new forkgroup.

Old forkgroups do not get pruned, yet, and their entries
stay in alternates.

17 months agoclone: support --project-list= for cgit
Eric Wong [Mon, 28 Nov 2022 05:32:31 +0000 (05:32 +0000)]
clone: support --project-list= for cgit

grokmirror supports it, and we also support cgit, so this should
make running mirrors easier.  This will be useful for scripting
purposes, too.

17 months agolei_mirror: break out of fgrp fetch iteration early
Eric Wong [Mon, 28 Nov 2022 05:32:30 +0000 (05:32 +0000)]
lei_mirror: break out of fgrp fetch iteration early

Don't queue up more work if we already have a failure somewhere.

17 months agolei_mirror: don't clobber inbox.config.example if it exists
Eric Wong [Mon, 28 Nov 2022 05:32:29 +0000 (05:32 +0000)]
lei_mirror: don't clobber inbox.config.example if it exists

Users may save notes or edits in there, and it's only an
example, so there's no need to mindlessly clobber it.

17 months agolei_mirror: set info/web/last-modified from manifest
Eric Wong [Mon, 28 Nov 2022 05:32:28 +0000 (05:32 +0000)]
lei_mirror: set info/web/last-modified from manifest

The grokmirror manifest sets {modified}, so we might as well use
it to make life easier for users of cgit (and compatible)
front-ends.

17 months agolei_mirror: omit trailing slash for git remote.*.url
Eric Wong [Mon, 28 Nov 2022 05:32:27 +0000 (05:32 +0000)]
lei_mirror: omit trailing slash for git remote.*.url

While PublicInbox::WWW URLs have a trailing slash in them
for compatibility with static web server mirrors, URLs
intended for `git clone' don't benefit from this and the
trailing `/' just looks awkward.

17 months agolei_mirror: avoid redundant curl `-f' use
Eric Wong [Mon, 28 Nov 2022 05:32:26 +0000 (05:32 +0000)]
lei_mirror: avoid redundant curl `-f' use

All of our curl invocations use the `-f' (--fail) switch
anyways, and I can't imagine a time when we'd want silent
failures.

17 months agolei_mirror: use curl -z/--timecond if manifest exists
Eric Wong [Mon, 28 Nov 2022 05:32:25 +0000 (05:32 +0000)]
lei_mirror: use curl -z/--timecond if manifest exists

This lets us save cycles and avoid scanning + comparing manifest
contents by relying on the Last-Modified HTTP response header.

17 months agolei_mirror: eliminate circular references
Eric Wong [Mon, 28 Nov 2022 05:32:24 +0000 (05:32 +0000)]
lei_mirror: eliminate circular references

...by using local-ized globals.  While non-globals could work,
eliminating the {todo} and {fgrp_todo} refs in all sub-refs
is more error-prone and the `local' construct is convenient.

This allows us to get rid of the `delete $fgrp->{-fini}' call
in pack_refs and eliminates the indiscriminate reaping of all
processes before calling fgrp_fetch_all.  This means we can
fully depend on DESTROY to provide predictable dependency
handling while supporting parallelization.

Global $TODO and $FGRP_TODO now become SCALAR refs on
consumption so they can act as assertions to detect future bugs.

17 months agolei_mirror: support {symlinks} from manifest
Eric Wong [Mon, 28 Nov 2022 05:32:23 +0000 (05:32 +0000)]
lei_mirror: support {symlinks} from manifest

It's part of grokmirror, and useful for keeping compatibility.
We can make use of File::Spec->abs2rel here to ensure our
symlinks are relative and the entire mirror can be copied
as a whole.

17 months agolei_mirror: set {head} from manifest
Eric Wong [Mon, 28 Nov 2022 05:32:22 +0000 (05:32 +0000)]
lei_mirror: set {head} from manifest

We handle symbolic refs properly, at least.  It's also possible
for $GIT_DIR/HEAD to contain a full SHA-1/SHA-256, and we'll
support that by using update-ref --no-deref

17 months agolei_mirror: shorten scope mirror objects
Eric Wong [Mon, 28 Nov 2022 05:32:21 +0000 (05:32 +0000)]
lei_mirror: shorten scope mirror objects

We may be able to save some memory this way.

17 months agolei_mirror: simplify forkgroup-related subs
Eric Wong [Mon, 28 Nov 2022 05:32:20 +0000 (05:32 +0000)]
lei_mirror: simplify forkgroup-related subs

We can pass fewer variables around on stack since $fgrp is just
a copy of $self.  We can also rely more on explicit callback
passing rather than relying on OnDestroy and ->cancel for
conditional calls.

17 months agolei_mirror: run v1_done earlier on forkgroup done
Eric Wong [Mon, 28 Nov 2022 05:32:19 +0000 (05:32 +0000)]
lei_mirror: run v1_done earlier on forkgroup done

There's likely a circular reference somewhere which was
preventing v1_done from running early.  In any case, this allows
v1_done to run in parallel with the pack-refs process since
there's no ordering dependency between ref-packing and v1_done.

17 months agolei_mirror: simplify most process spawning
Eric Wong [Mon, 28 Nov 2022 05:32:18 +0000 (05:32 +0000)]
lei_mirror: simplify most process spawning

For commands where we rely on successful exit codes to continue,
start_cmd() generalizes well enough to be used in a variety
of places.

17 months agolei_mirror: remove janky mirror.done stamp file
Eric Wong [Mon, 28 Nov 2022 05:32:17 +0000 (05:32 +0000)]
lei_mirror: remove janky mirror.done stamp file

This makes a fundamental (and overdue) change to the core of
lei in how it handles child errors.  Every process which
generates or receives a child error will remember it before
passing it on.  This ensures _wq_done_wait callbacks will
know of prior errors aside from $? when it runs.

17 months agolei_mirror: update fingerprints when writing local manifest.js.gz
Eric Wong [Mon, 28 Nov 2022 05:32:16 +0000 (05:32 +0000)]
lei_mirror: update fingerprints when writing local manifest.js.gz

We need our local manifest to match the actual data we store,
not what we're mirroring.

17 months agolei_mirror: --manifest= affects destination, too
Eric Wong [Mon, 28 Nov 2022 05:32:15 +0000 (05:32 +0000)]
lei_mirror: --manifest= affects destination, too

This probably makes the most sense, if a user wants to
use an alternate path to read from, it's likely they
want to write it there, too.

17 months agolei_mirror: respect `./' and `../' prefixes for CLI args
Eric Wong [Mon, 28 Nov 2022 05:32:14 +0000 (05:32 +0000)]
lei_mirror: respect `./' and `../' prefixes for CLI args

Users may wish to keep objstore and manifest files at
a higher level to prevent direct access via HTTP(S),
so those relative paths probably make sense.

17 months agolei_mirror: don't warn on missing manifest on initial clone
Eric Wong [Mon, 28 Nov 2022 05:32:13 +0000 (05:32 +0000)]
lei_mirror: don't warn on missing manifest on initial clone

Users may choose to specify a manifest on the initial clone,
so don't complain if it's missing in that case.

17 months agoclone: support --keep-going/-k like make(1)
Eric Wong [Mon, 28 Nov 2022 05:32:12 +0000 (05:32 +0000)]
clone: support --keep-going/-k like make(1)

This can be useful for intermittent network errors,
and the required code changes makes it less dependent
on global state.

17 months agolei_mirror: avoid needless FD passing
Eric Wong [Mon, 28 Nov 2022 05:32:11 +0000 (05:32 +0000)]
lei_mirror: avoid needless FD passing

Most git processes we invoke don't care about stdin nor stdout,
so don't waste cycles and memory dealing with it.

stderr passing is added `git config --unset-all remotes.fgrptmp'
invocation, though, since that can fail due to I/O errors or OOM.

17 months agoclone|fetch: support passing --prune(-tags) to `git fetch'
Eric Wong [Mon, 28 Nov 2022 05:32:10 +0000 (05:32 +0000)]
clone|fetch: support passing --prune(-tags) to `git fetch'

We need to be able to get rid of removed branches and tags on
the remote.  --prune-tags is implied for non-objstore repos,
and incompatible with objstore repos.

17 months agoclone: canonicalize destination path from CLI
Eric Wong [Mon, 28 Nov 2022 05:32:09 +0000 (05:32 +0000)]
clone: canonicalize destination path from CLI

We'll probably save the destination path somewhere, so
ensure the path doesn't have redundant slashes and such

17 months agolei_mirror: delay configuring forkgroups
Eric Wong [Mon, 28 Nov 2022 05:32:08 +0000 (05:32 +0000)]
lei_mirror: delay configuring forkgroups

When relying on `public-inbox-clone --manifest=', idempotent
`git config' invocations can take a considerable amount of
time.  We still configure inboxes idempotently since it
allows quickly changing URLs to mirrors, but we just defer
it until an update is actually needed.

17 months agoclone: support loading manifest.js.gz from destination
Eric Wong [Mon, 28 Nov 2022 05:32:07 +0000 (05:32 +0000)]
clone: support loading manifest.js.gz from destination

This will allow us to quickly check fingerprints against
remotes with a single HTTP(S) request, saving us numerous
`git show-refs' invocations.

17 months agolei_mirror: check fingerprints before fetching
Eric Wong [Mon, 28 Nov 2022 05:32:06 +0000 (05:32 +0000)]
lei_mirror: check fingerprints before fetching

While we currently don't check an existing on-disk manifest,
using `git show-ref' can still save us precious network traffic.

17 months agolei_mirror: support resuming multi-repo clones
Eric Wong [Mon, 28 Nov 2022 05:32:05 +0000 (05:32 +0000)]
lei_mirror: support resuming multi-repo clones

This is actually a combination of clone and fetch, and I don't
think `public-inbox-fetch' will be used to update multiple git
repos (inbox or not).

Our use of `git update-ref --stdin -z' was broken for
incremental updates, but now fixed to properly NUL-terminate
commands.

17 months agoon_destroy: support ->cancel callback
Eric Wong [Mon, 28 Nov 2022 05:32:04 +0000 (05:32 +0000)]
on_destroy: support ->cancel callback

We probably use this idiom elsewhere, but having this method
around to make future use cases more readable is probably prudent.

17 months agolei_mirror: show child error error code
Eric Wong [Mon, 28 Nov 2022 05:32:03 +0000 (05:32 +0000)]
lei_mirror: show child error error code

Just passing the exit value of the child process isn't to
our parent process isn't very useful when multiple commands
are failing at once.

17 months agolei_mirror: properly pack-refs in non-forkgroup repos
Eric Wong [Mon, 28 Nov 2022 05:32:02 +0000 (05:32 +0000)]
lei_mirror: properly pack-refs in non-forkgroup repos

We need to ensure `git update-ref --stdin' is complete
before running `git pack-refs', otherwise loose refs can
remain while update-ref is still running.

17 months agofetch: eliminate File::Temp->filename var
Eric Wong [Mon, 28 Nov 2022 05:32:01 +0000 (05:32 +0000)]
fetch: eliminate File::Temp->filename var

File::Temp objects are overloaded to automatically
call ->filename when stringified, so there's no need
to store the ->filename result on the Perl stack.

17 months agofetch: use v5.12
Eric Wong [Mon, 28 Nov 2022 05:32:00 +0000 (05:32 +0000)]
fetch: use v5.12

Another tiny step towards improved startup performance by
avoiding one .pm file.

17 months agolei_mirror: shorten remote names
Eric Wong [Mon, 28 Nov 2022 05:31:59 +0000 (05:31 +0000)]
lei_mirror: shorten remote names

The lengthy-but-human-meaningful remote names are more expensive
at runtime and increase packed-refs space.

17 months agoclone: require `--objstore=' for default location
Eric Wong [Mon, 28 Nov 2022 05:31:58 +0000 (05:31 +0000)]
clone: require `--objstore=' for default location

Allowing just `--objstore' without `=' was confusing,
since it could eat one of the required parameters (URL or
DESTINATION).

17 months agoclone: use v5.12
Eric Wong [Mon, 28 Nov 2022 05:31:57 +0000 (05:31 +0000)]
clone: use v5.12

Another small step in what will probably a be a decades-long
quest to reduce startup time by a few milliseconds.

17 months agoclone: drop unnecessary requires
Eric Wong [Mon, 28 Nov 2022 05:31:56 +0000 (05:31 +0000)]
clone: drop unnecessary requires

These packages are all require-ed elsewhere.

17 months agoclone: move --dry-run handling to lei_mirror
Eric Wong [Mon, 28 Nov 2022 05:31:55 +0000 (05:31 +0000)]
clone: move --dry-run handling to lei_mirror

lei will probably support dry-run in more places, too.

17 months agolei_mirror: forkgroups use `git fetch --multiple'
Eric Wong [Mon, 28 Nov 2022 05:31:54 +0000 (05:31 +0000)]
lei_mirror: forkgroups use `git fetch --multiple'

This offloads network parallelization and safety off to git
itself while reducing the amount of unnecessary process spawning
we do.  This also improves readability of pack-refs invocations
and reduces the need for them.

To prevent heavily-forked repos from hitting system command-line
size limits, we group refs to be updated in the "fgrptmp" group.

17 months agolei_mirror: fix --dry-run for forkgroups
Eric Wong [Mon, 28 Nov 2022 05:31:53 +0000 (05:31 +0000)]
lei_mirror: fix --dry-run for forkgroups

We must not make permanent changes to the FS if --dry-run is in use.

17 months agolei_mirror: make basename more descriptive
Eric Wong [Mon, 28 Nov 2022 05:31:52 +0000 (05:31 +0000)]
lei_mirror: make basename more descriptive

This makes it easier for humans to distinguish between
"Alice/project.git" and "Bob/project.git"

17 months agolei_mirror: drop git <1.8.5 support
Eric Wong [Mon, 28 Nov 2022 05:31:51 +0000 (05:31 +0000)]
lei_mirror: drop git <1.8.5 support

Supporting git <1.8.5 via fetch on non-forkgroup repos would
make auto-GC dangerous, and I want to support auto-GC instead
of relying on the preciousObjects extension.

Since git 1.8.5 is 9 years old at this point, and grokmirror
(used by the only CentOS 7.x user I know of) already relies on
newer git, simplify our code and only fetch into forkgroups.

17 months agolei_mirror: do not show ref updates w/o --verbose
Eric Wong [Mon, 28 Nov 2022 05:31:50 +0000 (05:31 +0000)]
lei_mirror: do not show ref updates w/o --verbose

It's too noisy IMHO, and UIs are always opinionated.

17 months agolei_mirror: preserve permissions of existing alternates file
Eric Wong [Mon, 28 Nov 2022 05:31:49 +0000 (05:31 +0000)]
lei_mirror: preserve permissions of existing alternates file

We don't want to be clobbering permissions when changing to
relative paths.  Furthermore, we can avoid writing to the
alternates file if there are no changes.

17 months agolei_mirror: force --no-tags when fetching forkgroups
Eric Wong [Mon, 28 Nov 2022 05:31:48 +0000 (05:31 +0000)]
lei_mirror: force --no-tags when fetching forkgroups

We can't have multiple remotes writing to refs/tags/*
(instead of refs/remotes/*/tags) due to potential conflicts.

17 months agolei_mirror: set description for non-inboxes, too
Eric Wong [Mon, 28 Nov 2022 05:31:47 +0000 (05:31 +0000)]
lei_mirror: set description for non-inboxes, too

We can still set $GIT_DIR/description when cloning coderepos with
--inbox-config=never

17 months agolei_mirror: always pack refs for coderepos
Eric Wong [Mon, 28 Nov 2022 05:31:46 +0000 (05:31 +0000)]
lei_mirror: always pack refs for coderepos

Unlike object packing, ref packing is cheap and fast.

17 months agoclone: flesh out --objstore behavior and document
Eric Wong [Mon, 28 Nov 2022 05:31:45 +0000 (05:31 +0000)]
clone: flesh out --objstore behavior and document

We can support absolute paths to avoid surprising behaviors,
but relative paths are preferred since the goal is to be
accessible over the "dumb" HTTP git transport (the dumb
transport is uses less memory and CPU on the server).

17 months agolei_mirror: ensure git <1.8.5 fallback can use torsocks
Eric Wong [Mon, 28 Nov 2022 05:31:44 +0000 (05:31 +0000)]
lei_mirror: ensure git <1.8.5 fallback can use torsocks

Since we fall back to `git fetch' on versions of git without
`git update-ref --stdin' support, we must also support
torsocks use on Tor .onion URLs

17 months agolei_mirror: cleanup process reaping logic
Eric Wong [Mon, 28 Nov 2022 05:31:43 +0000 (05:31 +0000)]
lei_mirror: cleanup process reaping logic

We can put more of the default --jobs logic and loop handling
inside a sub to simplify callers.

17 months agolei_mirror: support --objstore and forkgroups
Eric Wong [Mon, 28 Nov 2022 05:31:42 +0000 (05:31 +0000)]
lei_mirror: support --objstore and forkgroups

The {forkgroup} directive of grokmirror 2.x manifest.js.gz
can facilitate more space savings and improved pack performance
with pack.islands.

17 months agolei_mirror: simplify clone_v2_prep
Eric Wong [Mon, 28 Nov 2022 05:31:41 +0000 (05:31 +0000)]
lei_mirror: simplify clone_v2_prep

Since everything relies on the instance-specific {todo} queue,
there's no need to have sub-specific queues.

17 months agolei_mirror: avoid convoluted lazy_cb usage
Eric Wong [Mon, 28 Nov 2022 05:31:40 +0000 (05:31 +0000)]
lei_mirror: avoid convoluted lazy_cb usage

lazy_cb should only be used for lei command dispatch and
completion callbacks when the method isn't known at startup.
There's zero reason to use it when the method is known
ahead-of-time, especially when there's a comment pointing
reviewers towards the only possible method it can dispatch.

17 months agolei_mirror: hoist out dump_manifest sub
Eric Wong [Mon, 28 Nov 2022 05:31:39 +0000 (05:31 +0000)]
lei_mirror: hoist out dump_manifest sub

We can reuse it in PublicInbox::Fetch, too.

17 months agolei_mirror: do not write Makefile for --inbox-config=never
Eric Wong [Mon, 28 Nov 2022 05:31:38 +0000 (05:31 +0000)]
lei_mirror: do not write Makefile for --inbox-config=never

We want to be able to clone non-inbox git repos, too.

17 months agolei_mirror: add `index' target to generated Makefile
Eric Wong [Mon, 28 Nov 2022 05:31:37 +0000 (05:31 +0000)]
lei_mirror: add `index' target to generated Makefile

It can probably be a useful hint to avoid misleading users
into always using `--reindex'.

17 months agolei_mirror: cleanup File::Temp OO usage
Eric Wong [Mon, 28 Nov 2022 05:31:36 +0000 (05:31 +0000)]
lei_mirror: cleanup File::Temp OO usage

There's no need to capture or rely on the File::Temp->filename
in most cases since most Perl functions accept file handles all
the same.

17 months agolei_mirror: ensure curl exits 22 on HTTP 404 responses
Eric Wong [Mon, 28 Nov 2022 05:31:35 +0000 (05:31 +0000)]
lei_mirror: ensure curl exits 22 on HTTP 404 responses

Oops, this is actually a long-standing bug :x

17 months agolei_mirror: require Perl v5.12+
Eric Wong [Mon, 28 Nov 2022 05:31:34 +0000 (05:31 +0000)]
lei_mirror: require Perl v5.12+

Another tiny step towards improve startup performance by
relying on Perl 5.12 strictness and avoiding strict.pm

17 months agoclone: support --inbox-version
Eric Wong [Mon, 28 Nov 2022 05:31:33 +0000 (05:31 +0000)]
clone: support --inbox-version

This is part of `lei add-external --mirror', and it makes
sense to have for development and testing.  We'll also add
a fallback in case somebody tries --inbox-version and fails
due to a newer remote instances of public-inbox.

17 months agolei_mirror: simplify v2 code paths
Eric Wong [Mon, 28 Nov 2022 05:31:32 +0000 (05:31 +0000)]
lei_mirror: simplify v2 code paths

We can simply reuse the parallelization of the manifest
code path for non-manifest v2 clones, now.

17 months agolei_mirror: support manifest {references} for v2 epochs
Eric Wong [Mon, 28 Nov 2022 05:31:31 +0000 (05:31 +0000)]
lei_mirror: support manifest {references} for v2 epochs

This may be useful in case a v1 inbox gets forked into v2
(untested).

17 months agolei_mirror: differentiate -entv vs -ent
Eric Wong [Mon, 28 Nov 2022 05:31:30 +0000 (05:31 +0000)]
lei_mirror: differentiate -entv vs -ent

It makes the code easier-to-follow when we have a single
versus multiple entities (`v' for vector, à la `argv').

17 months agolei_mirror: fix glob semantics to match end-of-path
Eric Wong [Mon, 28 Nov 2022 05:31:29 +0000 (05:31 +0000)]
lei_mirror: fix glob semantics to match end-of-path

Globs such as `*/foo' should not match `*/foobar',
this allows cloning only `git' and not
`gitolite-transparency-log` off lore

17 months agolei_mirror: require PublicInbox::Lock at use
Eric Wong [Mon, 28 Nov 2022 05:31:28 +0000 (05:31 +0000)]
lei_mirror: require PublicInbox::Lock at use

It's easier to understand why we lazy-load Lock for v2-only
code paths when we require it near its first use.

17 months agolei_mirror: do not fetch descriptions if using manifest
Eric Wong [Mon, 28 Nov 2022 05:31:27 +0000 (05:31 +0000)]
lei_mirror: do not fetch descriptions if using manifest

If a manifest exists, we can expect the description to always be
present, thus there's no need to make a separate HTTP(S) request
since we can use it as-is from the manifest for v1||coderepos
and strip / \[epoch [0-9]+\]\z/ from v1.

17 months agolei_mirror: defend against infinite loops
Eric Wong [Mon, 28 Nov 2022 05:31:26 +0000 (05:31 +0000)]
lei_mirror: defend against infinite loops

A reference chain of 1000 ought to be enough, I think...

17 months agolei_mirror: fix infinite loop in dependency resolution
Eric Wong [Mon, 28 Nov 2022 05:31:25 +0000 (05:31 +0000)]
lei_mirror: fix infinite loop in dependency resolution

We need to account for dependencies which are marked `done'.

17 months agolei_mirror: allow --epoch on mixed v1/v2 clones
Eric Wong [Mon, 28 Nov 2022 05:31:24 +0000 (05:31 +0000)]
lei_mirror: allow --epoch on mixed v1/v2 clones

It's entirely possible an instance will have both v1 and v2
inboxes (or v2 inboxes and coderepos).  Don't punish --epoch
users by forcing them to run multiple commands.

17 months agolei_mirror: reduce scope of v2 lock
Eric Wong [Mon, 28 Nov 2022 05:31:23 +0000 (05:31 +0000)]
lei_mirror: reduce scope of v2 lock

Guarding against parallel clones isn't realistic, really, only
setting up all.git, and even then, I'm not 100% sure the lock
is useful.

17 months agolei_mirror: retrieve v2 description properly
Eric Wong [Mon, 28 Nov 2022 05:31:22 +0000 (05:31 +0000)]
lei_mirror: retrieve v2 description properly

17 months agoclone: support --inbox-config option
Eric Wong [Mon, 28 Nov 2022 05:31:21 +0000 (05:31 +0000)]
clone: support --inbox-config option

This allows avoiding 404s when trying _/text/config/raw on code
repositories.

17 months agolei_mirror: reduce noise on interrupted clones
Eric Wong [Mon, 28 Nov 2022 05:31:20 +0000 (05:31 +0000)]
lei_mirror: reduce noise on interrupted clones

We don't need git-config or other commands failing loudly.
`git clone' and subcommands it spawns may still spew, but it's no
worse than interrupting `git clone' itself, now.

We accomplish this by localizing $LIVE (formerly %LIVE) and
detecting when its auto-vivification into a hashref goes
out-of-scope during the `DESTRUCT' ${^GLOBAL_PHASE}.

We can't use ${^GLOBAL_PHASE}, yet, either, since it appeared in
Perl 5.14 and we're still migrating slowly to Perl 5.12 before
going to 5.14.

17 months agolei_mirror: support {reference} for v1 manifest clones
Eric Wong [Mon, 28 Nov 2022 05:31:19 +0000 (05:31 +0000)]
lei_mirror: support {reference} for v1 manifest clones

This will be generalized to v2, as well.

17 months agolei_mirror: initialize placeholders with "head" from manifest
Eric Wong [Mon, 28 Nov 2022 05:31:18 +0000 (05:31 +0000)]
lei_mirror: initialize placeholders with "head" from manifest

This only affects v2 epochs, but ensures our bases are covered,
at least.  We'll have to update PublicInbox::Fetch later to
deal with "head" entries in manifest.js.gz, too.

17 months agoclone: support --dry-run / -n flag
Eric Wong [Mon, 28 Nov 2022 05:31:17 +0000 (05:31 +0000)]
clone: support --dry-run / -n flag

It still makes HTTP(S) requests to retrieve the manifest or
scrape HTML, but doesn't make permanent changes to the FS
(aside from modifying {acm}time of ${TMPDIR-/tmp}).

17 months agolei_mirror: set gitweb.owner from manifest
Eric Wong [Mon, 28 Nov 2022 05:31:16 +0000 (05:31 +0000)]
lei_mirror: set gitweb.owner from manifest

This is mainly for coderepos, but sometimes public-inboxes
get shared via cgit/gitweb, too.

17 months agolei_mirror: load most modules up-front
Eric Wong [Mon, 28 Nov 2022 05:31:15 +0000 (05:31 +0000)]
lei_mirror: load most modules up-front

lei lazy loads LeiMirror itself lazily, anyways, and it only
supports HTTP(S) mirrors, so there's no point in delaying most
of the modules it loads.  Some of the inbox-specific and
v2-specific stuff can be lazy-loaded, however, since this
will support mirroring non-inbox repositories, too.

17 months agolei_mirror: load File::Path unconditionally
Eric Wong [Mon, 28 Nov 2022 05:31:14 +0000 (05:31 +0000)]
lei_mirror: load File::Path unconditionally

File::Temp already uses it, so there's no sense in conditionally
require-ing it to save startup time.

17 months agolei_mirror: consolidate clone process management
Eric Wong [Mon, 28 Nov 2022 05:31:13 +0000 (05:31 +0000)]
lei_mirror: consolidate clone process management

This simplifies our code by having fewer places check process
limits and perform reaping.  We'll also print command names
immediately before executing, instead of right before waiting
for running processes.

17 months agolei_mirror: add a hint for skipped epoch permissions
Eric Wong [Mon, 28 Nov 2022 05:31:12 +0000 (05:31 +0000)]
lei_mirror: add a hint for skipped epoch permissions

Some users may think it's git-specific thing to enable
writability, rather than a *nix permissions thing.  Clarify that
it's a standard *nix thing.

17 months agolei_mirror: elide description retrieval for v1|coderepo
Eric Wong [Mon, 28 Nov 2022 05:31:11 +0000 (05:31 +0000)]
lei_mirror: elide description retrieval for v1|coderepo

manifest.js.gz can provide the description without an extra
HTTP(S) requests, so attempt to use it whenever we're using
the manifest.

17 months agolei_mirror: simplify _get_txt_start callers
Eric Wong [Mon, 28 Nov 2022 05:31:10 +0000 (05:31 +0000)]
lei_mirror: simplify _get_txt_start callers

We can avoid needless select()-based sleeps by always
using TMPDIR for temporary files, and just slurping the
small config or description file.

This will make it easier to reuse the description from
the manifest in the next commit.

17 months agomanifest: update module blurb + v5.12
Eric Wong [Mon, 28 Nov 2022 05:31:09 +0000 (05:31 +0000)]
manifest: update module blurb + v5.12

Helps steer new contributors (or forgetful old ones) in the
right direction.

17 months agoswitch inotify/kevent stuff to v5.12
Eric Wong [Mon, 28 Nov 2022 05:31:08 +0000 (05:31 +0000)]
switch inotify/kevent stuff to v5.12

Another tiny step towards an eventual startup time improvements
by avoiding strict.pm

17 months agolei_mirror: retrieve description text asynchronously, too
Eric Wong [Mon, 28 Nov 2022 05:31:07 +0000 (05:31 +0000)]
lei_mirror: retrieve description text asynchronously, too

We can easily parallelize this, so do it.

17 months agolei_mirror: move directory creation to v2-only path
Eric Wong [Mon, 28 Nov 2022 05:31:06 +0000 (05:31 +0000)]
lei_mirror: move directory creation to v2-only path

We rely on `git clone' to create the destination directory
for v1 and coderepos, so having it in _try_config_start was
senseless.

17 months agolei_mirror: default to single job by default
Eric Wong [Mon, 28 Nov 2022 05:31:05 +0000 (05:31 +0000)]
lei_mirror: default to single job by default

Parallel git clones are expensive on the server-side, and
smaller machines (which we encourage) can't handle them, well.

We'll also set `-q' since parallel clones will have output step
all over each other.

17 months agoclone: support parallel v1 clones
Eric Wong [Mon, 28 Nov 2022 05:31:04 +0000 (05:31 +0000)]
clone: support parallel v1 clones

This opens the door to parallel cloning of coderepos, too.  We
can also get rid of needless AutoReap usage, here, too since
it's usage has been 100% synchronous and not DESTROY-based as
they are in tests.

17 months agolei_mirror: rely on global process reaper
Eric Wong [Mon, 28 Nov 2022 05:31:03 +0000 (05:31 +0000)]
lei_mirror: rely on global process reaper

We no longer rely on SIGCHLD for predictability, and instead
call waitpid at safe points.  This will make it easier for us to
do parallel mirroring of multiple inboxes while preserving
proper dependencies via ->DESTROY callbacks.

17 months agolei_mirror: rely on DESTROY to index v2 inbox
Eric Wong [Mon, 28 Nov 2022 05:31:02 +0000 (05:31 +0000)]
lei_mirror: rely on DESTROY to index v2 inbox

This will give us more freedom in upcoming commits
to ensure indexing only happens after all all epochs
are cloned.

17 months agolei_mirror: async config retrieval for v2 w/ manifest
Eric Wong [Mon, 28 Nov 2022 05:31:01 +0000 (05:31 +0000)]
lei_mirror: async config retrieval for v2 w/ manifest

Another step towards being able to minimize mirror time by
supporting parallelization.

17 months agoclone: parallelize v2 epoch clones
Eric Wong [Mon, 28 Nov 2022 05:31:00 +0000 (05:31 +0000)]
clone: parallelize v2 epoch clones

This is a first step in supporting completely parallelized
clones.  Eventually, everything will be parallelized and
dependencies will be managed via callbacks.

17 months agoclone: support --include and --exclude with multi-clone
Eric Wong [Mon, 28 Nov 2022 05:30:59 +0000 (05:30 +0000)]
clone: support --include and --exclude with multi-clone

These will be handy when someone is interested in a subset of
inboxes on a large hosting site.