Eric Wong [Mon, 2 Jan 2023 08:20:13 +0000 (08:20 +0000)]
qspawn: fix process finalization for generic PSGI server
This fixes the inability to fallback to WwwCoderepo on cgit 404s
with generic PSGI servers. Unfortunately, this doesn't seem to
get tested with generic PSGI tests, and doesn't happen on
public-inbox-httpd, obviously.
Eric Wong [Mon, 2 Jan 2023 08:18:47 +0000 (08:18 +0000)]
t/httpd-unix.t: stop tail(1) before stopping server
When using the `TAIL' environment, the tail(1) process
inherits the non-FD_CLOEXEC pipe we introduced in commit 5f9baf725106 (t/httpd-unix: eliminate some busy waits, 2022-12-12).
We must ensure that pipe is gone before waiting on -httpd's
death by destroying the tail(1) process, first.
Eric Wong [Sun, 1 Jan 2023 10:54:40 +0000 (10:54 +0000)]
t/solver_git.t: avoid redundant work for snapshot test
We only have to generate the expected tarball and checksum once
for testing both -httpd and generic PSGI. And drop the redundant
length check since the SHA-256 check is sufficient.
Eric Wong [Sun, 25 Dec 2022 13:24:12 +0000 (13:24 +0000)]
syscall: fix i386/i686 detection
Both __ILP32__ and __x86_64__ need to be defined for a system to
be considered x32. Without this, my 32-bit Debian VM on a
64-bit kernel would fail after upgrading to Perl 5.32.1 on
Debian 11 (bullseye).
Eric Wong [Sat, 24 Dec 2022 10:40:47 +0000 (10:40 +0000)]
test_common: avoid needless fcntl in start_script
POSIX::dup2 does not do anything in addition to dup2(2) and is
thus immune to Perl automatically setting FD_CLOEXEC on FDs it
makes into IO objects/globs. We only need to account for the
case when both args for dup2 are identical, in which case the
kernel treats it as a no-op and then thus we need to clear
FD_CLOEXEC ourselves.
Eric Wong [Sat, 24 Dec 2022 07:17:07 +0000 (07:17 +0000)]
spawn_pp: cleanup, error checks and descriptive errors
The pipe(2) call needs to be checked for failure. While we're
at it, none of this is affected by unicode_strings, so Perl v5.12
is safe to use and gets rid of the strict.pm overhead.
We can also `die' directly since it's pure Perl and not contort
our Perl code to the assumptions of the Inline::C version.
`die' already implies a failure, so follow existing conventions
of just having the failing function or op name.
We can also rely on the grep op for filtering out non-system
signals to avoid writing a loop ourselves.
Finally, drop a needless `undef' on the read side of the pipe
since it's already closed immediately in the child.
Eric Wong [Fri, 23 Dec 2022 22:11:01 +0000 (22:11 +0000)]
cleanup pure Perl use
This quiets down tests when the optional Inline::C is missing.
We do not currently have a hard dependency on Inline::C; and we
should not leave PERL_INLINE_DIRECTORY set in PublicInbox::Spawn
if Inline fails to build.
Leaving PERL_INLINE_DIRECTORY set by Spawn after it fails (due
to missing Inline::C) would cause downstream failures in Gcf2
builds for the same reason. So we should bail out of the Gcf2
build early if Spawn already failed due to missing Inline::C.
The only time we want to be noisy is if a user explicitly sets
PERL_INLINE_DIRECTORY and Inline::C is missing.
Eric Wong [Fri, 23 Dec 2022 12:51:08 +0000 (12:51 +0000)]
syscall: drop syscall.ph support
h2ph-generated *.ph files are often wrong or incomplete and IME
they cause more problems than they solve. Furthermore, we need
knowledge of struct layouts which h2ph-generated files can't get
us. So trim down some bloat and leave a note for porters.
Eric Wong [Fri, 23 Dec 2022 11:05:15 +0000 (11:05 +0000)]
httpd/async + qspawn: rename {fh} fields
Use more unique names within the project to minimize confusion
since these packages interact quite a bit and using identical
names leads to needless confusion.
Eric Wong [Fri, 23 Dec 2022 11:05:13 +0000 (11:05 +0000)]
httpd/async: remove useless undef
Assigning `undef' to a scalar doesn't free it's memory,
we need to call `undef($var)' in the caller. It's also
been pointless since we simplified ->async_pass in commit b7fbffd1f8c12556 (httpd/async: get rid of ephemeral main_cb, 2019-12-25)
Eric Wong [Fri, 23 Dec 2022 11:05:12 +0000 (11:05 +0000)]
httpd: avoid crash on cgit -> coderepo 404 fallback
A trickled cgit response can cause HTTPD::Async->event_step to
fire an extra time after header parsing. We need to account for
the lack of async_pass call populating ->{fh} and ->{http} in
that case and avoid calling $self->{fh}->write when there's
no {fh}.
Eric Wong [Wed, 21 Dec 2022 23:22:10 +0000 (23:22 +0000)]
git: cap MAX_INFLIGHT value to POSIX minimum
This ensures we get consistent pipelining behavior across
platforms. Furthermore, a smaller value is probably more
reasonable since "git cat-file" can usually outpace indexing and
lower values allow us to react to user interaction (e.g. Ctrl-C)
more quickly.
The previous value based on Linux PIPE_BUF (4096) allowed a
value of 189 which worked fine on non-musl Linux systems, but
failed on musl-based Void and Alpine Linux. Mysteriously, this
works on musl up to a value of 114 and starts locking up at 115.
The reason for this failure is currently unexplained and will
hopefully be discovered soon.
Regardless, capping the value to 23 based on the universal
PIPE_BUF minimum (512) seems reasonable, anyways.
Eric Wong [Wed, 14 Dec 2022 22:24:08 +0000 (22:24 +0000)]
search_query: fix warnings on empty "o=" query
This fixes the following warnings from bad URLs:
Odd number of elements in anonymous hash at <>/PublicInbox/SearchQuery.pm line 22.
Argument "l" isn't numeric in numeric lt (<) at <>/PublicInbox/SearchView.pm line 39.
Eric Wong [Mon, 12 Dec 2022 04:22:01 +0000 (04:22 +0000)]
t/httpd-unix: eliminate some busy waits
A small step towards making our test suite as sleep-less as
possible. We can use FIFOs to coordinate processes in a few
places, while other spots can take advantage of disabling
FD_CLOEXEC to further eliminate back-and-forth traffic between
processes.
This speeds up t/httpd-unix.t by ~20 ms on my system.
Eric Wong [Thu, 1 Dec 2022 11:21:32 +0000 (11:21 +0000)]
lei_saved_search: expand only/include/exclude to absolute paths
While users may specify relative paths for convenience on the
command-line, absolute paths are required for `lei up' since
that (especially `lei up --all') could run from anywhere.
Note that we need to do this when parsing the command-line
options, since shortcuts for URL matching on URL path components
are allowed for `lei q', and those same shortcuts may remain
in effect across to `lei up' as the underlying external may
be moved to a different URI host.
Eric Wong [Mon, 28 Nov 2022 05:32:27 +0000 (05:32 +0000)]
lei_mirror: omit trailing slash for git remote.*.url
While PublicInbox::WWW URLs have a trailing slash in them
for compatibility with static web server mirrors, URLs
intended for `git clone' don't benefit from this and the
trailing `/' just looks awkward.
Eric Wong [Mon, 28 Nov 2022 05:32:24 +0000 (05:32 +0000)]
lei_mirror: eliminate circular references
...by using local-ized globals. While non-globals could work,
eliminating the {todo} and {fgrp_todo} refs in all sub-refs
is more error-prone and the `local' construct is convenient.
This allows us to get rid of the `delete $fgrp->{-fini}' call
in pack_refs and eliminates the indiscriminate reaping of all
processes before calling fgrp_fetch_all. This means we can
fully depend on DESTROY to provide predictable dependency
handling while supporting parallelization.
Global $TODO and $FGRP_TODO now become SCALAR refs on
consumption so they can act as assertions to detect future bugs.
Eric Wong [Mon, 28 Nov 2022 05:32:23 +0000 (05:32 +0000)]
lei_mirror: support {symlinks} from manifest
It's part of grokmirror, and useful for keeping compatibility.
We can make use of File::Spec->abs2rel here to ensure our
symlinks are relative and the entire mirror can be copied
as a whole.
Eric Wong [Mon, 28 Nov 2022 05:32:22 +0000 (05:32 +0000)]
lei_mirror: set {head} from manifest
We handle symbolic refs properly, at least. It's also possible
for $GIT_DIR/HEAD to contain a full SHA-1/SHA-256, and we'll
support that by using update-ref --no-deref
Eric Wong [Mon, 28 Nov 2022 05:32:20 +0000 (05:32 +0000)]
lei_mirror: simplify forkgroup-related subs
We can pass fewer variables around on stack since $fgrp is just
a copy of $self. We can also rely more on explicit callback
passing rather than relying on OnDestroy and ->cancel for
conditional calls.
Eric Wong [Mon, 28 Nov 2022 05:32:19 +0000 (05:32 +0000)]
lei_mirror: run v1_done earlier on forkgroup done
There's likely a circular reference somewhere which was
preventing v1_done from running early. In any case, this allows
v1_done to run in parallel with the pack-refs process since
there's no ordering dependency between ref-packing and v1_done.
Eric Wong [Mon, 28 Nov 2022 05:32:17 +0000 (05:32 +0000)]
lei_mirror: remove janky mirror.done stamp file
This makes a fundamental (and overdue) change to the core of
lei in how it handles child errors. Every process which
generates or receives a child error will remember it before
passing it on. This ensures _wq_done_wait callbacks will
know of prior errors aside from $? when it runs.
Eric Wong [Mon, 28 Nov 2022 05:32:14 +0000 (05:32 +0000)]
lei_mirror: respect `./' and `../' prefixes for CLI args
Users may wish to keep objstore and manifest files at
a higher level to prevent direct access via HTTP(S),
so those relative paths probably make sense.
Eric Wong [Mon, 28 Nov 2022 05:32:10 +0000 (05:32 +0000)]
clone|fetch: support passing --prune(-tags) to `git fetch'
We need to be able to get rid of removed branches and tags on
the remote. --prune-tags is implied for non-objstore repos,
and incompatible with objstore repos.
Eric Wong [Mon, 28 Nov 2022 05:32:08 +0000 (05:32 +0000)]
lei_mirror: delay configuring forkgroups
When relying on `public-inbox-clone --manifest=', idempotent
`git config' invocations can take a considerable amount of
time. We still configure inboxes idempotently since it
allows quickly changing URLs to mirrors, but we just defer
it until an update is actually needed.
Eric Wong [Mon, 28 Nov 2022 05:32:02 +0000 (05:32 +0000)]
lei_mirror: properly pack-refs in non-forkgroup repos
We need to ensure `git update-ref --stdin' is complete
before running `git pack-refs', otherwise loose refs can
remain while update-ref is still running.
Eric Wong [Mon, 28 Nov 2022 05:32:01 +0000 (05:32 +0000)]
fetch: eliminate File::Temp->filename var
File::Temp objects are overloaded to automatically
call ->filename when stringified, so there's no need
to store the ->filename result on the Perl stack.
Eric Wong [Mon, 28 Nov 2022 05:31:54 +0000 (05:31 +0000)]
lei_mirror: forkgroups use `git fetch --multiple'
This offloads network parallelization and safety off to git
itself while reducing the amount of unnecessary process spawning
we do. This also improves readability of pack-refs invocations
and reduces the need for them.
To prevent heavily-forked repos from hitting system command-line
size limits, we group refs to be updated in the "fgrptmp" group.
Eric Wong [Mon, 28 Nov 2022 05:31:51 +0000 (05:31 +0000)]
lei_mirror: drop git <1.8.5 support
Supporting git <1.8.5 via fetch on non-forkgroup repos would
make auto-GC dangerous, and I want to support auto-GC instead
of relying on the preciousObjects extension.
Since git 1.8.5 is 9 years old at this point, and grokmirror
(used by the only CentOS 7.x user I know of) already relies on
newer git, simplify our code and only fetch into forkgroups.
Eric Wong [Mon, 28 Nov 2022 05:31:49 +0000 (05:31 +0000)]
lei_mirror: preserve permissions of existing alternates file
We don't want to be clobbering permissions when changing to
relative paths. Furthermore, we can avoid writing to the
alternates file if there are no changes.
Eric Wong [Mon, 28 Nov 2022 05:31:45 +0000 (05:31 +0000)]
clone: flesh out --objstore behavior and document
We can support absolute paths to avoid surprising behaviors,
but relative paths are preferred since the goal is to be
accessible over the "dumb" HTTP git transport (the dumb
transport is uses less memory and CPU on the server).
Eric Wong [Mon, 28 Nov 2022 05:31:40 +0000 (05:31 +0000)]
lei_mirror: avoid convoluted lazy_cb usage
lazy_cb should only be used for lei command dispatch and
completion callbacks when the method isn't known at startup.
There's zero reason to use it when the method is known
ahead-of-time, especially when there's a comment pointing
reviewers towards the only possible method it can dispatch.
Eric Wong [Mon, 28 Nov 2022 05:31:33 +0000 (05:31 +0000)]
clone: support --inbox-version
This is part of `lei add-external --mirror', and it makes
sense to have for development and testing. We'll also add
a fallback in case somebody tries --inbox-version and fails
due to a newer remote instances of public-inbox.
Eric Wong [Mon, 28 Nov 2022 05:31:27 +0000 (05:31 +0000)]
lei_mirror: do not fetch descriptions if using manifest
If a manifest exists, we can expect the description to always be
present, thus there's no need to make a separate HTTP(S) request
since we can use it as-is from the manifest for v1||coderepos
and strip / \[epoch [0-9]+\]\z/ from v1.