]> Sergey Matveev's repositories - public-inbox.git/log
public-inbox.git
15 months agowww_coderepo: show tree root as "(root)"
Eric Wong [Tue, 10 Jan 2023 11:49:19 +0000 (11:49 +0000)]
www_coderepo: show tree root as "(root)"

We'll use the `b=' parameter as a hint.  I originally considered
`b=/', but a singular slash `/' isn't used in git for paths.
$refname:$path resolution where $path is an empty string,
`git cat-file -t $refname:' resolves to the tree, so it seems
special-casing the empty string is fine in the web UI, too.

15 months agowww_coderepo: handle "?h=$tip" in summary view
Eric Wong [Tue, 10 Jan 2023 11:49:18 +0000 (11:49 +0000)]
www_coderepo: handle "?h=$tip" in summary view

This makes sense at least as far as the README and `git log' output goes.
We'll also add the `b=' query parameter to the $OID/s/ href for
the README blob.

15 months agowww_coderepo: do not copy {-code_repos} from config
Eric Wong [Sun, 8 Jan 2023 08:04:13 +0000 (08:04 +0000)]
www_coderepo: do not copy {-code_repos} from config

Avoiding 2 extra hash lookups per-request when we do plenty more
isn't worth the static memory overhead.  This shaves another chunk
off our memory use:

$ perl -MDevel::Size=total_size -I lib -MPublicInbox::WwwCoderepo -E \
  'say total_size(PublicInbox::WwwCoderepo->new(PublicInbox::Config->new))'

before: 1184385
 after: 1020878

15 months agoconfig: do not implicitly set coderepo.*.cgiturl
Eric Wong [Sun, 8 Jan 2023 08:04:12 +0000 (08:04 +0000)]
config: do not implicitly set coderepo.*.cgiturl

It's a needless waste of memory and this change reduces the
WwwCoderepo object size by over 25% with over 1K repos.
Using the following check:

  perl -MDevel::Size=total_size -I lib -MPublicInbox::WwwCoderepo -E \
  'say total_size(PublicInbox::WwwCoderepo->new(PublicInbox::Config->new))'

before: 1612515
 after: 1184385

16 months agoqspawn: use Perl 5.12 and rely on `perl -w' for warnings
Eric Wong [Fri, 6 Jan 2023 11:51:39 +0000 (11:51 +0000)]
qspawn: use Perl 5.12 and rely on `perl -w' for warnings

Another step towards making our startup performance faster.

16 months agolei_mirror: do not needlessly rewrite project-list
Eric Wong [Fri, 6 Jan 2023 11:51:33 +0000 (11:51 +0000)]
lei_mirror: do not needlessly rewrite project-list

No need to cause extra wear on storage devices.

16 months agoqspawn: fix EINTR with generic PSGI servers
Eric Wong [Fri, 6 Jan 2023 10:10:53 +0000 (10:10 +0000)]
qspawn: fix EINTR with generic PSGI servers

Using the `next' operator doesn't work with `do {} (until|while)'
loops, so change it to use `until {}'.  I've never encountered
this problem in-the-wild, but I only use -(netd|httpd).

16 months agoqspawn: consistently return 500 on premature EOF
Eric Wong [Fri, 6 Jan 2023 10:10:52 +0000 (10:10 +0000)]
qspawn: consistently return 500 on premature EOF

If {parse_hdr} callback doesn't handle it, we need to break the
loop if the CGI process dies prematurely.  This doesn't fix a
currently known problem, but theoretically a SIGKILL could hit
(cgit || git-http-backend) while -netd or -httpd survives.

16 months agohttpd/async: retry reads properly when parsing headers
Eric Wong [Fri, 6 Jan 2023 10:10:51 +0000 (10:10 +0000)]
httpd/async: retry reads properly when parsing headers

While git-http-backend sends headers with one write syscall,
upstream cgit still trickles them out line-by-line and we need to
account for that and retry Qspawn {parse_hdr} callbacks.

16 months agoqspawn: use fallback response code from CGI program
Eric Wong [Fri, 6 Jan 2023 10:10:50 +0000 (10:10 +0000)]
qspawn: use fallback response code from CGI program

Prefer to use the original (cgit||git-http-backend) HTTP
response code if our fallback to WwwCoderepo fails.  404
codes is typically more appropriate than 500 for these things.

16 months agoclone: implement --exit-code
Eric Wong [Thu, 5 Jan 2023 11:41:57 +0000 (11:41 +0000)]
clone: implement --exit-code

Since public-inbox-clone is now useful for incremental updates
with manifest, --exit-code belongs here, too.

16 months agoclone: document --project-list and --post-update-hook
Eric Wong [Thu, 5 Jan 2023 11:41:56 +0000 (11:41 +0000)]
clone: document --project-list and --post-update-hook

I forgot to document these when I implemented them :x

16 months agowww: make coderepo URL generation more consistent
Eric Wong [Wed, 4 Jan 2023 10:34:05 +0000 (10:34 +0000)]
www: make coderepo URL generation more consistent

WwwStream and WwwText basically show the same thing, except the
latter relies on Linkify to create links.

16 months agogit: pub_urls shows base_url default
Eric Wong [Wed, 4 Jan 2023 10:34:04 +0000 (10:34 +0000)]
git: pub_urls shows base_url default

Since we have native coderepo viewing support without cgit,
configuring coderepo.$FOO.cgitUrl shouldn't be necessary anymore
and we can infer the public name based on the project nickname
(or whatever's in the generated project.list)

16 months agogit: fix non-empty SCRIPT_NAME handling for PSGI mounts
Eric Wong [Wed, 4 Jan 2023 10:34:03 +0000 (10:34 +0000)]
git: fix non-empty SCRIPT_NAME handling for PSGI mounts

When using the `mount' directive in PSGI (Plack::App::URLMap),
SCRIPT_NAME still needs to use a trailing slash before it can
be joined with another URL.

16 months agogit: write_all: remove leftover debug messages
Eric Wong [Thu, 5 Jan 2023 01:44:59 +0000 (01:44 +0000)]
git: write_all: remove leftover debug messages

I used these messages during development to verify Alpine was
triggering the intended codepaths.  They're no longer necessary
and just noise at this point.

Reported-by: Chris Brannon <chris@the-brannons.com>
Fixes: d4ba8828ab23 ("git: fix asynchronous batching for deep pipelines")
16 months agowww_coderepo: implement /$CODE_REPO/atom/ endpoint
Eric Wong [Tue, 3 Jan 2023 11:35:15 +0000 (11:35 +0000)]
www_coderepo: implement /$CODE_REPO/atom/ endpoint

This should be similar or identical to what's in cgit;
and tie into the rest of the www_coderepo stuff.

16 months agogit: fix asynchronous batching for deep pipelines
Eric Wong [Wed, 4 Jan 2023 03:49:34 +0000 (03:49 +0000)]
git: fix asynchronous batching for deep pipelines

...By using non-blocking pipe writes.  This avoids problems for
musl (and other libc) where getdelim(3) used by `git cat-file --batch*'
uses a smaller input buffer than glibc or FreeBSD libc.

My key mistake was our check against MAX_INFLIGHT is only useful
for the initial batch of requests.  It is not useful for
subsequent requests since git will drain the pipe at
unpredictable rates due to libc differences.

To fix this problem, I initially tried to drain the read pipe
as long as readable data was pending.  However, reading git
output without giving git more work would also limit parallelism
opportunities since we don't want git to sit idle, either.  This
change ensures we keep both pipes reasonably full to reduce
stalls and maximize parallelism between git and public-inbox.

While the limit set a few weeks ago in commit
56e6e587745c (git: cap MAX_INFLIGHT value to POSIX minimum, 2022-12-21)
remains in place, any higher or lower limit will work.  It may
be worth it to use an even lower limit to improve interactivity
w.r.t. Ctrl-C interrupts.

I've tested the pre-56e6e587745c and even higher values on an
Alpine VM in the GCC Farm <https://cfarm.tetaneutral.net>

Reported-by: Chris Brannon <chris@the-brannons.com>
Link: https://public-inbox.org/meta/87edssl7u0.fsf@the-brannons.com/T/
16 months agodaemon: don't bother checking for existing FD flags
Eric Wong [Tue, 3 Jan 2023 00:05:06 +0000 (00:05 +0000)]
daemon: don't bother checking for existing FD flags

FD_CLOEXEC is the only currently defined FD flag, and has been
the case for decades at this point.  I highly doubt any default
FD flag will ever be forced on us by the kernel, init system, or
Perl.  So save ourselves a syscall and just call F_SETFD with
the assumption FD_CLOEXEC is the only FD flag that we'd ever
care for.

16 months agogithttpbackend: avoid copying PSGI env
Eric Wong [Tue, 3 Jan 2023 00:03:00 +0000 (00:03 +0000)]
githttpbackend: avoid copying PSGI env

We can stash qspawn.wcb before we fallback to WwwCoderepo to
ensure the qspawn re-dispatch works as expected.  This is still
hacky and I want to tweak it further down the line.  Meanwhile,
lets make it less expensive to do hacky things...

16 months agoqspawn: fix process finalization for generic PSGI server
Eric Wong [Mon, 2 Jan 2023 08:20:13 +0000 (08:20 +0000)]
qspawn: fix process finalization for generic PSGI server

This fixes the inability to fallback to WwwCoderepo on cgit 404s
with generic PSGI servers.  Unfortunately, this doesn't seem to
get tested with generic PSGI tests, and doesn't happen on
public-inbox-httpd, obviously.

16 months agot/httpd-unix.t: stop tail(1) before stopping server
Eric Wong [Mon, 2 Jan 2023 08:18:47 +0000 (08:18 +0000)]
t/httpd-unix.t: stop tail(1) before stopping server

When using the `TAIL' environment, the tail(1) process
inherits the non-FD_CLOEXEC pipe we introduced in commit
5f9baf725106 (t/httpd-unix: eliminate some busy waits, 2022-12-12).
We must ensure that pipe is gone before waiting on -httpd's
death by destroying the tail(1) process, first.

16 months agot/solver_git.t: avoid redundant work for snapshot test
Eric Wong [Sun, 1 Jan 2023 10:54:40 +0000 (10:54 +0000)]
t/solver_git.t: avoid redundant work for snapshot test

We only have to generate the expected tarball and checksum once
for testing both -httpd and generic PSGI. And drop the redundant
length check since the SHA-256 check is sufficient.

This saves 20-30ms on my system.

16 months agot/run.perl: drop branch for a small set of test cases
Eric Wong [Fri, 30 Dec 2022 22:07:28 +0000 (22:07 +0000)]
t/run.perl: drop branch for a small set of test cases

It's not worth it, since our test count is only going to
increase over time.

16 months agowww: load cgitrc for coderepos for solver
Eric Wong [Sat, 31 Dec 2022 06:17:20 +0000 (06:17 +0000)]
www: load cgitrc for coderepos for solver

Loading cgitrc (and associated projects.list) can get users
out of defining as many individual coderepos.

xt/solver.t needs a use of `$_' replaced since that
gets clobbered while parsing cgitrc.

16 months agoclone: fix --post-update-hook behavior
Eric Wong [Fri, 30 Dec 2022 10:59:39 +0000 (10:59 +0000)]
clone: fix --post-update-hook behavior

Only run hooks if we've done a fetch (which may be a no-op), and
add some tests to ensure it works as advertised with and without
--objstore=

16 months agoclone: --dry-run unconditionally runs show-ref
Eric Wong [Fri, 30 Dec 2022 10:59:38 +0000 (10:59 +0000)]
clone: --dry-run unconditionally runs show-ref

It's useful to show what's being updated, of course.

16 months agoclone: support --post-update-hook= from grokmirror
Eric Wong [Wed, 28 Dec 2022 02:56:56 +0000 (02:56 +0000)]
clone: support --post-update-hook= from grokmirror

This should be compatible with both grokmirror 1 and 2 behavior
and serialized on a per-repo basis.

16 months agoqspawn: more generic command chaining
Eric Wong [Tue, 27 Dec 2022 12:51:55 +0000 (12:51 +0000)]
qspawn: more generic command chaining

Move the chaining logic into qspawn so we can gracefully
try other commands when cgit or git-http-backend refuses
to service a request for us.

16 months agosyscall: fix i386/i686 detection
Eric Wong [Sun, 25 Dec 2022 13:24:12 +0000 (13:24 +0000)]
syscall: fix i386/i686 detection

Both __ILP32__ and __x86_64__ need to be defined for a system to
be considered x32.  Without this, my 32-bit Debian VM on a
64-bit kernel would fail after upgrading to Perl 5.32.1 on
Debian 11 (bullseye).

16 months agotest_common: avoid needless fcntl in start_script
Eric Wong [Sat, 24 Dec 2022 10:40:47 +0000 (10:40 +0000)]
test_common: avoid needless fcntl in start_script

POSIX::dup2 does not do anything in addition to dup2(2) and is
thus immune to Perl automatically setting FD_CLOEXEC on FDs it
makes into IO objects/globs.  We only need to account for the
case when both args for dup2 are identical, in which case the
kernel treats it as a no-op and then thus we need to clear
FD_CLOEXEC ourselves.

16 months agospawn_pp: cleanup, error checks and descriptive errors
Eric Wong [Sat, 24 Dec 2022 07:17:07 +0000 (07:17 +0000)]
spawn_pp: cleanup, error checks and descriptive errors

The pipe(2) call needs to be checked for failure.  While we're
at it, none of this is affected by unicode_strings, so Perl v5.12
is safe to use and gets rid of the strict.pm overhead.

We can also `die' directly since it's pure Perl and not contort
our Perl code to the assumptions of the Inline::C version.

`die' already implies a failure, so follow existing conventions
of just having the failing function or op name.

We can also rely on the grep op for filtering out non-system
signals to avoid writing a loop ourselves.

Finally, drop a needless `undef' on the read side of the pipe
since it's already closed immediately in the child.

16 months agocleanup pure Perl use
Eric Wong [Fri, 23 Dec 2022 22:11:01 +0000 (22:11 +0000)]
cleanup pure Perl use

This quiets down tests when the optional Inline::C is missing.

We do not currently have a hard dependency on Inline::C; and we
should not leave PERL_INLINE_DIRECTORY set in PublicInbox::Spawn
if Inline fails to build.

Leaving PERL_INLINE_DIRECTORY set by Spawn after it fails (due
to missing Inline::C) would cause downstream failures in Gcf2
builds for the same reason.  So we should bail out of the Gcf2
build early if Spawn already failed due to missing Inline::C.

The only time we want to be noisy is if a user explicitly sets
PERL_INLINE_DIRECTORY and Inline::C is missing.

This reverts commit ad8acf7d6484d0a489499742cadadbd4f890ab53.
ad8acf7d6484d0a4 (Gcf2: Create cache folder if missing, 2022-09-08)

16 months agosyscall: drop syscall.ph support
Eric Wong [Fri, 23 Dec 2022 12:51:08 +0000 (12:51 +0000)]
syscall: drop syscall.ph support

h2ph-generated *.ph files are often wrong or incomplete and IME
they cause more problems than they solve.  Furthermore, we need
knowledge of struct layouts which h2ph-generated files can't get
us.  So trim down some bloat and leave a note for porters.

16 months agosyscall: get rid of epoll_defined() sub
Eric Wong [Fri, 23 Dec 2022 12:51:07 +0000 (12:51 +0000)]
syscall: get rid of epoll_defined() sub

We can just check defined() on the `our' var itself and
save the process several kilobytes of memory.

16 months agohttpd/async + qspawn: rename {fh} fields
Eric Wong [Fri, 23 Dec 2022 11:05:15 +0000 (11:05 +0000)]
httpd/async + qspawn: rename {fh} fields

Use more unique names within the project to minimize confusion
since these packages interact quite a bit and using identical
names leads to needless confusion.

16 months agoqspawn: shorten life of {hdr_buf} in generic code path
Eric Wong [Fri, 23 Dec 2022 11:05:14 +0000 (11:05 +0000)]
qspawn: shorten life of {hdr_buf} in generic code path

No point in keeping the old buffer around if we don't need to.

16 months agohttpd/async: remove useless undef
Eric Wong [Fri, 23 Dec 2022 11:05:13 +0000 (11:05 +0000)]
httpd/async: remove useless undef

Assigning `undef' to a scalar doesn't free it's memory,
we need to call `undef($var)' in the caller.  It's also
been pointless since we simplified ->async_pass in commit
b7fbffd1f8c12556 (httpd/async: get rid of ephemeral main_cb, 2019-12-25)

16 months agohttpd: avoid crash on cgit -> coderepo 404 fallback
Eric Wong [Fri, 23 Dec 2022 11:05:12 +0000 (11:05 +0000)]
httpd: avoid crash on cgit -> coderepo 404 fallback

A trickled cgit response can cause HTTPD::Async->event_step to
fire an extra time after header parsing.  We need to account for
the lack of async_pass call populating ->{fh} and ->{http} in
that case and avoid calling $self->{fh}->write when there's
no {fh}.

16 months agolei_mirror: allow `git show-ref' failures
Eric Wong [Fri, 23 Dec 2022 06:05:39 +0000 (06:05 +0000)]
lei_mirror: allow `git show-ref' failures

`git show-ref' may fail on initialized-but-empty repositories.
So just unconditionally fetch those repos if we're in that
situation.

16 months agotests: add tests for cloning coderepos w/ manifest
Eric Wong [Thu, 22 Dec 2022 10:43:42 +0000 (10:43 +0000)]
tests: add tests for cloning coderepos w/ manifest

It's not much, yet, but it's something for the corner cases
which I'm maybe not hitting under normal use.

16 months agogit: cap MAX_INFLIGHT value to POSIX minimum
Eric Wong [Wed, 21 Dec 2022 23:22:10 +0000 (23:22 +0000)]
git: cap MAX_INFLIGHT value to POSIX minimum

This ensures we get consistent pipelining behavior across
platforms.  Furthermore, a smaller value is probably more
reasonable since "git cat-file" can usually outpace indexing and
lower values allow us to react to user interaction (e.g. Ctrl-C)
more quickly.

The previous value based on Linux PIPE_BUF (4096) allowed a
value of 189 which worked fine on non-musl Linux systems, but
failed on musl-based Void and Alpine Linux.  Mysteriously, this
works on musl up to a value of 114 and starts locking up at 115.
The reason for this failure is currently unexplained and will
hopefully be discovered soon.

Regardless, capping the value to 23 based on the universal
PIPE_BUF minimum (512) seems reasonable, anyways.

Reported-by: Chris Brannon <chris@the-brannons.com>
Tested-by: Chris Brannon <chris@the-brannons.com>
Link: https://public-inbox.org/meta/87edssl7u0.fsf@the-brannons.com/T/
16 months agorelnotes: 2.0.0 work-in-progress
Eric Wong [Thu, 15 Dec 2022 19:34:18 +0000 (19:34 +0000)]
relnotes: 2.0.0 work-in-progress

I'm thinking the -nntpd regression fix will push this release
out sooner rather than later...

16 months agowww_listing: drop "sort options + mbox downloads" bit
Eric Wong [Wed, 14 Dec 2022 22:34:15 +0000 (22:34 +0000)]
www_listing: drop "sort options + mbox downloads" bit

The sort options and mbox downloads only apply to individual
inbox search endpoints, and they make no sense for the listing
of inboxes themselves.

16 months agosearch_query: fix warnings on empty "o=" query
Eric Wong [Wed, 14 Dec 2022 22:24:08 +0000 (22:24 +0000)]
search_query: fix warnings on empty "o=" query

This fixes the following warnings from bad URLs:

  Odd number of elements in anonymous hash at <>/PublicInbox/SearchQuery.pm line 22.
  Argument "l" isn't numeric in numeric lt (<) at <>/PublicInbox/SearchView.pm line 39.

16 months agosolver_git: more descriptive error for "git apply" failures
Eric Wong [Wed, 14 Dec 2022 22:23:50 +0000 (22:23 +0000)]
solver_git: more descriptive error for "git apply" failures

This happens quite often on my systems due to scrapers,
unfortunately.

16 months agolei_mirror: break circular references
Eric Wong [Mon, 12 Dec 2022 09:58:54 +0000 (09:58 +0000)]
lei_mirror: break circular references

It seems more graceful than dying and breaking a mirror, since
the {reference} in util-linux was irrelevant anyways with the
move to forkgroups.

16 months agolei_mirror: trim current symlinks from warning
Eric Wong [Mon, 12 Dec 2022 09:58:53 +0000 (09:58 +0000)]
lei_mirror: trim current symlinks from warning

This quiets needless warnings from current symlinks, while still
complaining about out-of-date ones.

16 months agot/httpd-unix: eliminate some busy waits
Eric Wong [Mon, 12 Dec 2022 04:22:01 +0000 (04:22 +0000)]
t/httpd-unix: eliminate some busy waits

A small step towards making our test suite as sleep-less as
possible.  We can use FIFOs to coordinate processes in a few
places, while other spots can take advantage of disabling
FD_CLOEXEC to further eliminate back-and-forth traffic between
processes.

This speeds up t/httpd-unix.t by ~20 ms on my system.

16 months agotests: replace select/usleep calls with tick()
Eric Wong [Mon, 12 Dec 2022 04:22:00 +0000 (04:22 +0000)]
tests: replace select/usleep calls with tick()

This makes it easier to identify places in tests which cause
unnecessary slowdowns doing busy waits.

17 months agolei_saved_search: expand only/include/exclude to absolute paths
Eric Wong [Thu, 1 Dec 2022 11:21:32 +0000 (11:21 +0000)]
lei_saved_search: expand only/include/exclude to absolute paths

While users may specify relative paths for convenience on the
command-line, absolute paths are required for `lei up' since
that (especially `lei up --all') could run from anywhere.

Note that we need to do this when parsing the command-line
options, since shortcuts for URL matching on URL path components
are allowed for `lei q', and those same shortcuts may remain
in effect across to `lei up' as the underlying external may
be moved to a different URI host.

17 months agolei: stricter external checks for valid $GIT_DIR/objects
Eric Wong [Thu, 1 Dec 2022 11:21:31 +0000 (11:21 +0000)]
lei: stricter external checks for valid $GIT_DIR/objects

I ended up with my $HOME in
~/.cache/lei/all_locals_ever.git/objects/info/alterntes
and am trying to avoid that in the future.

17 months agolei_mirror: handle forkgroup changes
Eric Wong [Mon, 28 Nov 2022 05:32:32 +0000 (05:32 +0000)]
lei_mirror: handle forkgroup changes

Forkgroups for projects are not static and may change at
the whim of the remote sysadmin.  Ensure we can migrate
to the new forkgroup.

Old forkgroups do not get pruned, yet, and their entries
stay in alternates.

17 months agoclone: support --project-list= for cgit
Eric Wong [Mon, 28 Nov 2022 05:32:31 +0000 (05:32 +0000)]
clone: support --project-list= for cgit

grokmirror supports it, and we also support cgit, so this should
make running mirrors easier.  This will be useful for scripting
purposes, too.

17 months agolei_mirror: break out of fgrp fetch iteration early
Eric Wong [Mon, 28 Nov 2022 05:32:30 +0000 (05:32 +0000)]
lei_mirror: break out of fgrp fetch iteration early

Don't queue up more work if we already have a failure somewhere.

17 months agolei_mirror: don't clobber inbox.config.example if it exists
Eric Wong [Mon, 28 Nov 2022 05:32:29 +0000 (05:32 +0000)]
lei_mirror: don't clobber inbox.config.example if it exists

Users may save notes or edits in there, and it's only an
example, so there's no need to mindlessly clobber it.

17 months agolei_mirror: set info/web/last-modified from manifest
Eric Wong [Mon, 28 Nov 2022 05:32:28 +0000 (05:32 +0000)]
lei_mirror: set info/web/last-modified from manifest

The grokmirror manifest sets {modified}, so we might as well use
it to make life easier for users of cgit (and compatible)
front-ends.

17 months agolei_mirror: omit trailing slash for git remote.*.url
Eric Wong [Mon, 28 Nov 2022 05:32:27 +0000 (05:32 +0000)]
lei_mirror: omit trailing slash for git remote.*.url

While PublicInbox::WWW URLs have a trailing slash in them
for compatibility with static web server mirrors, URLs
intended for `git clone' don't benefit from this and the
trailing `/' just looks awkward.

17 months agolei_mirror: avoid redundant curl `-f' use
Eric Wong [Mon, 28 Nov 2022 05:32:26 +0000 (05:32 +0000)]
lei_mirror: avoid redundant curl `-f' use

All of our curl invocations use the `-f' (--fail) switch
anyways, and I can't imagine a time when we'd want silent
failures.

17 months agolei_mirror: use curl -z/--timecond if manifest exists
Eric Wong [Mon, 28 Nov 2022 05:32:25 +0000 (05:32 +0000)]
lei_mirror: use curl -z/--timecond if manifest exists

This lets us save cycles and avoid scanning + comparing manifest
contents by relying on the Last-Modified HTTP response header.

17 months agolei_mirror: eliminate circular references
Eric Wong [Mon, 28 Nov 2022 05:32:24 +0000 (05:32 +0000)]
lei_mirror: eliminate circular references

...by using local-ized globals.  While non-globals could work,
eliminating the {todo} and {fgrp_todo} refs in all sub-refs
is more error-prone and the `local' construct is convenient.

This allows us to get rid of the `delete $fgrp->{-fini}' call
in pack_refs and eliminates the indiscriminate reaping of all
processes before calling fgrp_fetch_all.  This means we can
fully depend on DESTROY to provide predictable dependency
handling while supporting parallelization.

Global $TODO and $FGRP_TODO now become SCALAR refs on
consumption so they can act as assertions to detect future bugs.

17 months agolei_mirror: support {symlinks} from manifest
Eric Wong [Mon, 28 Nov 2022 05:32:23 +0000 (05:32 +0000)]
lei_mirror: support {symlinks} from manifest

It's part of grokmirror, and useful for keeping compatibility.
We can make use of File::Spec->abs2rel here to ensure our
symlinks are relative and the entire mirror can be copied
as a whole.

17 months agolei_mirror: set {head} from manifest
Eric Wong [Mon, 28 Nov 2022 05:32:22 +0000 (05:32 +0000)]
lei_mirror: set {head} from manifest

We handle symbolic refs properly, at least.  It's also possible
for $GIT_DIR/HEAD to contain a full SHA-1/SHA-256, and we'll
support that by using update-ref --no-deref

17 months agolei_mirror: shorten scope mirror objects
Eric Wong [Mon, 28 Nov 2022 05:32:21 +0000 (05:32 +0000)]
lei_mirror: shorten scope mirror objects

We may be able to save some memory this way.

17 months agolei_mirror: simplify forkgroup-related subs
Eric Wong [Mon, 28 Nov 2022 05:32:20 +0000 (05:32 +0000)]
lei_mirror: simplify forkgroup-related subs

We can pass fewer variables around on stack since $fgrp is just
a copy of $self.  We can also rely more on explicit callback
passing rather than relying on OnDestroy and ->cancel for
conditional calls.

17 months agolei_mirror: run v1_done earlier on forkgroup done
Eric Wong [Mon, 28 Nov 2022 05:32:19 +0000 (05:32 +0000)]
lei_mirror: run v1_done earlier on forkgroup done

There's likely a circular reference somewhere which was
preventing v1_done from running early.  In any case, this allows
v1_done to run in parallel with the pack-refs process since
there's no ordering dependency between ref-packing and v1_done.

17 months agolei_mirror: simplify most process spawning
Eric Wong [Mon, 28 Nov 2022 05:32:18 +0000 (05:32 +0000)]
lei_mirror: simplify most process spawning

For commands where we rely on successful exit codes to continue,
start_cmd() generalizes well enough to be used in a variety
of places.

17 months agolei_mirror: remove janky mirror.done stamp file
Eric Wong [Mon, 28 Nov 2022 05:32:17 +0000 (05:32 +0000)]
lei_mirror: remove janky mirror.done stamp file

This makes a fundamental (and overdue) change to the core of
lei in how it handles child errors.  Every process which
generates or receives a child error will remember it before
passing it on.  This ensures _wq_done_wait callbacks will
know of prior errors aside from $? when it runs.

17 months agolei_mirror: update fingerprints when writing local manifest.js.gz
Eric Wong [Mon, 28 Nov 2022 05:32:16 +0000 (05:32 +0000)]
lei_mirror: update fingerprints when writing local manifest.js.gz

We need our local manifest to match the actual data we store,
not what we're mirroring.

17 months agolei_mirror: --manifest= affects destination, too
Eric Wong [Mon, 28 Nov 2022 05:32:15 +0000 (05:32 +0000)]
lei_mirror: --manifest= affects destination, too

This probably makes the most sense, if a user wants to
use an alternate path to read from, it's likely they
want to write it there, too.

17 months agolei_mirror: respect `./' and `../' prefixes for CLI args
Eric Wong [Mon, 28 Nov 2022 05:32:14 +0000 (05:32 +0000)]
lei_mirror: respect `./' and `../' prefixes for CLI args

Users may wish to keep objstore and manifest files at
a higher level to prevent direct access via HTTP(S),
so those relative paths probably make sense.

17 months agolei_mirror: don't warn on missing manifest on initial clone
Eric Wong [Mon, 28 Nov 2022 05:32:13 +0000 (05:32 +0000)]
lei_mirror: don't warn on missing manifest on initial clone

Users may choose to specify a manifest on the initial clone,
so don't complain if it's missing in that case.

17 months agoclone: support --keep-going/-k like make(1)
Eric Wong [Mon, 28 Nov 2022 05:32:12 +0000 (05:32 +0000)]
clone: support --keep-going/-k like make(1)

This can be useful for intermittent network errors,
and the required code changes makes it less dependent
on global state.

17 months agolei_mirror: avoid needless FD passing
Eric Wong [Mon, 28 Nov 2022 05:32:11 +0000 (05:32 +0000)]
lei_mirror: avoid needless FD passing

Most git processes we invoke don't care about stdin nor stdout,
so don't waste cycles and memory dealing with it.

stderr passing is added `git config --unset-all remotes.fgrptmp'
invocation, though, since that can fail due to I/O errors or OOM.

17 months agoclone|fetch: support passing --prune(-tags) to `git fetch'
Eric Wong [Mon, 28 Nov 2022 05:32:10 +0000 (05:32 +0000)]
clone|fetch: support passing --prune(-tags) to `git fetch'

We need to be able to get rid of removed branches and tags on
the remote.  --prune-tags is implied for non-objstore repos,
and incompatible with objstore repos.

17 months agoclone: canonicalize destination path from CLI
Eric Wong [Mon, 28 Nov 2022 05:32:09 +0000 (05:32 +0000)]
clone: canonicalize destination path from CLI

We'll probably save the destination path somewhere, so
ensure the path doesn't have redundant slashes and such

17 months agolei_mirror: delay configuring forkgroups
Eric Wong [Mon, 28 Nov 2022 05:32:08 +0000 (05:32 +0000)]
lei_mirror: delay configuring forkgroups

When relying on `public-inbox-clone --manifest=', idempotent
`git config' invocations can take a considerable amount of
time.  We still configure inboxes idempotently since it
allows quickly changing URLs to mirrors, but we just defer
it until an update is actually needed.

17 months agoclone: support loading manifest.js.gz from destination
Eric Wong [Mon, 28 Nov 2022 05:32:07 +0000 (05:32 +0000)]
clone: support loading manifest.js.gz from destination

This will allow us to quickly check fingerprints against
remotes with a single HTTP(S) request, saving us numerous
`git show-refs' invocations.

17 months agolei_mirror: check fingerprints before fetching
Eric Wong [Mon, 28 Nov 2022 05:32:06 +0000 (05:32 +0000)]
lei_mirror: check fingerprints before fetching

While we currently don't check an existing on-disk manifest,
using `git show-ref' can still save us precious network traffic.

17 months agolei_mirror: support resuming multi-repo clones
Eric Wong [Mon, 28 Nov 2022 05:32:05 +0000 (05:32 +0000)]
lei_mirror: support resuming multi-repo clones

This is actually a combination of clone and fetch, and I don't
think `public-inbox-fetch' will be used to update multiple git
repos (inbox or not).

Our use of `git update-ref --stdin -z' was broken for
incremental updates, but now fixed to properly NUL-terminate
commands.

17 months agoon_destroy: support ->cancel callback
Eric Wong [Mon, 28 Nov 2022 05:32:04 +0000 (05:32 +0000)]
on_destroy: support ->cancel callback

We probably use this idiom elsewhere, but having this method
around to make future use cases more readable is probably prudent.

17 months agolei_mirror: show child error error code
Eric Wong [Mon, 28 Nov 2022 05:32:03 +0000 (05:32 +0000)]
lei_mirror: show child error error code

Just passing the exit value of the child process isn't to
our parent process isn't very useful when multiple commands
are failing at once.

17 months agolei_mirror: properly pack-refs in non-forkgroup repos
Eric Wong [Mon, 28 Nov 2022 05:32:02 +0000 (05:32 +0000)]
lei_mirror: properly pack-refs in non-forkgroup repos

We need to ensure `git update-ref --stdin' is complete
before running `git pack-refs', otherwise loose refs can
remain while update-ref is still running.

17 months agofetch: eliminate File::Temp->filename var
Eric Wong [Mon, 28 Nov 2022 05:32:01 +0000 (05:32 +0000)]
fetch: eliminate File::Temp->filename var

File::Temp objects are overloaded to automatically
call ->filename when stringified, so there's no need
to store the ->filename result on the Perl stack.

17 months agofetch: use v5.12
Eric Wong [Mon, 28 Nov 2022 05:32:00 +0000 (05:32 +0000)]
fetch: use v5.12

Another tiny step towards improved startup performance by
avoiding one .pm file.

17 months agolei_mirror: shorten remote names
Eric Wong [Mon, 28 Nov 2022 05:31:59 +0000 (05:31 +0000)]
lei_mirror: shorten remote names

The lengthy-but-human-meaningful remote names are more expensive
at runtime and increase packed-refs space.

17 months agoclone: require `--objstore=' for default location
Eric Wong [Mon, 28 Nov 2022 05:31:58 +0000 (05:31 +0000)]
clone: require `--objstore=' for default location

Allowing just `--objstore' without `=' was confusing,
since it could eat one of the required parameters (URL or
DESTINATION).

17 months agoclone: use v5.12
Eric Wong [Mon, 28 Nov 2022 05:31:57 +0000 (05:31 +0000)]
clone: use v5.12

Another small step in what will probably a be a decades-long
quest to reduce startup time by a few milliseconds.

17 months agoclone: drop unnecessary requires
Eric Wong [Mon, 28 Nov 2022 05:31:56 +0000 (05:31 +0000)]
clone: drop unnecessary requires

These packages are all require-ed elsewhere.

17 months agoclone: move --dry-run handling to lei_mirror
Eric Wong [Mon, 28 Nov 2022 05:31:55 +0000 (05:31 +0000)]
clone: move --dry-run handling to lei_mirror

lei will probably support dry-run in more places, too.

17 months agolei_mirror: forkgroups use `git fetch --multiple'
Eric Wong [Mon, 28 Nov 2022 05:31:54 +0000 (05:31 +0000)]
lei_mirror: forkgroups use `git fetch --multiple'

This offloads network parallelization and safety off to git
itself while reducing the amount of unnecessary process spawning
we do.  This also improves readability of pack-refs invocations
and reduces the need for them.

To prevent heavily-forked repos from hitting system command-line
size limits, we group refs to be updated in the "fgrptmp" group.

17 months agolei_mirror: fix --dry-run for forkgroups
Eric Wong [Mon, 28 Nov 2022 05:31:53 +0000 (05:31 +0000)]
lei_mirror: fix --dry-run for forkgroups

We must not make permanent changes to the FS if --dry-run is in use.

17 months agolei_mirror: make basename more descriptive
Eric Wong [Mon, 28 Nov 2022 05:31:52 +0000 (05:31 +0000)]
lei_mirror: make basename more descriptive

This makes it easier for humans to distinguish between
"Alice/project.git" and "Bob/project.git"

17 months agolei_mirror: drop git <1.8.5 support
Eric Wong [Mon, 28 Nov 2022 05:31:51 +0000 (05:31 +0000)]
lei_mirror: drop git <1.8.5 support

Supporting git <1.8.5 via fetch on non-forkgroup repos would
make auto-GC dangerous, and I want to support auto-GC instead
of relying on the preciousObjects extension.

Since git 1.8.5 is 9 years old at this point, and grokmirror
(used by the only CentOS 7.x user I know of) already relies on
newer git, simplify our code and only fetch into forkgroups.

17 months agolei_mirror: do not show ref updates w/o --verbose
Eric Wong [Mon, 28 Nov 2022 05:31:50 +0000 (05:31 +0000)]
lei_mirror: do not show ref updates w/o --verbose

It's too noisy IMHO, and UIs are always opinionated.

17 months agolei_mirror: preserve permissions of existing alternates file
Eric Wong [Mon, 28 Nov 2022 05:31:49 +0000 (05:31 +0000)]
lei_mirror: preserve permissions of existing alternates file

We don't want to be clobbering permissions when changing to
relative paths.  Furthermore, we can avoid writing to the
alternates file if there are no changes.

17 months agolei_mirror: force --no-tags when fetching forkgroups
Eric Wong [Mon, 28 Nov 2022 05:31:48 +0000 (05:31 +0000)]
lei_mirror: force --no-tags when fetching forkgroups

We can't have multiple remotes writing to refs/tags/*
(instead of refs/remotes/*/tags) due to potential conflicts.

17 months agolei_mirror: set description for non-inboxes, too
Eric Wong [Mon, 28 Nov 2022 05:31:47 +0000 (05:31 +0000)]
lei_mirror: set description for non-inboxes, too

We can still set $GIT_DIR/description when cloning coderepos with
--inbox-config=never

17 months agolei_mirror: always pack refs for coderepos
Eric Wong [Mon, 28 Nov 2022 05:31:46 +0000 (05:31 +0000)]
lei_mirror: always pack refs for coderepos

Unlike object packing, ref packing is cheap and fast.

17 months agoclone: flesh out --objstore behavior and document
Eric Wong [Mon, 28 Nov 2022 05:31:45 +0000 (05:31 +0000)]
clone: flesh out --objstore behavior and document

We can support absolute paths to avoid surprising behaviors,
but relative paths are preferred since the goal is to be
accessible over the "dumb" HTTP git transport (the dumb
transport is uses less memory and CPU on the server).