Eric Wong [Tue, 14 Feb 2023 02:42:32 +0000 (02:42 +0000)]
lei q: do not collapse threads with `-tt'
While having Xapian collapse threads is an easy way to reduce
the amount of deduplication work we need to do when writing
out threads; we can't rely on it when using `lei q -tt` since
that needs to flag all hits.
Eric Wong [Mon, 13 Feb 2023 01:02:12 +0000 (01:02 +0000)]
imap: quiet Parse::RecDescent errors on bad search queries
Parse::RecDescent emits giant errors to STDERR by default
(bypassing $SIG{__WARN__}, even). Shut it up since there's
no good way to pass those back to a client, and we don't want
clients flooding logs with bogus requests.
Eric Wong [Sun, 12 Feb 2023 23:18:28 +0000 (23:18 +0000)]
lei_mirror: fetch most-recently-updated repos, first
Within the same forkgroup, we can assume the most recently updated
repo has the most data, so fetch those, first. We'll save new clones
for last since we can preserve {reference} ordering for them.
Eric Wong [Sun, 12 Feb 2023 23:18:27 +0000 (23:18 +0000)]
lei_mirror: further reduce `git config' calls
We can parse the config at once and avoid clobbering variables
which do not need changing. We'll also do some prep work for
fetch.hideRefs proposal being discussed at
<https://public-inbox.org/git/20230209122857.M669733@dcvr/>
Eric Wong [Sun, 12 Feb 2023 03:12:03 +0000 (03:12 +0000)]
t/lei-refresh-mail-sync: avoid kill+sleep loop
While we can't waitpid() on daemonized process, we can abuse the
lack of FD_CLOEXEC to detect a process death. This saves
roughly 400ms for this slow test.
Eric Wong [Fri, 10 Feb 2023 08:56:41 +0000 (08:56 +0000)]
git_async_cat: use awaitpid
While awaitpid already registered a no-op callback in
_bidi_pipe, we can still call it again when registering it into
our event loop to ensure EPOLL_CTL_DEL fires.
Eric Wong [Fri, 10 Feb 2023 03:58:52 +0000 (03:58 +0000)]
lei_mirror: avoid dir/file conflicts in update-ref
Using the files ref backend for git, `delete' and `create'
operations for `update-ref --stdin' need to be processed in
separate transactions to avoid conflicts in cases where a file
becomes a directory (or presumably, vice versa).
Eric Wong [Sat, 4 Feb 2023 20:41:10 +0000 (20:41 +0000)]
www: sort all /$INBOX/ topics by Received: timestamp
Our previous pinning prevention only worked to prevent older
(non-most-recent) topics from being pinned to the landing page,
but not the most recent window of messages.
We still sort messages within threads by Date: because that
makes git-send-email patchsets display more nicely, but we
don't want recent topics pinned due to future Date: headers.
I nearly switched sort_ds() back to sorting by Received: until
I looked back on commit 8e52e5fdea416d6fda0b8d301144af0c043a5a76
(use both Date: and Received: times, 2018-03-21) and was reminded
git-send-email relies on Date: for large series, so I added a
note about it for sort_ds().
Eric Wong [Fri, 3 Feb 2023 03:46:03 +0000 (03:46 +0000)]
lei_mirror: use --no-write-fetch-head on git 2.29+
This avoids unnecessary writes to the FETCH_HEAD file, which is
worthless in multi-remote mirrors. Actually, I haven't found
FETCH_HEAD useful anywhere since the `/remotes/' namespace
became popular...
Eric Wong [Tue, 31 Jan 2023 10:31:57 +0000 (10:31 +0000)]
www: diff: fix encoding problems when showing diff
We need to use the utf8 layer when writing files to be diffed,
and utf8::decode the `git diff' output. Furthermore, do the
CRLF > LF conversion early to avoid showing CRLF vs LF
differences in the diff, since that doesn't matter to MUAs
(nor our normal HTML views)
Eric Wong [Tue, 31 Jan 2023 00:05:15 +0000 (00:05 +0000)]
lei: drop -watches and -lei_note_event from workers
I noticed these while tracking down circular refs for commit 7b654d175cf2e31b (ipc: drop awaitpid_init to avoid circular refs, 2023-01-30).
While they're not the cause of circular refs, they're still
a waste of memory in worker processes.
Eric Wong [Mon, 30 Jan 2023 22:50:07 +0000 (22:50 +0000)]
tests: make require_git and require_cmd easier-to-use
We'll rely on defined(wantarray) to implicitly skip subtests,
and memoize these to reduce syscalls, since tests should
be short-lived enough to not be affected by new installations or
removals of git/xapian-compact/curl/etc...
Eric Wong [Mon, 30 Jan 2023 04:30:57 +0000 (04:30 +0000)]
ipc: drop awaitpid_init to avoid circular refs
This brings t/lei-index.t back down from ~8 to ~3s. I didn't
notice this before was because the LeiNoteEvent timer was firing
every 5s and clearing circular refs and parallel testing meant
the delay got hidden.
Fixes: 4a2a95bbc78f99c8 (ipc+lei: switch to awaitpid, 2023-01-17)
Eric Wong [Sun, 29 Jan 2023 10:30:41 +0000 (10:30 +0000)]
use Net::SSLeay (OpenSSL) for SHA-(1|256) if installed
On my x86-64 machine, OpenSSL SHA-256 is nearly twice as fast as
the Digest::SHA implementation from Perl, most likely due to an
optimized assembly implementation. SHA-1 is a few percent
faster, too.
Eric Wong [Sun, 29 Jan 2023 09:45:11 +0000 (09:45 +0000)]
spawn_pp: use `which()' properly for pure-Perl spawn
I have no idea if mod_perl/mod_perl2 is used nowadays, but
we're stuck supporting it as long as mod_perl exists. So
add some tests and make minor updates to existing ones to
ensure it stays working.
Eric Wong [Sat, 28 Jan 2023 11:02:54 +0000 (11:02 +0000)]
www_coderepo: support $REPO/refs/{heads,tags}/ endpoints
These are also in cgit, but we'll include CLI hints to show
viewers how our data is generated. We don't have "$REPO/refs/"
without (heads|tags) yet, though...
Eric Wong [Thu, 26 Jan 2023 09:32:57 +0000 (09:32 +0000)]
git: drop needless checks for old git
`ambiguous' was added in git 2.21, and `dangling' was the only
other possible phrase which was inadvertantly slipped in prior
to 2.21. Thus there's no need to check for `notdir' or `loop'
responses since we aren't using `git cat-file --follow-symlinks'
anywhere.
Eric Wong [Thu, 26 Jan 2023 09:32:56 +0000 (09:32 +0000)]
git: use --batch-command in git 2.36+ to save processes
`git cat-file --batch-command' combines the functionality of
`--batch' and `--batch-check' into a single process. This
reduces the amount of running processes and is primarily
useful for coderepos (e.g. solver).
This also fixes prior use of `print { $git->{out} }' which is
a a potential (but unlikely) bug since commit d4ba8828ab23f278
(git: fix asynchronous batching for deep pipelines, 2023-01-04)
Lack of libgit2 on one of my test machines also uncovered fixes
necessary for t/imapd.t, t/nntpd.t and t/nntpd-v2.t.
Eric Wong [Wed, 25 Jan 2023 10:18:33 +0000 (10:18 +0000)]
process_pipe: warn hackers off using it for bidirectional pipes
While most uses of ->DESTROY happens in a predictable order in
long-lived daemons, process teardown on exit is chaotic and not
subject to ordering guarantees, so we must keep both ends of a
`git cat-file --batch*' pipe at the same level in the object
hierarchy.
Eric Wong [Tue, 24 Jan 2023 09:49:40 +0000 (09:49 +0000)]
viewvcs: improve tree glossary view
Adding an <hr> helps delineate the glossary, note that
submodules are rare, and avoid needlessly defining the
commits-in-trees case since the extra information is likely
to overwhelm new users.
Eric Wong [Tue, 24 Jan 2023 09:49:34 +0000 (09:49 +0000)]
www_coderepo: eliminate debug log footer
WwwCoderepo is for viewing blobs already in code repositories,
so there's no place for a debug log showing which mails were
used to arrive at a given blob. The debug footer remains for
/$INBOX/$OID/s/ URLs, of course.
Eric Wong [Tue, 24 Jan 2023 09:49:33 +0000 (09:49 +0000)]
www_coderepo: show /$INBOX/?t=$DATE link for commits
While we can't inexpensively search for git commits based on the
timestamp, coderepos configured for inboxes can still look up
messages based on the inbox URL.
Eric Wong [Tue, 24 Jan 2023 09:49:30 +0000 (09:49 +0000)]
qspawn: drop lineno from command failure warning
git, cgit, or any other command failing isn't an error
we can do anything about in qspawn, so don't have Perl
emit line number info and needlessly pollute logs.
Eric Wong [Sat, 21 Jan 2023 08:58:19 +0000 (08:58 +0000)]
ds: awaitpid: do not clobber entries for reaped processes
We must only write to $AWAIT_PIDS on the initial reap attempt.
While we're at it, avoid triggering an extra wakeup if we're
doing synchronous awaitpid. This seems to eliminate most
reliance on Qspawn->DESTROY to call Qspawn->finalize.
Eric Wong [Tue, 17 Jan 2023 07:19:10 +0000 (07:19 +0000)]
ipc+lei: switch to awaitpid
This avoids awkwardly stuffing an arrayref into callbacks
which expect multiple arguments. IPC->awaitpid_init now
allows pre-registering callbacks before spawning workers.
Eric Wong [Tue, 17 Jan 2023 07:19:07 +0000 (07:19 +0000)]
eofpipe: drop {arg} support for now
The only user of EOFpipe has no args, so avoid wasting a hash
slot on it. If we need it again in the future, EOFpipe will
allow an array of args, instead.
Eric Wong [Tue, 17 Jan 2023 07:19:05 +0000 (07:19 +0000)]
watch: switch to awaitpid
-watch relies on our event_loop anyways, and awaitpid lets us
avoid the extra overhead of EOFpipe. Add an extra {quit} check
in imap_idle_fork while we're at it.
Eric Wong [Wed, 18 Jan 2023 02:10:11 +0000 (02:10 +0000)]
qspawn: use ->DESTROY to force ->finalize
There's apparently a few places where we do not call ->finalize
or ->finish and leave dangling limiter slots occupied. I can't
reproduce this easily, so it's likely in error-handling paths.
I already made ->finalize idempotent when switching to awaitpid
since I wanted to rely entirely on DESTROY. However, DESTROY
doesn't always fire soon enough (and the client has already seen
a response), but using DESTROY as a fallback seems reasonable..
This does the minimum to ensure the limiter is freed up on
process exit, but ensuring a finish/finalize call always happens
is the goal.
Eric Wong [Tue, 17 Jan 2023 07:19:03 +0000 (07:19 +0000)]
ds: introduce awaitpid, switch ProcessPipe users
awaitpid is the new API which will eventually replace dwaitpid.
It enables early registration of callback handlers. Eventually
(once dwaitpid is gone) it'll be able to use fewer waitpid
calls.
The avoidance of waitpid(-1) in our earlier days was driven by
the belief that threads may eventually become relevant for Perl 5,
but that's extremely unlikely at this stage. I will still
introduce optional threads via C, but they definitely won't be
spawning/reaping processes.
Argument order to callbacks is swapped (PID first) to allow
flattened multiple arguments more natrually. The previous API
(allowing only a single argument, as influenced by
pthread_create(3)) was more tedious as it involved packing
multiple arguments into yet another array.
Eric Wong [Fri, 13 Jan 2023 10:35:50 +0000 (10:35 +0000)]
coderepo: consolidate git --batch-check users
And another opportunity to simplify our code between different
PSGI-ish implementations. The snapshot retrieval is simpler,
but potentially slower since we waste cycles scanning for tags
even after we've found one. It's probably not a big deal since
it's only short info lines and we can utilize pipelining.
Eric Wong [Fri, 13 Jan 2023 04:01:32 +0000 (04:01 +0000)]
www_coderepo: tree: do not break #n$LINENO
We can't use 302 redirects at the /tree/ endpoint as originally
intended since "#n$LINENO" fragment links aren't preserved
across redirects (since clients don't typically send that part
of the URL in requests).
So we'll have to make sure we handle prefixes properly and show
trees directly. Oh well :< At least the history-aware 404
handling remains :>
Eric Wong [Thu, 12 Jan 2023 14:14:35 +0000 (14:14 +0000)]
www_coderepo: /tree/ 404s search git history
Displaying git trees over the web with pathnames in the URLs
have the unfortunate consequence of URLs getting out-of-date
if files are renamed or deleted from the latest tree.
We can utilize `git log' here to search history and find the
commit which led to the rename or deletion. Of course, we'll
show a suitable command to the user as well, another small
step towards covertly teaching users the git CLI :>
`git log' is not especially fast, here, but Qspawn limiters can
do their job and renames and deletions aren't too common in most
codebases.
Eric Wong [Thu, 12 Jan 2023 14:14:33 +0000 (14:14 +0000)]
www_stream: coderepo-specific top bar
It gets nasty when multiple, non-ALL lists point to the same
coderepo, but I guess ALL exists for that. Only lightly-tested
with various PSGI prefix mounts, but it seems to be working...
Eric Wong [Wed, 11 Jan 2023 10:55:39 +0000 (10:55 +0000)]
www: /$INBOX/$MSGID/d/ to diff reused Message-IDs
To ensure users aren't abusing the ability to reuse Message-IDs,
provide a convenient front-end to `lei mail-diff' from WWW.
Most of the time it's just list-appended signatures, so I expect
this to be useful for /all/ users.
Eric Wong [Tue, 10 Jan 2023 11:49:19 +0000 (11:49 +0000)]
www_coderepo: show tree root as "(root)"
We'll use the `b=' parameter as a hint. I originally considered
`b=/', but a singular slash `/' isn't used in git for paths.
$refname:$path resolution where $path is an empty string,
`git cat-file -t $refname:' resolves to the tree, so it seems
special-casing the empty string is fine in the web UI, too.
Eric Wong [Tue, 10 Jan 2023 11:49:18 +0000 (11:49 +0000)]
www_coderepo: handle "?h=$tip" in summary view
This makes sense at least as far as the README and `git log' output goes.
We'll also add the `b=' query parameter to the $OID/s/ href for
the README blob.
Eric Wong [Fri, 6 Jan 2023 10:10:53 +0000 (10:10 +0000)]
qspawn: fix EINTR with generic PSGI servers
Using the `next' operator doesn't work with `do {} (until|while)'
loops, so change it to use `until {}'. I've never encountered
this problem in-the-wild, but I only use -(netd|httpd).
Eric Wong [Fri, 6 Jan 2023 10:10:52 +0000 (10:10 +0000)]
qspawn: consistently return 500 on premature EOF
If {parse_hdr} callback doesn't handle it, we need to break the
loop if the CGI process dies prematurely. This doesn't fix a
currently known problem, but theoretically a SIGKILL could hit
(cgit || git-http-backend) while -netd or -httpd survives.
Eric Wong [Fri, 6 Jan 2023 10:10:51 +0000 (10:10 +0000)]
httpd/async: retry reads properly when parsing headers
While git-http-backend sends headers with one write syscall,
upstream cgit still trickles them out line-by-line and we need to
account for that and retry Qspawn {parse_hdr} callbacks.
Eric Wong [Fri, 6 Jan 2023 10:10:50 +0000 (10:10 +0000)]
qspawn: use fallback response code from CGI program
Prefer to use the original (cgit||git-http-backend) HTTP
response code if our fallback to WwwCoderepo fails. 404
codes is typically more appropriate than 500 for these things.
Eric Wong [Wed, 4 Jan 2023 10:34:04 +0000 (10:34 +0000)]
git: pub_urls shows base_url default
Since we have native coderepo viewing support without cgit,
configuring coderepo.$FOO.cgitUrl shouldn't be necessary anymore
and we can infer the public name based on the project nickname
(or whatever's in the generated project.list)
Eric Wong [Wed, 4 Jan 2023 10:34:03 +0000 (10:34 +0000)]
git: fix non-empty SCRIPT_NAME handling for PSGI mounts
When using the `mount' directive in PSGI (Plack::App::URLMap),
SCRIPT_NAME still needs to use a trailing slash before it can
be joined with another URL.
Eric Wong [Thu, 5 Jan 2023 01:44:59 +0000 (01:44 +0000)]
git: write_all: remove leftover debug messages
I used these messages during development to verify Alpine was
triggering the intended codepaths. They're no longer necessary
and just noise at this point.
Reported-by: Chris Brannon <chris@the-brannons.com> Fixes: d4ba8828ab23 ("git: fix asynchronous batching for deep pipelines")
Eric Wong [Wed, 4 Jan 2023 03:49:34 +0000 (03:49 +0000)]
git: fix asynchronous batching for deep pipelines
...By using non-blocking pipe writes. This avoids problems for
musl (and other libc) where getdelim(3) used by `git cat-file --batch*'
uses a smaller input buffer than glibc or FreeBSD libc.
My key mistake was our check against MAX_INFLIGHT is only useful
for the initial batch of requests. It is not useful for
subsequent requests since git will drain the pipe at
unpredictable rates due to libc differences.
To fix this problem, I initially tried to drain the read pipe
as long as readable data was pending. However, reading git
output without giving git more work would also limit parallelism
opportunities since we don't want git to sit idle, either. This
change ensures we keep both pipes reasonably full to reduce
stalls and maximize parallelism between git and public-inbox.
While the limit set a few weeks ago in commit 56e6e587745c (git: cap MAX_INFLIGHT value to POSIX minimum, 2022-12-21)
remains in place, any higher or lower limit will work. It may
be worth it to use an even lower limit to improve interactivity
w.r.t. Ctrl-C interrupts.
I've tested the pre-56e6e587745c and even higher values on an
Alpine VM in the GCC Farm <https://cfarm.tetaneutral.net>
Eric Wong [Tue, 3 Jan 2023 00:05:06 +0000 (00:05 +0000)]
daemon: don't bother checking for existing FD flags
FD_CLOEXEC is the only currently defined FD flag, and has been
the case for decades at this point. I highly doubt any default
FD flag will ever be forced on us by the kernel, init system, or
Perl. So save ourselves a syscall and just call F_SETFD with
the assumption FD_CLOEXEC is the only FD flag that we'd ever
care for.
Eric Wong [Tue, 3 Jan 2023 00:03:00 +0000 (00:03 +0000)]
githttpbackend: avoid copying PSGI env
We can stash qspawn.wcb before we fallback to WwwCoderepo to
ensure the qspawn re-dispatch works as expected. This is still
hacky and I want to tweak it further down the line. Meanwhile,
lets make it less expensive to do hacky things...
Eric Wong [Mon, 2 Jan 2023 08:20:13 +0000 (08:20 +0000)]
qspawn: fix process finalization for generic PSGI server
This fixes the inability to fallback to WwwCoderepo on cgit 404s
with generic PSGI servers. Unfortunately, this doesn't seem to
get tested with generic PSGI tests, and doesn't happen on
public-inbox-httpd, obviously.
Eric Wong [Mon, 2 Jan 2023 08:18:47 +0000 (08:18 +0000)]
t/httpd-unix.t: stop tail(1) before stopping server
When using the `TAIL' environment, the tail(1) process
inherits the non-FD_CLOEXEC pipe we introduced in commit 5f9baf725106 (t/httpd-unix: eliminate some busy waits, 2022-12-12).
We must ensure that pipe is gone before waiting on -httpd's
death by destroying the tail(1) process, first.