Eric Wong [Mon, 28 Nov 2022 05:32:27 +0000 (05:32 +0000)]
lei_mirror: omit trailing slash for git remote.*.url
While PublicInbox::WWW URLs have a trailing slash in them
for compatibility with static web server mirrors, URLs
intended for `git clone' don't benefit from this and the
trailing `/' just looks awkward.
Eric Wong [Mon, 28 Nov 2022 05:32:26 +0000 (05:32 +0000)]
lei_mirror: avoid redundant curl `-f' use
All of our curl invocations use the `-f' (--fail) switch
anyways, and I can't imagine a time when we'd want silent
failures.
Eric Wong [Mon, 28 Nov 2022 05:32:25 +0000 (05:32 +0000)]
lei_mirror: use curl -z/--timecond if manifest exists
This lets us save cycles and avoid scanning + comparing manifest
contents by relying on the Last-Modified HTTP response header.
Eric Wong [Mon, 28 Nov 2022 05:32:24 +0000 (05:32 +0000)]
lei_mirror: eliminate circular references
...by using local-ized globals. While non-globals could work,
eliminating the {todo} and {fgrp_todo} refs in all sub-refs
is more error-prone and the `local' construct is convenient.
This allows us to get rid of the `delete $fgrp->{-fini}' call
in pack_refs and eliminates the indiscriminate reaping of all
processes before calling fgrp_fetch_all. This means we can
fully depend on DESTROY to provide predictable dependency
handling while supporting parallelization.
Global $TODO and $FGRP_TODO now become SCALAR refs on
consumption so they can act as assertions to detect future bugs.
Eric Wong [Mon, 28 Nov 2022 05:32:23 +0000 (05:32 +0000)]
lei_mirror: support {symlinks} from manifest
It's part of grokmirror, and useful for keeping compatibility.
We can make use of File::Spec->abs2rel here to ensure our
symlinks are relative and the entire mirror can be copied
as a whole.
Eric Wong [Mon, 28 Nov 2022 05:32:22 +0000 (05:32 +0000)]
lei_mirror: set {head} from manifest
We handle symbolic refs properly, at least. It's also possible
for $GIT_DIR/HEAD to contain a full SHA-1/SHA-256, and we'll
support that by using update-ref --no-deref
Eric Wong [Mon, 28 Nov 2022 05:32:21 +0000 (05:32 +0000)]
lei_mirror: shorten scope mirror objects
We may be able to save some memory this way.
Eric Wong [Mon, 28 Nov 2022 05:32:20 +0000 (05:32 +0000)]
lei_mirror: simplify forkgroup-related subs
We can pass fewer variables around on stack since $fgrp is just
a copy of $self. We can also rely more on explicit callback
passing rather than relying on OnDestroy and ->cancel for
conditional calls.
Eric Wong [Mon, 28 Nov 2022 05:32:19 +0000 (05:32 +0000)]
lei_mirror: run v1_done earlier on forkgroup done
There's likely a circular reference somewhere which was
preventing v1_done from running early. In any case, this allows
v1_done to run in parallel with the pack-refs process since
there's no ordering dependency between ref-packing and v1_done.
Eric Wong [Mon, 28 Nov 2022 05:32:18 +0000 (05:32 +0000)]
lei_mirror: simplify most process spawning
For commands where we rely on successful exit codes to continue,
start_cmd() generalizes well enough to be used in a variety
of places.
Eric Wong [Mon, 28 Nov 2022 05:32:17 +0000 (05:32 +0000)]
lei_mirror: remove janky mirror.done stamp file
This makes a fundamental (and overdue) change to the core of
lei in how it handles child errors. Every process which
generates or receives a child error will remember it before
passing it on. This ensures _wq_done_wait callbacks will
know of prior errors aside from $? when it runs.
Eric Wong [Mon, 28 Nov 2022 05:32:16 +0000 (05:32 +0000)]
lei_mirror: update fingerprints when writing local manifest.js.gz
We need our local manifest to match the actual data we store,
not what we're mirroring.
Eric Wong [Mon, 28 Nov 2022 05:32:15 +0000 (05:32 +0000)]
lei_mirror: --manifest= affects destination, too
This probably makes the most sense, if a user wants to
use an alternate path to read from, it's likely they
want to write it there, too.
Eric Wong [Mon, 28 Nov 2022 05:32:14 +0000 (05:32 +0000)]
lei_mirror: respect `./' and `../' prefixes for CLI args
Users may wish to keep objstore and manifest files at
a higher level to prevent direct access via HTTP(S),
so those relative paths probably make sense.
Eric Wong [Mon, 28 Nov 2022 05:32:13 +0000 (05:32 +0000)]
lei_mirror: don't warn on missing manifest on initial clone
Users may choose to specify a manifest on the initial clone,
so don't complain if it's missing in that case.
Eric Wong [Mon, 28 Nov 2022 05:32:12 +0000 (05:32 +0000)]
clone: support --keep-going/-k like make(1)
This can be useful for intermittent network errors,
and the required code changes makes it less dependent
on global state.
Eric Wong [Mon, 28 Nov 2022 05:32:11 +0000 (05:32 +0000)]
lei_mirror: avoid needless FD passing
Most git processes we invoke don't care about stdin nor stdout,
so don't waste cycles and memory dealing with it.
stderr passing is added `git config --unset-all remotes.fgrptmp'
invocation, though, since that can fail due to I/O errors or OOM.
Eric Wong [Mon, 28 Nov 2022 05:32:10 +0000 (05:32 +0000)]
clone|fetch: support passing --prune(-tags) to `git fetch'
We need to be able to get rid of removed branches and tags on
the remote. --prune-tags is implied for non-objstore repos,
and incompatible with objstore repos.
Eric Wong [Mon, 28 Nov 2022 05:32:09 +0000 (05:32 +0000)]
clone: canonicalize destination path from CLI
We'll probably save the destination path somewhere, so
ensure the path doesn't have redundant slashes and such
Eric Wong [Mon, 28 Nov 2022 05:32:08 +0000 (05:32 +0000)]
lei_mirror: delay configuring forkgroups
When relying on `public-inbox-clone --manifest=', idempotent
`git config' invocations can take a considerable amount of
time. We still configure inboxes idempotently since it
allows quickly changing URLs to mirrors, but we just defer
it until an update is actually needed.
Eric Wong [Mon, 28 Nov 2022 05:32:07 +0000 (05:32 +0000)]
clone: support loading manifest.js.gz from destination
This will allow us to quickly check fingerprints against
remotes with a single HTTP(S) request, saving us numerous
`git show-refs' invocations.
Eric Wong [Mon, 28 Nov 2022 05:32:06 +0000 (05:32 +0000)]
lei_mirror: check fingerprints before fetching
While we currently don't check an existing on-disk manifest,
using `git show-ref' can still save us precious network traffic.
Eric Wong [Mon, 28 Nov 2022 05:32:05 +0000 (05:32 +0000)]
lei_mirror: support resuming multi-repo clones
This is actually a combination of clone and fetch, and I don't
think `public-inbox-fetch' will be used to update multiple git
repos (inbox or not).
Our use of `git update-ref --stdin -z' was broken for
incremental updates, but now fixed to properly NUL-terminate
commands.
Eric Wong [Mon, 28 Nov 2022 05:32:04 +0000 (05:32 +0000)]
on_destroy: support ->cancel callback
We probably use this idiom elsewhere, but having this method
around to make future use cases more readable is probably prudent.
Eric Wong [Mon, 28 Nov 2022 05:32:03 +0000 (05:32 +0000)]
lei_mirror: show child error error code
Just passing the exit value of the child process isn't to
our parent process isn't very useful when multiple commands
are failing at once.
Eric Wong [Mon, 28 Nov 2022 05:32:02 +0000 (05:32 +0000)]
lei_mirror: properly pack-refs in non-forkgroup repos
We need to ensure `git update-ref --stdin' is complete
before running `git pack-refs', otherwise loose refs can
remain while update-ref is still running.
Eric Wong [Mon, 28 Nov 2022 05:32:01 +0000 (05:32 +0000)]
fetch: eliminate File::Temp->filename var
File::Temp objects are overloaded to automatically
call ->filename when stringified, so there's no need
to store the ->filename result on the Perl stack.
Eric Wong [Mon, 28 Nov 2022 05:32:00 +0000 (05:32 +0000)]
fetch: use v5.12
Another tiny step towards improved startup performance by
avoiding one .pm file.
Eric Wong [Mon, 28 Nov 2022 05:31:59 +0000 (05:31 +0000)]
lei_mirror: shorten remote names
The lengthy-but-human-meaningful remote names are more expensive
at runtime and increase packed-refs space.
Eric Wong [Mon, 28 Nov 2022 05:31:58 +0000 (05:31 +0000)]
clone: require `--objstore=' for default location
Allowing just `--objstore' without `=' was confusing,
since it could eat one of the required parameters (URL or
DESTINATION).
Eric Wong [Mon, 28 Nov 2022 05:31:57 +0000 (05:31 +0000)]
clone: use v5.12
Another small step in what will probably a be a decades-long
quest to reduce startup time by a few milliseconds.
Eric Wong [Mon, 28 Nov 2022 05:31:56 +0000 (05:31 +0000)]
clone: drop unnecessary requires
These packages are all require-ed elsewhere.
Eric Wong [Mon, 28 Nov 2022 05:31:55 +0000 (05:31 +0000)]
clone: move --dry-run handling to lei_mirror
lei will probably support dry-run in more places, too.
Eric Wong [Mon, 28 Nov 2022 05:31:54 +0000 (05:31 +0000)]
lei_mirror: forkgroups use `git fetch --multiple'
This offloads network parallelization and safety off to git
itself while reducing the amount of unnecessary process spawning
we do. This also improves readability of pack-refs invocations
and reduces the need for them.
To prevent heavily-forked repos from hitting system command-line
size limits, we group refs to be updated in the "fgrptmp" group.
Eric Wong [Mon, 28 Nov 2022 05:31:53 +0000 (05:31 +0000)]
lei_mirror: fix --dry-run for forkgroups
We must not make permanent changes to the FS if --dry-run is in use.
Eric Wong [Mon, 28 Nov 2022 05:31:52 +0000 (05:31 +0000)]
lei_mirror: make basename more descriptive
This makes it easier for humans to distinguish between
"Alice/project.git" and "Bob/project.git"
Eric Wong [Mon, 28 Nov 2022 05:31:51 +0000 (05:31 +0000)]
lei_mirror: drop git <1.8.5 support
Supporting git <1.8.5 via fetch on non-forkgroup repos would
make auto-GC dangerous, and I want to support auto-GC instead
of relying on the preciousObjects extension.
Since git 1.8.5 is 9 years old at this point, and grokmirror
(used by the only CentOS 7.x user I know of) already relies on
newer git, simplify our code and only fetch into forkgroups.
Eric Wong [Mon, 28 Nov 2022 05:31:50 +0000 (05:31 +0000)]
lei_mirror: do not show ref updates w/o --verbose
It's too noisy IMHO, and UIs are always opinionated.
Eric Wong [Mon, 28 Nov 2022 05:31:49 +0000 (05:31 +0000)]
lei_mirror: preserve permissions of existing alternates file
We don't want to be clobbering permissions when changing to
relative paths. Furthermore, we can avoid writing to the
alternates file if there are no changes.
Eric Wong [Mon, 28 Nov 2022 05:31:48 +0000 (05:31 +0000)]
lei_mirror: force --no-tags when fetching forkgroups
We can't have multiple remotes writing to refs/tags/*
(instead of refs/remotes/*/tags) due to potential conflicts.
Eric Wong [Mon, 28 Nov 2022 05:31:47 +0000 (05:31 +0000)]
lei_mirror: set description for non-inboxes, too
We can still set $GIT_DIR/description when cloning coderepos with
--inbox-config=never
Eric Wong [Mon, 28 Nov 2022 05:31:46 +0000 (05:31 +0000)]
lei_mirror: always pack refs for coderepos
Unlike object packing, ref packing is cheap and fast.
Eric Wong [Mon, 28 Nov 2022 05:31:45 +0000 (05:31 +0000)]
clone: flesh out --objstore behavior and document
We can support absolute paths to avoid surprising behaviors,
but relative paths are preferred since the goal is to be
accessible over the "dumb" HTTP git transport (the dumb
transport is uses less memory and CPU on the server).
Eric Wong [Mon, 28 Nov 2022 05:31:44 +0000 (05:31 +0000)]
lei_mirror: ensure git <1.8.5 fallback can use torsocks
Since we fall back to `git fetch' on versions of git without
`git update-ref --stdin' support, we must also support
torsocks use on Tor .onion URLs
Eric Wong [Mon, 28 Nov 2022 05:31:43 +0000 (05:31 +0000)]
lei_mirror: cleanup process reaping logic
We can put more of the default --jobs logic and loop handling
inside a sub to simplify callers.
Eric Wong [Mon, 28 Nov 2022 05:31:42 +0000 (05:31 +0000)]
lei_mirror: support --objstore and forkgroups
The {forkgroup} directive of grokmirror 2.x manifest.js.gz
can facilitate more space savings and improved pack performance
with pack.islands.
Eric Wong [Mon, 28 Nov 2022 05:31:41 +0000 (05:31 +0000)]
lei_mirror: simplify clone_v2_prep
Since everything relies on the instance-specific {todo} queue,
there's no need to have sub-specific queues.
Eric Wong [Mon, 28 Nov 2022 05:31:40 +0000 (05:31 +0000)]
lei_mirror: avoid convoluted lazy_cb usage
lazy_cb should only be used for lei command dispatch and
completion callbacks when the method isn't known at startup.
There's zero reason to use it when the method is known
ahead-of-time, especially when there's a comment pointing
reviewers towards the only possible method it can dispatch.
Eric Wong [Mon, 28 Nov 2022 05:31:39 +0000 (05:31 +0000)]
lei_mirror: hoist out dump_manifest sub
We can reuse it in PublicInbox::Fetch, too.
Eric Wong [Mon, 28 Nov 2022 05:31:38 +0000 (05:31 +0000)]
lei_mirror: do not write Makefile for --inbox-config=never
We want to be able to clone non-inbox git repos, too.
Eric Wong [Mon, 28 Nov 2022 05:31:37 +0000 (05:31 +0000)]
lei_mirror: add `index' target to generated Makefile
It can probably be a useful hint to avoid misleading users
into always using `--reindex'.
Eric Wong [Mon, 28 Nov 2022 05:31:36 +0000 (05:31 +0000)]
lei_mirror: cleanup File::Temp OO usage
There's no need to capture or rely on the File::Temp->filename
in most cases since most Perl functions accept file handles all
the same.
Eric Wong [Mon, 28 Nov 2022 05:31:35 +0000 (05:31 +0000)]
lei_mirror: ensure curl exits 22 on HTTP 404 responses
Oops, this is actually a long-standing bug :x
Eric Wong [Mon, 28 Nov 2022 05:31:34 +0000 (05:31 +0000)]
lei_mirror: require Perl v5.12+
Another tiny step towards improve startup performance by
relying on Perl 5.12 strictness and avoiding strict.pm
Eric Wong [Mon, 28 Nov 2022 05:31:33 +0000 (05:31 +0000)]
clone: support --inbox-version
This is part of `lei add-external --mirror', and it makes
sense to have for development and testing. We'll also add
a fallback in case somebody tries --inbox-version and fails
due to a newer remote instances of public-inbox.
Eric Wong [Mon, 28 Nov 2022 05:31:32 +0000 (05:31 +0000)]
lei_mirror: simplify v2 code paths
We can simply reuse the parallelization of the manifest
code path for non-manifest v2 clones, now.
Eric Wong [Mon, 28 Nov 2022 05:31:31 +0000 (05:31 +0000)]
lei_mirror: support manifest {references} for v2 epochs
This may be useful in case a v1 inbox gets forked into v2
(untested).
Eric Wong [Mon, 28 Nov 2022 05:31:30 +0000 (05:31 +0000)]
lei_mirror: differentiate -entv vs -ent
It makes the code easier-to-follow when we have a single
versus multiple entities (`v' for vector, à la `argv').
Eric Wong [Mon, 28 Nov 2022 05:31:29 +0000 (05:31 +0000)]
lei_mirror: fix glob semantics to match end-of-path
Globs such as `*/foo' should not match `*/foobar',
this allows cloning only `git' and not
`gitolite-transparency-log` off lore
Eric Wong [Mon, 28 Nov 2022 05:31:28 +0000 (05:31 +0000)]
lei_mirror: require PublicInbox::Lock at use
It's easier to understand why we lazy-load Lock for v2-only
code paths when we require it near its first use.
Eric Wong [Mon, 28 Nov 2022 05:31:27 +0000 (05:31 +0000)]
lei_mirror: do not fetch descriptions if using manifest
If a manifest exists, we can expect the description to always be
present, thus there's no need to make a separate HTTP(S) request
since we can use it as-is from the manifest for v1||coderepos
and strip / \[epoch [0-9]+\]\z/ from v1.
Eric Wong [Mon, 28 Nov 2022 05:31:26 +0000 (05:31 +0000)]
lei_mirror: defend against infinite loops
A reference chain of 1000 ought to be enough, I think...
Eric Wong [Mon, 28 Nov 2022 05:31:25 +0000 (05:31 +0000)]
lei_mirror: fix infinite loop in dependency resolution
We need to account for dependencies which are marked `done'.
Eric Wong [Mon, 28 Nov 2022 05:31:24 +0000 (05:31 +0000)]
lei_mirror: allow --epoch on mixed v1/v2 clones
It's entirely possible an instance will have both v1 and v2
inboxes (or v2 inboxes and coderepos). Don't punish --epoch
users by forcing them to run multiple commands.
Eric Wong [Mon, 28 Nov 2022 05:31:23 +0000 (05:31 +0000)]
lei_mirror: reduce scope of v2 lock
Guarding against parallel clones isn't realistic, really, only
setting up all.git, and even then, I'm not 100% sure the lock
is useful.
Eric Wong [Mon, 28 Nov 2022 05:31:22 +0000 (05:31 +0000)]
lei_mirror: retrieve v2 description properly
Eric Wong [Mon, 28 Nov 2022 05:31:21 +0000 (05:31 +0000)]
clone: support --inbox-config option
This allows avoiding 404s when trying _/text/config/raw on code
repositories.
Eric Wong [Mon, 28 Nov 2022 05:31:20 +0000 (05:31 +0000)]
lei_mirror: reduce noise on interrupted clones
We don't need git-config or other commands failing loudly.
`git clone' and subcommands it spawns may still spew, but it's no
worse than interrupting `git clone' itself, now.
We accomplish this by localizing $LIVE (formerly %LIVE) and
detecting when its auto-vivification into a hashref goes
out-of-scope during the `DESTRUCT' ${^GLOBAL_PHASE}.
We can't use ${^GLOBAL_PHASE}, yet, either, since it appeared in
Perl 5.14 and we're still migrating slowly to Perl 5.12 before
going to 5.14.
Eric Wong [Mon, 28 Nov 2022 05:31:19 +0000 (05:31 +0000)]
lei_mirror: support {reference} for v1 manifest clones
This will be generalized to v2, as well.
Eric Wong [Mon, 28 Nov 2022 05:31:18 +0000 (05:31 +0000)]
lei_mirror: initialize placeholders with "head" from manifest
This only affects v2 epochs, but ensures our bases are covered,
at least. We'll have to update PublicInbox::Fetch later to
deal with "head" entries in manifest.js.gz, too.
Eric Wong [Mon, 28 Nov 2022 05:31:17 +0000 (05:31 +0000)]
clone: support --dry-run / -n flag
It still makes HTTP(S) requests to retrieve the manifest or
scrape HTML, but doesn't make permanent changes to the FS
(aside from modifying {acm}time of ${TMPDIR-/tmp}).
Eric Wong [Mon, 28 Nov 2022 05:31:16 +0000 (05:31 +0000)]
lei_mirror: set gitweb.owner from manifest
This is mainly for coderepos, but sometimes public-inboxes
get shared via cgit/gitweb, too.
Eric Wong [Mon, 28 Nov 2022 05:31:15 +0000 (05:31 +0000)]
lei_mirror: load most modules up-front
lei lazy loads LeiMirror itself lazily, anyways, and it only
supports HTTP(S) mirrors, so there's no point in delaying most
of the modules it loads. Some of the inbox-specific and
v2-specific stuff can be lazy-loaded, however, since this
will support mirroring non-inbox repositories, too.
Eric Wong [Mon, 28 Nov 2022 05:31:14 +0000 (05:31 +0000)]
lei_mirror: load File::Path unconditionally
File::Temp already uses it, so there's no sense in conditionally
require-ing it to save startup time.
Eric Wong [Mon, 28 Nov 2022 05:31:13 +0000 (05:31 +0000)]
lei_mirror: consolidate clone process management
This simplifies our code by having fewer places check process
limits and perform reaping. We'll also print command names
immediately before executing, instead of right before waiting
for running processes.
Eric Wong [Mon, 28 Nov 2022 05:31:12 +0000 (05:31 +0000)]
lei_mirror: add a hint for skipped epoch permissions
Some users may think it's git-specific thing to enable
writability, rather than a *nix permissions thing. Clarify that
it's a standard *nix thing.
Eric Wong [Mon, 28 Nov 2022 05:31:11 +0000 (05:31 +0000)]
lei_mirror: elide description retrieval for v1|coderepo
manifest.js.gz can provide the description without an extra
HTTP(S) requests, so attempt to use it whenever we're using
the manifest.
Eric Wong [Mon, 28 Nov 2022 05:31:10 +0000 (05:31 +0000)]
lei_mirror: simplify _get_txt_start callers
We can avoid needless select()-based sleeps by always
using TMPDIR for temporary files, and just slurping the
small config or description file.
This will make it easier to reuse the description from
the manifest in the next commit.
Eric Wong [Mon, 28 Nov 2022 05:31:09 +0000 (05:31 +0000)]
manifest: update module blurb + v5.12
Helps steer new contributors (or forgetful old ones) in the
right direction.
Eric Wong [Mon, 28 Nov 2022 05:31:08 +0000 (05:31 +0000)]
switch inotify/kevent stuff to v5.12
Another tiny step towards an eventual startup time improvements
by avoiding strict.pm
Eric Wong [Mon, 28 Nov 2022 05:31:07 +0000 (05:31 +0000)]
lei_mirror: retrieve description text asynchronously, too
We can easily parallelize this, so do it.
Eric Wong [Mon, 28 Nov 2022 05:31:06 +0000 (05:31 +0000)]
lei_mirror: move directory creation to v2-only path
We rely on `git clone' to create the destination directory
for v1 and coderepos, so having it in _try_config_start was
senseless.
Eric Wong [Mon, 28 Nov 2022 05:31:05 +0000 (05:31 +0000)]
lei_mirror: default to single job by default
Parallel git clones are expensive on the server-side, and
smaller machines (which we encourage) can't handle them, well.
We'll also set `-q' since parallel clones will have output step
all over each other.
Eric Wong [Mon, 28 Nov 2022 05:31:04 +0000 (05:31 +0000)]
clone: support parallel v1 clones
This opens the door to parallel cloning of coderepos, too. We
can also get rid of needless AutoReap usage, here, too since
it's usage has been 100% synchronous and not DESTROY-based as
they are in tests.
Eric Wong [Mon, 28 Nov 2022 05:31:03 +0000 (05:31 +0000)]
lei_mirror: rely on global process reaper
We no longer rely on SIGCHLD for predictability, and instead
call waitpid at safe points. This will make it easier for us to
do parallel mirroring of multiple inboxes while preserving
proper dependencies via ->DESTROY callbacks.
Eric Wong [Mon, 28 Nov 2022 05:31:02 +0000 (05:31 +0000)]
lei_mirror: rely on DESTROY to index v2 inbox
This will give us more freedom in upcoming commits
to ensure indexing only happens after all all epochs
are cloned.
Eric Wong [Mon, 28 Nov 2022 05:31:01 +0000 (05:31 +0000)]
lei_mirror: async config retrieval for v2 w/ manifest
Another step towards being able to minimize mirror time by
supporting parallelization.
Eric Wong [Mon, 28 Nov 2022 05:31:00 +0000 (05:31 +0000)]
clone: parallelize v2 epoch clones
This is a first step in supporting completely parallelized
clones. Eventually, everything will be parallelized and
dependencies will be managed via callbacks.
Eric Wong [Mon, 28 Nov 2022 05:30:59 +0000 (05:30 +0000)]
clone: support --include and --exclude with multi-clone
These will be handy when someone is interested in a subset of
inboxes on a large hosting site.
Eric Wong [Mon, 28 Nov 2022 05:30:58 +0000 (05:30 +0000)]
clone: support multi-inbox clone
This is to ensure we can do `public-inbox-clone https://yhbt.net/lore'
or `public-inbox-clone https://lore.kernel.org/' and clone all
inboxes (and whatever else git stores).
Eric Wong [Sat, 26 Nov 2022 07:24:02 +0000 (07:24 +0000)]
filter/rubylang: adjust filter for new list software
The host serving ruby-core and ruby-dev no longer set
X-Mail-Count, but the serial number remains active in
the Subject.
mephi42 [Mon, 28 Nov 2022 20:25:21 +0000 (21:25 +0100)]
nntpd: fix LISTGROUP with range
This reverts
0c62cffc2389 ("nntp: listgroup_range_i: remove useless
`map' op") and adds a test that demonstrates the breakage: the server
returns lines like
ARRAY(0x556dace73f08)
instead of message numbers.
Fixes: 0c62cffc2389 ("nntp: listgroup_range_i: remove useless `map' op")
Eric Wong [Mon, 28 Nov 2022 20:34:06 +0000 (20:34 +0000)]
dskqxs:carp
Eric Wong [Sun, 27 Nov 2022 09:15:47 +0000 (09:15 +0000)]
content_hash: handle References as octets
The alsa-devel archives on lore has some UTF-8 References:
headers, so we need to treat them as octets, again, otherwise
(re)indexing triggers cascading failures.
Fixes: 5198c976ce8b "eml: header_raw converts octets to Perl UTF-8"
Eric Wong [Sat, 26 Nov 2022 09:55:16 +0000 (09:55 +0000)]
examples/nginx_proxy: recommend `proxy_buffering off'
public-inbox-httpd has always been designed to handle slow
clients efficiently via non-blocking sockets and epoll|kqueue.
Thus the proxy buffering capabilities of nginx were a needless
waste of memory and filesystem traffic and increases response
latency.
nginx does provide an HTTPS-capable reverse-proxy to talk to
varnish, however, any other HTTPS-capable reverse proxy works,
too.
Eric Wong [Fri, 25 Nov 2022 11:44:35 +0000 (11:44 +0000)]
SaPlugin::ListMirror: follow RFC 2919 List-ID rules
List-ID headers are sometimes populated with a descriptive phrase
before the angle-bracketed value and making things difficult to
match.
Tweak our handling to allow checking the angle-bracketed portion
only in accordance with RFC 2919.
Handling of all other headers and senselessly non-bracketed
values for List-ID remain unchanged.
Eric Wong [Thu, 24 Nov 2022 21:31:55 +0000 (21:31 +0000)]
eml: header_raw converts octets to Perl UTF-8
This fixes the display of raw (non-RFC 2047) names and subjects
in HTML message views.
SMTPUTF8 (RFC 6531) allows raw UTF-8 in headers without RFC 2047
encoding, so let Perl handle it as a character sequence for the
rest of our consumers. Thus, the old special case in
PublicInbox::Smsg->populate is no longer necessary and gone.
The one regression notice so far (and fixed here) is compressed
IMAP envelope responses still needs raw bytes since the zlib
wrapper is designed for octets, not Perl UTF-8 chars. Thus we
reverse utf8::decode with utf8::encode in PublicInbox::IMAP::_esc.
->header_set also forces encoding to bytes, since all existing
callers would either be dealing with ->header_raw results or
be RFC-2047-encoded anyways.
Reindexing is not necessary with this change due to the prior
PublicInbox::Smsg->populate special case.
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20221124153715.3nenjpjzj43vqxr2@meerkat.local/
Eric Wong [Wed, 23 Nov 2022 04:09:58 +0000 (04:09 +0000)]
lei_curl: use http.proxy config from git if available
Since HTTP(S) URLs hit by lei or public-inbox-{clone,fetch} are
expected to be git endpoints anyways, fall back to using
http.proxy from git configs to save the user from having to
maintain the same configuration for different things.
Eric Wong [Wed, 23 Nov 2022 04:09:57 +0000 (04:09 +0000)]
config: urlmatch $? does not influence our exits
We don't want to leak $? from `git config' failures into
lei nor public-inbox-* processes.
Eric Wong [Wed, 23 Nov 2022 04:09:56 +0000 (04:09 +0000)]
lei_curl: set --proxy for curl(1) properly
curl(1) doesn't accept `--proxy=' with the `=', apparently :x