Eric Wong [Fri, 26 Feb 2021 09:41:38 +0000 (22:41 -1100)]
lei q: support mbox locking by default
While this diverges from from mairix(1) behavior, it's the safer
option. We'll follow Debian policy by supporting fcntl and
dotlocks by default (in that order). Users who do not want
locking can use "--lock=none"
This will be used in a read-only capacity for watching
mailboxes for keyword updates via inotify or EVFILT_VNODE.
Eric Wong [Thu, 25 Feb 2021 10:11:06 +0000 (10:11 +0000)]
lei q: -tt marks direct hits as "flagged"
This can be used to quickly distinguish messages which were
direct hits when doing thread expansion vs messages that
were merely part of the same thread.
This is NOT mairix-derived behavior, but I occasionally found
it useful when looking at results in an MUA to know whether
a message was a direct hit or not.
This makes "-t" consistent with non-"-t" cases as far as keyword
reading goes.
Eric Wong [Thu, 25 Feb 2021 10:11:04 +0000 (10:11 +0000)]
lei import: use --in-format/-F for consistency
Since we recommend $IN_FORMAT:$LOCATION, this is hopefully not
intrusive (not that this is released software, yet). This is
to be consistent with "lei convert" usage.
We'll keep "-f" only for output formats, since that is used
for "lei q" and "lei convert" for outputs
Eric Wong [Thu, 25 Feb 2021 10:11:03 +0000 (10:11 +0000)]
lei convert: support IMAP output and "-F eml" inputs
eml ("message/rfc822" MIME type) is supported by "lei import",
so it probably makes sense to support via convert, at least
for tests. And IMAP support is supported in "lei q -o $MFOLDER",
so this only required renaming {nrd} => {net} and initializing
outputs before augment preparation (creating the IMAP folder)
Eric Wong [Wed, 24 Feb 2021 23:37:18 +0000 (05:37 +0600)]
lei q: auto-memoize remote messages into lei/store
This lets users avoid network traffic on subsequent searches at
the expense of local disk space. --no-import-remote may be
specified to reverse this trade-off for users with little
storage.
Eric Wong [Wed, 24 Feb 2021 23:37:17 +0000 (05:37 +0600)]
lei_external: don't treat IPv6 URLs as globs
IPv6 addresses are hexadecimals and colons inside brackets, so
add some DWIM-ery to ensure we don't attempt to treat addresses
like "http://[dead:beef]/foo/" as a glob.
Uwe Kleine-König [Wed, 24 Feb 2021 08:54:56 +0000 (09:54 +0100)]
www: use PublicInbox::WwwStream
This prevents the following problem logged to the webserver's error log:
E: Undefined subroutine &PublicInbox::WwwStream::code_footer called at /usr/share/perl5/PublicInbox/WwwListing.pm line 102.
in PublicInbox::ConfigIter=ARRAY(0x557aea68b1a8)::each_section at /usr/share/perl5/PublicInbox/ConfigIter.pm line 37.
Fixes: 7a3946ef122e ("www: support listing of inboxes")
Eric Wong [Tue, 23 Feb 2021 10:01:15 +0000 (04:01 -0600)]
lei q: reduce default lei2mail workers
While disk I/O is typically buffered for good scheduling,
git blob decoding uses a non-trivial amount of CPU time
and it helps to leave some CPU available for it.
Eric Wong [Tue, 23 Feb 2021 10:01:14 +0000 (04:01 -0600)]
lei: support "-C" to chdir in all sub commands
We'll also support "-C" at the end of most commands to give
users a little more flexibility when building command-lines.
This conflicts with "lei daemon-kill -CHLD", so that's
special-cased since "-C" makes no sense with daemon-kill,
anyways.
Unlike "git show", the to-be-implemented "lei show" will diverge
and enable "--find-copies[=<n>]" by default, so "-C[<n>]" won't
be necessary.
Eric Wong [Mon, 22 Feb 2021 11:22:59 +0000 (08:22 -0300)]
lei_auth: trim and remove leftover worker code
LeiAuth is no longer a separate worker process. Instead, it's
used directly by LeiToMail and LeiImport for sharing auth info
from the first worker to the rest of the workers, using
lei-daemon as a message router. So drop the old code to reduce
human cognitive load and interpreter memory overhead.
Eric Wong [Mon, 22 Feb 2021 11:22:57 +0000 (08:22 -0300)]
net_reader: mic_get: reuse connections if cache enabled
We only enable {mic_cached} in WQ workers, and those
aren't expected to fork again going forward. So cache
here avoid a penalty for the non-augmenting (imap_delete_all)
call with "lei q"
Eric Wong [Mon, 22 Feb 2021 11:22:56 +0000 (08:22 -0300)]
lei q: reduce wasted IMAP connection for auth
We can rework the first lei2mail worker to authenticate, and
then share auth info with the rest of the lei2mail workers. As
with "lei import", this uses PktOp and lei-daemon to share
updated credentials between the first an subsequent l2m workers.
Eric Wong [Mon, 22 Feb 2021 11:22:53 +0000 (08:22 -0300)]
lei convert: auth directly from worker process
Since this only has one worker, we can auth directly in the
worker since the convert worker now has access to the script/lei
{sock} for running "git credential".
Eric Wong [Mon, 22 Feb 2021 11:22:52 +0000 (08:22 -0300)]
lei: _lei_cfg: return empty hashref if unconfigured
Existing callers in LeiExternal actually depend on this,
and LeiAuth shouldn't need to be creating a config file
just to do a conversion against an anonymous IMAP server.
Eric Wong [Mon, 22 Feb 2021 11:22:51 +0000 (08:22 -0300)]
lei: keep client {sock} in short-lived workers
For non-persistent workers, there's no harm in keeping the
client socket open. This means we can avoid dancing around
closing it in PublicInbox::LeiAuth::ipc_atfork_child.
Eventually, other WQ workers will trigger "git credential"
spawning in script/lei directly.
Eric Wong [Mon, 22 Feb 2021 06:18:55 +0000 (06:18 +0000)]
lei_store: populate ALL.git/alternates with new epochs
Since eidx_init updates ALL.git/objects/info/alternates, we need
to ensure new epochs we create from LeiStore->importer exist
before eidx_init writes alternates.
Eric Wong [Sun, 21 Feb 2021 18:28:15 +0000 (00:28 +0600)]
lei-daemon: prefer graceful shutdowns
We'll keep the daemon alive as long as a a script/lei client
remains connected. This ought to improve user experience
and is in line with what -imapd/-httpd/-nntpd users have
expected over the years.
Kyle Meyer [Sun, 21 Feb 2021 21:46:11 +0000 (16:46 -0500)]
t/www_listing: require grok-pull version 2 or later
The grok-pull-based tests in www_listing are incompatible with
Grokmirror v2 in two ways: the generated configuration format and the
expected exit codes. Update the tests to work with v2, and skip them
for earlier versions.
This was tested with the latest release of Grokmirror, v2.0.7. Note
that the "pull" and "fsck" sections are required even though they're
empty.
Kyle Meyer [Sun, 21 Feb 2021 21:46:10 +0000 (16:46 -0500)]
t/www_listing: reword grok-pull skip message
Make it clear that this skip is because grok-pull isn't available at
all because the next commit will add another skip for older versions
of Grokmirror.
Eric Wong [Sun, 21 Feb 2021 07:41:34 +0000 (07:41 +0000)]
lei2mail: parallel augment for lock-free stores
This lets us make use of multiple cores on IMAP and Maildir
backed by SSD (or better) storage. This benefits IMAP stores
with high network latency, but may still penalize IMAP servers
with rotational storage.
Eric Wong [Sun, 21 Feb 2021 07:41:32 +0000 (07:41 +0000)]
ipc: support setting a locked number of WQ workers
We can use this to ensure sharded work doesn't do unexpected
things if workers are added/removed. We currently don't
increase/decrease workers once a workqueue is started, but
non-lei code (-httpd/imapd) may start doing so.
This also fixes a bug where lei2mail workers could not
be adjusted via --jobs on the command-line.
Eric Wong [Sun, 21 Feb 2021 07:41:30 +0000 (07:41 +0000)]
ipc: add wq_broadcast
We'll give workqueues a broadcast mechanism to ensure all
workers see a certain message. We'll also tag each worker
with {-wq_worker_nr} in preparation for work distribution.
This is intended to avoid extra connection and fork() costs
from LeiAuth in a future commit.
Eric Wong [Fri, 19 Feb 2021 12:09:54 +0000 (05:09 -0700)]
net_writer: start implementing IMAP write support
Requiring TEST_IMAP_WRITE_URL to be set to a writable IMAP
server URL isn't ideal, but it works for now until we have time
to setup a mock dovecot/cyrus/etc... instance for testing.
Eric Wong [Fri, 19 Feb 2021 12:09:53 +0000 (05:09 -0700)]
net_reader: handle single-message IMAP mailboxes
Due to an off-by-one error, we were unable to read mailboxes
with only a single message of UID:1. Without this fix, the
message with UID:1 could only be read after UID:2 was created;
so there's no permanent data loss as long as a new message
showed up.
This affects all releases of public-inbox-watch with IMAP
support, though it probably went unnoticed because single
message inboxes are rare.
Eric Wong [Fri, 19 Feb 2021 00:58:32 +0000 (00:58 +0000)]
emergency: modernize and reduce syscalls
As with LeiToMail, we'll exclusively rely on O_EXCL and EEXIST
instead of "-f" (stat(2)) for file name collision checking.
Furthermore, we can rely on link(2) error handling instead of
using stat(2) to check the result of link(2).
We'll still keep the hostname in these filenames, but memoize it
on a per-instance basis since hostname changes are rare and we
can assume it won't change between "tmp" and "cur".
We'll also start embedding the PID as {"tmp.$$"} into the fiel
name to guard against accidental deletion in child processes,
instead of requiring an extra hash lookup.
Finally, avoid multiple getpid(2) syscalls in internal subs
since glibc no longer caches in getpid(3).
We'll also favor constant comparison of $! against EEXIST for
inlining. and stop doing ->autoflush when we only have a single
print + flush.
Eric Wong [Thu, 18 Feb 2021 20:22:24 +0000 (23:22 +0300)]
lei: consolidate the bulk of the IPC code
The backends for "lei add-external --mirror", "lei convert", and
"lei import" all share a similar pattern for spawning background
workers. Hoist out the common parts to slim down our code base
a bit.
The LeiXSearch and LeiToMail workers for "lei q" remains a the
odd duck due to the deep pipelining and parallelization.
Eric Wong [Thu, 18 Feb 2021 20:22:22 +0000 (23:22 +0300)]
lei convert: mail format conversion sub-command
This will make testing IMAP support for other commands easier, as
it doesn't write to lei/store at all. Like the pager and MUA,
"git credential" is always spawned by script/lei (and not
lei-daemon) so it has a controlling terminal for password
prompts.
v2: fix missing requires, correct test ordering
v3: ensure config exists for IMAP auth
Eric Wong [Wed, 17 Feb 2021 10:07:02 +0000 (09:07 -0100)]
tests: setup_public_inboxes: use IMAP-friendly newsgroups
-imapd won't support newsgroups ending with /\.[0-9]+\z/ since
it reserves those for partitioning inboxes into 50K slices.
So bump the home[0-9]+ version and switch to IMAP-friendly
newsgroup names.
Eric Wong [Thu, 11 Feb 2021 05:57:28 +0000 (12:57 +0700)]
search: query_approxidate: cleanup regexp, more tests
The cleanup doesn't seem to matter, I initially thought I needed
to handle "" (two double quotes) explicitly because that's what
Xapian does to escape a double quote inside a double-quoted
phrase. It turns out we only need to be able to pass phrases
through to Xapian unmodified, and the existing group of
["\x{201c}\x{201d}] is sufficient for our purposes.
Eric Wong [Fri, 12 Feb 2021 07:05:52 +0000 (00:05 -0700)]
mbox_reader: do not chomp non-blank EOL
It's conceivable some cases won't generate an empty line before
an mboxrd or mboxo From_ line. Ensure we can handle that case
and don't leave the Eml->{bdy} without a trailing LF character.
And drop an unnecessary alarm import while we're in the area.
Eric Wong [Fri, 12 Feb 2021 07:05:50 +0000 (00:05 -0700)]
filter/vger: kill trailing newlines aggressively
PublicInbox::MboxReader->(mboxrd|mboxo) only deletes the last
trailing newline, not every single trailing newline like
InboxWritable->import_mbox does.
Testing PublicInbox::MboxReader->mboxrd (next commit) with
scripts/import_vger_from_mbox on the LKML archive I got 2018 for
v2 development; this difference was responsible for a single
spam message(*) from out of 2722831 not being filtered correctly
and returning a different result.
Eric Wong [Wed, 10 Feb 2021 19:57:58 +0000 (18:57 -0100)]
search: use git approxidate in WWW and "lei q --stdin"
This greatly improves the usability of d:, dt:, and rt: search
prefixes for users already familiar git's "approxidate" feature.
That is, users familiar with the --(since|after|until|before)=
options in git-log(1) and similar commands will be able to use
those dates in the WWW UI.
Eric Wong [Wed, 10 Feb 2021 07:07:49 +0000 (07:07 +0000)]
net_reader: new package split from -watch
We'll be using some of this for IMAP and NNTP support in lei,
too. More will need to be done to improve code sharing and
reusability, soon, but this is a start.
Eric Wong [Wed, 10 Feb 2021 07:07:44 +0000 (07:07 +0000)]
lei *external: glob improvements, ls-external filtering
The "ls-external" now accepts the same glob patterns used by
with lei q --{include,only,exclude}. If no glob is detected, it
will be treated as a literal substring match (like "grep -F").
Eric Wong [Tue, 9 Feb 2021 08:09:37 +0000 (07:09 -0100)]
tests|lei: fixes for TEST_RUN_MODE=0 and lei oneshot
DESTROY callbacks can clobber $?, so we must take care to
preserve it when exiting. We'll also try to make an effort to
ensure better DESTROY ordering and delete as much as possible
before x_it finishes.
We also need to load PublicInbox::Config when setting up
public inboxes.
Eric Wong [Tue, 9 Feb 2021 08:09:34 +0000 (07:09 -0100)]
lei q: prefix --alert ops with ':' instead of '-'
Using dashed keywords confuses the option parser without
"=" signs (and bash completion doesn't yet work with "=").
So use ":" instead of "-" as the prefix for internal ops,
since ":" is just as unlikely to be the first character of
an executable file in a user's $PATH.
Eric Wong [Tue, 9 Feb 2021 08:09:33 +0000 (07:09 -0100)]
use MdirReader in -watch and InboxWritable
MdirReader now handles files in "$MAILDIR/new" properly and
is stricter about what it accepts. eml_from_path is also
made robust against FIFOs while eliminating TOCTOU races with
between stat(2) and open(2) calls.
Eric Wong [Tue, 9 Feb 2021 08:09:32 +0000 (07:09 -0100)]
t/run.perl: fix for >128 tests
We need to explicitly close the write-end of the pipe in workers
to ensure they don't prevent each other from seeing EOF.
Also, make a note to keep using the pipe for now since
Linux <3.14 had broken read(2) semantics when file descriptions
are shared across threads/processes.
Eric Wong [Tue, 9 Feb 2021 08:09:31 +0000 (07:09 -0100)]
lei: split out MdirReader package, lazy-require earlier
We'll do more requires in the top-level lei-daemon process to
save work in workers. We can also work towards aborting on
user errors in lei-daemon rather than worker processes.
"lei import -f mbox*" is finally tested inside t/lei_to_mail.t
Eric Wong [Tue, 9 Feb 2021 08:09:30 +0000 (07:09 -0100)]
git: ->qx: respect caller's $/ in array context
This could lead to bad results when doing ls-tree -z
for v2 import in case there's multiple files. In any case,
the `local $/ = "\0"' in Import.pm is also eliminated to
reduce potential confusion and surprises.
Eric Wong [Tue, 9 Feb 2021 08:09:29 +0000 (07:09 -0100)]
t/cgi.t: modernizations and style updates
We prefer BAIL_OUT or fail to die in tests (I didn't know
BAIL_OUT existed when I started the project). We can also
depend on IO::Uncompress::Gunzip being available,
We'll keep the cgi_run wrapper since the .cgi could
use some coverage and remove the FIXME note. run_script
makes tests fast enough.