Eric Wong [Mon, 8 Feb 2021 09:05:18 +0000 (23:05 -1000)]
git: implement date_parse method
Users are expected to be familiar with git's "approxidate"
functionality for parsing dates, so we'll expose that
in our UIs. Xapian itself has limited date parsing functionality
and I can't expect users to learn it.
This takes around 4-5ms on my aging workstation, so it'll
probably be made acceptable for the WWW UI, even.
libgit2 has a git__date_parse function which I expect to have
less overhead, but it's only for internal use at the moment.
Eric Wong [Mon, 8 Feb 2021 09:05:17 +0000 (23:05 -1000)]
lei: drop BSD::Resource usage
It's no longer necessary with the changes to stop doing
FD passing in our backend.
cf. commits
5180ed0a1cd65139 and
7d440bf3667b8ef5
("lei q: eliminate $not_done temporary git dir hack")
("lei q: reorder internals to reduce FD passing")
Eric Wong [Mon, 8 Feb 2021 09:05:16 +0000 (23:05 -1000)]
lei: avoid racing on unlink + bind + listen
When multiple lei(1) processes are starting in parallel without
lei-daemon already running, it's possible for them to trample
each others' socket path trying to start lei-daemon. Lock
errors.log before unlink/bind/listen. We'll add an extra
connect(2) attempt to check if the starter lost the race.
Without this change, a stress script like the following could
easily cause problems:
lei q -o ~/tmp/a foo ... &
lei q -o ~/tmp/b bar ... &
lei q -o ~/tmp/c quux ... &
lei q -o ~/tmp/d baz ... &
Eric Wong [Mon, 8 Feb 2021 09:05:15 +0000 (23:05 -1000)]
lei: start_pager: drop COLUMNS default
It shouldn't be needed since none of our subcommands will care
or attempt to format output. Once "lei show" is implemented,
we'll run "git show" directly on the result.
Eric Wong [Mon, 8 Feb 2021 09:05:14 +0000 (23:05 -1000)]
ds: improve add_timer usability
Packing args into an arrayref is awkward and we may be using
this API more in lei.
Eric Wong [Mon, 8 Feb 2021 09:05:13 +0000 (23:05 -1000)]
tests: favor IPv6
IPv4 gets plenty of real-world coverage, and apparently there's
Debian buildd hosts which lack IPv4(*). So ensure everything
can work on IPv6 and not cause problems for odd setups.
(*) https://bugs.debian.org/979432
Eric Wong [Mon, 8 Feb 2021 09:05:12 +0000 (23:05 -1000)]
lei q: support --alert=CMD for early MUA users
For --mua users writing to lock-free -o MFOLDER destinations;
we'll keep -WINCH and send an ASCII terminal bell when results
are complete. This is intended to let early MUA spawners know
when lei2mail is done writing results.
We'll also support running arbitrary commands. It may be used
to run play(1) (from SoX), handle pipelines+redirects
(e.g. "/bin/sh -c 'echo search done | wall'") or other commands.
Eric Wong [Mon, 8 Feb 2021 09:05:11 +0000 (23:05 -1000)]
lei q: SIGWINCH process group with the terminal
While using utime on the destination Maildir is enough for mutt
to eventually notice new mail, "eventually" isn't good enough.
Send a SIGWINCH to wake mutt (and likely other MUAs)
immediately. This is more portable than relying on MUAs to
support inotify or EVFILT_VNODE.
Eric Wong [Mon, 8 Feb 2021 09:05:10 +0000 (23:05 -1000)]
lei_xsearch: quiet Eml warnings from remote mboxrds
This will probably cover full Atom/HTML feed generation or any
outputs which are order-dependent, but those aren't prioritized
at the moment.
Eric Wong [Mon, 8 Feb 2021 09:05:09 +0000 (23:05 -1000)]
lei q: improve remote mboxrd UX + MUA
For early MUA spawners using lock-free outputs, we we need to
on the startq pipe to silence progress reporting. For
--augment users, we can start the MUA even earlier by
creating Maildirs in the pre-augment phase.
To improve progress reporting for non-MUA (or late-MUA)
spawners, we'll no longer blindly append "--compressed" to the
curl(1) command when POST-ing for the gzipped mboxrd.
Furthermore, we'll overload stringify ('""') in LeiCurl to
ensure the empty -d '' string shows up properly.
v2: fix startq waiting with --threads
mset_progress is never shown with early MUA spawning,
The plan is to still show progress when augmenting and
deduping. This fixes all local search cases.
A leftover debug bit is dropped, too
Eric Wong [Mon, 8 Feb 2021 09:11:03 +0000 (09:11 +0000)]
INSTALL: depend on Text::ParseWords
It's been distributed with Perl since 1994, and we use it for
both -imapd and lei. It's split out as a separate package in
CentOS 7.x, so we'll depend on it to avoid surprising users
of RPM-based distros.
Eric Wong [Sun, 7 Feb 2021 10:40:02 +0000 (09:40 -0100)]
lei q: fix arbitrary --mua command handling
Perl doesn't seem to warn for shadowed variables, here :x
Eric Wong [Mon, 8 Feb 2021 06:06:51 +0000 (05:06 -0100)]
lei import: support Maildirs
It seems to be working trivially, though I'm probably
going to split out Maildir reading into a separate
package rather than using LeiToMail.
Eric Wong [Sun, 7 Feb 2021 08:52:01 +0000 (08:52 +0000)]
httpd/async: avoid unnecessary on-stack delete
While this doesn't fix a known problem, this was a risky
construct in case somebody uses confess/longmess inside
the user-supplied callback.
cf. commit
0795b0906cc81f40
("ds: guard against stack-not-refcounted quirk of Perl 5")
Eric Wong [Sun, 7 Feb 2021 08:52:00 +0000 (08:52 +0000)]
imap: avoid unnecessary on-stack delete
None of the Content-Type attributes are long-lived
(and unlikely to be memory intensive). While these
callsites won't trigger $DB::args segfaults via
confess or longmess, it'll make future code audits
easier.
cf. commit
0795b0906cc81f40
("ds: guard against stack-not-refcounted quirk of Perl 5")
Eric Wong [Sun, 7 Feb 2021 08:51:56 +0000 (08:51 +0000)]
lei: replace --thread with --threads
Nobody is expected to use long options, but for consistency
with mairix(1), we'll use the pluralized option throughout
(including existing PublicInbox::{Search,SearchView}).
Link: https://public-inbox.org/meta/20210206090119.GA14519@dcvr/
Eric Wong [Sun, 7 Feb 2021 08:51:55 +0000 (08:51 +0000)]
lei: remove --mua-cmd alias for --mua
While "mua-cmd" may be more accurate, nobody is expected
to type 4 extra characters. It's a needless ambiguity
with no precedence or prior art to follow.
Link: https://public-inbox.org/meta/20210206090119.GA14519@dcvr/
Eric Wong [Sun, 7 Feb 2021 08:51:54 +0000 (08:51 +0000)]
lei: more consistent IPC exit and error handling
We're able to propagate $? from wq_workers in a consistent
manner, now.
Eric Wong [Sun, 7 Feb 2021 08:51:53 +0000 (08:51 +0000)]
ipc: wq_do => wq_io_do
We will have a ->wq_do that doesn't pass FDs for I/O.
Eric Wong [Sun, 7 Feb 2021 08:51:52 +0000 (08:51 +0000)]
Revert "ipc: add support for asynchronous callbacks"
This reverts commit
a7e6a8cd68fb6d700337d8dbc7ee2c65ff3d2fc1.
It turns out to be unworkable in the face of multiple producer
processes, since the lock we make has no effect when calculating
pipe capacity.
Eric Wong [Sun, 7 Feb 2021 08:51:51 +0000 (08:51 +0000)]
tests: guard setup_public_inboxes for SQLite and Xapian
This will need some work to before it's generally applicable
to the rest of our code base.
Eric Wong [Sun, 7 Feb 2021 08:51:50 +0000 (08:51 +0000)]
xapcmd: avoid potential die surprise in children
Make some notes about sub usage, this may be converted
to use workqueues once the cmsg dependency is dropped.
Eric Wong [Sun, 7 Feb 2021 08:51:49 +0000 (08:51 +0000)]
Makefile.PL: depend on IO::Uncompress::Gunzip
It's another part of the Perl standard library and rarely
split out from Perl (though we can't depend on that fact).
Eric Wong [Sun, 7 Feb 2021 08:51:48 +0000 (08:51 +0000)]
ipc: trim down the Storable checks
It's distributed with Perl and our Makefile.PL even declares a
dependency on it, just like Encode and all the Compress::*
stuff.
Eric Wong [Sun, 7 Feb 2021 08:51:47 +0000 (08:51 +0000)]
ipc: do not die inside wq_worker child process
die() in a child zips up the stack into the parent, which is
undesirable behavior. We're going to exit anyways, just warn
and let exit(1) happen due to $@ being set.
Eric Wong [Sun, 7 Feb 2021 08:51:46 +0000 (08:51 +0000)]
spawn_pp: die more consistently in child
The default $SIG{__DIE__} inside a forked child doesn't actually
do what we want it to do. We don't want it to zip up the stack
the parent used, but instead want to exit the child process
after warning.
Eric Wong [Sun, 7 Feb 2021 08:51:45 +0000 (08:51 +0000)]
lei add-external: handle interrupts with --mirror
This also updates lei_xsearch to follow the same pattern for
stopping curl(1) and tail(1) processes it spawns.
Eric Wong [Sun, 7 Feb 2021 08:51:44 +0000 (08:51 +0000)]
spawn: pi_fork_exec: support "pgid"
We'll be using this to allow the "git clone" process hierarchy
to be killed via Ctrl-C. This also fixes a long-standing bug
in error reporting for the Inline::C version, because we're
actually testing for errors, now!
n.b. strlen(3) is officially async-signal-safe as of
POSIX.1-2016, but I can't think of a reason any previous
implementation prior to that wouldn't be.
Eric Wong [Sun, 7 Feb 2021 08:51:43 +0000 (08:51 +0000)]
spawn: pi_fork_exec: restore parent sigmask in child
We continue to unblock SIGCHLD unconditionally, but also
any signals not blocked by the parent (wq_worker).
This will allow Ctrl-C (SIGINT) to stop "git clone" and allow
git-clone cleanup to be performed and other long-running
processes when pi_fork_exec supports setpgid(2). This won't
affect existing daemons on systems with signalfd(2) or
EVFILT_SIGNAL at all, since those run with signals blocked
anyways.
Eric Wong [Sat, 6 Feb 2021 12:18:44 +0000 (12:18 +0000)]
lei: remove short switch support for curl(1) options
In particular, -U and -u switches may conflict with diff(1)
options we may need for "lei show" which will use solver
remotely or locally.
Eric Wong [Sat, 6 Feb 2021 12:18:43 +0000 (12:18 +0000)]
lei_curl: replace -K/--config with --curl-config
Seeing --config in the command-line for lei may mislead users
into thinking we support config file overrides that way. Rename
the option to --curl-config and drop the short switch for now.
Eric Wong [Sat, 6 Feb 2021 12:18:42 +0000 (12:18 +0000)]
lei add-external: reject index and remote opts w/o mirror
Option combinations which make no sense should fail
to prevent misunderstandings and avoid surprises.
Eric Wong [Sat, 6 Feb 2021 12:18:41 +0000 (12:18 +0000)]
lei help: split out into separate file
We'll reword and improve formatting with non-breaking spaces
("\xa0") which is only replaced with SP after wrapping.
Some terminology is shortened (e.g. "URL_OR_PATHNAME" => "LOCATION")
to improve formatting.
This also enables completion for -h/--help and lets us
prioritize favored switch names while attempting to
satisfy users relying on muscle memory from other tools.
Eric Wong [Sat, 6 Feb 2021 12:18:40 +0000 (12:18 +0000)]
lei: add-external --mirror support
This can be useful for users who want to clone and
mirror an existing public-inbox. This doesn't have
update support, yet, so users will need to run
"git fetch && public-inbox-index" for now.
Eric Wong [Sat, 6 Feb 2021 12:18:39 +0000 (12:18 +0000)]
script/lei: avoid waitpid(-1, ...) to keep tests fast
We only spawn one process to be reaped at the moment. tests
will run the contents of script/* in the same process if
possible, so any test scripts which spawn -httpd or other
read-only can cause us to stall with waitpid(-1, ...)
Eric Wong [Sat, 6 Feb 2021 12:18:38 +0000 (12:18 +0000)]
treewide: replace confess with croak
The PublicInbox::Eml (and previously Email::MIME) use of confess
was the primary (or only) culprit behind the lei2mail segfaults
fixed by commit
0795b0906cc81f40.
("ds: guard against stack-not-refcounted quirk of Perl 5").
We never care about a backtrace when dealing with Eml objects
anyways, so it was just a worthless waste of CPU cycles.
We can also drop confess in a few other places. Since we only
use Perl and Inline::C, users will never be without source
and can replace s/croak/Carp::confess/ on a per-callsite basis
to help report problems.
It's also possible to use PERL5OPT=-MCarp=verbose in the
environment though still potentially risky.
Link: https://public-inbox.org/meta/20210201082833.3293-1-e@80x24.org/
Eric Wong [Sat, 6 Feb 2021 12:18:37 +0000 (12:18 +0000)]
tests: split out lei-daemon.t from lei.t
This makes it easier for hackers to find daemon-specific
tests and forces us to always test both daemon and
oneshot mode.
Eric Wong [Sat, 6 Feb 2021 12:18:36 +0000 (12:18 +0000)]
t/tests: split out setup_public_inboxes sub
We'll probably use this in many more existing places
and likely change non-lei tests to use it.
Eric Wong [Sat, 6 Feb 2021 12:18:35 +0000 (12:18 +0000)]
t/lei-externals: split out into separate test
This is still overloaded with "lei q" stuff, but that's
somewhat inevitable.
Eric Wong [Sat, 6 Feb 2021 12:18:34 +0000 (12:18 +0000)]
tests: add test_lei wrapper, split out t/lei-import.t
This will make it easier to maintain and test lei going forward,
we need to be testing against existing read-only daemons. We'll
also save ourselves some boilerplate by exporting all the
Test::More methods directly in TestCommon
We'll start using this by splitting out the latest "lei import"
tests into its own file.
Eric Wong [Sat, 6 Feb 2021 12:18:33 +0000 (12:18 +0000)]
lei_query: trim curl options
Get rid of short options which will or may conflict with
some of our own. We may switch over to "git -c http.*"
options since we need to run "git clone" and "git fetch"
anyways.
Eric Wong [Sat, 6 Feb 2021 12:18:32 +0000 (12:18 +0000)]
init: lowercase -j for --jobs
This is taken from common implementations of make(1)
and only affected people using the command-line help
output.
Eric Wong [Sat, 6 Feb 2021 12:18:31 +0000 (12:18 +0000)]
lei: abort lei_import worker on client abort
We'll stuff all the common wq key fields into the
@WQ_KEYS array so it's easier to keep track of what
to kill or reap.
Eric Wong [Sat, 6 Feb 2021 12:18:30 +0000 (12:18 +0000)]
lei: fix completion of --no-kw / --no-keywords
We did not complete --no-* flags properly when multiple options
are allowed.
Eric Wong [Sat, 6 Feb 2021 12:18:29 +0000 (12:18 +0000)]
lei: favor "keywords" over "flags", test --no-kw
JMAP brain says "keywords", IMAP brain says "flags";
JMAP brain wins today.
Since "keywords" is a bit long, support "kw" as a shortcut since
there's no conflict and "kw:" will be our search prefix for
looking up messages by keyword.
Eric Wong [Sat, 6 Feb 2021 12:18:28 +0000 (12:18 +0000)]
lei_overview: drop unnecessary autoflush call
This was actually causing xt/lei-sigpipe.t failures,
presumably due to reused/recycled workers with many
externals.
Eric Wong [Fri, 5 Feb 2021 00:13:54 +0000 (05:13 +0500)]
httpd/async: set O_NONBLOCK correctly
While Perl tie is nice for some things, getting
IO::Handle->blocking to work transparently with it doesn't
seem possible at the moment.
Add some examples in t/spawn.t for future hackers.
Fixes: 22e51bd9da476fa9 ("qspawn: switch to ProcessPipe via popen_rd")
Eric Wong [Thu, 4 Feb 2021 09:59:30 +0000 (00:59 -0900)]
lei import: initial implementation
Only tested with .eml files so far, but Maildir + IMAP
will be supported.
Eric Wong [Thu, 4 Feb 2021 09:59:29 +0000 (00:59 -0900)]
lei_xsearch: drop unused imports
Reaping is handled by the parent PublicInbox::IPC, and we
have no business using PublicInbox::Import since LeiXSearch
won't write to git directly (it will write via LeiStore).
Eric Wong [Thu, 4 Feb 2021 09:59:28 +0000 (00:59 -0900)]
lei_query: remove uneeded dwaitpid import
All process management is handled elsewhere.
Eric Wong [Thu, 4 Feb 2021 09:59:27 +0000 (00:59 -0900)]
lei q: eliminate $not_done temporary git dir hack
Another step towards simplifying lei internals.
None of our current uses of ->wq_do involve FD passing, and the
plan is only rely on FD passing between lei-daemon and lei(1).
Internally, it ought to be possible for lei-daemon internal bits
to be ordered properly to not need FD passing.
Eric Wong [Thu, 4 Feb 2021 09:59:26 +0000 (00:59 -0900)]
eml: handle warning ignores for lei
There's nothing we can do about bad emails in our search
results, so quiet things down and don't fight the MUA for
the terminal.
Eric Wong [Thu, 4 Feb 2021 09:59:25 +0000 (00:59 -0900)]
lei q: reinstate early MUA spawn for Maildir
Once all files are written, we can use utime() to poke Maildirs
to wake up MUAs that fail to account for nanosecond timestamps
resolution.
Eric Wong [Thu, 4 Feb 2021 09:59:24 +0000 (00:59 -0900)]
lei q: only start pager if output is to stdout
No need to be starting a pager if we're writing to a regular file.
Eric Wong [Thu, 4 Feb 2021 09:59:23 +0000 (00:59 -0900)]
lei q: reorder internals to reduce FD passing
While FD passing is critical for script/lei <=> lei-daemon,
lei-daemon doesn't need to use it internally if FDs are
created in the proper order before forking.
Eric Wong [Thu, 4 Feb 2021 09:59:22 +0000 (00:59 -0900)]
ipc: localize fields assignment
We don't want circular references giving surprising behavior
during worker exit.
Eric Wong [Thu, 4 Feb 2021 09:59:21 +0000 (00:59 -0900)]
lei q: delay worker spawn
Now that --stdin support is sorted, we can delay spawning
workers until we know the query is ready-to-run.
Eric Wong [Thu, 4 Feb 2021 02:10:07 +0000 (02:10 +0000)]
t/lei: skip "lei q" tests on missing dependencies
... for now. It's probably possible to just use send()
recv() without CMSG_* eventually.
Eric Wong [Thu, 4 Feb 2021 02:06:54 +0000 (02:06 +0000)]
pkt_op: do not exit subroutine via "next"
"next" apparently doesn't work in "do {} while" loops,
so just use "while" as it makes no difference, here.
Kyle Meyer [Thu, 4 Feb 2021 02:54:46 +0000 (21:54 -0500)]
wwwaltid: add missing word to instructions
Kyle Meyer [Thu, 4 Feb 2021 02:54:45 +0000 (21:54 -0500)]
www: call curl with -d '' in the altid instructions
Nginx doesn't appear to be happy with just -XPOST, so use -d '' to
avoid potential confusion about why the instructions aren't working.
cf. commit
533e1234bc03a1ca8754d249aa8c2ce157e26780
(lei_xsearch: use curl -d '' for nginx compatibility, 2021-01-24)
Eric Wong [Wed, 3 Feb 2021 21:51:44 +0000 (15:51 -0600)]
tests: guard against missing DBD::SQLite
The features we use for SharedKV could probably be implemented
with GDBM_File or SDBM_File, but that doesn't seem worth it at
the moment since we depend on SQLite elsewhere.
Eric Wong [Wed, 3 Feb 2021 21:51:43 +0000 (15:51 -0600)]
doc: update dependencies (+Storable, Data::Dumper)
The new IPC stuff doesn't work without Storable or Sereal.
Storable is part of the standard library since Perl 5.8, so
we'll put a hard dependency on it for distros that package
it separately.
Data::Dumper is also part of the standard library, and
PublicInbox::MboxReader uses it, and it's frequently useful
during development.
We'll also trim down INSTALL for standard library modules so
it's hopefully less daunting for new users.
Development dependencies are noted in HACKING, now.
Email::MIME is only used for maintainer tests, so it's only
documented in HACKING.
Eric Wong [Wed, 3 Feb 2021 21:51:42 +0000 (15:51 -0600)]
spawn: merge common C code together
There'll probably be more things which work on both GNU and
*BSD systems which we don't need separate strings for.
Eric Wong [Wed, 3 Feb 2021 21:51:41 +0000 (15:51 -0600)]
HACKING: use "just-ahead-of-time" to describe Inline::C
Inline::C works during module load time, so "just-ahead-of-time"
is a better description of it than "just-in-time". I don't
think "JAOT" is a well-known enough acronym, so it's worth
spelling it out.
Eric Wong [Wed, 3 Feb 2021 08:11:43 +0000 (22:11 -1000)]
lei q: support reading queries from stdin
This will be useful on shared machines when a user doesn't want
search queries visible to other users looking at the ps(1)
output or similar.
Eric Wong [Wed, 3 Feb 2021 08:11:42 +0000 (22:11 -1000)]
lei: use sleep(1) loop for infinite sleep
Perl may internally race and miss signals due to a lack of
self-pipe / eventfd / signalfd / EVFILT_SIGNAL usage. While our
event loop paths avoid these problems by using signalfd or
EVFILT_SIGNAL, thse sleep() calls are not within the event loop.
Eric Wong [Wed, 3 Feb 2021 08:11:41 +0000 (22:11 -1000)]
lei add-external: completion for existing URL basenames
Given the presence of one external on a certain host or prefix
path, it's logical other inboxes would share a common prefix.
For bash users, attempt to complete that using the "-o nospace"
option of bash
Eric Wong [Wed, 3 Feb 2021 08:11:40 +0000 (22:11 -1000)]
lei: help starts pager
Because some commands have many options which take up
multiple screens.
Eric Wong [Wed, 3 Feb 2021 08:11:39 +0000 (22:11 -1000)]
lei: complete basenames for include|exclude|only
This will make it even easier for RSI-afflicted users to use,
since many externals may share a common prefix.
Eric Wong [Wed, 3 Feb 2021 08:11:38 +0000 (22:11 -1000)]
lei q: -I/--exclude/--only support globs and basenames
We can do basename matching when it's unambiguous. Since '*?[]'
characters are rare in URLs and pathnames, we'll do glob
matching by default to support a (curl-inspired) --globoff/-g
option to disable globbing.
And fix --exclude while we're at it
Eric Wong [Wed, 3 Feb 2021 08:11:37 +0000 (22:11 -1000)]
lei: propagate curl errors, improve internal consistency
IO::Uncompress::Gunzip seems to be losing $? when closing
PublicInbox::ProcessPipe. To workaround this, do a synchronous
waitpid ourselves to force proper $? reporting update tests to
use the new --only feature for testing invalid URLs.
This improves internal code consistency by having {pkt_op}
parse the same ASCII-only protocol script/lei understands.
We no longer pass {sock} to worker processes at all,
further reducing FD pressure on per-user limits.
Eric Wong [Wed, 3 Feb 2021 08:11:36 +0000 (22:11 -1000)]
lei: err: avoid uninitialized variable warnings
Eric Wong [Wed, 3 Feb 2021 08:11:35 +0000 (22:11 -1000)]
pkt_op: rely on DS::in_loop global
No reason to check for $lei->{oneshot} here.
Eric Wong [Wed, 3 Feb 2021 08:11:34 +0000 (22:11 -1000)]
lei: further reduce lei2mail FD pressure
We don't need to be sending errors directly to the client, but
instead go through lei-daemon or the top-level one-shot process.
Eric Wong [Wed, 3 Feb 2021 08:11:33 +0000 (22:11 -1000)]
lei: reduce FD pressure from lei2mail worker
lei2mail doesn't need stdin anymore, so we can use the [0] slot
for the $not_done keepalive purposes.
Eric Wong [Tue, 2 Feb 2021 11:47:02 +0000 (11:47 +0000)]
lei q: support --jobs [SEARCHERS],[WRITERS]
This comma-delimited parameter allows controlling the number or
lei_xsearch and lei2mail worker processes. With the change
to make IPC wq_* work use the event loop, it's now safe to
run fewer worker processes for searching with no risk of
deadlocks.
MAX_PER_HOST isn't configurable yet for remote hosts,
and maybe it shouldn't be due to potential for abuse.
Eric Wong [Tue, 2 Feb 2021 11:47:01 +0000 (11:47 +0000)]
lei q: tidy up progress reporting
We won't be reporting progress when output is going to stdout
since it can clutter up the terminal unless stderr != stdout,
which probably isn't worth checking.
We'll also use a more agnostic mset_progress which may
make it easier to support worker-less invocations.
Eric Wong [Tue, 2 Feb 2021 11:47:00 +0000 (11:47 +0000)]
lei_overview: avoid unnecessary {l2m} delete
We may reuse these objects in the non-worker code paths.
Eric Wong [Tue, 2 Feb 2021 11:46:59 +0000 (11:46 +0000)]
doc: lei-q: note "-a" and link to Xapian QueryParser
"-a" is supported by mairix, too. We should also note somewhere
the query parsing features supported by Xapian.
Eric Wong [Tue, 2 Feb 2021 11:46:58 +0000 (11:46 +0000)]
lei_xsearch: ensure curl.err and tail(1) cleanup happens
We can safely rely on exit(0) here when interacting with curl(1)
and git(1), unlike query workers which hit Xapian directly,
where some badness happens when hit with a signal while
retrieving an mset.
Eric Wong [Tue, 2 Feb 2021 11:46:57 +0000 (11:46 +0000)]
pktop: fix potential undefined var
In case we have other bugs in our code.
Eric Wong [Tue, 2 Feb 2021 11:46:56 +0000 (11:46 +0000)]
cmd_ipc4: fix comments and formatting
Eric Wong [Tue, 2 Feb 2021 11:46:55 +0000 (11:46 +0000)]
lei q: do not leave temporary files after oneshot exit
Avoid on-stack shortcuts which may prevent destructors from
firing since we're not inside the event loop. We'll also tidy
up the unlink mechanism in LeiOverview while we're at it.
Eric Wong [Tue, 2 Feb 2021 11:46:54 +0000 (11:46 +0000)]
lib: explicitly distinguish oneshot use
The daemon must not be fooled into thinking it's in oneshot
after a lei client disconnects and erases {sock}.
Eric Wong [Tue, 2 Feb 2021 11:46:53 +0000 (11:46 +0000)]
lei_xsearch: truncate curl stderr after reading it
We may have further URLs to read in that process, so ensure
we don't end up having tail send stale data.
Eric Wong [Tue, 2 Feb 2021 11:46:52 +0000 (11:46 +0000)]
lei: q: shell completion for --(include|exclude|only)
Because .onion URLs names are long!
Eric Wong [Tue, 2 Feb 2021 11:46:51 +0000 (11:46 +0000)]
lei: complete: do not complete non-arg options w/ help text
Some of our command-line switches take no arguments, and need
no completion for those arguments.
Eric Wong [Tue, 2 Feb 2021 11:46:50 +0000 (11:46 +0000)]
lei q: support --only, --include and --exclude
-I is short for --include since it's standard for C compilers
(along with Perl and Ruby). There are no single-character
shortcuts for --exclude or --only, since I don't expect
--exclude to be used very often and --only is already short (and
will support shell completion).
Eric Wong [Tue, 2 Feb 2021 11:46:49 +0000 (11:46 +0000)]
lei q: emit progress and counting via PktOp
Sometimes it can be confusing for "lei q" to finish writing to a
Maildir|mbox and not know if it did anything. So show some
per-external progress and stats.
These can be disabled via the new --quiet/-q switch.
We differ slightly from mairix(1) here, as we use stderr
instead of stdout for reporting totals (and we support
parallel queries from various sources).
Eric Wong [Tue, 2 Feb 2021 11:46:48 +0000 (11:46 +0000)]
lei_query: default to 10000 messages as documented
Otherwise, we were only getting 50 matches without (-t)
thread expansion.
Eric Wong [Tue, 2 Feb 2021 11:46:47 +0000 (11:46 +0000)]
lei: switch to use SEQPACKET socketpair instead of pipe
This will allow us to use larger messages and do progress
reporting to accumulate in the main daemon.
Eric Wong [Mon, 1 Feb 2021 08:28:33 +0000 (22:28 -1000)]
doc: note optional BSD::Resource use
We've actually been capable of using this since 2019(*) in our
spawn code for PSGI limiters. And it's been used since 2016 in
our tests. It's a dependency of SpamAssassin, and Danga::Socket
used it, too.
(*) commit
721368cd04bfbd03c0d9173fff633ae34f16409a
("spawn: support RLIMIT_CPU, RLIMIT_DATA and RLIMIT_CORE")
Eric Wong [Mon, 1 Feb 2021 08:28:32 +0000 (22:28 -1000)]
lei: avoid ETOOMANYREFS, cleanup imports
As with PublicInbox::IPC, we'll attempt to bump RLIMIT_NOFILE
and transparently workaround ETOOMANYREFS. If that fails,
we'll give the user a hint to bump RLIMIT_NOFILE since
ETOOMANYREFS is an uncommon error which users may be unfamiliar
with.
Found while stress testing for segfaults.
Eric Wong [Mon, 1 Feb 2021 08:28:31 +0000 (22:28 -1000)]
ds: next_tick: avoid $_ in top-level loop iterator
$_ at the top of a potentially deep stack below may cause
surprising behavior as I experienced with ExtSearchIdx. In the
future, we'll limit our $_ usage to easily-auditable bits (e.g.
map, grep, and small for loops)
Eric Wong [Mon, 1 Feb 2021 08:28:30 +0000 (22:28 -1000)]
ds: guard against stack-not-refcounted quirk of Perl 5
The Perl 5 stack is weakly-referenced for performance reasons.
This means it's possible for items in the stack to be freed
while executing further down the stack.
In lei (and perhaps public-facing read-only daemons in the
future), we'll fork and call PublicInbox::DS->Reset in the child
process. This causes %DescriptorMap to be clobbered, allowing
the $DescriptorMap{$fd} arg to be freed inside the child
process.
When Carp::confess or Carp::longmess is called to generate a
backtrace, it may access the @DB::args array. This array access
is not protected by reference counting and is known to cause
segfaults and other weird errors.
While the caller of an unnecessary Carp::confess may be
eliminated in a future commit, we can't guarantee our
dependencies will be free of @DB::args access attempts
in the future.
So guard against this Perl 5 quirmk by defensively bumping the
refcount of any object we call ->event_step on.
cf. https://rt.perl.org/Public/Bug/Display.html?id=131046
https://github.com/Perl/perl5/issues/15928
Eric Wong [Mon, 1 Feb 2021 08:28:29 +0000 (22:28 -1000)]
import: reap git-config(1) synchronously
This avoids a zombie if another step of the event loop
takes too long.
Eric Wong [Mon, 1 Feb 2021 08:28:28 +0000 (22:28 -1000)]
sharedkv: do not set cache_size by default
These DBs will probably be too small to be worth increasing the
cache size of.
Eric Wong [Mon, 1 Feb 2021 08:28:27 +0000 (22:28 -1000)]
lei_to_mail: reduce spew on Maildir removal
At most, we'll only warn once per worker when a Maildir
disappears from under us. We'll also use the '!' OpPipe
to note the exceptional condition, and use '|' to SIGPIPE
so it'll be a bit easier for hackers to remember.
Eric Wong [Mon, 1 Feb 2021 08:28:26 +0000 (22:28 -1000)]
sharedkv: use lock_for_scope_fast
This allows us to avoid repeated open() and close() syscalls
and speeds up the new xt/stress-sharedkv.t maintainer test
by roughly 7%.