Eric Wong [Tue, 13 Apr 2021 10:54:42 +0000 (10:54 +0000)]
lei_xsearch: use per-external queries when not sorting
We only need the combined mset query when we care about sort
order. When writing to --output destinations intended for MUA
consumption, sort order is irrelevant as MUAs are expected to
offer their own sorting, so run queries to each external in
parallel.
This prepares us for docid-sort-based saved search support.
It will also become faster than the combined mset query for
users with many externals due to current Xapian exhibiting poor
performance with many shards (the same reason -extindex exists)
Eric Wong [Sun, 11 Apr 2021 05:32:55 +0000 (05:32 +0000)]
www: do not obfuscate addresses in URLs
As they are likely Message-IDs. If an email address ends up in
a URL, then it's likely public, so there's even less reason to
obfuscate that particular address.
[km: add xt/perf-obfuscate.t]
[ew: modernize perf test (5.10.1), use diag instead of print]
This version of the patch avoids the massive slowdown noted by Kyle in
<https://public-inbox.org/meta/87wnt9or6t.fsf@kyleam.com/>.
Performance remains roughly the same, if not slightly faster
(which may be due to me testing this on a busy server). Results
from xt/perf-obfuscate.t against 6078 messages on a local mirror
of <https://public-inbox.org/meta/>:
before: 6.67 usr + 0.04 sys = 6.71 CPU
after: 6.64 usr + 0.04 sys = 6.68 CPU
import: convert init.defaultBranch to fully qualified ref
init.defaultBranch expects a branch name, not a fully qualified ref.
git-init prepends "refs/heads/" automatically and unconditionally.
PublicInbox::Import::default_branch, however, incorrectly passes on
the init.defaultBranch value as is, leading to it being used in spots
where a fully qualified ref is required. For example, with an
init.defaultBranch value of "master", public-inbox-index for a v2
repository would lead to an all.git repository where HEAD's content is
"ref: master" instead of "ref: refs/heads/master".
Prepend "refs/heads/" to the incoming init.defaultBranch value.
Eric Wong [Mon, 5 Apr 2021 10:27:52 +0000 (10:27 +0000)]
lei q: fix auth IMAP --output with remote mboxrd
IMAP authentication info is only shared amongst lei2mail workers,
so we must ensure all IMAP writes go through lei2mail workers
even if we don't have to access the mail through git.
This allows us to decouple the latency of the remote mboxrd from
the latency of the IMAP --output at the expense of extra IPC
overhead within our own processes.
Eric Wong [Mon, 5 Apr 2021 10:27:51 +0000 (10:27 +0000)]
lei_to_mail: improve comments and reduce LoC
We don't need to waste LoC on corner cases, single-use internal
subs, or restoring SIG{__WARN__} when a process exits. All that
extra code contributes to memory use and startup time, especially
for users who can't use FD passing.
Eric Wong [Sun, 4 Apr 2021 17:38:07 +0000 (22:38 +0500)]
lei_search: ignore Resent-Message-ID for indexing
It currently conflicts with the way OverIdx and SearchIdx
index messages, ultimately leading to violating a NOT NULL
constraint on id2num.id in over.sqlite3.
We may allow searching Resent-* fields separately, though I'm
not sure how useful it'll be.
Since every command that writes to lei/store calls ->done
to commit its output, we can rely on that to return a
pathname for a readable file with errors in it.
Errors can still get crossed up if multiple lei commands
are writing to the store at once, but reduces the delay
in seeing them and ensures it won't get seen when somebody
is attempting to use shell completion.
Eric Wong [Sat, 3 Apr 2021 10:48:26 +0000 (10:48 +0000)]
lei: improve handling of Message-ID-less draft messages
We need a stable fallback time for digest2mid in the presence
of messages without Received/Date headers. Furthermore, we
must avoid using uninitialized smsg->{mid} when parsing
References for draft replies.
Eric Wong [Sat, 3 Apr 2021 01:37:32 +0000 (22:37 -0300)]
lei q: don't show remote progress if MUA is running
Remote results can safely use the same mset progress reporting
as local results, despite not knowing the size of the result
set. We're assuming terminal MUAs, for now.
Eric Wong [Sat, 3 Apr 2021 02:24:24 +0000 (02:24 +0000)]
lei tag: fix tagging of IMAP inputs
We need net_merge_all and to lock the number of worker jobs.
Parallel inputs are not supported, yet (is it needed?, I don't
expect this to be used for multiple files very often...).
Eric Wong [Sat, 3 Apr 2021 02:24:23 +0000 (02:24 +0000)]
lei q: ensure wq workers shutdown on IMAP auth failures
Leaving workers running on after auth failures is bad and messy,
cleanup our process management to have consistent worker
teardowns. Improve error reporting, too, instead of letting
Mail::IMAPClient->exists fail due to undef.
Eric Wong [Fri, 2 Apr 2021 09:42:54 +0000 (05:42 -0400)]
lei: fix git-credential handling
I completely forgot about git-credential prompting when
making lei background the client process for MUA.
Now it backgrounds itself only for the MUA when no FDs are
passed, since the MUA is the final command run. Otherwise, it
relies on FD passing as before.
Fixes: c790a75439f3a1db ("script/lei: background ourselves on MUA/pager exec")
Eric Wong [Thu, 1 Apr 2021 12:10:41 +0000 (17:10 +0500)]
lei_store: quiet down git user info being unset
lei_store contents aren't intended to become public, so there's
no point in nagging users for their email address for git
committer information like git does.
Eric Wong [Thu, 1 Apr 2021 09:32:38 +0000 (02:32 -0700)]
lei sucks: sub-command to aid bug reporting
It's a bit of an Easter egg, though it's not possible to hide those
in Free Software... Anyways, it doesn't cost us an entry in %CMD
of LEI.pm and anybody frustrated enough with lei just might type
"lei sucks" on the command-line :>
Eric Wong [Wed, 31 Mar 2021 23:29:36 +0000 (23:29 +0000)]
script/lei: background ourselves on MUA/pager exec
This ought to give the MUA or pager exclusive access to the
controlling terminal. The downside is we can only exec the
pager or MUA once per invocation, but I can't imagine a valid
case for running those things multiple times, either.
Note: I'm no expert when it comes to terminal control matters,
but this allows Ctrl-Z-ed mutt instance to come back and is
a nice code reduction, as well.
Eric Wong [Wed, 31 Mar 2021 01:53:18 +0000 (06:53 +0500)]
lei blob: "--mail" disables solver, use --include/only
Assume a user specifying --mail doesn't want to spend cycles
reconstructing a blob from a code repo. Also, don't require
users to use add-external or a previous -I or --only to ready an
external for use with ale.git.
Eric Wong [Wed, 31 Mar 2021 00:41:09 +0000 (00:41 +0000)]
doc: lei-overview: favor Maildir for mutt examples
mboxes are generally horrible for interactive read-write use due
to locking. Describe our parallel behavior with mutt, since
writing mail can take a long while and being able to read
results as they're written is nice.
We'll also use a gzipped mboxrd for the import example, since
we can decompress gzipped mboxrds automatically, now.
Eric Wong [Wed, 31 Mar 2021 00:41:08 +0000 (00:41 +0000)]
doc: add lei-mail-formats(5) manpage
While plenty of online documentation exists, it's good to have
a locally-available summary for users to look at offline.
Fix a URL in Watch.pm while we're at it, too.
Eric Wong [Tue, 30 Mar 2021 09:39:27 +0000 (09:39 +0000)]
lei tag: rename from "lei mark"
I've decided "tag" is a better verb since it seems more
widely-used term for associating metadata with data.
Not only is it analogous to the "notmuch tag" command, but
also makes sense when compared to tooling for manipulating
metadata for non-mail data (e.g. audio metadata tags).
There's even a Wikipedia entry for it:
https://en.wikipedia.org/wiki/Tag_(metadata)
whereas "mark" is used in the description, but has no
entry of its own with regards to metadata.
Eric Wong [Tue, 30 Mar 2021 07:23:54 +0000 (12:23 +0500)]
lei_to_mail: update some comments and style
Note that update_kw_maybe is critical in preventing accidental
data loss with default "lei q --output" behavior.
Also avoid treating (proposed) MH support as lock-free, since
appears to lack specifications for locking and be even worse
than mbox* in that regard...
Eric Wong [Mon, 29 Mar 2021 23:58:54 +0000 (23:58 +0000)]
git: local_nick: handle trailing or redundant '/' in git_dir
Some cgit configs use trailing slashes in pathnames
which we preserve internally.
Before this change, trailing slashes in cgit config files
was causing ViewVCS (SolverGit) output to show up as "???"
for coderepos without cgitUrl configured.
Eric Wong [Mon, 29 Mar 2021 07:08:25 +0000 (07:08 +0000)]
lei_input: treat ".eml" and ".patch" suffix as "eml"
".eml" is a suffix supported by (/usr/local)/etc/mime.types
on Debian and FreeBSD systems using the "mime-support" package.
".patch" is what "git format-patch" generates by default since
git v1.5.0 in 2007.
Eric Wong [Mon, 29 Mar 2021 07:08:24 +0000 (07:08 +0000)]
lei: use IO::Uncompress::Gunzip MultiStream
This is compatible with default gunzip(1) behavior and
future-proofs us against potential changes in PublicInbox::WWW
to save memory on public-inbox-httpd instances.
Eric Wong [Mon, 29 Mar 2021 08:04:14 +0000 (08:04 +0000)]
doc: lei q: add warning for --output clobbering
The behavior matching mairix still frightens me a bit when it
comes to supporting new users. On the other hand, I've rarely
ever used --augment with mairix, so I still think the current
(dangerous) behavior makes sense in the context of search results.
Eric Wong [Mon, 29 Mar 2021 08:04:13 +0000 (08:04 +0000)]
doc: lei q: drop NNTP from --output description
We only support NNTP as inputs for convert, import, and
mark|tag. I'm not sure if supporting NNTP output is worth
it, nor do we have a good way to test it.
Kyle Meyer [Mon, 29 Mar 2021 03:13:43 +0000 (23:13 -0400)]
doc config: don't render a to-do comment
In the public-inbox-config manpage, the match=domain item under
publicinbox.wwwlisting has a to-do comment that gets rendered as
"support showing cgit listing". That's potential confusing to
readers, especially given that the "TODO" is dropped.
Change the markup so that the comment isn't rendered.
Kyle Meyer [Mon, 29 Mar 2021 03:11:13 +0000 (23:11 -0400)]
doc lei: don't render most to-do comments
The lei manpages have a number of to-dos, but with the exception of
the lei-q's -tt warning, none of them seem worth displaying to the
reader (and some might not be worth addressing at all).
Kyle Meyer [Mon, 29 Mar 2021 03:11:12 +0000 (23:11 -0400)]
doc lei: drop an unnecessary to-do comment
When a new command is implemented, it is probably clear that it should
be added to lei.pod, but either way, having a to-do comment in lei.pod
isn't likely to help.
Eric Wong [Sun, 28 Mar 2021 09:01:24 +0000 (09:01 +0000)]
treewide: shorten temporary filename
File::Temp only requires four 'X' characters (unlike mkstemp(3),
which requires six). So only so only give it 4 to avoid an
80-column violation and maybe save metadata space on FSes.
Eric Wong [Sun, 28 Mar 2021 09:01:23 +0000 (09:01 +0000)]
lei: drop coderepo placeholders, submodule TODO
"lei blob" supports --git-dir and -C, and checks if the
current directory has a git directory associated with it.
It will likely support submodules in the future.
I'm inclined to believe declaring coderepos in a command-line
tool is needless clutter and users will rarely want to search
for blobs across different projects when on the command-line.
Eric Wong [Sun, 28 Mar 2021 09:01:22 +0000 (09:01 +0000)]
lei blob: add remote external support
Introduce a new LeiRemote wrapper to provide an internal API
which SolverGit expects. This lets us use HTTP/HTTPS endpoints
to reconstruct blobs off patches as we would with local
endpoints, just more slowly...
Eric Wong [Sun, 28 Mar 2021 09:01:16 +0000 (09:01 +0000)]
lei blob: support --no-mail switch
It's possible for a abbreviated OID to be resolved unambiguously
to an email before we attempt to look at externals via xsearch;
so provide a way for a user to force searching coderepos.
If hints (--oid-a, --path-a, --path-b) are present, we'll
assume --no-mail by default, otherwise we'll assume the
user wants to look through mail for a matching blob.
Eric Wong [Sun, 28 Mar 2021 09:01:13 +0000 (09:01 +0000)]
lei: simplify PktOp callers
Provide a consistent ->op_wait_event method instead of
forcing callers to loop (or not) at each callsite.
This also avoid a leak possibility by avoiding circular
references.
Eric Wong [Sun, 28 Mar 2021 00:17:25 +0000 (00:17 +0000)]
test_common: require_mods bundles
This makes it easier to manage test dependencies on systems
where optional stuff isn't installed. This fixes some lei tests
which didn't check for Plack before starting -httpd, and ensures
Parse::RecDescent is available for -imapd in case
Mail::IMAPClient stops using it.
Eric Wong [Fri, 26 Mar 2021 09:51:25 +0000 (09:51 +0000)]
lei: support /dev/fd/[0-2] inputs and outputs in daemon
Since lei-daemon won't have the same FDs as the client, we
need to special-case thse mappings and won't be able to open
arbitrary, non-standard FDs.
We also won't attempt to support /proc/self/fd/[0-2] since
that's a Linux-ism. /dev/fd/[0-2] and /dev/std{in,out,err}
are portable to FreeBSD, at least. mawk(1) also supports
/dev/std{out,err}, as does gawk(1) (which supports everything
we can support, and arbitrary /dev/fd/$FD).
Eric Wong [Fri, 26 Mar 2021 09:51:24 +0000 (09:51 +0000)]
lei: do not blindly commit to lei/store on close
It may hide errors/bugs, instead do it explicitly for each
worker that writes to it. For lei_xsearch, it will be better
to close before spawning the MUA for future use since we may
need it again once the user starts changing keywords.
Stavros Ntentos [Fri, 26 Mar 2021 16:31:46 +0000 (18:31 +0200)]
git-send-email-reply: Append subject
I keep copy-pasting the addresses provided,
I keep writing my plaintext reply in a file,
and I keep forgetting to add a subject
(because I am "just" writing a plaintext file)
Teach `git-send-email-reply` to append a `--subject` line.
[ew: avoid URI-encoded subject on command-line, adjust t/reply.t]
Eric Wong [Fri, 26 Mar 2021 04:29:35 +0000 (06:29 +0200)]
lei_xsearch: wait for kw updates for non-threaded case, too
We'll also hoist wait_startq out of the per-message loops
since it's not worth having to check every single message
when filling in smsg info is reasonably fast, anyways.
Eric Wong [Thu, 25 Mar 2021 04:20:25 +0000 (06:20 +0200)]
t/cmd_ipc: workaround signal handling raciness
Perl can't check for interrupts when inside a blocking syscall,
as there's no self-pipe mechanism inside Perl itself. So fork
a child and have it repeated kill(2) instead of relying on alarm(3).
Eric Wong [Thu, 25 Mar 2021 04:20:24 +0000 (06:20 +0200)]
lei import: force store, improve test diagnostics
"lei import" should never be without a {sto}, and *_done should
not be called multiple times, so ensure we can fail if it's
missing.
Update some existing tests to complain loudly by introducing a
handy "xbail" function which wraps "explain" and BAIL_OUT.
BAIL_OUT was painful to type and concatenating the result of
"explain" doesn't work as I thought it would since "explain"
always returns an array, and BAIL_OUT only accepts a single
scalar arg (unlike "die").
Eric Wong [Thu, 25 Mar 2021 04:20:21 +0000 (06:20 +0200)]
lei_mirror: don't show success on failure
While we were exiting with a error code, showing a successful
"# mirrored $URL" message is misleading and wrong. Don't show
success until everything is complete and the config is written.
Eric Wong [Wed, 24 Mar 2021 09:23:35 +0000 (14:23 +0500)]
lei-daemon: do not leak FDs on bogus requests
If a client passes us the incorrect number of FDs, we'll vivify
them into PerlIO objects so they can be auto-closed. Using
POSIX::close was considered, but it would've been more code to
handle an uncommon case.