Eric Wong [Mon, 22 Mar 2021 07:53:58 +0000 (07:53 +0000)]
lei: simplify workers_start and callers
Since workers_start is in the common PublicInbox::LEI
package, we can just use \&METHOD_NAME instead of relying
on UNIVERSAL->can to avoid a method dispatch.
Most of our worker code can just use lei->dclose, so default
to doing that unless it's been overridden.
Eric Wong [Mon, 22 Mar 2021 07:53:56 +0000 (07:53 +0000)]
net_reader: escape nasty chars from Net::NNTP->message
Net::Cmd::message (used by Net::NNTP) does no escaping at all,
so "\r" was causing confusing/nonsensical error messages when
I tried to import from the wrong group.
Eric Wong [Sun, 21 Mar 2021 09:50:47 +0000 (15:50 +0600)]
lei: fix some warnings in tests
And then test the contents of $lei_err to ensure it doesn't
happen again.
We'll also make MboxLock emit nicer warnings without the line
number, since the line number is irrelevant to the user fixing
an mbox lock contention problem.
Finally, we'll also allow showing loud warnings via
TEST_LEI_ERR_LOUD=1
Eric Wong [Sun, 21 Mar 2021 09:50:45 +0000 (15:50 +0600)]
lei import: vivify external-only messages
Keyword storage for external-only messages was preventing
messages from being explicitly imported. Teach lei_store
to vivify keyword-only entries into fully-indexed messages
on import.
Eric Wong [Wed, 17 Mar 2021 18:14:08 +0000 (20:14 +0200)]
searchview: collapse Message-ID links in summary
There's no point in showing duplicate links to the same
Message-ID in summary view. The per-message page will
note the duplication (if any) separately.
Eric Wong [Sat, 20 Mar 2021 10:04:07 +0000 (19:04 +0900)]
lei: tie ALE lifetime to config file
This should make a future change to "lei import" work more
nicely, since we'll be needing ALE to vivify external-only
messages upon explicit "lei import".
Eric Wong [Sat, 20 Mar 2021 10:04:05 +0000 (19:04 +0900)]
lei q: put keywords on one line in --pretty output
Don't waste precious terminal space when there are only a small
number of possible keywords supported/reserved for JMAP. In the
future, we may implement more sophisticated wrapping for labels,
but it we'll cross tha bridge when we come to it.
Eric Wong [Sat, 20 Mar 2021 10:04:03 +0000 (19:04 +0900)]
lei: All Local Externals: bare git dir for alternates
This will be used for keyword (and label) storage for externals.
We'll be using this to ensure we don't redundantly auto-import
messages into lei/store if they're already in a local external
(they can still be imported explicitly via "lei import").
Eric Wong [Fri, 19 Mar 2021 22:38:49 +0000 (20:38 -0200)]
lei q: -I/--include overrides --no-(external|local|remote)
Assume that anybody using -I/--include for external locations
will want to override --no-$FOO if they're explicitly including
a location.
With some effort, we could make it order-dependent (e.g.
"-I $LOCATION --no-$FOO" and "--no-$FOO -I $LOCATION"
behave differently). However that's not straightforward
when using Getopt::Long to parse command-line options into
a hashref.
I'm also not sure if order-dependent switches are a desirable
UI/UX quality.
Eric Wong [Fri, 19 Mar 2021 04:18:54 +0000 (04:18 +0000)]
examples: cgit-commit-filter: drop <tt> HTML tag, use title=
<tt> doesn't seem necessary and it's deprecated in HTML, nowadays.
In any case, dillo's CSS support seems to show it as fixed-width
even without <tt>. Use the title= attribute to highlight that
it goes to the mail thread, too.
In the future, we'll probably link to something like "lei p2q"
(patch-to-query) to include OIDs in the search.
Eric Wong [Wed, 17 Mar 2021 09:39:22 +0000 (15:39 +0600)]
lei_store: keywords => vmd (volatile metadata), prepare for labels
Since keywords and mailboxes (AKA labels) are separate things in
JMAP; and only keywords can map reliably to Maildir and mbox;
we'll keep them separate in our internal data representations,
too.
I initially wanted to call this just "meta" for "metadata", but
that might be confused with our mailing list name. "metadata"
is already used in Xapian's own API, to add another layer of
confusion.
"tags" was also considered, but probably confusing to notmuch
users since our "labels" are analogous to "tags" in notmuch,
and notmuch doesn't seem to cover "keywords" separately...
So "vmd" it is, since we haven't used this particular
three-letter-abbreviation anywhere before; and "volatile" seems
like a good description of this metadata since everything else
up to this point has been mostly WORM (write-once, read-many).
Eric Wong [Wed, 17 Mar 2021 07:02:17 +0000 (23:02 -0800)]
www: improve visibility of coderepos
By adding "+code" next to "mirror" at the top next to the search
box. Instead of showing "/path/to/$FOO", showing "$FOO.git"
makes it more obvious we're talking about a git repo, here,
instead of some random directory.
Eric Wong [Tue, 16 Mar 2021 10:28:50 +0000 (16:28 +0600)]
eml: decode Bcc, and Resent-* address variants
This is closer to matching RFC 8621 section 4.1.2.3,
though we don't support the "Any header field not defined in
RFC5322 or RFC2369" rule, since that could get tricky...
Eric Wong [Mon, 15 Mar 2021 11:58:26 +0000 (12:58 +0100)]
t/*: disable fsync on tests were create_inbox isn't worth it
Using create_inbox doesn't seem worth the trouble, here, at the
moment, but disabling fsync(2) gives a noticeable speedup on
my system even with an SSD.
Eric Wong [Mon, 15 Mar 2021 11:57:54 +0000 (12:57 +0100)]
test_common: minor simplifications to setup_public_inboxes
This will results in a small reduction in on-disk footprint
by removing Xapian docdata and reduction in code by removing
an unnecessary -index invocation.
Eric Wong [Mon, 15 Mar 2021 11:57:52 +0000 (12:57 +0100)]
test_common: add create_inbox helper sub
This saves over 100ms in t/lei-q-remote-import.t so far when
TMPDIR is on an SSD. If we can memoize inbox creation to save a
few dozen milliseconds every test, this could add up to
noticeable savings across our entire test suite.
Eric Wong [Sun, 14 Mar 2021 11:12:00 +0000 (13:12 +0200)]
lei q: do not import unnecessarily from externals
We only want to auto import messages that are exclusively in
remote externals. Messages in local externals are not
auto-imported to save space and reduce wear on storage device.
Eric Wong [Sat, 13 Mar 2021 15:40:27 +0000 (15:40 +0000)]
searchidx: fix -Lmedium for IDs and filenames
This fixes "m:", "l:", "f:", "t:", "c:", "dfn:", and "n:" search
prefixes under indexlevel=medium when mixed with indexlevel=full
inboxish. We need positional data for Message-IDs, List-Id,
email addresses and filenames for exact matches, though we still
want to support wildcards.
Fortunately the storage cost is still small as these prefixes
tend to be small compared to message bodies. These are NOT
boolean terms since wildcard support and partial matching is
desired.
Eric Wong [Fri, 12 Mar 2021 10:39:43 +0000 (10:39 +0000)]
lei q: mbox*: disable changing parallelism, add --rsyncable
Unfortunately, being mairix-compatible with --threads means we
can't change thread-count of gzip, bzip2, or xz when writing to
compressed mbox with a --threads= parameter. It's probably not
worth changing, anyways, so another switch or additional value
for --jobs= won't be added.
While we're in the area, add --rsyncable support since
most installations of gzip support it nowadays.
Fixes: 5beb4a5f6585acd ("lei: replace --thread with --threads")
Eric Wong [Fri, 12 Mar 2021 10:39:42 +0000 (10:39 +0000)]
lei: rearrange OPT_DESC and drop some TBD switches
It'll be easier for us to have the option-spec in front of the
command instead of the other way around. The option-spec in
front makes it easier to sort and keep track of potentially
confusing/ambiguous use of command-line switches between
different commands.
We'll also update some of the proposed switches while we're
at it.
Eric Wong [Thu, 11 Mar 2021 01:45:39 +0000 (19:45 -0600)]
msg_part_text: discover text in application/octet-stream
Some poorly-configured MUAs will send application/octet-stream
even for text-only attachments. We can't make expect all MUAs
are configured with proper MIME types, and there is plenty of
historical mail that falls into this unfortunate criteria.
v2: simplify the check and ensures returned text is Perl "utf8"
Eric Wong [Thu, 11 Mar 2021 10:45:38 +0000 (02:45 -0800)]
v2writable: fix undocumented --xapian-only
We can't pass $self and GLOBs across IPC channels transparently.
I only noticed this because I'm testing the application/octet-stream
fallback with https://public-inbox.org/meta/20210311014539.19756-1-e@80x24.org/
Fixes: bf8df8160076d7a1 ("searchidxshard: use PublicInbox::IPC to kill lots of code")
Eric Wong [Wed, 10 Mar 2021 13:23:44 +0000 (13:23 +0000)]
lei import: skip trashed Maildir messages
This matches IMAP behavior in NetReader in skipping \\Deleted
messages. Since lei may be used for personal, non-public mail;
Draft messages are NOT skipped by "lei import".
Eric Wong [Wed, 10 Mar 2021 13:23:43 +0000 (13:23 +0000)]
lei import: simplify Maildir handling
Having a one-off Maildir functionality in LeiStore doesn't seem
worth the maintenance burden, especially given an upcoming
change to skip trashed messages.
I expect this will hurt performance slightly with extra IPC
overhead for the socket copy, but "lei import" may eventually
become rare or at least not hit messages redundantly.
Eric Wong [Fri, 5 Mar 2021 01:38:29 +0000 (18:38 -0700)]
lei q: fix --import-before default and FIFO output
commit 6c551bffd75afb41d9b5e4774068abe7e06ed0e7
("lei q: --import-augment for mbox and mbox.gz") added a check to
in _pre_augment_mbox for the option being a ref() to distinguish
between default values and user-supplied values (which are
non-ref SCALARs from Getopt::Long).
However, LeiQuery failed to use a SCALAR ref as the default
value, making the check in _pre_augment_mbox useless. We
now update LeiQuery to use \1 instead of 1 as the default
value so "lei q -f mboxrd ..." to stdout works once again.
Unfortunately, testing with redirects pointed to regular
files didn't trigger the code paths being updated. Testing
with a FIFO revealed further bugs in the FIFO handling code
which are also fixed in this commit.
We'll also update the $lei->out error message to be
less-specific about "stdout" and use the term "output", instead,
since LeiToMail replaces stdout for all mbox outputs.
Eric Wong [Fri, 5 Mar 2021 03:10:58 +0000 (19:10 -0800)]
search: use "z:" instead of "bytes:" prefix
So far, searching by size has never been publicly documented,
and IMHO, of questionable utility. In any case, "z:" is what
mairix(1) uses, so it may be familiar to existing mairix users
(I've never used this prefix myself).
So far, this prefix is only used internally in tests and in
auto-translated queries from IMAP; thus this incompatible change
is unlikely to affect anyone.