Eric Wong [Wed, 9 Jun 2021 07:47:49 +0000 (07:47 +0000)]
lei tag: parallelize Maildir access
Since Maildir isn't guaranteed to have any sort of order, we
can parallelize inputs, here. On a 4-core system, this reduced
one of my tag invocations from 5.5 to 1.4s.
Eric Wong [Wed, 9 Jun 2021 10:03:05 +0000 (10:03 +0000)]
lei/store: do eidx_init before creating R/W lms dbh
Sharing lms->{dbh} with eidx shards appears to be the cause of
the "Issuing rollback() due to DESTROY without explicit
disconnect() of DBD::SQLite::db handle" messages I've been
seeing from "lei up".
Eric Wong [Tue, 8 Jun 2021 23:56:13 +0000 (23:56 +0000)]
lei pmdir: fix nproc for <= 4 CPUs
I forgot my FreeBSD VM has 8 cores, actually, and tweaked the
nproc detection on that machine before finalizing commit 10b523eb017162240b1ac3647f8dcbbf2be348a7
("lei import: speed up repeated Maildir imports")
Fixes: 10b523eb01716224 ("lei import: speed up repeated Maildir imports")
Eric Wong [Tue, 8 Jun 2021 09:50:21 +0000 (09:50 +0000)]
lei import: speed up repeated Maildir imports
On a 4-core CPU, this speeds up "lei import" on a largish
Maildir inbox with 75K messages from ~8 minutes down to ~40s.
Parallelizing alone did not bring any improvement and may
even hurt performance slightly, depending on CPU availability.
However, creating the index on the "fid" and "name" columns in
blob2name yields us the same speedup we got.
Parallelizing IMAP makes more sense due to the fact most IMAP
stores are non-local and subject to network latency.
Followup-to: bdecd7ed8e0dcf0b45491b947cd737ba8cfe38a3 ("lei import: speed up kw updates for old IMAP messages")
Eric Wong [Tue, 8 Jun 2021 09:50:19 +0000 (09:50 +0000)]
lei: safety fix for multiple WQ classes
For commands utilizing multiple workers, this simple change
generalizes the persistence mechanism and and prevents
lei->dclose from causing script/lei to exit if there are
still in-flight workers.
This ougth to prevent read-after-write consistency problems that
occasionally manifest in scripts (e.g. test cases) but usually
go unnoticed in normal use.
Eric Wong [Mon, 7 Jun 2021 19:06:30 +0000 (19:06 +0000)]
lei/store: checkpoint commits mail_sync.sqlite3
We mainly rely on ->done with lei/store, but moving to
->checkpoint probably makes sense. Note: over, msgmap, and
mail_sync all have slightly different transacation behavior;
perhaps they can be unified in the future.
Eric Wong [Sat, 5 Jun 2021 21:04:50 +0000 (21:04 +0000)]
INSTALL: note about lei metadata storage
Since lei is for personal mailboxes, I don't think lei needs to
keep keyword and label changes in history. And fix a minor
wording problem ("or" => "nor") while we're at it.
Eric Wong [Thu, 3 Jun 2021 01:05:20 +0000 (01:05 +0000)]
lei import: speed up kw updates for old IMAP messages
On a 4-core CPU, this speeds up "lei import" on a largish IMAP
inbox with 75K messages from ~21 minutes down to 40s.
Parallelizing with the new LeiImportKw WQ worker class gives a
near-linear speedup and brought the runtime down to ~5:40.
The new idx_fid_uid index on the "fid" and "uid" columns of
blob2num in mail_sync.sqlite3 brought us the final speedup.
An additional index on over.sqlite3#xref3(oidbin) did not help,
since idx_nntp already exists and speeds up the new ->oidbin_exists
internal API.
I initially experimented with a separate "lei import-kw" command
but decided against it since it's useless outside of IMAP+JMAP
and would require extra cognitive overhead for both users and
hackers. So LeiImportKw is just a WQ worker used by "lei import"
and not its own user-visible command.
v2: fix ikw_done_wait arg handling (ugh, confusing API :x)
Eric Wong [Sun, 30 May 2021 11:45:44 +0000 (11:45 +0000)]
lei import: import IMAP flag changes from old messages
This makes "lei import" behavior with IMAP folders more
consistent with that with Maildir.
Opening IMAP folders read-write with "SELECT" (instead of
read-only with "EXAMINE") was necessary, since it lets an IMAP
server communicate to us as to whether or not it's worth
refetching IMAP flags of previously imported messages.
Fetching UID+FLAGS only is one of the fastest IMAP operations
with dovecot, our -imapd and presumably other common IMAP servers.
It is issued by common MUAs such as mutt after every SELECT.
Users may now rely on "lei import" exclusively to merge mail and
keywords into lei/store, and "lei export-kw" to propagate
keyword changes back to IMAP servers.
A sticks-and-stones workflow for personal mailboxes is currently:
lei import imaps://$MY_PERSONAL_INBOX
lei q --mua=$MUA -o /tmp/results SEARCH TERMS...
# do stuff from within $MUA to /tmp/results
lei import /tmp/results # read keyword changes from MUA
lei export-kw imaps://$MY_PERSONAL_INBOX
# repeat when new stuff shows up in personal inbox
The next goal is to automate repeated imports + export-kw
commands with with inotify and IMAP IDLE.
Eric Wong [Sat, 29 May 2021 20:20:39 +0000 (20:20 +0000)]
lei q: --sort and --save|v2 are incompatible
Saved searches rely on (reverse) docid ordering for efficient
incremental results, and sorting any other way prevents that.
Update comment description in LeiQuery while we're at it:
"ls-query" and "rm-query" are "ls-search" and "forget-search",
respectively, and "mv-query" is implicit with "edit-search"
Eric Wong [Sat, 29 May 2021 20:20:38 +0000 (20:20 +0000)]
lei import|lcat: improve+fix single message IMAP support
lcat can now dump the memoized contents of entire IMAP folders,
not just a single UID. It's now parallelized and pipelined for
multiple lei2mail workers.
Furthemore, various forms of JSON output work consistently
with blob-only output, now.
While working on this, I noticed NetReader was passing UID URLs
to imap_each callbacks, which was causing mail_sync.sqlite3 to
store UIDs in `folders' and clearly wrong so it's now fixed.
Eric Wong [Fri, 28 May 2021 22:37:21 +0000 (22:37 +0000)]
lei q|up: support v2:/path/to/inboxdir destination
This allows "lei-managed pseudo mailing lists" as described
by Konstantin.
Alternates use is optional and can be enables via --shared.
This doesn't manage or edit ~/.public-inbox/config; presumably
there'll need to be some tweaking of search parameters before
finalizing and making the inbox publicly accessible via HTTP/NNTP.
Eric Wong [Fri, 28 May 2021 19:47:23 +0000 (19:47 +0000)]
lei: retry_reopen on read-only Xapian access
Xapian DBs may be modified by a parallel process while we're
reading it, and Xapian's MVCC model places the burden on readers
to retry operations.
We'll also have retry_reopen croak instead of die on errors,
which ought to help us track down some "Document not found"
errors I've occasionally seen when using "lei <q|up>".
Eric Wong [Fri, 28 May 2021 00:07:54 +0000 (00:07 +0000)]
lei_mail_sync: debug code for uncommitted txn
I'm not 100% sure why, but "lei up" seems to cause uncommitted
transaction errors. LeiToMail calls sto->set_sync_info, but
LeiXSearch should call sto->done and lms_commit, so I'm not
sure where the uncommited transaction is coming from...
Eric Wong [Fri, 28 May 2021 00:07:53 +0000 (00:07 +0000)]
viewdiff: escape '{' and '}' for regexp
Perl 5 doesn't warn on this, yet, but it warns on unescaped
'(' and ')' nowadays, so it's conceivable Perl could start
warning on this in the future. So future-proof our code and
reduce reader confusion.
Eric Wong [Fri, 28 May 2021 00:07:52 +0000 (00:07 +0000)]
viewdiff: make $UNSAFE a variable
There's no sense in using a constant here since it
gets copied into the uri_escape_utf8 function anyways.
Furthermore, inlined constants still leave behind a
subroutine and subs cost several KB of memory.
Finally, add a comment as to why it's different than the default
escape, since I just spent a minute wondering that.
Eric Wong [Wed, 26 May 2021 18:08:57 +0000 (18:08 +0000)]
lei: require Socket::MsgHdr or Inline::C, drop oneshot
The cost of supporting separate code paths between oneshot and
daemon isn't worth the trouble; especially if there are more
users to support. The test suite time nearly doubles with
oneshot, so that's hurting developer productivity.
FD passing is currently required to work efficiently with
remote HTTP(S) queries which return large messages, as seen in
commit 708b182a57373172f5523f3dc297659d58e03b58
("ipc: wq: handle >MAX_ARG_STRLEN && <EMSGSIZE case").
Additionally, upcoming support for IMAP IDLE and inotify-based
monitoring of Maildirs cannot work properly without a background
daemon.
Eric Wong [Tue, 25 May 2021 22:20:01 +0000 (22:20 +0000)]
ipc: wq: handle >MAX_ARG_STRLEN && <EMSGSIZE case
WQWorkers are limited roughly to MAX_ARG_STRLEN (the kernel
limit of argv + environ) to avoid excessive memory growth.
Occasionally, we need to send larger messages via workqueues
that are too small to hit EMSGSIZE on the sender.
This fixes "lei q" when using HTTP(S) externals, since that
code path sends large Eml objects from lei_xsearch workers
directly to lei2mail WQ workers.
Eric Wong [Tue, 25 May 2021 22:20:00 +0000 (22:20 +0000)]
ipc: avoid potential stack-not-refcounted bug
This fixes a potential problem with Carp::longmess
firing somewhere deeper in the stack. This is not a known
problem at this time, but something I noticed while chasing
something else.
Eric Wong [Tue, 25 May 2021 11:01:36 +0000 (11:01 +0000)]
lei forget-mail-sync: new command to drop sync information
Sometimes a user stops caring to sync an IMAP or Maildir
folder, or wants to force a resync. Let them run this
command to have lei forget all the sync information about
the mail folder.
This won't delete any stored messages in git, but will
leave "lei index" users with dangling references.
Eric Wong [Sun, 23 May 2021 21:36:50 +0000 (21:36 +0000)]
lei inspect: use LeiMailSync->match_imap_url
Move match_imap_url into LeiMailSync so it can be used in more
places, such as "lei inspect". Upcoming commands such as
"lei forget-mail-sync" and {add,forget,pause,resume}-watch will
also support relaxed IMAP matching rules since there's
no reasonable way to expect users use ";UIDVALIDITY=" on the
command-line.
Eric Wong [Sun, 23 May 2021 08:01:16 +0000 (08:01 +0000)]
lei <q|up>: set \Recent on non-empty mbox and Maildir
Despite JMAP not supporting the equivalent of the IMAP \Recent
flag, it is useful for "lei q --augment", and "lei up" users to
be able to distinguish new results from old-but-unread messages
in an mbox or Maildir.
For mbox family messages, we'll drop the "O" status flag when
appending to mboxes, and we'll write to the "new" subdirectory
of Maildirs.
Behavior when writing to initially empty Maildirs and mboxes
remains unchanged since there's no need to distinguish between
new and old results in the initial case. Having users wait
for a rename(2) storm or complete mbox rewrite hurts UX.
With IMAP mailboxes, \Recent is already enforced by the IMAP
server and IMAP clients have no way of changing it(*)
(*) mutt uses the "Old" IMAP flag which isn't part of RFC 3501,
other MUAs may do similar things.
Eric Wong [Sun, 23 May 2021 01:38:28 +0000 (01:38 +0000)]
lei export-kw: relax IMAP URL matching
It's unreasonable to expect UIDVALIDITY= to be specified in
command-line arguments. We'll also check for cases without
"$USER@" or ";AUTH=", since we accept those forms on the
command-line.
Eric Wong [Sun, 23 May 2021 01:38:27 +0000 (01:38 +0000)]
lei export-kw: support exporting keywords to IMAP
We support writing to IMAP stores in other places (just like
Maildir), and it's actually less complex for us to write to
IMAP. Neither usability nor performance is ideal, but usability
will be addressed in the next commit to relax CLI argument
checking.
Performance is poor due to the synchronous Mail::IMAPClient
API and will need to be addressed with pipelining sometime
further in the future.
Eric Wong [Fri, 21 May 2021 10:28:32 +0000 (10:28 +0000)]
lei import: store IMAP user+auth in mail_sync folder URI
Just having UIDVALIDITY in the URI isn't enough, since a single
lei user may have multiple IMAP logins on the same server.
This leads to compatibility problems and forces a reimport for
the few users already using this lei functionality, but it's not
stable nor released, yet.
Eric Wong [Fri, 21 May 2021 10:28:26 +0000 (10:28 +0000)]
lei: drop EOFpipe in favor of PktOp
lei already uses PktOp and SOCK_SEQPACKET throughout; whereas
EOFpipe had one single use in lei. Since PktOp is a strict
superset of EOFpipe functionality, we may be able to get rid of
EOFpipe entirely.
However, lei is considered a portability canary and I'm not sure
if the stable public-inbox-* code can drop EOFpipe just yet.
Kyle Meyer [Fri, 21 May 2021 04:38:16 +0000 (00:38 -0400)]
lei rediff: fix construction of git-diff options
When generating git-diff options, lei-rediff extracts the single
character option from the lei option spec. However, there's no check
that the regular expression actually matches, leading to an
unintentional git-diff option when there isn't a short option (e.g.,
--inter-hunk-context=1 maps to the invalid `git diff --color -w1').
Check for a match before trying to extract the single character
option.
Fixes: cf0c7ce3ce81b5c3 (lei rediff: regenerate diffs from stdin)
Eric Wong [Wed, 19 May 2021 08:54:13 +0000 (08:54 +0000)]
lei: relax rules for "new" in Maildir
mbsync and offlineimap both use ":2," suffixes for filenames in
"new/", however my interpretation of the Maildir spec at
<https://cr.yp.to/proto/maildir.html> is that ":2," is only for
files in "cur/". My interpretation also matches that of
doveecot, but we'll allow what mbsync and offlineimap do given
their popularity.
Kyle Meyer [Mon, 17 May 2021 03:37:00 +0000 (23:37 -0400)]
lei lcat: fix handling of multiple MSGID_OR_URL arguments
`lei lcat' is documented as being able to display multiple messages,
but this works only with --stdin because the positional argument
MSGID_OR_URL is missing a period.
Kyle Meyer [Mon, 17 May 2021 03:35:23 +0000 (23:35 -0400)]
doc lei: resort lei-tag entries
The command was renamed in 54da988cfb049ea2 (lei tag: rename from "lei
mark", 2021-03-30). Relocate its entries in txt2pre and Makefile.PL
to restore alphabetical sorting.
Kyle Meyer [Mon, 17 May 2021 03:35:21 +0000 (23:35 -0400)]
doc: split option variants into separate items
e226f18934eb7291 modified the lei-q manpage so that each variant of an
option gets a dedicated =item to make L</--xyz> look nicer and to
follow the Perl core documentation. Do the same for the other
manpages.
Note that this still leaves the variants of an option grouped in one
scenario: when a list of options without descriptions is presented as
a pointer to another location. Splitting the variants in that case
would make it harder for the reader to tell what the distinct options
are.
Kyle Meyer [Mon, 17 May 2021 03:35:20 +0000 (23:35 -0400)]
doc lei blob: avoid combined description of separate options
The next commit will update the manpages to split each option's
variants into separate items. This change won't mix well with
--oid-a, --path-a, and --path-b. These different options all share a
single description, and, if each form is on its own line, the link
between the variants of each option would no longer be clear.
Use a dedicated description for each option to avoid confusion.
Kyle Meyer [Sun, 16 May 2021 02:42:42 +0000 (22:42 -0400)]
lei rediff: handle stdin like other commands
`lei rediff' reads from stdin when no argument is specified, but this
is likely unintentional given that other lei commands instead have a
--stdin|- option and that `lei rediff --help' includes --stdin.
Eric Wong [Fri, 14 May 2021 20:14:47 +0000 (20:14 +0000)]
dir_idle: support IN_DELETE_SELF|IN_MOVE_SELF, too
We'll treat IN_MOVE_SELF as IN_DELETE_SELF since there
doesn't seem to be a reliable way to distinguish them
with FakeInotify, nor know the new name with kevent.
Eric Wong [Sun, 9 May 2021 11:16:13 +0000 (11:16 +0000)]
git: fix numerous bugs in git_quote and git_unquote
git always quotes with leading zeros to ensure the octal
representation is 3 characters long. We enforce that to match
low ASCII characters (e.g. [x01-\x06]) that don't need the
range provided by 3 characters.
git_unquote now does a single pass so it won't get fooled by
decoded backslashes into parsing a digit as an octal character.
git_unquote is also capped to "\377" so we don't overflow a
byte.
Eric Wong [Thu, 6 May 2021 08:38:53 +0000 (08:38 +0000)]
syscall: minor yak-shaving updates
FreeBSD (and other *BSDs) do not have stable syscall numbers, so
drop no-op checks for it and add a note to use Inline::C,
instead. Drop an implicit return for the syscall.ph loading
while we're at it, too.
On Linux, epoll_create(2) ignores the size arg since Linux
2.6.8, so just hard code it to some non-zero value.
On a side note, we can probably drop epoll_create(2) support
soon and just use epoll_create1(2) which appeared in 2.6.27+
(2008-10-09). Our userspace (Perl and git) requirements are
already further ahead.
Eric Wong [Thu, 6 May 2021 02:28:19 +0000 (02:28 +0000)]
lei_xsearch: fix accounting bugs in for remote mboxrd
We must not accumulate mset totals for messages which
have already been counted. Furthermore, the combined
search was being passed an extra arg and causing the
total to go missing.
We use trailing slashes internally, but should not increase
visual noise for users by exposing them in config files or
DB storage (and shell completion/listings).
This fixes a long-standing bug in $lei->rel2abs that prevented
absolute paths from being canonicalized.
Eric Wong [Thu, 6 May 2021 01:53:36 +0000 (01:53 +0000)]
lei_rediff: reduce overhead of tmp store
We don't need Xapian positional info when searching
for blob pre/post-images. Furthermore, rediff will
usually be used for a single email or at most, one
patchset. So there's little point in parallelizing
or having multiple shards.
Eric Wong [Wed, 5 May 2021 17:49:44 +0000 (17:49 +0000)]
lei rediff: do not automatically store patches/mails
We can use a temporary lei/store to avoid cluttering up
future search results. This is especially useful since
we expect "lei rediff" to be useful for non-email diffs
and individual attachments, too.
Eric Wong [Wed, 5 May 2021 10:46:38 +0000 (10:46 +0000)]
lei blob: support "lei index"-ed mail
Normal git retrieval don't work for Maildir blobs indexed using
"lei index". Fortunately, this oddness is limited to the
LeiStore class and we can override smsg_eml with a fallback
to read blobs from Maildirs.
Eric Wong [Wed, 5 May 2021 10:46:37 +0000 (10:46 +0000)]
lei rediff: regenerate diffs from stdin
Sometimes a mailed patch is generated with non-ideal output,
(lacking context, noisy whitespace changes, etc.), or a user
wants to use the same external diff viewer they've configured
git to use.
Since we have SolverGit to regenerate arbitrary blobs from
patches; this new command allows us to regenerate a diff with
different options using the blobs SolverGit gives us.
The amount of git-diff(1) options is mind numbing, so it's
likely I missed some favorites or botched the getopt spec
translation.
This also fixes Inbox::base_url to check psgi.url_scheme
before attempting to generate URLs and avoid uninitialized
variable warnings. Oddly, the "lei blob" tests did not
trigger these uninitialized warnings.
Note: this will automatically import+index the message(s)
it's regenerating, because solver relies on being able
to lookup pre/postimage OIDs and read blobs.
Eric Wong [Tue, 4 May 2021 09:49:12 +0000 (09:49 +0000)]
lei index: new command to index mail w/o git storage
Since completely purging blobs from git is slow, users may wish
to index messages in Maildirs (and eventually other local
storage) without storing data in git.
Much code from LeiImport and LeiInput is reused, and a new dummy
FakeImport class supplies a non-storing $im->add and minimize
changes to LeiStore.
The tricky part of this command is to support "lei import"
after a message has gone through "lei index". Relying on
$smsg->{bytes} == 0 (as we do for external-only vmd storage)
does not work here, since it would break searching for "z:"
byte-ranges when not using externals.
This eventually required PublicInbox::Import::add to use a
SharedKV to keep track of imported blobs and prevent
duplication.
Eric Wong [Tue, 4 May 2021 05:14:19 +0000 (05:14 +0000)]
lei ls-mail-sync: fix handling of non-wildcard filters
If lei_ls_mail_sync() is given a filter without any wildcards
and --globoff is unspecified, glob2re() will return undef,
resulting in the final regular expression being undefined.
Always use a fallback value when there's no RE.
Kyle Meyer [Tue, 4 May 2021 04:45:57 +0000 (00:45 -0400)]
lei ls-mail-sync: accept a filter
lei_ls_mail_sync() is written to accept a filter, and ls-mail-sync has
related command-line options (--globoff, --invert-match), but a
positional argument isn't actually accepted. Add it.
Eric Wong [Tue, 4 May 2021 04:15:44 +0000 (04:15 +0000)]
doc: ignore onion URLs for 80-column check
This failure was also passing under FreeBSD make + /bin/sh;
so we also avoid the '&&' chain is avoided and use '>$@' as a
separate line in the Makefile.
Eric Wong [Tue, 4 May 2021 01:32:25 +0000 (01:32 +0000)]
treewide: update to v3 Tor onions
v2 onions are insecure, deprecated and going away. v3 names are
unfortunately longer and more difficult to remember, but should
be more resistant to attack than v2 ones.
Eric Wong [Mon, 3 May 2021 20:57:31 +0000 (20:57 +0000)]
lei up: fix dedupe with remote externals on Maildir + IMAP
LeiToMail Maildir and IMAP write callbacks need to account for
the caller-supplied smsg. We'll also make better use of the
user-supplied smsg object by ensuring blob deduplication happens
ASAP.
Fixes: e76683309ca4f254 ("lei <q|up>: distinguish between mset and l2m counts")