Eric Wong [Fri, 30 Apr 2021 09:24:31 +0000 (09:24 +0000)]
lei sucks: preserve utsname.machine, add "x86" where appropriate
It's helpful for us to distinguish x86 kernels from x86_64
kernels when using an x86 userspace. OSes are dropping i386
support and only support i486 and newer, so "x86" is a more
appropriate description for that platform than "i386".
Eric Wong [Thu, 29 Apr 2021 19:49:57 +0000 (19:49 +0000)]
lei_store: fix locking w.r.t epoch creation
Prior to this change, it was possible for oneshot lei processes
to race on epoch creation/rollover. lei-daemon normally
prevents the problem by funnelling all writes to a single
socket, but oneshot lei has no such protection.
Eric Wong [Thu, 29 Apr 2021 09:46:19 +0000 (09:46 +0000)]
lei import: support UIDVALIDITY in IMAP URL
Specifying a UIDVALIDITY value allows the user to enforce
a strict match and force failure. This necessitated changes
to NetReader to allow die() and make error reporting more
suitable for CLI usage rather than daemonized usage of -watch.
Eric Wong [Thu, 29 Apr 2021 09:46:18 +0000 (09:46 +0000)]
lei import: avoid IMAPTracker, use LeiMailSync more
IMAPTracker has a UNIQUE constraint on the `url' column,
which may cause compatibility and/or rollback problems
in attempting to deal with UIDVALIDITY changes.
Having multiple sources of truth leads to confusion and bugs,
so relying on LeiMailSync exclusively ought to simplify things.
Furthermore, since LeiMailSync is only written to by LeiStore,
it is safer in that it won't mark a UID or article as imported
until git-fast-import has seen it, and the SQLite commit always
happens after "done\n" is sent to fast-import.
This mostly reverts recent commits to IMAPTracker to support
lei, those are:
Eric Wong [Wed, 28 Apr 2021 19:37:29 +0000 (19:37 +0000)]
lei: avoid close(STD{IN,OUT,ERR}) in oneshot mode
This seems to fix the occasional "make check-run" failures I've
been chasing.
Some parts of our code assumes we can close($lei->{1})
and similar, which causes IO::Handle::autoflush to behave
badly when STDOUT is the "select"-ed FH of the Perl process.
Since oneshot mode is (hopefully) the uncommon case, we'll
just accept the cost of extra FDs and minimize differences
between lei in oneshot vs daemon mode.
Eric Wong [Wed, 28 Apr 2021 07:52:04 +0000 (07:52 +0000)]
lei_view_text: translate background colors from git
This seems to work with or without attributes. We'll deal with
256-color terminal colors when/if somebody cares for it, but the
usual 16 ought to be more than enough.
Eric Wong [Wed, 28 Apr 2021 07:52:03 +0000 (07:52 +0000)]
lei_view_text: improve attachment display
Support setting a color to distinguish from user-supplied text.
We'll also put the $BLOB:$IDX identifier on a separate line and
just put the entire corresponding lei command in the form of:
"[-- lei blob $BLOB:$IDX --]" to teach users how to access it.
Eric Wong [Wed, 28 Apr 2021 07:51:57 +0000 (07:51 +0000)]
view_diff: minor coding style fixes
Prefer "use v5.10", s/base/parent/, rely on "perl -w" for warnings.
We also pass a regexp to the split perlop rather than literal
SV, since split() will compile a new RE every time.
Eric Wong [Wed, 28 Apr 2021 04:51:06 +0000 (04:51 +0000)]
doc: lei q: split =item aliases onto separate lines
It makes L</--augment> look nicer without resorting to
L<--augment|/-a, --augment> and similarly verbose nastiness.
Having each option as a separate =item (with a blank line in
between each =item) seems to be the preferred style used within
Perl core documentation (I used perlrun.pod as an example),
so we'll follow Perl core style, here.
This needs to be done for other manpages, at some point...
Eric Wong [Wed, 28 Apr 2021 06:55:22 +0000 (06:55 +0000)]
view: add [thread overview] anchor next to Date:
The existing Subject: anchor to #r may not be 100% obvious,
and we can't stick the phrase "[thread overview]" into the
same line as the Subject without introducing ambiguity.
Fortunately, we have the Date: header directly under it.
Adding "[thread overview]" after the Date: is unambiguous
and won't make the line too long for valid emails.
This hopefully improves navigation ever-so-slightly thanks
to comments by Son Luong Ngoc.
Eric Wong [Tue, 27 Apr 2021 11:07:52 +0000 (11:07 +0000)]
lei lcat: extract Message-IDs from URLs and show them
It's a wrapper around "lei q" which extracts Message-IDs
from URLs, "<$MSGID>", "id:$MSGID" and attempts to display the
local version of the message.
Its main purpose is to extract Message-IDs out of
commonly-understood URLs to save users bandwidth and time
by displaying the message locally. When reading from stdin,
it will discard things it doesn't understand, so you can just
pipe an entire "Link: $URL" line to it and it'll attempt to
pluck the Message-ID out of the URL.
Eric Wong [Sat, 24 Apr 2021 22:42:59 +0000 (22:42 +0000)]
lei_saved_search: avoid reentrancy in ->is_dup
Use a separate git process when calling xoids_for to prevent
reentrancy in ->is_dup. Reentrancy happens since LeiToMail will
call ->is_dup when inside callbacks when writing mail.
This fixes --dedupe=mid test failures in t/lei-q-save.t
I could only reproduce this consistently on a uniprocessor VM.
"schedtool -a 0x1 -e ..." could not reproduce the problem on
2 and 4-core systems.
Eric Wong [Sat, 24 Apr 2021 10:23:30 +0000 (10:23 +0000)]
extindex: --gc: use escape pathnames for SQL LIKE properly
This allows us to handle odd inboxes w/o a newsgroup configured
if they also make the strange choice of having backslashes in
their path name. Also, ensure we use case-sensitive LIKE, since
case-insensitive FSes are not worth supporting.
Eric Wong [Sat, 24 Apr 2021 09:28:46 +0000 (09:28 +0000)]
lei import: keep sync info for Maildir and IMAP folders
We aren't using it, yet, but the plan is to be able to use
this information to propagate keyword changes back to IMAP
and Maildir folders using some to-be-implemented command.
"lei inspect" is a half-baked new command to make testing this
change easier. It will be updated to support more SQLite+Xapian
introspection duties in the future, including public-inbox
things independent of lei.
Eric Wong [Fri, 23 Apr 2021 08:06:12 +0000 (04:06 -0400)]
lei_to_mail: cwd-agnostic Maildir wakeup
Since we don't have *at() syscalls readily available to us,
lei-daemon may call ->poke_dst in the wrong relative directory.
Despite not having *at() syscalls, we can still capture the
"$MAILDIR/cur" directory handle at pre_augment time so we can
reliably call futimes(2) on it using the `utime' perlop.
Eric Wong [Fri, 23 Apr 2021 07:28:15 +0000 (07:28 +0000)]
net_reader: restart on first UID when UIDVALIDITY changes
In other words, treat the same IMAP folder with a different
UIDVALIDITY as a completely different folder. If the UIDVALIDITY
changes, we can start from UID=1 without falling behind or
losing data. If the UIDVALIDITY gets reset to a previously
known-good message, we can still resume where we left off
before the first UIDVALIDITY change.
This affects public-inbox-watch and "lei import"
One potential downside of this is for rare altid users, but
that's mainly intended for NNTP article numbers which are/were
often publicized; not IMAP UIDs which are rarely publicized.
The other potential downside is bandwidth waste in in the rare
case UIDVALIDITY changes while IMAP folder contents remain
unchanged. There's no extra storage used due to existing
(v1|v2|lei/store) deduplication mechanisms.
Before this change, we were matching offlineimap behavior and
stopped synching an IMAP folder when its UIDVALIDITY changed.
offlineimap behavior made sense for IMAP <=> Maildir
synchronization since Maildirs had no sense of UIDVALIDITY and
could only rely on name mapping.
Eric Wong [Fri, 23 Apr 2021 01:45:13 +0000 (01:45 +0000)]
lei up: support symlinked pathnames
On my default FreeBSD 11.x system, "/home" is a symlink to
"/usr/home", which causes "lei up" path resolution to fail when
I use outputs in $HOME. Fall back to a slow path of globbing
and matching pathnames based on st_ino+st_dev.
Eric Wong [Thu, 22 Apr 2021 09:08:21 +0000 (07:08 -0200)]
lei import: --incremental default for NNTP and IMAP
No point in burning through bandwidth to import stuff we already
saw. All this logic is shared with -watch but uses a different
pathname for lei since it's tied to lei/store (and not a
public-inbox).
Eric Wong [Wed, 21 Apr 2021 23:50:52 +0000 (23:50 +0000)]
lei: flesh out `forwarded' kw support for Maildir and IMAP
Maildir and IMAP can both handle `forwarded'. Ensure we don't
lose `forwarded' when reading from stores which do not support
it, but ensure we can set it when reading from IMAP and Maildir
stores.
Eric Wong [Wed, 21 Apr 2021 18:36:10 +0000 (18:36 +0000)]
lei: share common *done_wait callbacks
Code is the enemy, and there's no need to duplicate things, here.
There may be further opportunities along these lines to further
deduplicate things...
Eric Wong [Tue, 20 Apr 2021 07:16:54 +0000 (07:16 +0000)]
lei forget-search: new command to forget saved searches
Readers may lose interest in subscription topics. This lets
them avoid clutter by forgetting a saved search.
This does not and will not destroy the contents of an --output
mailbox. In other words, this is similar to unsubscribing
from an Atom/RSS feed or NNTP group.
I've also decided we won't support 'mv-search', since it'll
probably be rarely used and "lei convert" can be used, instead.
Eric Wong [Mon, 19 Apr 2021 23:49:01 +0000 (14:49 -0900)]
lei up: support --all=local
Users may wish to update several saved searches at once. We can
support parallel updates in lei-daemon so users won't have to do
it themselves via xargs or similar.
Supporting IMAP outputs would be significantly more involved
since we'd have to pre-authenticate for every single IMAP
output before entering the redispatch loop.
Eric Wong [Tue, 20 Apr 2021 09:01:00 +0000 (09:01 +0000)]
lei-sigpipe: update and move test from xt => t
We have "lei import" and better test infrastructure for lei,
now, so we can more easily test SIGPIPE without relying on
an already-configured instance.
Eric Wong [Mon, 19 Apr 2021 08:52:13 +0000 (08:52 +0000)]
config: git_config_dump blesses
I don't know if it's worth it to sub (or super)class
PublicInbox::Config into something more generic for
lei, but this change simplifies a good chunk of lei
code that reuses the public-inbox config parsing.
Eric Wong [Mon, 19 Apr 2021 08:52:10 +0000 (08:52 +0000)]
lei: support unlinked/missing saved searches
It's conceivable a user will want to erase all previous
results but still rerun/refresh a search to get new results.
We probably won't support prune functionality, here, and
instead require explicit removal of saved searches.
Eric Wong [Sat, 17 Apr 2021 19:00:53 +0000 (19:00 +0000)]
lei up: further improve Maildir canonicalization
We want to be able to use "lei up ." when inside a Maildir.
We'll also relax Maildir/mbox basenames to be any non-'/'
character after converting relative paths to absolute. The
old restriction on allowed characters was unnecessary and made
it impossible to reliably map "." when used as the sole argument
for "lei up".
Eric Wong [Sat, 17 Apr 2021 10:24:45 +0000 (10:24 +0000)]
lei up: fix canonicalization of Maildirs
We always represent --output destination directories with a
trailing slash to disambiguate directories from mbox filenames.
Therefore, we must use the trailing slash when mapping the
destination beck from the lei/saved-search/* directory.
"lei up" now relies exclusively on the users --output pathname
or URL for updates. This ought to be less confusing since
pathnames in ~/.local/store/lei/saved-searches aren't ideal.
Eric Wong [Sat, 17 Apr 2021 19:00:01 +0000 (19:00 +0000)]
lei q: fix MUA spawn after reading query from stdin
Since "lei q" may read queries from stdin, we must reconnect a
known terminal before spawning terminal MUAs. Attempt to use
stdout as stdin for this purpose, since terminal MUAs tend to
expect stdout to be a terminal.
Eric Wong [Fri, 16 Apr 2021 23:10:35 +0000 (16:10 -0700)]
lei q --save: clobber config file on repeats
A user may wish to clobber/refine existing search parameters
by issuing "lei q --save" again. Support that by overwriting
the lei.saved-search state file entirely.
We continue to preserve over.sqlite3 for deduplication purposes.
Eric Wong [Sat, 17 Apr 2021 09:47:11 +0000 (09:47 +0000)]
lei_query: fix relative path handling on --stdin
Since --stdin could be waiting on user keyboard input or
something else slow, we handle it in the event loop. That
means other commands can change the working directory of
lei-daemon while a query is being trickled to us via stdin.
Rearranging query handling internals to delay opening the
--output destination in commit 26e0fe73de93f451 meant
another command could throw off our --output pathname if
it is relative.
Fixes: 26e0fe73de93f451 ("lei_query: rearrange internals to capture query early")
Eric Wong [Fri, 16 Apr 2021 23:10:27 +0000 (16:10 -0700)]
lei q: --save preserves relative time queries
Somebody may want a saved search which consistently asks for
messages within a rolling time period window. In other words,
we want to support using "lei q --save dt:last.week.." and keeps
the "dt:last.week.." relative to whenever "lei up" is run. This
ensures relative date-time specifications get used in the future
rather than converting into an absolute date-time from the
initial "lei q" invocation.
Eric Wong [Fri, 16 Apr 2021 23:43:06 +0000 (18:43 -0500)]
search: expand "d:" to "dt:" for precision with approxidate
If a user specifies "d:" with a higher precision than it was
traditionally able to handle, switch transparently to "dt:".
This lowers the learning curve and improves DWIM-ness.
Eric Wong [Tue, 13 Apr 2021 10:54:45 +0000 (10:54 +0000)]
lei q: start wiring up saved search
This will have a over.sqlite3 for content-based deduplication.
It may exhibit ibxish methods, so serving a read-only (or even
R/W) IMAP or instance or displaying HTML isn't outside the realm
of possibility.
Eric Wong [Tue, 13 Apr 2021 10:54:42 +0000 (10:54 +0000)]
lei_xsearch: use per-external queries when not sorting
We only need the combined mset query when we care about sort
order. When writing to --output destinations intended for MUA
consumption, sort order is irrelevant as MUAs are expected to
offer their own sorting, so run queries to each external in
parallel.
This prepares us for docid-sort-based saved search support.
It will also become faster than the combined mset query for
users with many externals due to current Xapian exhibiting poor
performance with many shards (the same reason -extindex exists)
Eric Wong [Sun, 11 Apr 2021 05:32:55 +0000 (05:32 +0000)]
www: do not obfuscate addresses in URLs
As they are likely Message-IDs. If an email address ends up in
a URL, then it's likely public, so there's even less reason to
obfuscate that particular address.
[km: add xt/perf-obfuscate.t]
[ew: modernize perf test (5.10.1), use diag instead of print]
This version of the patch avoids the massive slowdown noted by Kyle in
<https://public-inbox.org/meta/87wnt9or6t.fsf@kyleam.com/>.
Performance remains roughly the same, if not slightly faster
(which may be due to me testing this on a busy server). Results
from xt/perf-obfuscate.t against 6078 messages on a local mirror
of <https://public-inbox.org/meta/>:
before: 6.67 usr + 0.04 sys = 6.71 CPU
after: 6.64 usr + 0.04 sys = 6.68 CPU
import: convert init.defaultBranch to fully qualified ref
init.defaultBranch expects a branch name, not a fully qualified ref.
git-init prepends "refs/heads/" automatically and unconditionally.
PublicInbox::Import::default_branch, however, incorrectly passes on
the init.defaultBranch value as is, leading to it being used in spots
where a fully qualified ref is required. For example, with an
init.defaultBranch value of "master", public-inbox-index for a v2
repository would lead to an all.git repository where HEAD's content is
"ref: master" instead of "ref: refs/heads/master".
Prepend "refs/heads/" to the incoming init.defaultBranch value.