Eric Wong [Thu, 23 Sep 2021 00:46:25 +0000 (00:46 +0000)]
daemons: revamp periodic cleanup task
Neither Inboxes nor ExtSearch objects were retrying correctly
when there are live git processes, but the inboxes were getting
rescanned for search or other reasons. Ensure the scan retries
eventually if there's live processes.
We also need to update the cleanup task to detect Xapian shard
count changes, since Xapian ->reopen is enough to detect any
other Xapian changes. Otherwise, we just issue an inexpensive
->reopen call and let Xapian check whether there's anything
worth reopening.
This also lets us eliminate the Devel::Peek dependency.
Eric Wong [Wed, 22 Sep 2021 09:45:17 +0000 (09:45 +0000)]
gcf2 + extsearch: check for unlinked files on Linux
Check for unlinked mmap-ed files via /proc/$PID/maps every 60s
or so.
ExtSearch (extindex) is compatible-enough with Inbox objects to
be wired into the old per-inbox code, but the startup cost is
projected to be much higher down the line when there's >30K
inboxes, so we scan /proc/$PID/maps for deleted files before
unlinking. With old Inbox objects, it was (and is) simpler to
just kill processes w/o checking due to the low startup cost
(and non-portability of checking).
Eric Wong [Wed, 22 Sep 2021 02:24:34 +0000 (02:24 +0000)]
lei up: avoid excessively parallel --all
We shouldn't dispatch all outputs right away since they
can be expensive CPU-wise. Instead, rely on DESTROY to
trigger further redispatches.
This also fixes a circular reference bug for the single-output
case that could lead to a leftover script/lei after MUA exit.
I'm not sure how --jobs/-j should work when the actual xsearch
and lei2mail has it's own parallelism ("--jobs=$X,$M"), but
it's better than having thousands of subtasks running.
Fixes: b34a267efff7b831 ("lei up: fix --mua with single output")
Eric Wong [Tue, 21 Sep 2021 09:29:45 +0000 (09:29 +0000)]
lei: umask(077) before opening errors.log
There's a chance some sensitive information (e.g. folder names)
can end up in errors.log, though $XDG_RUNTIME_DIR or
/tmp/lei-$UID/ will have 0700 permissions, anyways.
Eric Wong [Tue, 21 Sep 2021 09:29:44 +0000 (09:29 +0000)]
script/lei: handle SIGTSTP and SIGCONT
Sometimes it's useful to pause an expensive query or
refresh-mail-sync to do something else. While lei-daemon and
lei/store can't be paused since they're shared across clients,
per-invocation WQ workers can be paused safely using the
unblockable SIGSTOP.
While we're at it, drop the ETOOMANYREFS hint since it
hasn't been a problem since we drastically reduced FD passing
early in development.
Eric Wong [Tue, 21 Sep 2021 07:41:59 +0000 (07:41 +0000)]
lei q: improve --limit behavior and progress
Avoid slurping gigantic (e.g. 100000) result sets into a single
response if a giant limit is specified, and instead use 10000
as a window for the mset with a given offset. We'll also warn
and hint towards about the --limit= switch when the estimated
result set is larger than the default limit.
Eric Wong [Tue, 21 Sep 2021 07:41:55 +0000 (07:41 +0000)]
lei: various completion improvements
"lei export-kw" no longer completes for anonymous sources.
More commands use "lei refresh-mail-sync" as a basis for their
completion work, as well.
";AUTH=ANONYMOUS@" is stripped from completions since it was
preventing bash completion from working on AUTH=ANONYMOUS IMAP
URLs. I'm not sure if there's a better way, but all of our code
works fine without specifying AUTH=ANONYMOUS as a command-line
arg.
Finally, we fallback to using more candidates if none can
be found, allowing multiple URLs to be completed.
Eric Wong [Tue, 21 Sep 2021 07:41:52 +0000 (07:41 +0000)]
lei lcat: use single queue for ordering
If lcat-ing multiple argument types (blobs vs folders),
maintain the original order of the arguments instead of
dumping all blobs before folder contents.
Eric Wong [Tue, 21 Sep 2021 07:41:51 +0000 (07:41 +0000)]
lei: simplify internal arg2folder usage
We can set opt->{quiet} for (internal) 'note-event' command
to quiet ->qerr, since we use ->qerr everywhere else. And
we'll just die() instead of setting a ->{fail} message, since
eval + die are more inline with the rest of our Perl code.
Eric Wong [Tue, 21 Sep 2021 07:41:50 +0000 (07:41 +0000)]
lei_mail_sync: account for non-unique cases
NNTP servers, IMAP servers, and various MUAs may recycle
"unique" identifiers due to software bugs or careless BOFHs.
Warn about them, but always be prepared to account for them.
Eric Wong [Mon, 20 Sep 2021 13:00:33 +0000 (13:00 +0000)]
gcf2: fix loading at runtime
We need to waitpid synchronously on pkg-config to use $?.
When loading Gcf2 inside the event loop, implicit dwaitpid
done by PublicInbox::ProcessPipe would not call waitpid in
time to zero $?. This was causing one of my -httpd to
occasionally fall back to git(1) instead of using Gcf2.
Eric Wong [Sun, 19 Sep 2021 12:50:32 +0000 (12:50 +0000)]
net_reader: no STARTTLS for IMAP localhost or onions
At least not by default, to match existing NNTP behavior.
Tor .onions are already encrypted, and there's no point
in encrypting traffic on localhost outside of testing.
Eric Wong [Sun, 19 Sep 2021 12:50:30 +0000 (12:50 +0000)]
xt: add fsck script over over.sqlite3
I'm not sure what caused it, but I've noticed two missing
messages that failed from "lei up" on an https:// external;
and I've also seen some duplicates in the past (which I
think I fixed...).
Eric Wong [Sun, 19 Sep 2021 12:50:29 +0000 (12:50 +0000)]
net_reader: fix single NNTP article fetch, test ranges
While NNTP ranges was already working, fetching a single message
was broken. We'll also simplify the code a bit and ensure
incremental synchronization is ignored when ranges are
specified.
Eric Wong [Sun, 19 Sep 2021 12:50:28 +0000 (12:50 +0000)]
lei ls-mail-source: pretty JSON support
As with other commands, we enable pretty JSON by default if
stdout is a terminal or if --pretty is specified. While the
->pretty JSON output has excessive vertical whitespace, too many
lines is preferable to having everything on one line.
Eric Wong [Sun, 19 Sep 2021 12:50:27 +0000 (12:50 +0000)]
lei ls-mail-source: use "high"/"low" for NNTP
The meanings of "hwm" and "lwm" may not be obvious abbreviations
for (high|low) water mark descriptions used by RFC 3977.
"high" and "low" should be obvious to anyone.
Eric Wong [Sun, 19 Sep 2021 12:50:25 +0000 (12:50 +0000)]
ipc: drop dynamic WQ process counts
In retrospect, I don't think it's needed; and trying to wire up
a user interface for lei to manage process counts doesn't seem
worthwhile. It could be resurrected for public-facing daemon
use in the future, but that's what version control systems are for.
This also lets us automatically avoid setting up broadcast
sockets
Eric Wong [Sun, 19 Sep 2021 12:50:23 +0000 (12:50 +0000)]
lei: simplify sto_done_request
With the switch from pipes to sockets for lei-daemon =>
lei/store IPC, we can send the script/lei client socket to the
lei/store process and rely on reference counting in both Perl
and the kernel to persist the script/lei.
Eric Wong [Sun, 19 Sep 2021 00:36:04 +0000 (00:36 +0000)]
doc: tuning: note git 2.33+, move libgit2 into Inline::C section
git 2.33+ contains important optimizations for the
thousands-of-inboxes case. And combine the Inline::C stuff
with libgit2, since our use of libgit2 requires Inline::C.
Eric Wong [Sat, 18 Sep 2021 22:38:43 +0000 (22:38 +0000)]
t/lei-refresh-mail-sync: improve test reliability
We can't assume -imapd will be ready by the time we try to
connect to it after restart when using "-l $ADDR". So recreate
the (closed-for-testing) listen socket in the parent and hand it
off to -imapd as we do normally
Eric Wong [Sat, 18 Sep 2021 09:33:32 +0000 (09:33 +0000)]
lei up: automatically use dt: for remote externals
Since we can't use maxuid for remote externals, automatically
maintaining the last time we got results and appending a dt:
range to the query will prevent HTTP(S) responses from getting
too big.
We could be using "rt:", but no stable release of public-inbox
supports it, yet, so we'll use dt:, instead.
By default, there's a two day fudge factor to account for MTA
downtime and delays; which is hopefully enough. The fudge
factor may be changed per-invocation with the
--remote-fudge-factor=INTERVAL option
Since different externals can have different message transport
routes, "lastresult" entries are stored on a per-external basis.
Eric Wong [Sat, 18 Sep 2021 09:33:31 +0000 (09:33 +0000)]
net_reader: set SO_KEEPALIVE on all Net::NNTP sockets
SO_KEEPALIVE can prevent stuck processes and is safe to enable
unconditionally on all TCP sockets (like git, and the rest of
public-inbox does). Verified via strace on both NNTP and NNTPS
with and without nntp.proxy=socks5h://...
Eric Wong [Sat, 18 Sep 2021 09:33:30 +0000 (09:33 +0000)]
net_reader: support imaps:// w/ socks5h:// proxy
While Non-TLS IMAP worked perfectly with IO::Socket::Socks
and Mail::IMAPClient; we need to wrap the IO::Socket::Socks
object with IO::Socket::SSL before handing it to
Mail::IMAPClient.
Eric Wong [Sat, 18 Sep 2021 09:33:29 +0000 (09:33 +0000)]
net_reader: detect IMAP failures earlier
An Mail::IMAPClient object may be returned even on connection
failure, so use IsConnected to check for it. This ensures
git-credential will no longer prompt for passwords when there's
no connection.
Eric Wong [Sat, 18 Sep 2021 09:33:27 +0000 (09:33 +0000)]
ds: support add unique timers
A common pattern we use is to arm a timer once and prevent
it from being armed until it fires. We'll be using it more
to do polling for saved searches and imports.
Eric Wong [Sat, 18 Sep 2021 09:33:25 +0000 (09:33 +0000)]
lei_mail_sync: rely on flock(2), avoid IPC
Since 44917fdd24a8bec1 ("lei_mail_sync: do not use transactions"),
relying on lei/store to serialize access was a pointless endeavor.
Rely on flock(2) to serialize multiple writers since (in my
experience) it's the easiest way to deal with parallel writers
when using SQLite. This allows us to simplify existing callers
while speeding up 'lei refresh-mail-sync --all=local' by 5% or
so.
Eric Wong [Sat, 18 Sep 2021 09:33:24 +0000 (09:33 +0000)]
lei: lock worker counts
It doesn't seem worthwhile to change worker counts dynamically
on a per-command-basis with lei, and I don't know how such an
interface would even work...
Eric Wong [Wed, 8 Sep 2021 16:42:29 +0000 (14:42 -0200)]
git_http_backend: forward HTTP_GIT_PROTOCOL in request headers
It looks like git-http-backend(1) will support
HTTP_GIT_PROTOCOL, soon, and we won't have to add GIT_PROTOCOL
support to support newer versions of the git protocol, either.
Eric Wong [Fri, 17 Sep 2021 12:38:36 +0000 (07:38 -0500)]
doc: add lei-security(7) manpage
It seems like a good idea to have a manpage where somebody
can quickly look up and address their concerns as to what
to put on encrypted device/filesystem.
And I probably would've designed lei around make(1) for
parallelization if I didn't have to keep credentials off
the FS :P
Eric Wong [Fri, 17 Sep 2021 12:12:30 +0000 (07:12 -0500)]
script/lei: umask(077) before execve
While my MUA also runs umask(077) unconditionally, not all
MUAs do. Additionally, pagers may support writing its buffer
to disk, so ensure anything else we spawn has umask(077).
Eric Wong [Fri, 17 Sep 2021 11:00:23 +0000 (11:00 +0000)]
fetch: ignore non-writable epoch dirs
This will eventually be useful for maintaing partial mirrors.
Keeping inline with the original public-inbox-fetch philosophy,
there are no additional config files to manage:
the user merely needs to remove write permissions to an $N.git
directory to prevent it from being updated.
Re-enabling updates just requires restoring write permission.
Eric Wong [Fri, 17 Sep 2021 04:40:07 +0000 (13:40 +0900)]
search: fix rt: w/ approxidate when TZ != UTC
While git respects a user's local timezone and returns
seconds-since-the-Epoch, we were unnecessarily and incorrectly
calling gmtime+strftime on its result. So ignore calling
gmtime+strftime when the strftime format is "%s", just feed
the output time from git directly to Xapian.
This is mainly for lei, which will likely run in a variety of
timezones. While we're at it, add a recommendation to use
TZ=UTC in public-inbox-httpd, in case there are (misguided :P)
sysadmins who set a non-UTC TZ.
Eric Wong [Fri, 17 Sep 2021 01:56:44 +0000 (20:56 -0500)]
lei refresh-mail-sync: drop old IMAP folder info
Like with Maildir, IMAP folders can be deleted entirely.
Ensure they can be eliminated, but don't be fooled into
removing them if they're temporarily unreachable.
Eric Wong [Fri, 17 Sep 2021 01:56:43 +0000 (20:56 -0500)]
lei refresh-mail-sync: implicitly remove missing folders
There's no point in keeping mail_sync.sqlite3 entries around
if the folder is gone. We do keep saved-search configs around,
however, since somebody may decide to blow away a search and
start over.
Eric Wong [Fri, 17 Sep 2021 01:56:40 +0000 (20:56 -0500)]
lei_mail_sync: don't hold statement handle into callback
This can cause readers and writers to conflict since the
implicit transaction from SELECT in a LeiRefreshMailSync
worker would block the LeiStore process.
Eric Wong [Fri, 17 Sep 2021 01:56:39 +0000 (20:56 -0500)]
lei refresh-mail-sync: replace prune-mail-sync
Merely pruning mail synchronization information was
insufficient for Maildir: renames are common in Maildir
and we need to detect them after-the-fact when lei-daemon
isn't running.
Running this command could make "lei index" far more
useful...
v2: close R/O mail_sync.sqlite3 dbh before fork
Keeping the DB file handle open across fork can cause bad things
to happen even if we don't use it since sqlite3 itself still knows
about it (but doesn't know Perl code doesn't know about it).
Eric Wong [Thu, 16 Sep 2021 20:15:20 +0000 (14:15 -0600)]
lei_pmdir: do not attempt to trigger network auth
Since some commands access both Maildirs and IMAP/NNTP servers
at the same time, LeiPmdir may see the same lei->{auth} and
lei->{net} objects as the sibling LeiInput-based workers.
Delete those at fork and do not attempt to do authentication in
those cases, since "net_merge_continue" will not be a registered
op and cause PktOp to fail even if authentication /can/ work
from a LeiPmdir worker.
Eric Wong [Thu, 16 Sep 2021 07:45:45 +0000 (07:45 +0000)]
doc: lei-mail-formats: add "eml" and expand on git things
While "eml" is not an output format, it seems worthy
to document, here, since users are likely to have experience
with *.patch files from "git format-patch".
Eric Wong [Thu, 16 Sep 2021 09:41:16 +0000 (09:41 +0000)]
net_reader: load IO::Socket::Socks in all workers
This was previously undetected since SOCKS is mainly used for
read-only (single worker) tasks, and worker[0] always loaded
the module. However, "lei refresh-mail-sync" can bounce reads
to any worker, so we need to ensure worker[1..Inf] load it, too.
Eric Wong [Thu, 16 Sep 2021 02:19:43 +0000 (21:19 -0500)]
imapd: sort LIST response
While RFC 3501 doesn't require LIST responses be sorted,
it makes reading protocol dumps easier and we memoize it
once per-refresh, so it shouldn't be too expensive even
with thousands of folders.
Eric Wong [Thu, 16 Sep 2021 02:19:42 +0000 (21:19 -0500)]
lei ls-mail-source: sort IMAP folder names
Otherwise, public-inbox-imapd will emit mailboxes in random
order (as IMAP servers do not need to guarantee any sort of
ordering). We'll take into account numeric slice numbers
generated by -imapd if they exist, so slice "80" doesn't show up
next to "8".
Eric Wong [Thu, 16 Sep 2021 00:26:53 +0000 (00:26 +0000)]
www_stream: note existence of IMAP and NNTP URLs
The "mirror" link may not clue users into the existence of
NNTP and IMAP servers, so add a note about them (but don't
list them, in case there are dozens of URLs :>).
Eric Wong [Wed, 15 Sep 2021 21:35:58 +0000 (21:35 +0000)]
fetch|clone|--mirror: shorten paths for progress output
The full pathname for "curl -o ..." was too noisy and confusing.
Reduce confusion by adding the ".tmp" suffix and relying on
"-C". We'll also avoid displaying "-C" in run_reap() and
rely on "--git-dir=" with "git fetch" to display progress for
users.
Since the beginning of time, I've been dropping Makefiles
in $INBOX_DIR (and above hiearchies) to organize groups
of commands.
make(1) is widely available in various flavors and a familiar
tool for our target audience. It is easy to run in the right
directory, typically has built-in shell completion, and doesn't
silently ignore errors by default like Bourne shell.
Eric Wong [Wed, 15 Sep 2021 21:35:55 +0000 (21:35 +0000)]
fetch: support --exit-code switch
As noted in the new manpage entry, this is useful for avoiding
public-inbox-index invocations when there's nothing to update.
We use 127 to match "grok-pull", and also because it doesn't
conflict with any of the current curl(1) exit codes.
Eric Wong [Wed, 15 Sep 2021 11:26:17 +0000 (11:26 +0000)]
multi_git: hoist out common epoch/alternates handling
IMHO, this greatly improves code sharing and organization
between v2, extindex, and lei/store. Common git-related
logic for these is lightly-refactored and easier to reason
about.
The impetus for this big change was to ensure inboxes
created+managed by public-inbox-{clone,fetch} could have
alternates and configs setup properly without depending on
SQLite (via V2Writable). This change does that while
making old code shorter and better factored.
Eric Wong [Tue, 14 Sep 2021 20:12:16 +0000 (20:12 +0000)]
doc: update authentication notes for lei
~/.netrc isn't used by default any more, and I'm not sure it's
worthwhile to document the --netrc switch since it's rare for
non-FTP clients to support.
Followup-to: 9d11ed460ce113dd ("lei: do not read ~/.netrc by default") Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Eric Wong [Tue, 14 Sep 2021 08:53:22 +0000 (08:53 +0000)]
spawn+gcf2: improve diagnostics for build failures
I'm not sure why, but I noticed the one of my latest restarts of
public-inbox-httpd wasn't loading the Inline::C .so for Gcf2 nor
Spawn. I also can't reproduce the problem as both .so files are
loaded fine on a restart with zero config changes.
In any case, some extra, automatic diagnostics for build errors
won't hurt, as no extra noise is introduced for successful builds.
This will also make future development of C code more convenient,
hopefully.
Eric Wong [Tue, 14 Sep 2021 02:39:05 +0000 (02:39 +0000)]
lei up: fix env/cwd mismatches with multiple folders
By moving %ENV localization and fchdir into ->dispatch,
we can maintain a consistent environment across multiple
dispatches while having different clients.
Eric Wong [Tue, 14 Sep 2021 02:39:04 +0000 (02:39 +0000)]
test_common: remove non-hidden files, first
We want to remove any inotify-watched files before removing
~/.local/lei/store/ipc.lock, since sto_done_request was failing
on attempts to lock a non-existent lei/store/ipc.lock file.
Eric Wong [Tue, 14 Sep 2021 02:39:03 +0000 (02:39 +0000)]
t/run: TEST_LEI_DAEMON_PERSIST: die if pid changes
While persisting lei-daemon across different test cases isn't
the default anymore, we can notice problems more quickly if
the daemon PID changes since the daemon gets auto-restarted
after failures.
Eric Wong [Tue, 14 Sep 2021 02:39:02 +0000 (02:39 +0000)]
lei: sto_done_request: add eval guard
Failures here can cause the lei-daemon event loop to break
since PktOp doesn't guard dispatch. Add a guard here (and
not deeper in the stack) so we can use the $lei object to
report errors.
Eric Wong [Mon, 13 Sep 2021 20:53:50 +0000 (20:53 +0000)]
tests: add require_cmd, require curl when needed
t/v2mirror.t and t/lei-mirror.t are now skipped when curl
is missing (instead of failing in appropriate places).
A bunch of which() checks are updated to use require_cmd
to avoid explicitly loading Spawn.