Eric Wong [Sun, 19 Sep 2021 12:50:25 +0000 (12:50 +0000)]
ipc: drop dynamic WQ process counts
In retrospect, I don't think it's needed; and trying to wire up
a user interface for lei to manage process counts doesn't seem
worthwhile. It could be resurrected for public-facing daemon
use in the future, but that's what version control systems are for.
This also lets us automatically avoid setting up broadcast
sockets
Eric Wong [Sun, 19 Sep 2021 12:50:23 +0000 (12:50 +0000)]
lei: simplify sto_done_request
With the switch from pipes to sockets for lei-daemon =>
lei/store IPC, we can send the script/lei client socket to the
lei/store process and rely on reference counting in both Perl
and the kernel to persist the script/lei.
Eric Wong [Sun, 19 Sep 2021 00:36:04 +0000 (00:36 +0000)]
doc: tuning: note git 2.33+, move libgit2 into Inline::C section
git 2.33+ contains important optimizations for the
thousands-of-inboxes case. And combine the Inline::C stuff
with libgit2, since our use of libgit2 requires Inline::C.
Eric Wong [Sat, 18 Sep 2021 22:38:43 +0000 (22:38 +0000)]
t/lei-refresh-mail-sync: improve test reliability
We can't assume -imapd will be ready by the time we try to
connect to it after restart when using "-l $ADDR". So recreate
the (closed-for-testing) listen socket in the parent and hand it
off to -imapd as we do normally
Eric Wong [Sat, 18 Sep 2021 09:33:32 +0000 (09:33 +0000)]
lei up: automatically use dt: for remote externals
Since we can't use maxuid for remote externals, automatically
maintaining the last time we got results and appending a dt:
range to the query will prevent HTTP(S) responses from getting
too big.
We could be using "rt:", but no stable release of public-inbox
supports it, yet, so we'll use dt:, instead.
By default, there's a two day fudge factor to account for MTA
downtime and delays; which is hopefully enough. The fudge
factor may be changed per-invocation with the
--remote-fudge-factor=INTERVAL option
Since different externals can have different message transport
routes, "lastresult" entries are stored on a per-external basis.
Eric Wong [Sat, 18 Sep 2021 09:33:31 +0000 (09:33 +0000)]
net_reader: set SO_KEEPALIVE on all Net::NNTP sockets
SO_KEEPALIVE can prevent stuck processes and is safe to enable
unconditionally on all TCP sockets (like git, and the rest of
public-inbox does). Verified via strace on both NNTP and NNTPS
with and without nntp.proxy=socks5h://...
Eric Wong [Sat, 18 Sep 2021 09:33:30 +0000 (09:33 +0000)]
net_reader: support imaps:// w/ socks5h:// proxy
While Non-TLS IMAP worked perfectly with IO::Socket::Socks
and Mail::IMAPClient; we need to wrap the IO::Socket::Socks
object with IO::Socket::SSL before handing it to
Mail::IMAPClient.
Eric Wong [Sat, 18 Sep 2021 09:33:29 +0000 (09:33 +0000)]
net_reader: detect IMAP failures earlier
An Mail::IMAPClient object may be returned even on connection
failure, so use IsConnected to check for it. This ensures
git-credential will no longer prompt for passwords when there's
no connection.
Eric Wong [Sat, 18 Sep 2021 09:33:27 +0000 (09:33 +0000)]
ds: support add unique timers
A common pattern we use is to arm a timer once and prevent
it from being armed until it fires. We'll be using it more
to do polling for saved searches and imports.
Eric Wong [Sat, 18 Sep 2021 09:33:25 +0000 (09:33 +0000)]
lei_mail_sync: rely on flock(2), avoid IPC
Since 44917fdd24a8bec1 ("lei_mail_sync: do not use transactions"),
relying on lei/store to serialize access was a pointless endeavor.
Rely on flock(2) to serialize multiple writers since (in my
experience) it's the easiest way to deal with parallel writers
when using SQLite. This allows us to simplify existing callers
while speeding up 'lei refresh-mail-sync --all=local' by 5% or
so.
Eric Wong [Sat, 18 Sep 2021 09:33:24 +0000 (09:33 +0000)]
lei: lock worker counts
It doesn't seem worthwhile to change worker counts dynamically
on a per-command-basis with lei, and I don't know how such an
interface would even work...
Eric Wong [Wed, 8 Sep 2021 16:42:29 +0000 (14:42 -0200)]
git_http_backend: forward HTTP_GIT_PROTOCOL in request headers
It looks like git-http-backend(1) will support
HTTP_GIT_PROTOCOL, soon, and we won't have to add GIT_PROTOCOL
support to support newer versions of the git protocol, either.
Eric Wong [Fri, 17 Sep 2021 12:38:36 +0000 (07:38 -0500)]
doc: add lei-security(7) manpage
It seems like a good idea to have a manpage where somebody
can quickly look up and address their concerns as to what
to put on encrypted device/filesystem.
And I probably would've designed lei around make(1) for
parallelization if I didn't have to keep credentials off
the FS :P
Eric Wong [Fri, 17 Sep 2021 12:12:30 +0000 (07:12 -0500)]
script/lei: umask(077) before execve
While my MUA also runs umask(077) unconditionally, not all
MUAs do. Additionally, pagers may support writing its buffer
to disk, so ensure anything else we spawn has umask(077).
Eric Wong [Fri, 17 Sep 2021 11:00:23 +0000 (11:00 +0000)]
fetch: ignore non-writable epoch dirs
This will eventually be useful for maintaing partial mirrors.
Keeping inline with the original public-inbox-fetch philosophy,
there are no additional config files to manage:
the user merely needs to remove write permissions to an $N.git
directory to prevent it from being updated.
Re-enabling updates just requires restoring write permission.
Eric Wong [Fri, 17 Sep 2021 04:40:07 +0000 (13:40 +0900)]
search: fix rt: w/ approxidate when TZ != UTC
While git respects a user's local timezone and returns
seconds-since-the-Epoch, we were unnecessarily and incorrectly
calling gmtime+strftime on its result. So ignore calling
gmtime+strftime when the strftime format is "%s", just feed
the output time from git directly to Xapian.
This is mainly for lei, which will likely run in a variety of
timezones. While we're at it, add a recommendation to use
TZ=UTC in public-inbox-httpd, in case there are (misguided :P)
sysadmins who set a non-UTC TZ.
Eric Wong [Fri, 17 Sep 2021 01:56:44 +0000 (20:56 -0500)]
lei refresh-mail-sync: drop old IMAP folder info
Like with Maildir, IMAP folders can be deleted entirely.
Ensure they can be eliminated, but don't be fooled into
removing them if they're temporarily unreachable.
Eric Wong [Fri, 17 Sep 2021 01:56:43 +0000 (20:56 -0500)]
lei refresh-mail-sync: implicitly remove missing folders
There's no point in keeping mail_sync.sqlite3 entries around
if the folder is gone. We do keep saved-search configs around,
however, since somebody may decide to blow away a search and
start over.
Eric Wong [Fri, 17 Sep 2021 01:56:40 +0000 (20:56 -0500)]
lei_mail_sync: don't hold statement handle into callback
This can cause readers and writers to conflict since the
implicit transaction from SELECT in a LeiRefreshMailSync
worker would block the LeiStore process.
Eric Wong [Fri, 17 Sep 2021 01:56:39 +0000 (20:56 -0500)]
lei refresh-mail-sync: replace prune-mail-sync
Merely pruning mail synchronization information was
insufficient for Maildir: renames are common in Maildir
and we need to detect them after-the-fact when lei-daemon
isn't running.
Running this command could make "lei index" far more
useful...
v2: close R/O mail_sync.sqlite3 dbh before fork
Keeping the DB file handle open across fork can cause bad things
to happen even if we don't use it since sqlite3 itself still knows
about it (but doesn't know Perl code doesn't know about it).
Eric Wong [Thu, 16 Sep 2021 20:15:20 +0000 (14:15 -0600)]
lei_pmdir: do not attempt to trigger network auth
Since some commands access both Maildirs and IMAP/NNTP servers
at the same time, LeiPmdir may see the same lei->{auth} and
lei->{net} objects as the sibling LeiInput-based workers.
Delete those at fork and do not attempt to do authentication in
those cases, since "net_merge_continue" will not be a registered
op and cause PktOp to fail even if authentication /can/ work
from a LeiPmdir worker.
Eric Wong [Thu, 16 Sep 2021 07:45:45 +0000 (07:45 +0000)]
doc: lei-mail-formats: add "eml" and expand on git things
While "eml" is not an output format, it seems worthy
to document, here, since users are likely to have experience
with *.patch files from "git format-patch".
Eric Wong [Thu, 16 Sep 2021 09:41:16 +0000 (09:41 +0000)]
net_reader: load IO::Socket::Socks in all workers
This was previously undetected since SOCKS is mainly used for
read-only (single worker) tasks, and worker[0] always loaded
the module. However, "lei refresh-mail-sync" can bounce reads
to any worker, so we need to ensure worker[1..Inf] load it, too.
Eric Wong [Thu, 16 Sep 2021 02:19:43 +0000 (21:19 -0500)]
imapd: sort LIST response
While RFC 3501 doesn't require LIST responses be sorted,
it makes reading protocol dumps easier and we memoize it
once per-refresh, so it shouldn't be too expensive even
with thousands of folders.
Eric Wong [Thu, 16 Sep 2021 02:19:42 +0000 (21:19 -0500)]
lei ls-mail-source: sort IMAP folder names
Otherwise, public-inbox-imapd will emit mailboxes in random
order (as IMAP servers do not need to guarantee any sort of
ordering). We'll take into account numeric slice numbers
generated by -imapd if they exist, so slice "80" doesn't show up
next to "8".
Eric Wong [Thu, 16 Sep 2021 00:26:53 +0000 (00:26 +0000)]
www_stream: note existence of IMAP and NNTP URLs
The "mirror" link may not clue users into the existence of
NNTP and IMAP servers, so add a note about them (but don't
list them, in case there are dozens of URLs :>).
Eric Wong [Wed, 15 Sep 2021 21:35:58 +0000 (21:35 +0000)]
fetch|clone|--mirror: shorten paths for progress output
The full pathname for "curl -o ..." was too noisy and confusing.
Reduce confusion by adding the ".tmp" suffix and relying on
"-C". We'll also avoid displaying "-C" in run_reap() and
rely on "--git-dir=" with "git fetch" to display progress for
users.
Since the beginning of time, I've been dropping Makefiles
in $INBOX_DIR (and above hiearchies) to organize groups
of commands.
make(1) is widely available in various flavors and a familiar
tool for our target audience. It is easy to run in the right
directory, typically has built-in shell completion, and doesn't
silently ignore errors by default like Bourne shell.
Eric Wong [Wed, 15 Sep 2021 21:35:55 +0000 (21:35 +0000)]
fetch: support --exit-code switch
As noted in the new manpage entry, this is useful for avoiding
public-inbox-index invocations when there's nothing to update.
We use 127 to match "grok-pull", and also because it doesn't
conflict with any of the current curl(1) exit codes.
Eric Wong [Wed, 15 Sep 2021 11:26:17 +0000 (11:26 +0000)]
multi_git: hoist out common epoch/alternates handling
IMHO, this greatly improves code sharing and organization
between v2, extindex, and lei/store. Common git-related
logic for these is lightly-refactored and easier to reason
about.
The impetus for this big change was to ensure inboxes
created+managed by public-inbox-{clone,fetch} could have
alternates and configs setup properly without depending on
SQLite (via V2Writable). This change does that while
making old code shorter and better factored.
Eric Wong [Tue, 14 Sep 2021 20:12:16 +0000 (20:12 +0000)]
doc: update authentication notes for lei
~/.netrc isn't used by default any more, and I'm not sure it's
worthwhile to document the --netrc switch since it's rare for
non-FTP clients to support.
Followup-to: 9d11ed460ce113dd ("lei: do not read ~/.netrc by default") Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Eric Wong [Tue, 14 Sep 2021 08:53:22 +0000 (08:53 +0000)]
spawn+gcf2: improve diagnostics for build failures
I'm not sure why, but I noticed the one of my latest restarts of
public-inbox-httpd wasn't loading the Inline::C .so for Gcf2 nor
Spawn. I also can't reproduce the problem as both .so files are
loaded fine on a restart with zero config changes.
In any case, some extra, automatic diagnostics for build errors
won't hurt, as no extra noise is introduced for successful builds.
This will also make future development of C code more convenient,
hopefully.
Eric Wong [Tue, 14 Sep 2021 02:39:05 +0000 (02:39 +0000)]
lei up: fix env/cwd mismatches with multiple folders
By moving %ENV localization and fchdir into ->dispatch,
we can maintain a consistent environment across multiple
dispatches while having different clients.
Eric Wong [Tue, 14 Sep 2021 02:39:04 +0000 (02:39 +0000)]
test_common: remove non-hidden files, first
We want to remove any inotify-watched files before removing
~/.local/lei/store/ipc.lock, since sto_done_request was failing
on attempts to lock a non-existent lei/store/ipc.lock file.
Eric Wong [Tue, 14 Sep 2021 02:39:03 +0000 (02:39 +0000)]
t/run: TEST_LEI_DAEMON_PERSIST: die if pid changes
While persisting lei-daemon across different test cases isn't
the default anymore, we can notice problems more quickly if
the daemon PID changes since the daemon gets auto-restarted
after failures.
Eric Wong [Tue, 14 Sep 2021 02:39:02 +0000 (02:39 +0000)]
lei: sto_done_request: add eval guard
Failures here can cause the lei-daemon event loop to break
since PktOp doesn't guard dispatch. Add a guard here (and
not deeper in the stack) so we can use the $lei object to
report errors.
Eric Wong [Mon, 13 Sep 2021 20:53:50 +0000 (20:53 +0000)]
tests: add require_cmd, require curl when needed
t/v2mirror.t and t/lei-mirror.t are now skipped when curl
is missing (instead of failing in appropriate places).
A bunch of which() checks are updated to use require_cmd
to avoid explicitly loading Spawn.
Eric Wong [Sun, 12 Sep 2021 11:58:15 +0000 (11:58 +0000)]
fetch: drop 304 Not Modified support, simplify comparisons
Timestamp comparisons only have 1 second granularity, which
isn't nearly enough for our test cases, and probably not for
real world use for "git send-email" bursts and fast SMTP
servers.
We'll continue to check modification times inside the manifest,
though, in case an extremely rare SHA-1 collision is found...
Eric Wong [Sun, 12 Sep 2021 07:47:16 +0000 (07:47 +0000)]
fetch: use manifest.js.gz for v1
This is gentler to the remote HTTP server in the no-op case and
will allow client migrations to some v2-ish format without
forcing the client to redownload everything.
Eric Wong [Sun, 12 Sep 2021 07:47:15 +0000 (07:47 +0000)]
init: set a useful description
"Unnamed repository" for v1 inboxes was misleading, and having a
non-existent description for v2 was equally annoying, so set a
short description based on the primary address.
We remove descriptions when setting up new test inboxes to
preserve the behavior of the t/lei-mirror.t test case.
Eric Wong [Sun, 12 Sep 2021 07:47:12 +0000 (07:47 +0000)]
new public-inbox-{clone,fetch} commands
Setting up and maintaining git-only mirrors of v2 inboxes is
complex since multiple commands are required to clone and fetch
into epochs.
Unlike grokmirror, these commands do not require any
configuration. Instead, they rely on existing git config files
and work like "git clone --mirror" and "git fetch",
respectively.
Like grokmirror, they use manifest.js.gz, but only on a
per-inbox basis so users won't have to clone every inbox of a
large instance nor edit config files to include/exclude inboxes
they're interested in.
Eric Wong [Sat, 11 Sep 2021 23:30:46 +0000 (23:30 +0000)]
www: use ->ALL for per-inbox manifest.js.gz, too
With 11 epochs on LKML, the lkml/manifest.js.gz response time
goes from around 60ms to around 10ms, a significant improvement.
And improve test coverage while we're at it.
When generating per-inbox manifests, we were forgetting to
cleanup per-epoch "git cat-file --batch" processes. Our
previous method of generating modified times was also stupidly
inefficient, so replace the pipeline with a single
"git for-each-ref" invocation.
Eric Wong [Sat, 11 Sep 2021 08:33:19 +0000 (08:33 +0000)]
lei q|lcat: support "-f reply" output format
When composing replies in "git format-patch" cover letters,
I'd been relying on "lei q -f text ...", but that still requires
several steps to make it suitable for composing a reply:
* s/^/> / to quote the body
* drop existing In-Reply-To+References
* s/^Message-ID:/In-Reply-To:/;
* add an attribute line
...
"lei q -f reply" takes care of most of that and users will
only have to trim "From " lines, unnecessary results and
over-quoted text (and trimming is likely less error-prone
than doing all the steps above manually).
This should also be a good replacement for
"git format-patch --in-reply-to=...", since copying long
Message-IDs can be error-prone (and this lets you include
quoted text in replies).
Eric Wong [Sat, 11 Sep 2021 00:19:17 +0000 (00:19 +0000)]
lei: normalize whitespace in remote queries
Having redundant "+" in URLs is ugly and can hurt cacheability
of queries. Even with "quoted phrase searches", Xapian seems
unaffected by redundant spaces, so just normalize the ASCII
white spaces to ' ' (%20) when fed via STDIN or saved-search
config file.
Eric Wong [Fri, 10 Sep 2021 13:10:04 +0000 (13:10 +0000)]
INSTALL: depend on URI rather than URI::Escape
As far as I can tell, URI::Escape has always been a part of the
`URI' package (aka "distribution" on CPAN) and not distributed
separately (unlike URI::Escape::XS). So avoid confusing users
with `URI::Escape' and just document `URI' instead.
Along the same lines, we depend on the `Plack' package rather
than Plack::Util or Plack::Builder, after all.
Eric Wong [Fri, 10 Sep 2021 11:46:53 +0000 (11:46 +0000)]
lei up: only delay non-zero "# $NR written to ..."
"# 0 written to $FOLDER" messages aren't important to the
user, so we can show them in real time and allow them to
be lost in the terminal scroll. When >0 messages are
written to a folder, we'll show them last so a user
will know which folders to open with their MUA.
Eric Wong [Fri, 10 Sep 2021 09:08:49 +0000 (09:08 +0000)]
lei: do not read ~/.netrc by default
Since ~/.netrc isn't widely used by most (if any) NNTP and IMAP
clients, we won't read it by default for lei. AFAIK, ~/.netrc
is mainly by FTP clients (e.g. ftp(1) and lftp(1)). wget uses
it by default for HTTP(S) (and FTP), but curl does not.
To avoid breaking stable release use cases, public-inbox-watch
continues to read ~/.netrc by default.
The --netrc switch is supported by all existing lei commands
which may use curl.
Eric Wong [Fri, 10 Sep 2021 09:15:36 +0000 (09:15 +0000)]
lei add-external --mirror: quiet unlink error on ENOENT
If the mirror.done file doesn't exist for unlink, it's because
we already got another error, so don't confuse users by noting
an unlink error since the ENOENT is expected in the face of
other errors.
Eric Wong [Fri, 10 Sep 2021 05:51:00 +0000 (05:51 +0000)]
lei add-external --mirror: deduce paths for PSGI mount prefixes
The current manifest.js.gz generation in WWW doesn't account for
PSGI mount prefixes (and grokmirror 1.x appears to work fine).
In other words, <https://yhbt.net/lore/lkml/manifest.js.gz>
currently has keys like "/lkml/git/0.git" and not
"/lore/lkml/git/0.git" where "/lore" is the PSGI mount prefix.
This works fine with the prefix accounted for in my grokmirror
(1.x) repos.conf like this:
site = https://yhbt.net/lore/
manifest = https://yhbt.net/lore/manifest.js.gz
Adding the PSGI mount prefix in manifest.js.gz is probably not
desirable since it would force the prefix into the locally
cloned path by grokmirror, and all the cloned directories
would have the remote PSGI mount prefix prepended to the
toplevel.
So, "lei add-external --mirror" needs to account for PSGI
mount prefixes by deducing the prefix based on available keys
in the manifest.js.gz hash table.
Eric Wong [Thu, 9 Sep 2021 05:25:05 +0000 (05:25 +0000)]
net_reader: support Mail::IMAPClient Ignoresizeerrors
Some proprietary servers may do wacky things and give the
wrong size, so Mail::IMAPClient has a knob for this which
we can expose to users to workaround this.
Eric Wong [Thu, 9 Sep 2021 05:25:03 +0000 (05:25 +0000)]
net_reader: combine Net::NNTP and IMAPClient args
Since these are keyed by IMAP and NNTP URIs which can never
conflict, it simplifies our internals to keep them in one big
hash since we'll add POP3 and JMAP client support.
Eric Wong [Thu, 9 Sep 2021 05:25:02 +0000 (05:25 +0000)]
net_reader: imap_opt => cfg_opt
Since this our internal IMAP options are keyed by URI section,
there's no need to have separate hashes for NNTP and IMAP
options since they URI already distinguishes them.
This will make future changes to support POP3 and JMAP and
arg caching with lei/store easier.