Eric Wong [Mon, 22 Nov 2021 18:38:09 +0000 (18:38 +0000)]
lei: always use 3-arg open perlop
Future-proofing in case future versions of Perl warn on this, since
2-arg forms of open may be subject to injection vulnerabilities
with non-literal args.
Eric Wong [Mon, 22 Nov 2021 18:23:52 +0000 (18:23 +0000)]
searchidx: avoid modification of read-only `$_'
This fixes the "Modification of a read-only value attempted at ..."
error in an initial run of t/reindex-time-range.t. It was
reproducible by running `rm -rf t/data-gen/reindex-time-range.v*'
before `make && prove -bvw t/reindex-time-range.t'. Thanks to
Jörg Rödel for providing the backtrace which helped find this.
Eric Wong [Wed, 10 Nov 2021 10:33:16 +0000 (10:33 +0000)]
t/lei-watch: test with with higher sleep
0.1s may not be enough for a task switch and inotify wakeup,
so try doubling it and see if it fixes test reliability, for
now. A future change may be to implement a watcher/tracer
for inotify -> lei/store events.
Eric Wong [Wed, 10 Nov 2021 10:28:37 +0000 (10:28 +0000)]
lei q: make HTTP(S) query strings even less ugly
Following commit 57fed2e4b78ed394 (lei: normalize whitespace in
remote queries, 2021-09-11), leaving the trailing `\n' from
stdin queries to be normalized to ` ' (SP) causes it to appear
as `+' in URLs, which Xapian ignores.
Eric Wong [Wed, 10 Nov 2021 10:28:37 +0000 (10:28 +0000)]
lei q: disallow "\n" in argv[] elements
I don't expect this to be hit in real-world use via normal
interactive shells. However, somebody could accidentally add
"\n" in languages (e.g. Perl, C) where it's easy to pass "\n"
in argv[].
Eric Wong [Wed, 10 Nov 2021 10:28:37 +0000 (10:28 +0000)]
lei up: infer rawstr from old searches via trailing "\n"
For --stdin searches created prior to commit 666dde69a3f6 (lei
q|up: fix saved searches for single-phrase search, 2021-11-08)
we still want to be able to run "lei up" on them without
regressions. So assume nobody manages to enter "\n" as an
argv[] element and consider the presence of "\n" as a previous
--stdin use.
This fixes errors from "lei up" such as:
lei_xsearch 2 wq_worker: Exception: Key too long: length was 840 bytes,
maximum length of a key is 255 bytes at ../PublicInbox/IPC.pm line 250.
Fixes: 666dde69a3f6 ("lei q|up: fix saved searches for single-phrase search")
Eric Wong [Tue, 9 Nov 2021 00:20:50 +0000 (00:20 +0000)]
build: do not repeatedly build some docs
Text versions of manpages do not need to be generated for normal
installations, they're only used for generating HTML and our
amazing, award-winning homepage.
We'll also rely on touch(1) instead of Perl utime to benefit
users w/o git-set-file-times in txt2pre. Perl numeric values
cannot represent nanosecond resolution accurately even with
Time::HiRes; which causes nanosecond-aware make(1)
implementations to repeatedly rebuild.
Eric Wong [Mon, 8 Nov 2021 23:39:26 +0000 (23:39 +0000)]
lei q|up: fix saved searches for single-phrase search
`"' (double-quote) needs to be quoted for stdin searches.
We also need to differentiate between "lei q --stdin" usage
when calling "lei up", do it by setting an internal "rawstr"
knob to ensure we can parse the config properly regardless
of whether the initial search used --stdin or not.
Eric Wong [Thu, 4 Nov 2021 07:03:01 +0000 (07:03 +0000)]
AUTHORS: clarify my title
Being an anti-centralization, anti-authority project; the
traditional meaning of "Benevolent Dictator" never sat well
with me.
Benevolence is relative; and I've never been benevolent towards
monopolist-types who try to consolidate power and influence.
Power corrupts, after all. In any case, I'll never be more than
a random idiot serving data which anybody can mirror and fork.
Eric Wong [Wed, 3 Nov 2021 20:35:55 +0000 (20:35 +0000)]
lei_curl: use http.proxy knob via URL match for curl
Using the --proxy on the command-line affects the entire
lei invocation, and users searching HTTP(S) remotes and
writing to an IMAP folder may want more fine-grained proxy
use:
lei q -o imap://no-proxy.example/foo -O https://need-proxy.example/bar ...
Eric Wong [Wed, 3 Nov 2021 21:01:21 +0000 (21:01 +0000)]
doc: add more 3rd-party refs, use Debian manpages for xapian
curl, torsocks, and gitglossary manpages are all newly
referenced, so make sure they're linkified properly in HTML.
We'll be using Debian's manpages as an ad-free, Tor-accessible
host for manpages as a fallback since hosting manpages for all
3rd-party projects we reference doesn't scale.
extindex is a far more important feature than libgit2 support
(which is actually underperforming and might go away). The
search results page is also improved (IMHO), nowadays.
Eric Wong [Wed, 3 Nov 2021 08:34:44 +0000 (08:34 +0000)]
doc: extindex: document current behavior + knobs
I'm not really sure if extindex writing to the config file
is a good idea (since -index doesn't, as -init exists).
Just document what it does and let the user handle it, since
the config file shouldn't be daunting to new users.
Eric Wong [Tue, 2 Nov 2021 18:14:45 +0000 (18:14 +0000)]
lei <rediff|rm|tag>: stdin implies `-F eml'
These commands are usually run on a single message, so saving
the user the trouble of typing `-F eml' on the command-line
seems reasonable. I don't think commands like "index" and
"import" will be too useful for single messages, though.
Eric Wong [Tue, 2 Nov 2021 18:14:43 +0000 (18:14 +0000)]
lei mail-diff: do not default to 'eml'
In retrospect, this doesn't make sense, since it needs at least
two messages to diff. So go about "normal" input rules and
require users to specify the format.
Eric Wong [Tue, 2 Nov 2021 09:24:39 +0000 (09:24 +0000)]
t/lei-refresh-mail-sync: speed up test on FreeBSD 12
And improve reliability while we're at it. It seems closing a
TCP listen socket on FreeBSD 12.2 doesn't cause connect()-ing
clients to fail. This happens regardless of whether a socket is
IPv4 or IPv6
This non-failure was causing tests to timeout slowly on the
client side instead of failing immediately. We now fork a new
process which does nothing but accept() + shutdown() to emulate
a dead server.
Reliability improves on all OSes since there's never a point in
time when another process can bind the socket.
I've been seeing the following error on occasion during "make check-run":
$PWD/t/data-gen/reindex-time-range.v1-master index failed: Modification of a read-only value attempted at $DIR/lib/PublicInbox/SearchIdx.pm line 899, <$r> line 1.
Perhaps this fixes it. In any case, a construct of:
$h->{k} //= do { $h->{x} = ...; $val };
seems wrong and may cause Perl to error out depending on how
hashes are randomized.
Eric Wong [Sat, 30 Oct 2021 08:11:43 +0000 (08:11 +0000)]
lei_to_mail: avoid SEGV on worker exit via SIGTERM
->DESTROY ordering via "exit()" calls is tricky, and dedupe
checks were causing problems.
AFAIK, this only affects users who manually enable WAL on
lei/store/ei*/over.sqlite3. Fortunately, there is no data
corruption as a result even though "read-only" WAL requires
write permissions.
Eric Wong [Sat, 30 Oct 2021 08:11:42 +0000 (08:11 +0000)]
lei_xsearch: quiet error message on SIG{PIPE,TERM}
SIGPIPE and SIGTERM are common and user-induced, so they're
not worth warning on. Add the value of "$?", though, since
it can help users notice other errors (e.g. SIGSEGV).
Eric Wong [Thu, 28 Oct 2021 11:15:01 +0000 (11:15 +0000)]
lei rm: move generic input_maildir_cb to LeiInput parent class
It's not much of a savings, right now, but maybe it can be in the
future. I wanted to eliminate the "lei convert" one, too, but
convert needs to preserve keywords which isn't possible with the
generic fallback, so new tests were written for convert, instead.
Eric Wong [Thu, 28 Oct 2021 11:14:57 +0000 (11:14 +0000)]
doc: lei blob: wording fixups, describe --remote
There's no current way to retrieve blobs by OID directly
from remote externals. Maybe the $INBOX_NAME/$OID/s/raw.eml
endpoint could be overloaded for that.
Eric Wong [Thu, 28 Oct 2021 11:14:54 +0000 (11:14 +0000)]
xt/net_writer_imap: test "lei convert" w/ IMAP source
I just did a double-take and nearly thought authentication
was broken while reading LeiConvert.pm. Add a comment in
LeiConvert.pm to clarify things, too.
Eric Wong [Wed, 27 Oct 2021 21:09:19 +0000 (21:09 +0000)]
lei q: fix remote import accounting
We need to update the {-nr_remote_eml} counter regardless
of progress display being enabled since it's needed for
saved searches. We'll also split out the {-imported} flag
separately and only call LeiStore->done if a new message
was imported.
Note: this change is NOT expected to fix errors reported by
Thomas in <ebf92218-1470-4602-b534-6dae59639dc6@t-8ch.de>
Eric Wong [Tue, 26 Oct 2021 21:18:05 +0000 (21:18 +0000)]
lei mail-diff: support more inputs, split newlines
Support --in-format like the rest of LeiInput users, and don't
default to .eml if a per-input format was specified. In any
case, I saved a bunch of messages from mutt which uses mboxcl2.
We'll also split newlines for diff, since it's a pain to read
diffs with escaped "\n" characters in them.
Eric Wong [Tue, 26 Oct 2021 10:35:57 +0000 (10:35 +0000)]
input_pipe: account for undefined {sock}
It's possible for ->event_step to fire twice due to ->requeue
with EPOLLET (but not EPOLLONESHOT). So account for that and
avoid causing event loop errors as a result.
Eric Wong [Tue, 26 Oct 2021 10:35:55 +0000 (10:35 +0000)]
lei p2q: use LeiInput for multi-patch series
The LeiInput backend now allows p2q to work like any other
command which reads .eml, .patch, mbox*, Maildir, IMAP, and NNTP
input. Running "git format-patch --stdout -1 $COMMIT" remains
supported.
This is intended to allow lower memory use while parsing
"git log --pretty=mboxrd -p" output. Previously, the entire
output of "git log" would be slurped into memory at once.
The intended use is to allow easy(-ish :P) searching for
unapplied patches as documented in the new example in the
manpage.
Eric Wong [Tue, 26 Oct 2021 10:35:52 +0000 (10:35 +0000)]
lei q: enable expensive Xapian flags
FLAG_PURE_NOT is too expensive for public-facing WWW use, but
lei isn't public-facing. We'll also unconditionally enable
phrase search on old "chert" DBs since lei doesn't need to
worry about fairness across 10K users.
Eric Wong [Tue, 26 Oct 2021 10:35:49 +0000 (10:35 +0000)]
doc: tuning: additional notes for many inboxes
-extindex is the most important piece for dealing with many
inboxes, so note it first. Also, frequent use of "git gc" is
important for both loose object performance and reducing memory
mappings.
Eric Wong [Mon, 25 Oct 2021 08:59:19 +0000 (08:59 +0000)]
lei_to_mail: write directly to mail_sync.sqlite3
No need to go through the lei/store process when we write
mail_sync.sqlite3. This ought to reduce ENOBUFS errors (and the
sleep workaround) on RAM-starved systems.
Eric Wong [Mon, 25 Oct 2021 17:53:51 +0000 (14:53 -0300)]
contrib/css/216light: add more contrast to foreground text
333 on dimmed displays doesn't show up well. I still
find 000 foregrounds too harsh, though, but 003 is available.
It seems dark enough to not cause problems while not being too
harsh.
003 should be available on more displays, even, and could fit
a 22-color "safest" color scheme.
Eric Wong [Sun, 24 Oct 2021 00:20:45 +0000 (18:20 -0600)]
git: avoid Perl5 internal scratchpad target cache
Creating a scalar ref directly off substr() seemed to be causing
the underlying non-ref scalar to end up in Perl's scratchpad.
Assign the substr result to a local variable seems sufficient to
prevent multi-megabyte SVs from lingering indefinitely when a
read-only daemon serves rare, oversized blobs.
The use of array-returning built-ins such as `grep' inside
arrayref declarations appears to result in permanently allocated
scratchpad space for caching according to my malloc inspector.
Thread skeletons get discarded every response, but multiple
skeletons can exist in memory at once, so do what we can to
prevent long-lived allocations from being made, here.
In other words, replacing constructs such as:
my $foo = [ grep(...) ];
with:
my @foo = grep(...);
Seems to ensure the mortality of the underlying array.
Eric Wong [Sun, 24 Oct 2021 00:20:43 +0000 (18:20 -0600)]
listener: emit warnings on EPERM
In retrospect, warnings for EPERM on accept4(2) failure may
help detect misconfigured firewalls, so start emitting warnings
for EPERM. Fwiw, I've never known excessive EPERM warnings
to be excessively noisy in other TCP services I've run over
the years.
Eric Wong [Sun, 24 Oct 2021 00:20:42 +0000 (18:20 -0600)]
http: use a larger buffer for ->getline responses
64K matches the Linux pipe default, and matches what we use in
httpd/async and qspawn. This should reduce syscalls used for
serving git packs via dumb HTTP and any ->getline code paths
used by other PSGI code.
This appears to speed up HTML rendering by w3m when serving
giant HTML responsees from the Devel::Mwrap::PSGI memory
debugger.
Eric Wong [Sun, 24 Oct 2021 00:20:40 +0000 (18:20 -0600)]
lei export-kw: skip read-only IMAP folders
Since we want to store IMAP flags asynchronously and not wait
for results, we can't check for IMAP errors this way and end up
wasting bandwidth on public-inbox-imapd. Now, we just check
PERMANENTFLAGS up front to ensure a folder can handle IMAP flag
storage before proceeding.
Eric Wong [Sat, 23 Oct 2021 21:53:46 +0000 (21:53 +0000)]
cmd_ipc4: retry sendmsg on ENOBUFS/ENOMEM/ETOOMANYREFS
I'm seeing ENOBUFS on a RAM-starved system, and slowing the
sender down enough for the receiver to drain the buffers seems
to work. ENOMEM and ETOOMANYREFS could be in the same boat
as ENOBUFS.
Watching for POLLOUT events via select/poll/epoll_wait doesn't
seem to work, since the kernel can already sleep (or return
EAGAIN) for cases where POLLOUT would work.
Eric Wong [Fri, 22 Oct 2021 08:22:45 +0000 (08:22 +0000)]
lei export-kw: don't recreate deleted IMAP folders
In case an IMAP folder is deleted, just set an error and
ignore it rather than creating an empty folder which we
attempt to export keywords to for non-existent messages.
Kyle Meyer [Fri, 22 Oct 2021 04:49:35 +0000 (00:49 -0400)]
wwwatomstream: call gmtime with scalar
When the gmtime() calls were moved from feed_entry() and atom_header()
into feed_updated() in c447bbbd, @_ rather than a scalar was passed to
gmtime(). As a result, feed <updated> values end up as
"1970-01-01T00:00:00Z".
Switch back to using a scalar argument to restore the correct
timestamps.
Eric Wong [Thu, 21 Oct 2021 21:10:32 +0000 (21:10 +0000)]
lei: use RENAME_NOREPLACE on Linux 3.15+
One syscall is better than two for atomicity in Maildirs. This
means there's no window where another process can see both the
old and new file at the same time (link && unlink), nor a window
where we might inadvertantly clobber an existing file if we were
to do `stat && rename'.