Eric Wong [Wed, 23 Mar 2022 21:08:19 +0000 (21:08 +0000)]
syscall: add sendmsg+recvmsg for remaining arches
aarch64, ppc64le, sparc64, loongarch64, and mips (32-bit userspace)
are all tested via machines from the GCC Farm Project
<https://cfarm.tetaneutral.net/>
Remaining syscall numbers are from musl <https://musl.libc.org/>
Eric Wong [Wed, 23 Mar 2022 08:54:35 +0000 (08:54 +0000)]
syscall: implement sendmsg+recvmsg in pure Perl
Socket::MsgHdr is only packaged for Debian and derivatives at
the moment, and Inline::C pulling in gcc/clang is a huge amount
of disk space and bandwidth for some users.
This enables disk space and/or bandwidth-limited users to use lei.
Only Linux guarantees a stable ABI and syscall numbers, but
that's the majority of our userbase. FreeBSD users will still
have to use Inline::C (or get Socket::MsgHdr packaged).
x86, x32, and x86-64 are all currently supported, more to be added.
Eric Wong [Tue, 15 Mar 2022 20:45:02 +0000 (20:45 +0000)]
www: loosen deep-linking prevention
Apparently some browsers can set a Referer: header which fails
to match. I'm not certain why, but making "$schema://$HOST_PORT"
matches case-insensitive seems more correct regardless.
In case that doesn't work, we'll also allow bypassing deep-link
prevention via a POST form button.
Eric Wong [Mon, 7 Mar 2022 10:57:37 +0000 (10:57 +0000)]
index|extindex: support --dangerous flag
This enables Xapian::DB_DANGEROUS to support in-place updates.
This can speed up the initial index and reduce I/O at the cost
of preventing concurrent readers and being unsafe in the face of
any abnormal terminations. This is more dangerous than
--no-fsync. --no-fsync is only unsafe in the event of a power
loss or kernel crash; --dangerous is unsafe even on SIGKILL.
Eric Wong [Sun, 27 Feb 2022 11:17:14 +0000 (11:17 +0000)]
t/lei-sigpipe: ensure SIGPIPE is unblocked for this test
Tests run under systemd (and similar) have SIGPIPE blocked by
default. This was causing this SIGPIPE test to get stuck when
run by automated builders used by Nix. Thanks to Julien
Moutinho and Dominique Martinet for tracking down this failure.
Eric Wong [Mon, 14 Feb 2022 05:37:25 +0000 (05:37 +0000)]
sharedkv: avoid ambiguity for numeric-like string keys
While we only store URLs and binary SHA-1/SHA-256 values in skv
at the moment, we may store potentially ambiguous keys/values in
the future. It's possible to store "02" and have it treated as
`2' unless explicitly binding parameters as SQL_BLOB. This
behavior was independent of the sqlite_unicode parameter as
evidenced by the new tests.
I only noticed this bug while hacking on another project using
DBD::SQLite, and not while hacking on public-inbox itself.
Eric Wong [Sun, 13 Feb 2022 21:01:59 +0000 (21:01 +0000)]
t/lei-*watch: disable flaky tests by default for now
Properly fixing these tests is too difficult for me at the
moment, so just disable these tests for now. A proper fix and
fleshing out support for inotify will hopefully happen at some
point.
Eric Wong [Fri, 11 Feb 2022 20:22:17 +0000 (20:22 +0000)]
view: remove all CR before LF
While we've rendered CR-LF as LF-only in HTML for many years,
some messages end up as CR-CR-LF. So strip ALL all CR bytes
preceding LF bytes, while preserving odd CR in the middle of
lines.
Eric Wong [Tue, 1 Feb 2022 23:34:28 +0000 (23:34 +0000)]
test_lei: use consistent locale for error messages
git-config(1) error messages are locale-dependent, so follow
the lead taken by git's own test suite and set LC_ALL=C and LANG=C
to ensure error messages we check against are not localized.
Eric Wong [Tue, 1 Feb 2022 01:27:50 +0000 (01:27 +0000)]
syscall: FS_IOC_*FLAGS: define on per-architecture basis
It turns out these Linux ioctls are unfortunately
architecture-dependent, and not endian-dependent.
Fixup some warning messages while we're at it, too.
Dominique Martinet [Thu, 9 Dec 2021 02:50:51 +0000 (11:50 +0900)]
syscall: fallback to rename on renameat2 EINVAL
ZFS appears to incorrectly return EINVAL on renameat2 when the operation is not
supported:
renameat2(AT_FDCWD, "...", AT_FDCWD, "...", RENAME_NOREPLACE) = -1 EINVAL
Fall back to the racy rename in this case as well:
Eric Wong [Sun, 30 Jan 2022 21:49:08 +0000 (21:49 +0000)]
rewrite Linux nodatacow use in pure Perl w/o system
btrfs is Linux-only at the moment (and likely to remain that way
for practical purposes). So rely on Linux ABI stability and use
the `syscall' and `ioctl' perlops rather than relying on Inline::C.
Inline::C (and gcc||clang) are monstrous dependencies which we
can't expect users to have.
This makes supporting new architectures more difficult, but new
architectures come along rarely and this reduces the burden for
the majority of Linux users on popular architectures (while
still avoiding the distribution of pre-built binaries).
Eric Wong [Mon, 22 Nov 2021 18:38:09 +0000 (18:38 +0000)]
lei: always use 3-arg open perlop
Future-proofing in case future versions of Perl warn on this, since
2-arg forms of open may be subject to injection vulnerabilities
with non-literal args.
Eric Wong [Mon, 22 Nov 2021 18:23:52 +0000 (18:23 +0000)]
searchidx: avoid modification of read-only `$_'
This fixes the "Modification of a read-only value attempted at ..."
error in an initial run of t/reindex-time-range.t. It was
reproducible by running `rm -rf t/data-gen/reindex-time-range.v*'
before `make && prove -bvw t/reindex-time-range.t'. Thanks to
Jörg Rödel for providing the backtrace which helped find this.
Eric Wong [Wed, 10 Nov 2021 10:33:16 +0000 (10:33 +0000)]
t/lei-watch: test with with higher sleep
0.1s may not be enough for a task switch and inotify wakeup,
so try doubling it and see if it fixes test reliability, for
now. A future change may be to implement a watcher/tracer
for inotify -> lei/store events.
Eric Wong [Wed, 10 Nov 2021 10:28:37 +0000 (10:28 +0000)]
lei q: make HTTP(S) query strings even less ugly
Following commit 57fed2e4b78ed394 (lei: normalize whitespace in
remote queries, 2021-09-11), leaving the trailing `\n' from
stdin queries to be normalized to ` ' (SP) causes it to appear
as `+' in URLs, which Xapian ignores.
Eric Wong [Wed, 10 Nov 2021 10:28:37 +0000 (10:28 +0000)]
lei q: disallow "\n" in argv[] elements
I don't expect this to be hit in real-world use via normal
interactive shells. However, somebody could accidentally add
"\n" in languages (e.g. Perl, C) where it's easy to pass "\n"
in argv[].
Eric Wong [Wed, 10 Nov 2021 10:28:37 +0000 (10:28 +0000)]
lei up: infer rawstr from old searches via trailing "\n"
For --stdin searches created prior to commit 666dde69a3f6 (lei
q|up: fix saved searches for single-phrase search, 2021-11-08)
we still want to be able to run "lei up" on them without
regressions. So assume nobody manages to enter "\n" as an
argv[] element and consider the presence of "\n" as a previous
--stdin use.
This fixes errors from "lei up" such as:
lei_xsearch 2 wq_worker: Exception: Key too long: length was 840 bytes,
maximum length of a key is 255 bytes at ../PublicInbox/IPC.pm line 250.
Fixes: 666dde69a3f6 ("lei q|up: fix saved searches for single-phrase search")
Eric Wong [Tue, 9 Nov 2021 00:20:50 +0000 (00:20 +0000)]
build: do not repeatedly build some docs
Text versions of manpages do not need to be generated for normal
installations, they're only used for generating HTML and our
amazing, award-winning homepage.
We'll also rely on touch(1) instead of Perl utime to benefit
users w/o git-set-file-times in txt2pre. Perl numeric values
cannot represent nanosecond resolution accurately even with
Time::HiRes; which causes nanosecond-aware make(1)
implementations to repeatedly rebuild.
Eric Wong [Mon, 8 Nov 2021 23:39:26 +0000 (23:39 +0000)]
lei q|up: fix saved searches for single-phrase search
`"' (double-quote) needs to be quoted for stdin searches.
We also need to differentiate between "lei q --stdin" usage
when calling "lei up", do it by setting an internal "rawstr"
knob to ensure we can parse the config properly regardless
of whether the initial search used --stdin or not.
Eric Wong [Thu, 4 Nov 2021 07:03:01 +0000 (07:03 +0000)]
AUTHORS: clarify my title
Being an anti-centralization, anti-authority project; the
traditional meaning of "Benevolent Dictator" never sat well
with me.
Benevolence is relative; and I've never been benevolent towards
monopolist-types who try to consolidate power and influence.
Power corrupts, after all. In any case, I'll never be more than
a random idiot serving data which anybody can mirror and fork.
Eric Wong [Wed, 3 Nov 2021 20:35:55 +0000 (20:35 +0000)]
lei_curl: use http.proxy knob via URL match for curl
Using the --proxy on the command-line affects the entire
lei invocation, and users searching HTTP(S) remotes and
writing to an IMAP folder may want more fine-grained proxy
use:
lei q -o imap://no-proxy.example/foo -O https://need-proxy.example/bar ...
Eric Wong [Wed, 3 Nov 2021 21:01:21 +0000 (21:01 +0000)]
doc: add more 3rd-party refs, use Debian manpages for xapian
curl, torsocks, and gitglossary manpages are all newly
referenced, so make sure they're linkified properly in HTML.
We'll be using Debian's manpages as an ad-free, Tor-accessible
host for manpages as a fallback since hosting manpages for all
3rd-party projects we reference doesn't scale.
extindex is a far more important feature than libgit2 support
(which is actually underperforming and might go away). The
search results page is also improved (IMHO), nowadays.
Eric Wong [Wed, 3 Nov 2021 08:34:44 +0000 (08:34 +0000)]
doc: extindex: document current behavior + knobs
I'm not really sure if extindex writing to the config file
is a good idea (since -index doesn't, as -init exists).
Just document what it does and let the user handle it, since
the config file shouldn't be daunting to new users.
Eric Wong [Tue, 2 Nov 2021 18:14:45 +0000 (18:14 +0000)]
lei <rediff|rm|tag>: stdin implies `-F eml'
These commands are usually run on a single message, so saving
the user the trouble of typing `-F eml' on the command-line
seems reasonable. I don't think commands like "index" and
"import" will be too useful for single messages, though.
Eric Wong [Tue, 2 Nov 2021 18:14:43 +0000 (18:14 +0000)]
lei mail-diff: do not default to 'eml'
In retrospect, this doesn't make sense, since it needs at least
two messages to diff. So go about "normal" input rules and
require users to specify the format.
Eric Wong [Tue, 2 Nov 2021 09:24:39 +0000 (09:24 +0000)]
t/lei-refresh-mail-sync: speed up test on FreeBSD 12
And improve reliability while we're at it. It seems closing a
TCP listen socket on FreeBSD 12.2 doesn't cause connect()-ing
clients to fail. This happens regardless of whether a socket is
IPv4 or IPv6
This non-failure was causing tests to timeout slowly on the
client side instead of failing immediately. We now fork a new
process which does nothing but accept() + shutdown() to emulate
a dead server.
Reliability improves on all OSes since there's never a point in
time when another process can bind the socket.
I've been seeing the following error on occasion during "make check-run":
$PWD/t/data-gen/reindex-time-range.v1-master index failed: Modification of a read-only value attempted at $DIR/lib/PublicInbox/SearchIdx.pm line 899, <$r> line 1.
Perhaps this fixes it. In any case, a construct of:
$h->{k} //= do { $h->{x} = ...; $val };
seems wrong and may cause Perl to error out depending on how
hashes are randomized.
Eric Wong [Sat, 30 Oct 2021 08:11:43 +0000 (08:11 +0000)]
lei_to_mail: avoid SEGV on worker exit via SIGTERM
->DESTROY ordering via "exit()" calls is tricky, and dedupe
checks were causing problems.
AFAIK, this only affects users who manually enable WAL on
lei/store/ei*/over.sqlite3. Fortunately, there is no data
corruption as a result even though "read-only" WAL requires
write permissions.
Eric Wong [Sat, 30 Oct 2021 08:11:42 +0000 (08:11 +0000)]
lei_xsearch: quiet error message on SIG{PIPE,TERM}
SIGPIPE and SIGTERM are common and user-induced, so they're
not worth warning on. Add the value of "$?", though, since
it can help users notice other errors (e.g. SIGSEGV).
Eric Wong [Thu, 28 Oct 2021 11:15:01 +0000 (11:15 +0000)]
lei rm: move generic input_maildir_cb to LeiInput parent class
It's not much of a savings, right now, but maybe it can be in the
future. I wanted to eliminate the "lei convert" one, too, but
convert needs to preserve keywords which isn't possible with the
generic fallback, so new tests were written for convert, instead.
Eric Wong [Thu, 28 Oct 2021 11:14:57 +0000 (11:14 +0000)]
doc: lei blob: wording fixups, describe --remote
There's no current way to retrieve blobs by OID directly
from remote externals. Maybe the $INBOX_NAME/$OID/s/raw.eml
endpoint could be overloaded for that.
Eric Wong [Thu, 28 Oct 2021 11:14:54 +0000 (11:14 +0000)]
xt/net_writer_imap: test "lei convert" w/ IMAP source
I just did a double-take and nearly thought authentication
was broken while reading LeiConvert.pm. Add a comment in
LeiConvert.pm to clarify things, too.
Eric Wong [Wed, 27 Oct 2021 21:09:19 +0000 (21:09 +0000)]
lei q: fix remote import accounting
We need to update the {-nr_remote_eml} counter regardless
of progress display being enabled since it's needed for
saved searches. We'll also split out the {-imported} flag
separately and only call LeiStore->done if a new message
was imported.
Note: this change is NOT expected to fix errors reported by
Thomas in <ebf92218-1470-4602-b534-6dae59639dc6@t-8ch.de>
Eric Wong [Tue, 26 Oct 2021 21:18:05 +0000 (21:18 +0000)]
lei mail-diff: support more inputs, split newlines
Support --in-format like the rest of LeiInput users, and don't
default to .eml if a per-input format was specified. In any
case, I saved a bunch of messages from mutt which uses mboxcl2.
We'll also split newlines for diff, since it's a pain to read
diffs with escaped "\n" characters in them.
Eric Wong [Tue, 26 Oct 2021 10:35:57 +0000 (10:35 +0000)]
input_pipe: account for undefined {sock}
It's possible for ->event_step to fire twice due to ->requeue
with EPOLLET (but not EPOLLONESHOT). So account for that and
avoid causing event loop errors as a result.
Eric Wong [Tue, 26 Oct 2021 10:35:55 +0000 (10:35 +0000)]
lei p2q: use LeiInput for multi-patch series
The LeiInput backend now allows p2q to work like any other
command which reads .eml, .patch, mbox*, Maildir, IMAP, and NNTP
input. Running "git format-patch --stdout -1 $COMMIT" remains
supported.
This is intended to allow lower memory use while parsing
"git log --pretty=mboxrd -p" output. Previously, the entire
output of "git log" would be slurped into memory at once.
The intended use is to allow easy(-ish :P) searching for
unapplied patches as documented in the new example in the
manpage.
Eric Wong [Tue, 26 Oct 2021 10:35:52 +0000 (10:35 +0000)]
lei q: enable expensive Xapian flags
FLAG_PURE_NOT is too expensive for public-facing WWW use, but
lei isn't public-facing. We'll also unconditionally enable
phrase search on old "chert" DBs since lei doesn't need to
worry about fairness across 10K users.
Eric Wong [Tue, 26 Oct 2021 10:35:49 +0000 (10:35 +0000)]
doc: tuning: additional notes for many inboxes
-extindex is the most important piece for dealing with many
inboxes, so note it first. Also, frequent use of "git gc" is
important for both loose object performance and reducing memory
mappings.
Eric Wong [Mon, 25 Oct 2021 08:59:19 +0000 (08:59 +0000)]
lei_to_mail: write directly to mail_sync.sqlite3
No need to go through the lei/store process when we write
mail_sync.sqlite3. This ought to reduce ENOBUFS errors (and the
sleep workaround) on RAM-starved systems.
Eric Wong [Mon, 25 Oct 2021 17:53:51 +0000 (14:53 -0300)]
contrib/css/216light: add more contrast to foreground text
333 on dimmed displays doesn't show up well. I still
find 000 foregrounds too harsh, though, but 003 is available.
It seems dark enough to not cause problems while not being too
harsh.
003 should be available on more displays, even, and could fit
a 22-color "safest" color scheme.