]> Sergey Matveev's repositories - public-inbox.git/log
public-inbox.git
20 months agoxt/mem-imapd-tls: update aliases to DSdeflate subs
Eric Wong [Sat, 23 Jul 2022 15:52:07 +0000 (15:52 +0000)]
xt/mem-imapd-tls: update aliases to DSdeflate subs

Fixes: 23af251dd607c4e7 (imap+nntp: share COMPRESS implementation, 2022-07-23)
20 months agonntp: use substr to check for trailing CRLF
Eric Wong [Sat, 23 Jul 2022 06:13:07 +0000 (06:13 +0000)]
nntp: use substr to check for trailing CRLF

Regexps consume more CPU cycles and memory, and aren't
necessary here since we just converted the entire buffer
to CRLF.

20 months agopop3: reduce memory use while generating the mailbox cache
Eric Wong [Sat, 23 Jul 2022 06:12:16 +0000 (06:12 +0000)]
pop3: reduce memory use while generating the mailbox cache

While the cache itself is relatively compact for 50K messages,
generating it was inefficient due to our schema and Over.pm APIs
being designed for NNTP.  While we won't change our schema for
now, we can choose better DBI APIs to use and limit our ephemeral
memory use.

This amounts to a 60% reduction in memory usage and a 5-10%
speedup against org.kernel.vger.git.0:

{
echo 'USER '$(uuidgen)'@org.kernel.vger.git.0'
echo PASS anonymous
echo STAT
echo QUIT
} | nc $HOST $PORT

20 months agoimap+nntp: share COMPRESS implementation
Eric Wong [Sat, 23 Jul 2022 04:41:55 +0000 (04:41 +0000)]
imap+nntp: share COMPRESS implementation

Their code was nearly identical to begin with, so save some
memory in -netd and disk space for all of our tarball/distro
users, at least.

And I seem to have used multiple inheritance successfully, here,
maybe...

20 months agonntp: resolve inboxes immediately on group listings
Eric Wong [Sat, 23 Jul 2022 04:41:54 +0000 (04:41 +0000)]
nntp: resolve inboxes immediately on group listings

This prevents potential races between SIGHUP config reloads
while gigantic group listings are streaming, allowing us to
avoid many invalidation checks.

This also reduces send(2) syscalls and avoid Perl internal pad
allocations in a few places where it's not beneficial.  There
might be a slight (0.5%) speedup, but I'm not sure if that's
down to system noise, power/thermal management, or other users
on my VM.

20 months agods: share long_step between NNTP and IMAP
Eric Wong [Sat, 23 Jul 2022 04:41:53 +0000 (04:41 +0000)]
ds: share long_step between NNTP and IMAP

It's not actually used by our POP3 code at the moment,
but it may be soon to reduce memory usage when loading
50K smsg objects into memory.

20 months agonntp: inline CRLF in all response lines
Eric Wong [Sat, 23 Jul 2022 04:41:52 +0000 (04:41 +0000)]
nntp: inline CRLF in all response lines

This brings NNTP closer to POP3 and IMAP implementations
to allow CoW avoidance on constants.

20 months agonntp: listgroup_range_i: remove useless `map' op
Eric Wong [Sat, 23 Jul 2022 04:41:51 +0000 (04:41 +0000)]
nntp: listgroup_range_i: remove useless `map' op

No need to iterate through the array twice; and this even seems
a hair faster than what I got with commit 726d6e71aee5d974
(nntp: small speed up for multi-line responses, 2020-12-04)

20 months agods: move requeue_once
Eric Wong [Sat, 23 Jul 2022 04:41:50 +0000 (04:41 +0000)]
ds: move requeue_once

It's the same subroutine everywhere.

20 months agods: move no-op ->zflush to common base class
Eric Wong [Sat, 23 Jul 2022 04:41:49 +0000 (04:41 +0000)]
ds: move no-op ->zflush to common base class

More deduplication, and POP3 never needed it.

20 months agods: support greeting protocols
Eric Wong [Sat, 23 Jul 2022 04:41:48 +0000 (04:41 +0000)]
ds: support greeting protocols

We can share some common code between IMAP, NNTP, and POP3
without too much trouble, so cut down our LoC.

20 months agonntp: remove more() wrapper
Eric Wong [Sat, 23 Jul 2022 04:41:47 +0000 (04:41 +0000)]
nntp: remove more() wrapper

Using PublicInbox::DS->msg_more directly can avoid unnecessary
CoW memory traffic since there's no appending "\r\n".

20 months agonntp: start adding CRLF to responses natively
Eric Wong [Sat, 23 Jul 2022 04:41:46 +0000 (04:41 +0000)]
nntp: start adding CRLF to responses natively

With IMAP and POP3, I've started to embed CRLF into constant
response codes to avoid triggering CoW and extra memory traffic
in Perl.

The end goal is to enable more code sharing between IMAP, NNTP,
and POP3 inside one -netd process.

20 months agonntp: pass regexp to split() callers
Eric Wong [Sat, 23 Jul 2022 04:41:45 +0000 (04:41 +0000)]
nntp: pass regexp to split() callers

Current implementations of Perl5 don't have optimizations for
single-character field separators.

20 months agopop3: drop File::FcntlLock requirement for FreeBSD and Linux
Eric Wong [Thu, 21 Jul 2022 05:36:12 +0000 (05:36 +0000)]
pop3: drop File::FcntlLock requirement for FreeBSD and Linux

I know Linux has a stable ABI for this, and FreeBSD seems to,
too (*BSDs don't have stable syscall numbers, though).
I suspect this is safe enough for all *BSDs.

This is stricter than the MboxLock one since we use exact byte
ranges with these locks.

20 months agowww: note "x=m" and "t=1" (mis)use for GET requests
Eric Wong [Wed, 20 Jul 2022 22:57:07 +0000 (22:57 +0000)]
www: note "x=m" and "t=1" (mis)use for GET requests

We require "x=m" (requests for mboxes) to be POST requests to
avoid unnecessary traffic from crawlers.  "t=1" only collapses
threads in the summary view, which isn't normally accessible
from <form> elements.

This also fixes the missing "[summary|nested]" element when
"x=m" is used.

20 months agogcf2: avoid excessive checks for unlinked files
Eric Wong [Wed, 20 Jul 2022 18:01:28 +0000 (18:01 +0000)]
gcf2: avoid excessive checks for unlinked files

We were misusing the timer and not expiring it before checking
for unlinked files.  Now, we check for unlinked files every 60s,
instead.

20 months agopop3: advertise STLS in CAPA if appropriate
Eric Wong [Wed, 20 Jul 2022 09:24:13 +0000 (09:24 +0000)]
pop3: advertise STLS in CAPA if appropriate

This is documented in RFC 2595, and POP3 clients may rely on
seeing "STLS" in CAPA output to initiate TLS negotiation.

20 months agonetd: setup TLS bits for well-known STARTTLS ports
Eric Wong [Wed, 20 Jul 2022 09:24:12 +0000 (09:24 +0000)]
netd: setup TLS bits for well-known STARTTLS ports

Unfortunately, I can't think of an easy way to test this in
our test suite since binding these ports are privileged and
are often in use, anyways.

20 months agopop3: TOP requests do not expire messages
Eric Wong [Wed, 20 Jul 2022 09:24:11 +0000 (09:24 +0000)]
pop3: TOP requests do not expire messages

RFC 2449 only documents "EXPIRE 0" behavior for RETR requests
which fetch the whole message.  TOP requests only fetch
the headers and top $N lines of the body, so it's probably
harmful for deletions to be triggered in those cases.

20 months agopop3: implement IN-USE from RESP-CODES (RFC 2449)
Eric Wong [Wed, 20 Jul 2022 09:24:10 +0000 (09:24 +0000)]
pop3: implement IN-USE from RESP-CODES (RFC 2449)

This may help clients communicate to users if they're
making parallel connections or if we have server bugs.

20 months agopublic-inbox-pop3d - a mostly read-only POP3 server
Eric Wong [Wed, 20 Jul 2022 09:24:09 +0000 (09:24 +0000)]
public-inbox-pop3d - a mostly read-only POP3 server

Old account expiry has not been implemented, but it seems to
work well with both mpop(1) and getmail(1).  The strictness of
mpop was particularly helpful in ironing out bugs in our
implementation of (dreaded) message sequence numbers.

"EXPIRE 0" (RFC 2449) can theoretically save numerous "DELE"
commands, but that's untested by real-world clients.  mpop
supports PIPELINING which is effective in hiding latency,
and the core networking functionality is already well-tested
from our NNTP and IMAP implementations.

Configuration requires "publicinbox.pop3state" to point to
a directory writable by the otherwise read-only daemon.
See public-inbox-pop3d(1) manpage for more usage details.

20 months agonetd: load modules for well-known ports
Eric Wong [Wed, 20 Jul 2022 01:22:04 +0000 (01:22 +0000)]
netd: load modules for well-known ports

When inheriting well-known ports from systemd (or similar),
we can auto-load the proper *D.pm file based on the port number
without requiring command-line args.

load_mod also gets fixed to use its argument, instead of implicit
$1 since that won't work for our well-known.

20 months agolei note-event: inline note_event_arm_done
Eric Wong [Tue, 19 Jul 2022 22:42:53 +0000 (22:42 +0000)]
lei note-event: inline note_event_arm_done

This was a single-caller sub since 47d4e53734820b4e
(lei_mail_sync: rely on flock(2), avoid IPC, 2021-09-18)
and unlikely to be used further, so inline it and save
a few KB of memory.

20 months agolei: avoid deadlock on inotify/EVFILT_VNODE wakeups
Eric Wong [Tue, 19 Jul 2022 22:42:52 +0000 (22:42 +0000)]
lei: avoid deadlock on inotify/EVFILT_VNODE wakeups

Enqueuing "note-event" requests from the DS event loop must
not wait on workers being able to drain the queue quickly
enough.  Thus we make the SOCK_SEQPACKET writes nonblocking
and rely on the lei-daemon event loop to enqueue writes.

This is a unique problem for "note-event" since it reuses
workers in between commands, while most lei commands currently
fork off new workers.

21 months agosearchidx: skip "delta $N" sections for base-85
Eric Wong [Tue, 19 Jul 2022 02:36:04 +0000 (02:36 +0000)]
searchidx: skip "delta $N" sections for base-85

I don't deal with binary patches ever, so I failed to notice
binary deltas are supported in addition to the more common
literals.

A quick check of apply.c in git.git confirms "delta" and
"literal" are the only binary patch classes we can expect.

21 months agotest_common: avoid uninitialized warning on readlink
Eric Wong [Sat, 9 Jul 2022 08:08:57 +0000 (08:08 +0000)]
test_common: avoid uninitialized warning on readlink

Of course, waiting for inotify to become active can't rely on
inotify, so we need to do a busy loop here, instead...

21 months agoimap: STATUS: count messages properly
Eric Wong [Fri, 8 Jul 2022 11:36:37 +0000 (11:36 +0000)]
imap: STATUS: count messages properly

This only affects the rarely-used STATUS command, our message
count was consistely zero due to misusing ->imap_exists.

Noticed while implementing POP3 server.

21 months agolei: track seen messages to note duplicates
Eric Wong [Thu, 7 Jul 2022 09:40:30 +0000 (09:40 +0000)]
lei: track seen messages to note duplicates

This may help track down deduplication or other bugs in lei
which lead to occasionally missing messages.

Link: https://public-inbox.org/meta/CAL_JsqJH8xx_2NyZffNsRXbGXiv3kjmCETvKXt3Yfb0uToLm9Q@mail.gmail.com/
21 months agolei_xsearch: simplify lei/store import check
Eric Wong [Thu, 7 Jul 2022 09:40:29 +0000 (09:40 +0000)]
lei_xsearch: simplify lei/store import check

There's no need to check for two fields when one will suffice.

21 months agotree-wide: Fix typo accomodate
Uwe Kleine-König [Fri, 1 Jul 2022 14:06:18 +0000 (16:06 +0200)]
tree-wide: Fix typo accomodate

This was pointed out by the Debian package linter "lintian".

21 months agotree-wide: Fix typo likelyhood
Uwe Kleine-König [Fri, 1 Jul 2022 14:04:20 +0000 (16:04 +0200)]
tree-wide: Fix typo likelyhood

This was pointed out by the Debian package linter "lintian".

21 months agosearchthread: delete children early while ordering
Eric Wong [Wed, 22 Jun 2022 08:02:53 +0000 (08:02 +0000)]
searchthread: delete children early while ordering

This allows us to free up some memory sooner rather than later
in case ordersub is expensive.

21 months agosearchthread: remove + inline single-use cast sub
Eric Wong [Wed, 22 Jun 2022 08:02:52 +0000 (08:02 +0000)]
searchthread: remove + inline single-use cast sub

No point in wasting several kilobytes of memory for a single-use
one-line sub.

21 months agodoc: lei-q: regenerate for patchid: help
Eric Wong [Wed, 22 Jun 2022 07:47:59 +0000 (07:47 +0000)]
doc: lei-q: regenerate for patchid: help

21 months agosearch: add help for patchid: prefix
Eric Wong [Tue, 21 Jun 2022 10:37:50 +0000 (10:37 +0000)]
search: add help for patchid: prefix

Noticed-by: Kyle Meyer <kyle@kyleam.com>
21 months agosearch: do not index base-85 binary patches
Eric Wong [Mon, 20 Jun 2022 19:27:30 +0000 (19:27 +0000)]
search: do not index base-85 binary patches

Base-85 binary patches generated by git lead to many false
positives, so skip over gibberish words which may occur in them.
To avoid regressions in search results, continue to allow
searching for exact size matches (via "literal $SIZE") and the
phrase "GIT binary patch" for the mere presence of a binary
patch.

21 months agosearch: support "patchid:" prefix (git patch-id --stable)
Eric Wong [Mon, 20 Jun 2022 19:27:29 +0000 (19:27 +0000)]
search: support "patchid:" prefix (git patch-id --stable)

This allows easy searching via patch-id from a git commit.

Currently, abbreviations are not supported, and it seems
needless to support them since AFAIK (git) doesn't generate
nor resolve abbreviated patch-ids anywhere.

21 months agosearchidx: use regexp as first arg for `split' op
Eric Wong [Mon, 20 Jun 2022 19:27:28 +0000 (19:27 +0000)]
searchidx: use regexp as first arg for `split' op

Current implementations of Perl5 don't have optimizations for
single-character field separators (unlike another non-Perl5 VM
I'm familiar with).

22 months agot/spawn: Find invalid PID to try to join its process group
Thiago Jung Bauermann [Fri, 10 Jun 2022 15:39:18 +0000 (12:39 -0300)]
t/spawn: Find invalid PID to try to join its process group

In the container used to build packages of the GNU Guix distribution, PID 1
runs as the same user as the test so this spawn that should fail actually
succeeds.

Fix the problem by going through different PIDs and picking one that
either doesn't exist or we aren't allowed to signal.

22 months agoAdd EditorConfig file
Thiago Jung Bauermann [Fri, 10 Jun 2022 15:51:43 +0000 (12:51 -0300)]
Add EditorConfig file

This allows several editors to automatically use the correct settings when
editing public-inbox files.

[ew: add to MANIFEST, too]

22 months agoview: do not escape first `@' in mailto: URLs
Eric Wong [Thu, 9 Jun 2022 17:53:53 +0000 (17:53 +0000)]
view: do not escape first `@' in mailto: URLs

It's probably not a perfect match for RFC 6068 atm, but perfect
is the enemy of good.

Reported-by: Moritz Poldrack <moritz@poldrack.dev>
Link: https://public-inbox.org/meta/CKJSWGSZFKMX.3VUSIYE955Z9X@Archetype/
23 months agoimapd: update comment for PublicInbox::ConfigIter
Eric Wong [Fri, 13 May 2022 00:40:38 +0000 (00:40 +0000)]
imapd: update comment for PublicInbox::ConfigIter

config enumeration was split out to a separate class a long time ago.

23 months agoimap: remove unused args_ok sub
Eric Wong [Fri, 13 May 2022 00:40:37 +0000 (00:40 +0000)]
imap: remove unused args_ok sub

Noticed while reviewing pieces for POP3.

23 months agodaemon: fix uninitialized variable
Eric Wong [Sun, 8 May 2022 22:10:31 +0000 (22:10 +0000)]
daemon: fix uninitialized variable

And also replace an unnecessary substitution (s///) op with a
match (m//).

Fixes: 93a7b219d58aad86 ("public-inbox-netd: a multi-protocol superserver")
23 months agodoc: add missing "be" for --key description
Eric Wong [Sat, 7 May 2022 00:10:07 +0000 (00:10 +0000)]
doc: add missing "be" for --key description

Link: https://public-inbox.org/meta/87levfv7hs.fsf@kyleam.com/
Noticed-by: Kyle Meyer <kyle@kyleam.com>
23 months agopublic-inbox-netd: a multi-protocol superserver
Eric Wong [Thu, 5 May 2022 10:52:15 +0000 (10:52 +0000)]
public-inbox-netd: a multi-protocol superserver

Since we'll be adding POP3 support as our 4th network protocol;
asking admins to run yet another daemon on top of existing
-httpd, -nntpd, -imapd is a maintenance burden and a waste of
memory.

The goal of public-inbox-netd is to be able to replace all
existing read-only daemons with a single process to save memory
and reduce administrative overhead; hopefully encouraging more
users to self-host their own mirrors.

It's barely-tested at the moment.  Eventually, multiple
PI_CONFIG and HOME directories will be supported, as are
per-listener .psgi config files.

23 months agolei import: add label completions (+L:$LABEL)
Eric Wong [Mon, 2 May 2022 18:10:07 +0000 (18:10 +0000)]
lei import: add label completions (+L:$LABEL)

This can probably be added for "lei q", too, but we typically
import first.  Labels can probably be made persistent on a
per-folder basis in the future.

23 months agolei_view_text: remove all CR before LF
Eric Wong [Mon, 2 May 2022 09:04:02 +0000 (09:04 +0000)]
lei_view_text: remove all CR before LF

This deals with CR-CR-LF messages, matching the HTML change in
7ee3643af9b72cad (view: remove all CR before LF, 2022-02-11)

23 months agolei refresh-mail-sync: filter NNTP(S) from --all
Eric Wong [Sat, 30 Apr 2022 21:29:30 +0000 (21:29 +0000)]
lei refresh-mail-sync: filter NNTP(S) from --all

We currently do not support refresh from NNTP since deletes are
rare with public-inbox NNTP servers; but traditional Usenet
servers do delete/expire messages and we should probably support
that at some point.

23 months agolei: improve diagnosis of errors from children
Eric Wong [Sat, 30 Apr 2022 21:04:12 +0000 (21:04 +0000)]
lei: improve diagnosis of errors from children

Not 100% sure what's going on, but maybe this helps.

23 months agolei: move to v5.12 to avoid "use strict"
Eric Wong [Sat, 23 Apr 2022 22:03:41 +0000 (22:03 +0000)]
lei: move to v5.12 to avoid "use strict"

Socket.pm still loads strict.pm, unfortunately, which hurts
startup time; but we'll save some LoC this way.

23 months agoMakefile.PL: various updates for new versions
Eric Wong [Sat, 23 Apr 2022 22:03:40 +0000 (22:03 +0000)]
Makefile.PL: various updates for new versions

We'll still stick to v5.10.1, mainly, but use v5.12 in a few places...

23 months agopublic-inbox 1.8.0 v1.8.0
Eric Wong [Sat, 23 Apr 2022 08:15:18 +0000 (08:15 +0000)]
public-inbox 1.8.0

23 months agodoc: update 1.8 WIP release notes
Eric Wong [Wed, 6 Apr 2022 00:19:21 +0000 (00:19 +0000)]
doc: update 1.8 WIP release notes

23 months agolei: commit store on interrupted partial imports
Eric Wong [Thu, 21 Apr 2022 11:59:06 +0000 (11:59 +0000)]
lei: commit store on interrupted partial imports

This change prevents lingering shard and git-fast-import
processes from remaining after interrupted "lei import" (and
similar).  It also reduces the likelyhood of data-loss in case
of subsequent abnormal termination of the daemon.

I think this is the least surprising way to handle users
prematurely aborting imports or other similar operations which
write to lei/store and will result in reduced bandwidth waste
for users with intermittent connections.  This is because the
lei/store processes may be shared by parallel "lei import"
callers, and commits done by any "lei import" caller will
inevitably trigger writes for all of them.

2 years agosyscall: golf + more idiomatic buffer initialization
Eric Wong [Mon, 18 Apr 2022 09:50:04 +0000 (09:50 +0000)]
syscall: golf + more idiomatic buffer initialization

While `vec' is useful for user-supplied buffers to avoid excess
memory traffic, but provides no benefit when we need to allocate
our own buffers as we do in nodatacow_fh, since Perl can't elide
memset(ptr, 0, len).  So just use the idiomatic `"\0" x $LEN' here.

2 years agolei: wire up pure Perl sendmsg/recvmsg for Linux users
Eric Wong [Mon, 18 Apr 2022 09:50:03 +0000 (09:50 +0000)]
lei: wire up pure Perl sendmsg/recvmsg for Linux users

This enables lei-daemon to work without Inline::C nor
Socket::MsgHdr installed.  Prior to this, only the `lei' client
was using the pure Perl implementation.  Either C implementation
is still marginally faster, however.

2 years agosyscall: more idiomatic cmsghdr space allocation
Eric Wong [Mon, 18 Apr 2022 09:50:02 +0000 (09:50 +0000)]
syscall: more idiomatic cmsghdr space allocation

Since we know the space required under Linux, we can use the
same initialization as the Inline::C version instead of
hard-coding 256 as we do for Socket::MsgHdr.

2 years agolei: clobber recvmsg buffer on errors
Eric Wong [Mon, 18 Apr 2022 09:50:01 +0000 (09:50 +0000)]
lei: clobber recvmsg buffer on errors

It will be necessary when we drop the Inline::C requirement
since the pure Perl Linux syscall recvmsg implementation.

This likely would've caused errors for Socket::MsgHdr users
without Inline::C, but I haven't tested it since it's a rare
configuration.

2 years agolei_mail_sync: explicit bind for old SQL_VARCHAR compat
Eric Wong [Mon, 18 Apr 2022 09:44:01 +0000 (09:44 +0000)]
lei_mail_sync: explicit bind for old SQL_VARCHAR compat

This avoids repeated work for incremental "lei import" runs when
users upgrade from 1.7 to current public-inbox.git (and eventually
1.8).

We need the explicit bind_param for fallback calls because
previous bind_param calls are "sticky" for a given statement
handle.  The DBI(3pm) manpage states:

  The data type is 'sticky' in that bind values passed to execute()
  are bound with the data type specified by earlier bind_param()
  calls, if any.  Portable applications should not rely on being
  able to change the data type after the first "bind_param" call.

2 years agolei: always open mail_sync.sqlite3 R/W
Eric Wong [Tue, 5 Apr 2022 08:18:24 +0000 (08:18 +0000)]
lei: always open mail_sync.sqlite3 R/W

This will make transparently upgrading from 1.7.0 -> 1.8.x
easier.  Only a single user has access to mail_sync.sqlite3,
and R/W at the kernel-level is required for WAL, anyways.

2 years agoview: remove unused $end variable
Eric Wong [Sat, 2 Apr 2022 04:38:43 +0000 (04:38 +0000)]
view: remove unused $end variable

Noticed while looking at something else completely unrelated...

2 years agoexamples/unsubscribe.milter: RFC 8058 (List-Unsubscribe=One-Click)
Eric Wong [Sat, 2 Apr 2022 01:56:59 +0000 (01:56 +0000)]
examples/unsubscribe.milter: RFC 8058 (List-Unsubscribe=One-Click)

This allows unambiguous signaling to some MUAs and webmail clients
that th List-Unsubscribe header contains an instantaneous
unsubscribe option.

2 years agoexamples/unsubscribe.milter: use IO::Socket, again
Eric Wong [Sat, 2 Apr 2022 01:40:34 +0000 (01:40 +0000)]
examples/unsubscribe.milter: use IO::Socket, again

Sendmail::PMilter requires an IO::Socket object, not a GLOB.

Fixes: e901a56b3b30b22f (treewide: favor open(..., '+<&=', $fd), 2021-05-21)
2 years agolei_mail_sync: store OIDs and Maildir filenames as blobs
Eric Wong [Sat, 2 Apr 2022 01:13:52 +0000 (01:13 +0000)]
lei_mail_sync: store OIDs and Maildir filenames as blobs

DBD::SQLite doesn't seem to use SQL_BLOB automatically, which
can lead to ambiguity in some cases (especially interoperating
with other tools).

Downgrading to lei 1.7.0 will cause problems, but upgrading
appears transparent after weeks of tests.

2 years agolei_mail_sync: ensure URLs and folder names are stored as binary
Eric Wong [Sat, 2 Apr 2022 01:13:51 +0000 (01:13 +0000)]
lei_mail_sync: ensure URLs and folder names are stored as binary

Apparently leaving {sqlite_unicode} unset isn't enough, and
there's subtle differences where BLOBs are stored differently
than TEXT when dealing with binary data.  We also want to avoid
odd cases where SQLite will attempt to treat a number-like value
as an integer.

This should avoid problems in case non-UTF-8 URLs and pathnames are
used.  They'll automatically be upgraded if not, but downgrades
to older lei would cause duplicates to appear.

2 years agoTODO: add item for auto-detecting TLS files in daemons
Eric Wong [Fri, 1 Apr 2022 09:09:58 +0000 (09:09 +0000)]
TODO: add item for auto-detecting TLS files in daemons

I forgot to restart my -imapd and -nntpd instances on
public-inbox.org after the cert expired :x

2 years agodoc: add WIP release notes for 1.8
Eric Wong [Fri, 11 Mar 2022 05:21:43 +0000 (05:21 +0000)]
doc: add WIP release notes for 1.8

1.8 will be a minor release, soon (I initially expected to
release it in December, but was side-tracked).  Major features
will be for 1.9.

2 years agoviewdiff: use defined checks in more places
Eric Wong [Wed, 30 Mar 2022 19:53:02 +0000 (19:53 +0000)]
viewdiff: use defined checks in more places

It's less cognitive overhead for future readers since I just
looked at it again and thought it was possible for "0" to be returned
(it isn't).

2 years agosyscall: add sendmsg+recvmsg for remaining arches
Eric Wong [Wed, 23 Mar 2022 21:08:19 +0000 (21:08 +0000)]
syscall: add sendmsg+recvmsg for remaining arches

aarch64, ppc64le, sparc64, loongarch64, and mips (32-bit userspace)
are all tested via machines from the GCC Farm Project
<https://cfarm.tetaneutral.net/>

Remaining syscall numbers are from musl <https://musl.libc.org/>

2 years agosyscall: implement sendmsg+recvmsg in pure Perl
Eric Wong [Wed, 23 Mar 2022 08:54:35 +0000 (08:54 +0000)]
syscall: implement sendmsg+recvmsg in pure Perl

Socket::MsgHdr is only packaged for Debian and derivatives at
the moment, and Inline::C pulling in gcc/clang is a huge amount
of disk space and bandwidth for some users.

This enables disk space and/or bandwidth-limited users to use lei.

Only Linux guarantees a stable ABI and syscall numbers, but
that's the majority of our userbase.  FreeBSD users will still
have to use Inline::C (or get Socket::MsgHdr packaged).

x86, x32, and x86-64 are all currently supported, more to be added.

2 years agorecv_cmd: do not undef recvmsg buffer arg on errors
Eric Wong [Wed, 23 Mar 2022 08:54:34 +0000 (08:54 +0000)]
recv_cmd: do not undef recvmsg buffer arg on errors

It's a waste of ops and cycles, and inconsistent with perl
sysread() behavior which doesn't touch the supplied buffer on
errors.

2 years agosyscall: drop unused EEXIST import
Eric Wong [Wed, 23 Mar 2022 08:54:33 +0000 (08:54 +0000)]
syscall: drop unused EEXIST import

We've never used it, actually.

2 years agowww: loosen deep-linking prevention
Eric Wong [Tue, 15 Mar 2022 20:45:02 +0000 (20:45 +0000)]
www: loosen deep-linking prevention

Apparently some browsers can set a Referer: header which fails
to match.  I'm not certain why, but making "$schema://$HOST_PORT"
matches case-insensitive seems more correct regardless.

In case that doesn't work, we'll also allow bypassing deep-link
prevention via a POST form button.

Reported-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://public-inbox.org/meta/93ebfbd1-9924-481c-4edc-9b232d1e995c@suse.cz/
2 years agot/lei-sigpipe.t: ensure SIGPIPE is not ignored instead of not blocked
Julien Moutinho [Fri, 11 Mar 2022 10:42:34 +0000 (11:42 +0100)]
t/lei-sigpipe.t: ensure SIGPIPE is not ignored instead of not blocked

Ignoring a signal is different than blocking a signal, and the
"IgnoreSIGPIPE" option of systemd ignores.

[ew: note systemd behavior]

Acked-by: Eric Wong <e@80x24.org>
2 years agoindex|extindex: support --dangerous flag
Eric Wong [Mon, 7 Mar 2022 10:57:37 +0000 (10:57 +0000)]
index|extindex: support --dangerous flag

This enables Xapian::DB_DANGEROUS to support in-place updates.
This can speed up the initial index and reduce I/O at the cost
of preventing concurrent readers and being unsafe in the face of
any abnormal terminations.  This is more dangerous than
--no-fsync.  --no-fsync is only unsafe in the event of a power
loss or kernel crash; --dangerous is unsafe even on SIGKILL.

2 years agot/lei-sigpipe: ensure SIGPIPE is unblocked for this test
Eric Wong [Sun, 27 Feb 2022 11:17:14 +0000 (11:17 +0000)]
t/lei-sigpipe: ensure SIGPIPE is unblocked for this test

Tests run under systemd (and similar) have SIGPIPE blocked by
default.  This was causing this SIGPIPE test to get stuck when
run by automated builders used by Nix.  Thanks to Julien
Moutinho and Dominique Martinet for tracking down this failure.

Reported-by: Julien Moutinho <julm+public-inbox@sourcephile.fr>
Reported-by: Dominique Martinet <asmadeus@codewreck.org>
Link: https://public-inbox.org/meta/20220227080422.gyqowrxomzu6gyin@sourcephile.fr/
2 years agot/lei-sigpipe: attempt to improve diagnostics for stuck test
Eric Wong [Thu, 17 Feb 2022 21:02:33 +0000 (21:02 +0000)]
t/lei-sigpipe: attempt to improve diagnostics for stuck test

This may help diagnose a difficult-to-reproduce test failure on NixOS.

Link: https://public-inbox/meta/20211209013743.okzgim7bbrpahks7@sourcephile.fr/
2 years agogit: do not dereference undef as ARRAY ref
Eric Wong [Thu, 17 Feb 2022 20:27:12 +0000 (20:27 +0000)]
git: do not dereference undef as ARRAY ref

When aborting git processes, we must account for the lack of
inflight requests.

2 years agosharedkv: avoid ambiguity for numeric-like string keys
Eric Wong [Mon, 14 Feb 2022 05:37:25 +0000 (05:37 +0000)]
sharedkv: avoid ambiguity for numeric-like string keys

While we only store URLs and binary SHA-1/SHA-256 values in skv
at the moment, we may store potentially ambiguous keys/values in
the future.  It's possible to store "02" and have it treated as
`2' unless explicitly binding parameters as SQL_BLOB.  This
behavior was independent of the sqlite_unicode parameter as
evidenced by the new tests.

I only noticed this bug while hacking on another project using
DBD::SQLite, and not while hacking on public-inbox itself.

2 years agosharedkv: remove unused subs
Eric Wong [Mon, 14 Feb 2022 05:37:24 +0000 (05:37 +0000)]
sharedkv: remove unused subs

Some features didn't get used, and they're just getting in the
way of upcoming bugfixes.

2 years agot/lei-*watch: disable flaky tests by default for now
Eric Wong [Sun, 13 Feb 2022 21:01:59 +0000 (21:01 +0000)]
t/lei-*watch: disable flaky tests by default for now

Properly fixing these tests is too difficult for me at the
moment, so just disable these tests for now.  A proper fix and
fleshing out support for inotify will hopefully happen at some
point.

2 years agoview: remove all CR before LF
Eric Wong [Fri, 11 Feb 2022 20:22:17 +0000 (20:22 +0000)]
view: remove all CR before LF

While we've rendered CR-LF as LF-only in HTML for many years,
some messages end up as CR-CR-LF.  So strip ALL all CR bytes
preceding LF bytes, while preserving odd CR in the middle of
lines.

Reported-by: Thomas Weißschuh <thomas@t-8ch.de>
Link: https://public-inbox.org/meta/8d13668f-cac7-4984-bb4e-ad90502dc46d@t-8ch.de/
2 years agotest_lei: use consistent locale for error messages
Eric Wong [Tue, 1 Feb 2022 23:34:28 +0000 (23:34 +0000)]
test_lei: use consistent locale for error messages

git-config(1) error messages are locale-dependent, so follow
the lead taken by git's own test suite and set LC_ALL=C and LANG=C
to ensure error messages we check against are not localized.

Reported-by: Julien Moutinho <julm+public-inbox@sourcephile.fr>
2 years agosyscall: FS_IOC_*FLAGS: define on per-architecture basis
Eric Wong [Tue, 1 Feb 2022 01:27:50 +0000 (01:27 +0000)]
syscall: FS_IOC_*FLAGS: define on per-architecture basis

It turns out these Linux ioctls are unfortunately
architecture-dependent, and not endian-dependent.
Fixup some warning messages while we're at it, too.

Fixes: 14fa0abdcc7b6513 ("rewrite Linux nodatacow use in pure Perl w/o system")
Link: https://public-inbox.org/meta/YfdYqLhDVQRQ9NGT@codewreck.org/
Noticed-by: Dominique Martinet <asmadeus@codewreck.org>
2 years agosyscall: fallback to rename on renameat2 EINVAL
Dominique Martinet [Thu, 9 Dec 2021 02:50:51 +0000 (11:50 +0900)]
syscall: fallback to rename on renameat2 EINVAL

ZFS appears to incorrectly return EINVAL on renameat2 when the operation is not
supported:
renameat2(AT_FDCWD, "...", AT_FDCWD, "...", RENAME_NOREPLACE) = -1 EINVAL

Fall back to the racy rename in this case as well:

2 years agorewrite Linux nodatacow use in pure Perl w/o system
Eric Wong [Sun, 30 Jan 2022 21:49:08 +0000 (21:49 +0000)]
rewrite Linux nodatacow use in pure Perl w/o system

btrfs is Linux-only at the moment (and likely to remain that way
for practical purposes).  So rely on Linux ABI stability and use
the `syscall' and `ioctl' perlops rather than relying on Inline::C.
Inline::C (and gcc||clang) are monstrous dependencies which we
can't expect users to have.

This makes supporting new architectures more difficult, but new
architectures come along rarely and this reduces the burden for
the majority of Linux users on popular architectures (while
still avoiding the distribution of pre-built binaries).

Link: https://public-inbox.org/meta/YbCPWGaJEkV6eWfo@codewreck.org/
2 years agohttp: don't send chunk finalizer on HEAD responses
Eric Wong [Sun, 30 Jan 2022 22:31:34 +0000 (22:31 +0000)]
http: don't send chunk finalizer on HEAD responses

AFAIK this doesn't affect Varnish or nginx users, but those
should eventually become optional dependencies.

2 years agot/eml.t: ignore newer Email::MIME behavior
Eric Wong [Thu, 30 Dec 2021 19:17:42 +0000 (19:17 +0000)]
t/eml.t: ignore newer Email::MIME behavior

Once again, our message parser class matches the more tolerant
behavior of older Email::MIME releases in order to handle
ancient messages.

This fixes <https://bugs.debian.org/1002219>, but dropping
Email::MIME entirely from the test suite may be prudent in
the future.

2 years agoMakefile.PL: fix useless use of push
Eric Wong [Mon, 6 Dec 2021 20:55:59 +0000 (20:55 +0000)]
Makefile.PL: fix useless use of push

2 years agoeliminate some unused subs
Eric Wong [Wed, 24 Nov 2021 15:45:39 +0000 (15:45 +0000)]
eliminate some unused subs

->newsgroup_matches was never used, and ->shard_over_check
was dropped in 89193578d21f (extindex: --gc checkpoints, 2021-10-06).

2 years agolei: always use 3-arg open perlop
Eric Wong [Mon, 22 Nov 2021 18:38:09 +0000 (18:38 +0000)]
lei: always use 3-arg open perlop

Future-proofing in case future versions of Perl warn on this, since
2-arg forms of open may be subject to injection vulnerabilities
with non-literal args.

2 years agospawn: avoid C++ keyword `try'
Eric Wong [Mon, 22 Nov 2021 18:16:32 +0000 (18:16 +0000)]
spawn: avoid C++ keyword `try'

This is future-proofing in case we build against Xapian directly
in the future, which would require a C++ compiler.

2 years agosearchidx: avoid modification of read-only `$_'
Eric Wong [Mon, 22 Nov 2021 18:23:52 +0000 (18:23 +0000)]
searchidx: avoid modification of read-only `$_'

This fixes the "Modification of a read-only value attempted at ..."
error in an initial run of t/reindex-time-range.t.  It was
reproducible by running `rm -rf t/data-gen/reindex-time-range.v*'
before `make && prove -bvw t/reindex-time-range.t'.  Thanks to
Jörg Rödel for providing the backtrace which helped find this.

Debugged-by: Jörg Rödel <joro@8bytes.org>
Link: https://public-inbox.org/meta/YZuZEY+WSnm4wlrS@8bytes.org/
2 years agot/lei-mirror: skip lei comparisons if lei missing
Eric Wong [Mon, 22 Nov 2021 07:42:41 +0000 (07:42 +0000)]
t/lei-mirror: skip lei comparisons if lei missing

We can't compare created_at times with lei if lei tests are
skipped due to Inline::C or Socket::MsgHdr unavailability.

Reported-by: Jörg Rödel <joro@8bytes.org>
Link: https://public-inbox.org/meta/YZebmAxlFJy4lqAw@8bytes.org/
2 years agolei forget-search: add help for --prune
Eric Wong [Fri, 12 Nov 2021 11:08:57 +0000 (11:08 +0000)]
lei forget-search: add help for --prune

This enables tab-completion, since I'm using --prune quite a bit
and my fingers are about to fall off :<

2 years agot/lei-watch: test with with higher sleep
Eric Wong [Wed, 10 Nov 2021 10:33:16 +0000 (10:33 +0000)]
t/lei-watch: test with with higher sleep

0.1s may not be enough for a task switch and inotify wakeup,
so try doubling it and see if it fixes test reliability, for
now.  A future change may be to implement a watcher/tracer
for inotify -> lei/store events.

Link: https://public-inbox.org/meta/20211104134327.zrf5jijfz7dsvb7l@meerkat.local/
2 years agolei q: make HTTP(S) query strings even less ugly
Eric Wong [Wed, 10 Nov 2021 10:28:37 +0000 (10:28 +0000)]
lei q: make HTTP(S) query strings even less ugly

Following commit 57fed2e4b78ed394 (lei: normalize whitespace in
remote queries, 2021-09-11), leaving the trailing `\n' from
stdin queries to be normalized to ` ' (SP) causes it to appear
as `+' in URLs, which Xapian ignores.

2 years agolei q: disallow "\n" in argv[] elements
Eric Wong [Wed, 10 Nov 2021 10:28:37 +0000 (10:28 +0000)]
lei q: disallow "\n" in argv[] elements

I don't expect this to be hit in real-world use via normal
interactive shells.  However, somebody could accidentally add
"\n" in languages (e.g. Perl, C) where it's easy to pass "\n"
in argv[].