Eric Wong [Sun, 20 Jan 2019 13:46:28 +0000 (13:46 +0000)]
solver: force quoted-printable bodies to LF
..if the Email::MIME ->crlf is LF.
Email::MIME::Encodings forces everything to CRLF on
quoted-printable messages for RFC-compliance; and
git-apply --ignore-whitespace seems to miss a context
line which is just "\r\n" (w/o leading space).
Eric Wong [Sun, 20 Jan 2019 10:42:07 +0000 (10:42 +0000)]
viewdiff: cleanup state transitions a bit
This makes things less error-prone and allows us to only
highlight the "@@ -\S+ \+\S+ @@" part of the hunk header
line, without highlighting the function context.
This more closely matches the coloring behavior of git-diff(1)
Eric Wong [Sat, 19 Jan 2019 21:57:26 +0000 (21:57 +0000)]
solver: restore diagnostics and deal with CRLF
Apparently Email::MIME returns quoted-printable text
with CRLF. So use --ignore-whitespace with git-apply(1)
and ensure we don't capture '\r' in pathnames from
those emails.
And restore "$@" dumping when we die while solving.
Eric Wong [Sat, 19 Jan 2019 08:54:42 +0000 (08:54 +0000)]
view: enforce trailing slash for /$INBOX/$OID/s/ endpoints
As with our use of the trailing slash in $MESSAGE_ID/T/ and
'$MESSAGE_ID/t/' endpoints, this for 'wget -r --mirror'
compatibility as well as allowing sysadmins to quickly stand up
a static directory with "index.html" in it to reduce load.
Eric Wong [Sat, 19 Jan 2019 08:27:44 +0000 (08:27 +0000)]
solver: add a TODO note about making this fully evented
Applying a 100+ patch series can be a pain and lead to a wayward
client monopolizing the connection. On the other hand, we'll
also need to be careful and limit the number of in-flight file
descriptors and parallel git-apply processes when we move to an
evented model, here.
Eric Wong [Sat, 19 Jan 2019 05:25:30 +0000 (05:25 +0000)]
solver: switch patch application to use a callback
A bit messy at the moment, but we need to break this up
into smaller steps for fairness with other clients, as
applying dozens of patches can take several hundred
milliseconds.
Eric Wong [Fri, 18 Jan 2019 11:17:58 +0000 (11:17 +0000)]
solver: operate directly on git index
No need to incur extra I/O traffic with a working-tree and
uncompressed files on the filesystem. git can handle patch
application in memory and we rely on exact blob matching
anyways, so no need for 3way patch application.
Eric Wong [Thu, 17 Jan 2019 11:50:57 +0000 (11:50 +0000)]
solver: various bugfixes and cleanups
Remove the make_path dependency and call mkdir directly.
Capture mode on new files, avoid referencing non-existent
functions and enhance the debug output for users to read.
Eric Wong [Fri, 18 Jan 2019 05:27:51 +0000 (05:27 +0000)]
git: check saves error on disambiguation
This will be useful for disambiguating short OIDs in older
emails when abbreviations were shorter.
Tested against the following script with /path/to/git.git
==> t.perl <==
use strict;
use PublicInbox::Git;
use Data::Dumper;
my $dir = shift or die "Usage: $0 GIT_DIR # (of git.git)";
my $git = PublicInbox::Git->new($dir);
my @res = $git->check('dead');
print Dumper({res => \@res, err=> $git->last_check_err});
@res = $git->check('5335669531d83d7d6c905bcfca9b5f8e182dc4d4');
print Dumper({res => \@res, err=> $git->last_check_err});
Eric Wong [Tue, 15 Jan 2019 08:22:41 +0000 (08:22 +0000)]
solver: initial Perl implementation
This will lookup git blobs from associated git source code
repositories. If the blobs can't be found, an attempt to
"solve" them via patch application will be performed.
Eventually, this may become the basis of a type-agnostic
frontend similar to "git show"
Eric Wong [Sat, 19 Jan 2019 02:10:09 +0000 (02:10 +0000)]
hval: force monospace for <form> elements, too
Same reasoning as commit 7b7885fc3be2719c068c0a2fc860d53f17a1d933,
because GUI browsers have a tendency to use a different
font-family (and thus different size) as the rest of the page.
Eric Wong [Fri, 18 Jan 2019 12:50:59 +0000 (12:50 +0000)]
git: git_unquote handles double-quote and backslash
We need to work with 0x22 (double-quote) and 0x5c (backslash);
even if they're oddball characters in filenames which wouldn't
be used by projects I'd want to work on.
Eric Wong [Fri, 18 Jan 2019 19:40:07 +0000 (19:40 +0000)]
t/git.t: avoid passing read-only value to git_unquote
Older versions of Perl (tested 5.14.2 on Debian wheezy(*),
reported by Konstantin on Perl 5.16.3) considered the result of
concatenating two string literals to be a constant value.
(*) not that other stuff works on wheezy, but t/git.t should.
Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Eric Wong [Wed, 16 Jan 2019 04:51:28 +0000 (04:51 +0000)]
nntp: header responses use CRLF consistently
Alpine is apparently stricter than other clients I've tried
w.r.t. using CRLF for headers. So do the same thing we do for
bodies to ensure we only emit CRLFs and no bare LFs.
Reported-by: Wang Kang <i@scateu.me>
https://public-inbox.org/meta/alpine.DEB.2.21.99.1901161043430.29788@la.scateu.me/
Eric Wong [Wed, 9 Jan 2019 11:43:26 +0000 (11:43 +0000)]
config: inbox name checking matches git.git more closely
Actually, it turns out git.git/remote.c::valid_remote_nick
rules alone are insufficient. More checking is performed as
part of the refname in the git.git/refs.c::check_refname_component
I also considered rejecting URL-unfriendly inbox names entirely,
but realized some users may intentionally configure names not
handled by our WWW endpoint for archives they don't want
accessible over HTTP.
Eric Wong [Tue, 15 Jan 2019 02:42:09 +0000 (02:42 +0000)]
git_unquote: perform modifications in-place
This function doesn't have a lot of callers at the moment so
none of them are affected by this change. But the plan is to
use this in our WWW code for things, so do it now before we
call it in more places.
Results from a Thinkpad X200 with a Core2Duo P8600 @ 2.4GHz:
Note: I mainly care about unquoted performance because
that's the common case for the target audience of public-inbox.
Script used to get benchmark results against the Linux source tree:
==> bench_unquote.perl <==
use strict;
use warnings;
use Benchmark ':hireswallclock';
my $nr = 50;
my %GIT_ESC = (
a => "\a",
b => "\b",
f => "\f",
n => "\n",
r => "\r",
t => "\t",
v => "\013",
);
Eric Wong [Thu, 10 Jan 2019 21:41:55 +0000 (21:41 +0000)]
Merge commit 'mem'
* commit 'mem':
view: more culling for search threads
over: cull unneeded fields for get_thread
searchmsg: remove unused fields for PSGI in Xapian results
searchview: drop unused {seen} hashref
searchmsg: remove Xapian::Document field
searchmsg: get rid of termlist scanning for mid
httpd: remove psgix.harakiri reference
Eric Wong [Thu, 10 Jan 2019 03:26:15 +0000 (03:26 +0000)]
t/v2writable.t: force more consistent "git log" output
This should probably use lower-level git plumbing, but until
then, consistently add a bunch of --no-* options to "git log"
to get more consistent output.
Noticed-by: Johannes Berg
https://public-inbox.org/meta/1538164205.14416.76.camel@sipsolutions.net/
Eric Wong [Tue, 8 Jan 2019 11:13:33 +0000 (11:13 +0000)]
over: cull unneeded fields for get_thread
On a certain ugly /$INBOX/$MESSAGE_ID/T/ endpoint with 1000
messages in the thread, this cuts memory usage from 2.5M to 1.9M
(which still isn't great, but it's a start).
Eric Wong [Tue, 8 Jan 2019 11:13:31 +0000 (11:13 +0000)]
searchmsg: remove unused fields for PSGI in Xapian results
These fields are only necessary in NNTP and not even stored in
Xapian; so keeping them around for the PSGI web UI search
results wastes nearly 80K when loading large result sets.
Eric Wong [Tue, 8 Jan 2019 11:13:26 +0000 (11:13 +0000)]
httpd: remove psgix.harakiri reference
We don't need to set "psgix." extension fields for things
we don't support. This saves 138 bytes per-client in $env
as measured by Devel::Size::total_size
Eric Wong [Tue, 8 Jan 2019 00:41:12 +0000 (00:41 +0000)]
view: stop storing all MIME objects on large threads
While we try to discard the $smsg (SearchMsg) objects quickly,
they remain referenced via $node (SearchThread::Msg) objects,
which are stored forever in $ctx->{mapping} to cull redundant
words out of subjects in the thread skeleton.
This significantly cuts memory bloat with large search results
with '&x=t'. Now, the search results overhead of
SearchThread::Msg and linked objects are stable at around 350K
instead of ~7M per response in a rough test (there's more
savings to be had in the same areas).
Several hundred kilobytes is still huge and a large per-client
cost; but it's far better than MEGABYTES per-client.
Eric Wong [Sat, 5 Jan 2019 10:41:15 +0000 (10:41 +0000)]
filter/rubylang: fix SQLite DB lifetime problems
Clearly the AltId stuff was never tested for v2. Ensure
this tricky filter (which reuses Msgmap to avoid introducing
new serial numbers) doesn't trigger deadlocks SQLite due
to opening a DB for writing multiple times.
I went through several iterations of this change before
going with this one, which is the least intrusive I could
fine.
Eric Wong [Fri, 4 Jan 2019 11:53:02 +0000 (11:53 +0000)]
t/cgi.t: remove more redundant tests
Most of these test cases are in t/plack.t, already; and that
runs much faster. Just ensure the slashy corner case and search
stuff works. While we're at it, avoid using the
public-inbox-index command and just use the internal API to
index.
Eric Wong [Wed, 2 Jan 2019 00:50:55 +0000 (00:50 +0000)]
t/v2reindex: use the larger text to increase test reliability
libxapian30:amd64 1.4.9-1 on Debian sid seems to give an 8KB
position.glass database with "hello world" as the document
regardless of our indexlevel. Use the text of the AGPL-3.0 for
a more realisitic Xapian database size.
And perhaps tying our tests to the AGPL will make life more
difficult for would-be copyright violators :>
With the mboxrd downloaded, mutt is able to view them without
difficulty.
Note: this change would require reindexing of Xapian to pick up
the changes. But it's only two ancient messages, the first was
resent by the original sender and the second is too old to be
relevant.
Eric Wong [Wed, 26 Dec 2018 09:07:49 +0000 (09:07 +0000)]
tests: consolidate process spawning code.
IPC::Run provides a nice simplification in several places; and
we already use it (optionally) on a lot of tests.
For the non-test code, we still rely on our vfork-capable
Inline::C stuff since real-world server processes can get large
enough to where vfork is an advantage. Maybe Perl5 can use
CLONE_VFORK somehow, one day:
Eric Wong [Fri, 28 Dec 2018 06:22:55 +0000 (06:22 +0000)]
examples/cgit-commit-filter.lua: update URLs
Let's Encrypt is working out nicely, so we can rely on HTTPS,
now. Use 80x24.org instead of bogomips.org while we're at it,
since I don't think the latter will remain.
Eric Wong [Fri, 28 Dec 2018 10:16:11 +0000 (10:16 +0000)]
init: allow --skip of old epochs for -V2 repos
This allows archivists to publish incomplete archives with newer
mail while allowing "0.git" (or "1.git" and so on) epochs to be
added-after-the-fact (without affecting "git clone" followers).
A reindex will be necessary for Xapian and SQLite to catch up
once the old epochs are added; but the reindexing code is also
capable of tolerating missing epochs.
Eric Wong [Wed, 12 Dec 2018 23:18:13 +0000 (23:18 +0000)]
doc/hosted: add glibc and bug-gnulib mirrors
These have existed for a while, actually, so, we might as well
publicize them. While we're at it, add a disclaimer to
discourage reliance on single points of failure.
Eric Wong [Thu, 6 Dec 2018 02:40:06 +0000 (02:40 +0000)]
nntp: prevent event_read from firing twice in a row
When a client starts pipelining requests to us which trigger
long responses, we need to keep socket readiness checks disabled
and only enable them when our socket rbuf is drained.
Failure to do this caused aborted clients with
"BUG: nested long response" when Danga::Socket calls event_read
for read-readiness after our "next_tick" sub fires in the
same event loop iteration.
Reported-by: Jonathan Corbet <corbet@lwn.net>
cf. https://public-inbox.org/meta/20181013124658.23b9f9d2@lwn.net/