Eric Wong [Sat, 21 May 2016 03:03:16 +0000 (03:03 +0000)]
http: reduce over-buffering for getline responses
By switching to a "pull"-based I/O model for reading
application responses, we should be able to throttle
buffering to slow clients more effectively and avoid
wasting precious RAM.
This will also allow us to more Danga::Socket-specific
knowledge out of the PSGI application and keep it
confined to PublicInbox::HTTP.
Eric Wong [Thu, 19 May 2016 10:23:28 +0000 (10:23 +0000)]
msg_iter: workaround broken Email::MIME versions
Email::MIME >= 1.923 and < 1.935 would drop too many newlines
in attachments. This would lead to ugly text files without
a proper trailing newline if using quoted-printable, 7bit, or
8bit. Attachments encoded with base64 were not affected.
These versions of Email::MIME are widely available in Debian 8
(Jessie) and even Ubuntu LTS distros so we will need to support
this workaround for a while.
Eric Wong [Wed, 18 May 2016 20:30:31 +0000 (20:30 +0000)]
msg_iter: new internal API for iterating through MIME
Unlike Email::MIME::walk_parts, this is non-recursive and gives
depth + index offset information about the part for creating
links for later retrieval
It is intended for read-only access and changes are not
propagated to the parent; however future versions of it
may clobber bodies or the original version as it iterates
to reduce memory overhead.
It is intended for making it easy to locate attachments within a
message in the WWW view.
Eric Wong [Tue, 17 May 2016 08:16:47 +0000 (08:16 +0000)]
http: release resources when idle
This lets us release old git processes so unlinked packs
(leftover from repacking) can be released. This may also
be helpful for Xapian as indices get rebuilt for tuning.
For SQLite (msgmap), the there may be no benefit besides
reducing FD pressure.
Followup changes will unify the Inbox and NewsGroup
classes and allow better code-sharing between NNTP and
HTTP classes (as well as the planned POP3 class).
Eric Wong [Tue, 17 May 2016 05:39:06 +0000 (05:39 +0000)]
view: escape Message-ID for "next" link
Oops, we need to escape Message-IDs since they can contain
bad characters such as '%' in them. '@' actually seems fine
and does not need to be escaped; however, but we've been
doing it forever.
Eric Wong [Sun, 15 May 2016 06:31:50 +0000 (06:31 +0000)]
www: fix for running under mount paths
We try to avoid issues like these by using relative URLs
in hrefs, but we can't avoid the problem with Location:
for redirects and Atom feeds which are likely to be
rehosted elsewhere.
We also reorder some of the code to work around a weird
issue on the psgi-plack mailing list:
<20160516073750.GA11931@dcvr.yhbt.net>
(Somewhere on https://groups.google.com/group/psgi-plack
but it's probably not bookmarkable)
Eric Wong [Sat, 14 May 2016 06:10:36 +0000 (06:10 +0000)]
declare Inbox object for reusability
From the beginning, we've avoided objects here in favor
of faster startup time; but it may not be worth it
since a persistent httpd/nntpd is faster and -mda
isn't hit as often.
Eric Wong [Sun, 15 May 2016 23:30:06 +0000 (23:30 +0000)]
mbox: support /$INBOX/all.mbox.gz endpoint
Allows easily downloading the entire archive without
special tools. In any case, it's not yet advertised to via
HTML until we can test it better. It'll also support range
queries in the future to avoid wasting bandwidth.
Eric Wong [Sat, 14 May 2016 02:54:09 +0000 (02:54 +0000)]
nntp: use "newsgroup" instead of "name"
This reduces the cognitive overhead for mapping names of
configuration values to internal field names of our classes.
Further changes along these lines coming...
Eric Wong [Sat, 14 May 2016 02:17:47 +0000 (02:17 +0000)]
import ssoma-replay example script I've been using
Unfortunately, most users still prefer their mail delivered
over SMTP; so we'll at least document mlmmj integration for now
until we can popularize pull-based reading over POP3/NNTP/ssoma.
Eric Wong [Sat, 14 May 2016 01:45:29 +0000 (01:45 +0000)]
t/nntpd: test for wide characters and UTF-8 mangling
We'll need to test non-UTF-8 messages at some point, too.
There are lots of legacy-encoded messages in old archives
and I would not bet we behave sanely w.r.t. those.
Eric Wong [Sat, 14 May 2016 01:24:08 +0000 (01:24 +0000)]
t/nntpd: avoid fork+exec for search indexing
The Xapian search index is required for the NNTP server, so
there's no point in calling system() for it like we do in
other tests. This should speed up the test a small amount.
The raw, undecoded body is probably what should be sent over the
wire anyways for clients to deal with. We'll need this to avoid
deprecation warnings with Perl 5.24+ since we use
send()/recv()/sysread().
Eric Wong [Thu, 12 May 2016 09:06:56 +0000 (09:06 +0000)]
import: fallback to email if '<>' exists in author name
git doesn't handle '<' and '>' characters in the author
name at all regardless of quoting, not just matched pairs.
So fall back to using the email as the author name since
the commit info isn't critical, anyways (shallow clones
are fine).
Eric Wong [Tue, 3 May 2016 02:34:57 +0000 (02:34 +0000)]
git-http-backend: reduce memory use for clone/fetch
When serving large static files or large packs, we may call
Danga::Socket::write directly to queue up callbacks to resume
reading and defer firing them until the socket is writable.
This prevents us from scheduling writes or buffering until we
know the socket is writable and prevents needless buffering by
Danga::Socket when faced with slow clients.
For smart clones, this comes at the cost of throttling the
output of "git pack-objects" to the speed of the client
connection. This is probably not ideal, but is the behavior of
the standard git-daemon, too; and is preferable to running the
httpd out-of-memory. Buffering to the filesystem may be an
option in the future...
Eric Wong [Tue, 3 May 2016 02:52:23 +0000 (02:52 +0000)]
http: move empty string check into write callback
This empty string check is for middlewares such as Deflater
which may write empty strings, not for direct real callers of
Danga::Socket who (presumably) know what they're doing.
Eric Wong [Tue, 3 May 2016 06:20:54 +0000 (06:20 +0000)]
spawnpp: use native perl %ENV outside of mod_perl
We only need to use env(1) under mod_perl; since mod_perl
is uncommon nowadays, support native %ENV for a teeny
speedup for folks uncomfortable with running vfork via
Inline::C snippet.
Eric Wong [Mon, 2 May 2016 07:52:41 +0000 (07:52 +0000)]
t/*.t: reduce -mda calls
Process startup times are atrocious for fast tests and there's far
too much setup involved. Rely on git-fast-import instead; but
more work is needed in this area.
Eric Wong [Mon, 2 May 2016 04:22:40 +0000 (04:22 +0000)]
nntp: append Archived-At and List-Archive headers
For readers using NNTP, we should do our best to advertise the
clonable HTTP/HTTPS URLs and the message permalink URL for
ease-of-referencing messages, since we don't want the NNTP server
and it's sequential article numbers to be relied on.
Eric Wong [Sun, 1 May 2016 10:14:28 +0000 (10:14 +0000)]
daemon: reduce timer-related allocations
We can reduce the allocation and overhead needed for
Danga::Socket timers for immediately-executed responses by
combining identical timers and reducing anonymous sub creation.
Eric Wong [Sun, 1 May 2016 08:54:10 +0000 (08:54 +0000)]
mda: export @BAD_HEADERS variable
This should allow users to change and add headers as needed.
While we're at it, add the X-Original-To header Postfix likes
to add; it seems like pointless bloat with the existence of
(important) Received: headers.
Eric Wong [Sat, 30 Apr 2016 02:57:40 +0000 (02:57 +0000)]
daemon: graceful shutdown warning and limit removal
git clones may take longer than 30s, much longer... So prepare
to wait almost indefinitely for sockets to timeout and document
the second signal behavior for immediate shutdown.
While we're at it, move parent death handling to a separate
class to avoid Danga::Socket->AddOtherFds, since that does not
allow proper handling the parent pipe being closed and would
actually misterminate a worker prematurely. t/nntpd.t is update
to illustrate the failure with workers enabled.
We will work to keep memory usage low and let clients take their
time without interrupting them.
Eric Wong [Fri, 29 Apr 2016 20:06:14 +0000 (20:06 +0000)]
TODO: add item for .mailmap support
Email addresses get out-of-date, so make sure they're mapped
properly for future readers. git and linux-kernel already have
an established convention for this, so we will follow it.
Eric Wong [Fri, 29 Apr 2016 03:32:20 +0000 (03:32 +0000)]
http: improve error handling for aborted responses
We need to abort connections properly if a response is prematurely
truncated. This includes problems with serving static files, since
a clumsy admin or broken FS could return truncated responses and
inadvertently leave a client waiting (since the client saw
"Content-Length" in the header and expected a certain length).
Eric Wong [Fri, 29 Apr 2016 04:00:24 +0000 (04:00 +0000)]
http: avoid corking on "Content-Length: 0" response
We must use a normal write instead of send(.., MSG_MORE)
when writing responses of "Content-Length: 0" to avoid
the corking effect MSG_MORE provides. We only want to
cork headers if we will send a non-empty body.
Fixes: c3eeaf664cf0 ("http: clarify intent for persistence")
This needs a proper test.
Eric Wong [Thu, 28 Apr 2016 01:56:08 +0000 (01:56 +0000)]
githttpbackend: clamp to one smart HTTP request at-a-time
Server admins may not be able to afford to have too many
git-pack-objects processes running at once. Since PSGI
HTTP servers should already be configured to use multiple
processes for other requests; limit concurrency of smart
backends to one; and fall back to dumb responses if we're
already generating a pack.
Eric Wong [Thu, 28 Apr 2016 01:56:07 +0000 (01:56 +0000)]
githttpbackend: fall back to dumb if smart HTTP is off
Using http.getanyfile still keeps the http-backend process
alive, so it's better to break out of that process and
handle serving entirely within the HTTP server.
Eric Wong [Thu, 28 Apr 2016 01:03:31 +0000 (01:03 +0000)]
import: run git-update-server-info when done
We should update $GIT_DIR/info/refs for dumb HTTP clients
whenever we make changes to the repository. The best place
to update is immediately after making commits.
This fixes a bug where public-inbox-learn did not properly
update $GIT_DIR/info/refs after inserting or removing
messages.
Eric Wong [Mon, 25 Apr 2016 09:50:02 +0000 (09:50 +0000)]
remove GIT_DIR env usage in favor of --git-dir
No need to maintain per-block environment state when we can
localize it to per-command. We've had --git-dir= in git
since 1.4.2 (2006-08-12) and already use it all over the
place.
Eric Wong [Mon, 25 Apr 2016 07:51:26 +0000 (07:51 +0000)]
nntp: reduce timers for weakening
Danga::Socket timers are not cheap, so avoid creating up
to 3 timers per-newsgroup by batching resource weakening.
This lets us reduce resource consumption for scheduing
additional resource consumption reduction :)
Eric Wong [Sun, 24 Apr 2016 23:52:00 +0000 (23:52 +0000)]
view: add extra newline in flat thread view for lynx
This shouldn't show up in other browsers (tested with w3m, too),
but the extra newline makes a difference for delineating
messages when viewed with lynx.
Eric Wong [Thu, 21 Apr 2016 22:46:04 +0000 (22:46 +0000)]
mda: reject multiple Message-IDs up front
While ssoma now documents it uses the first Message-ID, they
are confusing and could be a sign of a broken mail software,
and broken mail software is often a sign of spam...
Eric Wong [Sat, 16 Apr 2016 18:46:35 +0000 (18:46 +0000)]
view: show flat thread view in chronological order
Allowing readers new to a topic to follow in chronological order
probably makes the most sense. Reverse chronological order may
reduce scrolling (e.g. log view); but nearly all non-threaded
conversation displays seem to be chronological so perhaps
there's a good reason for that.
Eric Wong [Fri, 15 Apr 2016 20:50:56 +0000 (20:50 +0000)]
www: redirect /$MESSAGE_ID/f/ endpoints
Quote-folding was a major design mistake pre-1.0. Since this
project is still in its infancy and unlikely to be in wide
use at the moment, redirect the /f/ endpoints back to the
plain message.
Eric Wong [Wed, 13 Apr 2016 03:04:11 +0000 (03:04 +0000)]
www: stop generating /$MESSAGE_ID/f/ links
Quote-folding can be detrimental as it fails to hide the
real problem of over-quoting.
Over-quoting wastes bandwidth and space for all readers, not
just WWW readers of the public-inbox. So hopefully removing
quote-folding support from the WWW interface can shame those
repliers into quoting only relevant portions of what they reply
to.
Eric Wong [Sat, 9 Apr 2016 00:28:07 +0000 (00:28 +0000)]
import: initial module + test case
This will allow us to write fast importers for existing
archives as well as eventually removing the ssoma dependency
for performance and ease-of-installation.
Eric Wong [Wed, 6 Apr 2016 08:23:15 +0000 (08:23 +0000)]
view: account for threads lacking a common parent
In the per-message view, we still need to account for threads
lacking a common parent. This can happen when threads are
broken by some broken clients or if somebody sends the same
message twice to the same inbox with a different Message-ID.