Eric Wong [Mon, 30 May 2016 04:39:57 +0000 (04:39 +0000)]
http: yield body->getline running time
We cannot let a client monopolize the single-threaded server
even if it can drain the socket buffer faster than we can
emit data.
While we're at it, acknowledge the this behavior (which happens
naturally) in httpd/async.
The same idea is present in NNTP for the long_response code.
This is the HTTP followup to:
commit 0d0fde0bff97 ("nntp: introduce long response API for streaming")
commit 79d8bfedcdd2 ("nntp: avoid signals for long responses")
Eric Wong [Mon, 30 May 2016 02:10:36 +0000 (02:10 +0000)]
script/*{mda,learn}: no strict params for Email::MIME::ContentType
User input is imperfect, do not pollute our mail logs with
warnings we cannot fix. This is documented in the
Email::MIME::ContentType manpage so it should remain supported.
Eric Wong [Mon, 30 May 2016 00:51:44 +0000 (00:51 +0000)]
git-http-backend: remove dependency on Plack::Request
Plack::Request is unnecessary overhead for this given the
strictness of git-http-backend. Furthermore, having to make
commit 311c2adc8c63 ("avoid Plack::Request parsing body")
to avoid tempfiles should not have been necessary.
Eric Wong [Sat, 28 May 2016 01:57:14 +0000 (01:57 +0000)]
remove redundant NewsGroup class
Most of its functionality is in the PublicInbox::Inbox class.
While we're at it, we no longer auto-create newsgroup names
based on the inbox name, since newsgroup names probably deserve
some thought when it comes to hierarchy.
Eric Wong [Sat, 28 May 2016 01:57:08 +0000 (01:57 +0000)]
t/plack: ensure we can cascade on common endpoints
We don't serve things like robots.txt, favicon.ico, or
.well-known/ endpoints ourselves, but ensure we can be
used with Plack::App::Cascade for others.
Eric Wong [Fri, 27 May 2016 07:23:18 +0000 (07:23 +0000)]
httpd/async: do not needlessly weaken
The restart_read callback has no chance of circular reference,
and weakening $self before we create it can cause $self to
be undefined inside the callback (seen during stress testing).
Eric Wong [Fri, 27 May 2016 05:59:14 +0000 (05:59 +0000)]
httpd/async: prevent circular reference
We must avoid circular references which can cause leaks in
long-running processes. This callback is dangerous since
it may never be called to properly terminate everything.
Eric Wong [Wed, 25 May 2016 01:44:46 +0000 (01:44 +0000)]
remove Email::Address dependency
git has stricter requirements for ident names (no '<>')
which Email::Address allows.
Even in 1.908, Email::Address also has an incomplete fix for
CVE-2015-7686 with a DoS-able regexp for comments. Since we
don't care for or need all the RFC compliance of Email::Address,
avoiding it entirely may be preferable.
Email::Address will still be installed as a requirement for
Email::MIME, but it is only used by the
Email::MIME::header_str_set which we do not use
Eric Wong [Tue, 24 May 2016 03:41:53 +0000 (03:41 +0000)]
git-http-backend: use qspawn to limit running processes
Having an excessive amount of git-pack-objects processes is
dangerous to the health of the server. Queue up process spawning
for long-running responses and serve them sequentially, instead.
Eric Wong [Tue, 24 May 2016 03:41:52 +0000 (03:41 +0000)]
http: fix various race conditions
We no longer override Danga::Socket::event_write and instead
re-enable reads by queuing up another callback in the $close
response callback. This is necessary because event_write may not be
completely done writing a response, only the existing buffered data.
Furthermore, the {closed} field can almost be set at any time when
writing, so we must check it before acting on pipelined requests as
well as during write callbacks in more().
Eric Wong [Tue, 24 May 2016 03:41:51 +0000 (03:41 +0000)]
standardize timer-related event-loop code
Standardize the code we have in place to avoid creating too many
timer objects. We do not need exact timers for things that don't
need to be run ASAP, so we can play things fast and loose to avoid
wasting power with unnecessary wakeups.
We only need two classes of timers:
* asap - run this on the next loop tick, after operating on
@Danga::Socket::ToClose to close remaining sockets
* later - run at some point in the future. It could be as
soon as immediately (like "asap"), and as late as 60s into
the future.
In the future, we support an "emergency" switch to fire "later"
timers immediately.
Eric Wong [Mon, 23 May 2016 07:19:45 +0000 (07:19 +0000)]
http: chunk in the server, not middleware
Since PSGI does not require Transfer-Encoding: chunked or
Content-Length, we cannot expect random apps we host to chunk
their responses.
Thus, to improve interoperability, chunk at the HTTP layer like
other PSGI servers do. I'm chosing a more syscall-intensive method
(via multiple send(...MSG_MORE) for now to reduce copy + packet
overhead.
Eric Wong [Mon, 23 May 2016 04:01:14 +0000 (04:01 +0000)]
git-http-backend: refactor to support cleanup
We will have clients dropping connections during long clone
and fetch operations; so do not retain references holding
backend processes once we detect a client has dropped.
Eric Wong [Mon, 23 May 2016 01:17:28 +0000 (01:17 +0000)]
config: use popen_rd when spawning `git config'
We may spawn this in a large server process, so be sure
to take advantage of the optional vfork() support when
for folks who set PERL_INLINE_DIRECTORY.
Eric Wong [Sun, 22 May 2016 09:06:03 +0000 (09:06 +0000)]
git-http-backend: switch to async_pass
This simplifies the code somewhat; but it could probably
still be made simpler. It will need to support command
queueing for expensive commands so expensive processes
can be queued up.
Eric Wong [Sat, 21 May 2016 23:45:27 +0000 (23:45 +0000)]
http: support async_pass for Danga::Socket
This will allow us to minimize buffering after we wait
(possibly a long time) for readability. This also greatly
reduces the amount of Danga::Socket-specific knowledge we
have in our PSGI code, making it easier for others to
understand.
Eric Wong [Sat, 21 May 2016 03:03:17 +0000 (03:03 +0000)]
mbox: switch generation over to pull model
This allows us to easily provide gigantic inboxes
with proper backpressure handling for slow clients.
It also eliminates public-inbox-httpd and Danga::Socket-specific
knowledge from this class, making it easier to follow for
those used to generic PSGI applications.
Eric Wong [Sat, 21 May 2016 03:03:16 +0000 (03:03 +0000)]
http: reduce over-buffering for getline responses
By switching to a "pull"-based I/O model for reading
application responses, we should be able to throttle
buffering to slow clients more effectively and avoid
wasting precious RAM.
This will also allow us to more Danga::Socket-specific
knowledge out of the PSGI application and keep it
confined to PublicInbox::HTTP.
Eric Wong [Thu, 19 May 2016 10:23:28 +0000 (10:23 +0000)]
msg_iter: workaround broken Email::MIME versions
Email::MIME >= 1.923 and < 1.935 would drop too many newlines
in attachments. This would lead to ugly text files without
a proper trailing newline if using quoted-printable, 7bit, or
8bit. Attachments encoded with base64 were not affected.
These versions of Email::MIME are widely available in Debian 8
(Jessie) and even Ubuntu LTS distros so we will need to support
this workaround for a while.
Eric Wong [Wed, 18 May 2016 20:30:31 +0000 (20:30 +0000)]
msg_iter: new internal API for iterating through MIME
Unlike Email::MIME::walk_parts, this is non-recursive and gives
depth + index offset information about the part for creating
links for later retrieval
It is intended for read-only access and changes are not
propagated to the parent; however future versions of it
may clobber bodies or the original version as it iterates
to reduce memory overhead.
It is intended for making it easy to locate attachments within a
message in the WWW view.
Eric Wong [Tue, 17 May 2016 08:16:47 +0000 (08:16 +0000)]
http: release resources when idle
This lets us release old git processes so unlinked packs
(leftover from repacking) can be released. This may also
be helpful for Xapian as indices get rebuilt for tuning.
For SQLite (msgmap), the there may be no benefit besides
reducing FD pressure.
Followup changes will unify the Inbox and NewsGroup
classes and allow better code-sharing between NNTP and
HTTP classes (as well as the planned POP3 class).
Eric Wong [Tue, 17 May 2016 05:39:06 +0000 (05:39 +0000)]
view: escape Message-ID for "next" link
Oops, we need to escape Message-IDs since they can contain
bad characters such as '%' in them. '@' actually seems fine
and does not need to be escaped; however, but we've been
doing it forever.
Eric Wong [Sun, 15 May 2016 06:31:50 +0000 (06:31 +0000)]
www: fix for running under mount paths
We try to avoid issues like these by using relative URLs
in hrefs, but we can't avoid the problem with Location:
for redirects and Atom feeds which are likely to be
rehosted elsewhere.
We also reorder some of the code to work around a weird
issue on the psgi-plack mailing list:
<20160516073750.GA11931@dcvr.yhbt.net>
(Somewhere on https://groups.google.com/group/psgi-plack
but it's probably not bookmarkable)
Eric Wong [Sat, 14 May 2016 06:10:36 +0000 (06:10 +0000)]
declare Inbox object for reusability
From the beginning, we've avoided objects here in favor
of faster startup time; but it may not be worth it
since a persistent httpd/nntpd is faster and -mda
isn't hit as often.
Eric Wong [Sun, 15 May 2016 23:30:06 +0000 (23:30 +0000)]
mbox: support /$INBOX/all.mbox.gz endpoint
Allows easily downloading the entire archive without
special tools. In any case, it's not yet advertised to via
HTML until we can test it better. It'll also support range
queries in the future to avoid wasting bandwidth.
Eric Wong [Sat, 14 May 2016 02:54:09 +0000 (02:54 +0000)]
nntp: use "newsgroup" instead of "name"
This reduces the cognitive overhead for mapping names of
configuration values to internal field names of our classes.
Further changes along these lines coming...
Eric Wong [Sat, 14 May 2016 02:17:47 +0000 (02:17 +0000)]
import ssoma-replay example script I've been using
Unfortunately, most users still prefer their mail delivered
over SMTP; so we'll at least document mlmmj integration for now
until we can popularize pull-based reading over POP3/NNTP/ssoma.
Eric Wong [Sat, 14 May 2016 01:45:29 +0000 (01:45 +0000)]
t/nntpd: test for wide characters and UTF-8 mangling
We'll need to test non-UTF-8 messages at some point, too.
There are lots of legacy-encoded messages in old archives
and I would not bet we behave sanely w.r.t. those.
Eric Wong [Sat, 14 May 2016 01:24:08 +0000 (01:24 +0000)]
t/nntpd: avoid fork+exec for search indexing
The Xapian search index is required for the NNTP server, so
there's no point in calling system() for it like we do in
other tests. This should speed up the test a small amount.
The raw, undecoded body is probably what should be sent over the
wire anyways for clients to deal with. We'll need this to avoid
deprecation warnings with Perl 5.24+ since we use
send()/recv()/sysread().
Eric Wong [Thu, 12 May 2016 09:06:56 +0000 (09:06 +0000)]
import: fallback to email if '<>' exists in author name
git doesn't handle '<' and '>' characters in the author
name at all regardless of quoting, not just matched pairs.
So fall back to using the email as the author name since
the commit info isn't critical, anyways (shallow clones
are fine).
Eric Wong [Tue, 3 May 2016 02:34:57 +0000 (02:34 +0000)]
git-http-backend: reduce memory use for clone/fetch
When serving large static files or large packs, we may call
Danga::Socket::write directly to queue up callbacks to resume
reading and defer firing them until the socket is writable.
This prevents us from scheduling writes or buffering until we
know the socket is writable and prevents needless buffering by
Danga::Socket when faced with slow clients.
For smart clones, this comes at the cost of throttling the
output of "git pack-objects" to the speed of the client
connection. This is probably not ideal, but is the behavior of
the standard git-daemon, too; and is preferable to running the
httpd out-of-memory. Buffering to the filesystem may be an
option in the future...
Eric Wong [Tue, 3 May 2016 02:52:23 +0000 (02:52 +0000)]
http: move empty string check into write callback
This empty string check is for middlewares such as Deflater
which may write empty strings, not for direct real callers of
Danga::Socket who (presumably) know what they're doing.
Eric Wong [Tue, 3 May 2016 06:20:54 +0000 (06:20 +0000)]
spawnpp: use native perl %ENV outside of mod_perl
We only need to use env(1) under mod_perl; since mod_perl
is uncommon nowadays, support native %ENV for a teeny
speedup for folks uncomfortable with running vfork via
Inline::C snippet.
Eric Wong [Mon, 2 May 2016 07:52:41 +0000 (07:52 +0000)]
t/*.t: reduce -mda calls
Process startup times are atrocious for fast tests and there's far
too much setup involved. Rely on git-fast-import instead; but
more work is needed in this area.