Eric Wong [Sun, 22 May 2016 09:06:03 +0000 (09:06 +0000)]
git-http-backend: switch to async_pass
This simplifies the code somewhat; but it could probably
still be made simpler. It will need to support command
queueing for expensive commands so expensive processes
can be queued up.
Eric Wong [Sat, 21 May 2016 23:45:27 +0000 (23:45 +0000)]
http: support async_pass for Danga::Socket
This will allow us to minimize buffering after we wait
(possibly a long time) for readability. This also greatly
reduces the amount of Danga::Socket-specific knowledge we
have in our PSGI code, making it easier for others to
understand.
This unsubscribe PSGI endpoint should never incur enough load to
justify using multiple worker processes. If it's unstable and
crashes, systemd can automatically restart it.
Eric Wong [Sat, 21 May 2016 04:12:45 +0000 (04:12 +0000)]
unsubscribe: bad URL fixup
Fixup a comment about s/query string/PATH_INFO/ while
we're at it, as pre-published versions of this used
query strings before I determined it could be harder
to copy+paste URLs with query parameters in them.
Eric Wong [Sat, 21 May 2016 03:03:17 +0000 (03:03 +0000)]
mbox: switch generation over to pull model
This allows us to easily provide gigantic inboxes
with proper backpressure handling for slow clients.
It also eliminates public-inbox-httpd and Danga::Socket-specific
knowledge from this class, making it easier to follow for
those used to generic PSGI applications.
Eric Wong [Sat, 21 May 2016 03:03:16 +0000 (03:03 +0000)]
http: reduce over-buffering for getline responses
By switching to a "pull"-based I/O model for reading
application responses, we should be able to throttle
buffering to slow clients more effectively and avoid
wasting precious RAM.
This will also allow us to more Danga::Socket-specific
knowledge out of the PSGI application and keep it
confined to PublicInbox::HTTP.
Eric Wong [Wed, 18 May 2016 01:23:05 +0000 (01:23 +0000)]
unsubscribe: get off mah lawn^H^H^Hist
While public-inbox is intended primarily for archival,
SMTP list subscriptions are still in use in most places
and users are likely to want a good unsubscribe mechanism.
HTTP (or HTTPS) links in the List-Unsubscribe header are
often preferable since some users may use an incorrect
email address for mailto: links.
Thus, it is useful to provide an example which generates an
HTTPS link for users to click on. The default .psgi requires
a POST confirmation (as destructive actions with GET are
considered bad practice). However, the "confirm" parameter
may be disabled for a true "one-click" unsubscribe.
The generated URLs are hopefully short enough and both shell
and highlighting-friendly to reduce copy+paste errors.
Eric Wong [Thu, 19 May 2016 10:23:28 +0000 (10:23 +0000)]
msg_iter: workaround broken Email::MIME versions
Email::MIME >= 1.923 and < 1.935 would drop too many newlines
in attachments. This would lead to ugly text files without
a proper trailing newline if using quoted-printable, 7bit, or
8bit. Attachments encoded with base64 were not affected.
These versions of Email::MIME are widely available in Debian 8
(Jessie) and even Ubuntu LTS distros so we will need to support
this workaround for a while.
Eric Wong [Wed, 18 May 2016 20:30:31 +0000 (20:30 +0000)]
msg_iter: new internal API for iterating through MIME
Unlike Email::MIME::walk_parts, this is non-recursive and gives
depth + index offset information about the part for creating
links for later retrieval
It is intended for read-only access and changes are not
propagated to the parent; however future versions of it
may clobber bodies or the original version as it iterates
to reduce memory overhead.
It is intended for making it easy to locate attachments within a
message in the WWW view.
Eric Wong [Tue, 17 May 2016 08:16:47 +0000 (08:16 +0000)]
http: release resources when idle
This lets us release old git processes so unlinked packs
(leftover from repacking) can be released. This may also
be helpful for Xapian as indices get rebuilt for tuning.
For SQLite (msgmap), the there may be no benefit besides
reducing FD pressure.
Followup changes will unify the Inbox and NewsGroup
classes and allow better code-sharing between NNTP and
HTTP classes (as well as the planned POP3 class).
Eric Wong [Tue, 17 May 2016 05:39:06 +0000 (05:39 +0000)]
view: escape Message-ID for "next" link
Oops, we need to escape Message-IDs since they can contain
bad characters such as '%' in them. '@' actually seems fine
and does not need to be escaped; however, but we've been
doing it forever.
Eric Wong [Sun, 15 May 2016 06:31:50 +0000 (06:31 +0000)]
www: fix for running under mount paths
We try to avoid issues like these by using relative URLs
in hrefs, but we can't avoid the problem with Location:
for redirects and Atom feeds which are likely to be
rehosted elsewhere.
We also reorder some of the code to work around a weird
issue on the psgi-plack mailing list:
<20160516073750.GA11931@dcvr.yhbt.net>
(Somewhere on https://groups.google.com/group/psgi-plack
but it's probably not bookmarkable)
Eric Wong [Sat, 14 May 2016 06:10:36 +0000 (06:10 +0000)]
declare Inbox object for reusability
From the beginning, we've avoided objects here in favor
of faster startup time; but it may not be worth it
since a persistent httpd/nntpd is faster and -mda
isn't hit as often.
Eric Wong [Sun, 15 May 2016 23:30:06 +0000 (23:30 +0000)]
mbox: support /$INBOX/all.mbox.gz endpoint
Allows easily downloading the entire archive without
special tools. In any case, it's not yet advertised to via
HTML until we can test it better. It'll also support range
queries in the future to avoid wasting bandwidth.
Eric Wong [Sat, 14 May 2016 02:54:09 +0000 (02:54 +0000)]
nntp: use "newsgroup" instead of "name"
This reduces the cognitive overhead for mapping names of
configuration values to internal field names of our classes.
Further changes along these lines coming...
Eric Wong [Sat, 14 May 2016 02:17:47 +0000 (02:17 +0000)]
import ssoma-replay example script I've been using
Unfortunately, most users still prefer their mail delivered
over SMTP; so we'll at least document mlmmj integration for now
until we can popularize pull-based reading over POP3/NNTP/ssoma.
Eric Wong [Sat, 14 May 2016 01:45:29 +0000 (01:45 +0000)]
t/nntpd: test for wide characters and UTF-8 mangling
We'll need to test non-UTF-8 messages at some point, too.
There are lots of legacy-encoded messages in old archives
and I would not bet we behave sanely w.r.t. those.
Eric Wong [Sat, 14 May 2016 01:24:08 +0000 (01:24 +0000)]
t/nntpd: avoid fork+exec for search indexing
The Xapian search index is required for the NNTP server, so
there's no point in calling system() for it like we do in
other tests. This should speed up the test a small amount.
The raw, undecoded body is probably what should be sent over the
wire anyways for clients to deal with. We'll need this to avoid
deprecation warnings with Perl 5.24+ since we use
send()/recv()/sysread().
Eric Wong [Thu, 12 May 2016 09:06:56 +0000 (09:06 +0000)]
import: fallback to email if '<>' exists in author name
git doesn't handle '<' and '>' characters in the author
name at all regardless of quoting, not just matched pairs.
So fall back to using the email as the author name since
the commit info isn't critical, anyways (shallow clones
are fine).
Eric Wong [Tue, 3 May 2016 02:34:57 +0000 (02:34 +0000)]
git-http-backend: reduce memory use for clone/fetch
When serving large static files or large packs, we may call
Danga::Socket::write directly to queue up callbacks to resume
reading and defer firing them until the socket is writable.
This prevents us from scheduling writes or buffering until we
know the socket is writable and prevents needless buffering by
Danga::Socket when faced with slow clients.
For smart clones, this comes at the cost of throttling the
output of "git pack-objects" to the speed of the client
connection. This is probably not ideal, but is the behavior of
the standard git-daemon, too; and is preferable to running the
httpd out-of-memory. Buffering to the filesystem may be an
option in the future...
Eric Wong [Tue, 3 May 2016 02:52:23 +0000 (02:52 +0000)]
http: move empty string check into write callback
This empty string check is for middlewares such as Deflater
which may write empty strings, not for direct real callers of
Danga::Socket who (presumably) know what they're doing.
Eric Wong [Tue, 3 May 2016 06:20:54 +0000 (06:20 +0000)]
spawnpp: use native perl %ENV outside of mod_perl
We only need to use env(1) under mod_perl; since mod_perl
is uncommon nowadays, support native %ENV for a teeny
speedup for folks uncomfortable with running vfork via
Inline::C snippet.
Eric Wong [Mon, 2 May 2016 07:52:41 +0000 (07:52 +0000)]
t/*.t: reduce -mda calls
Process startup times are atrocious for fast tests and there's far
too much setup involved. Rely on git-fast-import instead; but
more work is needed in this area.
Eric Wong [Mon, 2 May 2016 04:22:40 +0000 (04:22 +0000)]
nntp: append Archived-At and List-Archive headers
For readers using NNTP, we should do our best to advertise the
clonable HTTP/HTTPS URLs and the message permalink URL for
ease-of-referencing messages, since we don't want the NNTP server
and it's sequential article numbers to be relied on.
Eric Wong [Sun, 1 May 2016 10:14:28 +0000 (10:14 +0000)]
daemon: reduce timer-related allocations
We can reduce the allocation and overhead needed for
Danga::Socket timers for immediately-executed responses by
combining identical timers and reducing anonymous sub creation.
Eric Wong [Sun, 1 May 2016 08:54:10 +0000 (08:54 +0000)]
mda: export @BAD_HEADERS variable
This should allow users to change and add headers as needed.
While we're at it, add the X-Original-To header Postfix likes
to add; it seems like pointless bloat with the existence of
(important) Received: headers.
Eric Wong [Sat, 30 Apr 2016 02:57:40 +0000 (02:57 +0000)]
daemon: graceful shutdown warning and limit removal
git clones may take longer than 30s, much longer... So prepare
to wait almost indefinitely for sockets to timeout and document
the second signal behavior for immediate shutdown.
While we're at it, move parent death handling to a separate
class to avoid Danga::Socket->AddOtherFds, since that does not
allow proper handling the parent pipe being closed and would
actually misterminate a worker prematurely. t/nntpd.t is update
to illustrate the failure with workers enabled.
We will work to keep memory usage low and let clients take their
time without interrupting them.
Eric Wong [Fri, 29 Apr 2016 20:06:14 +0000 (20:06 +0000)]
TODO: add item for .mailmap support
Email addresses get out-of-date, so make sure they're mapped
properly for future readers. git and linux-kernel already have
an established convention for this, so we will follow it.
Eric Wong [Fri, 29 Apr 2016 03:32:20 +0000 (03:32 +0000)]
http: improve error handling for aborted responses
We need to abort connections properly if a response is prematurely
truncated. This includes problems with serving static files, since
a clumsy admin or broken FS could return truncated responses and
inadvertently leave a client waiting (since the client saw
"Content-Length" in the header and expected a certain length).
Eric Wong [Fri, 29 Apr 2016 04:00:24 +0000 (04:00 +0000)]
http: avoid corking on "Content-Length: 0" response
We must use a normal write instead of send(.., MSG_MORE)
when writing responses of "Content-Length: 0" to avoid
the corking effect MSG_MORE provides. We only want to
cork headers if we will send a non-empty body.
Fixes: c3eeaf664cf0 ("http: clarify intent for persistence")
This needs a proper test.
Eric Wong [Thu, 28 Apr 2016 01:56:08 +0000 (01:56 +0000)]
githttpbackend: clamp to one smart HTTP request at-a-time
Server admins may not be able to afford to have too many
git-pack-objects processes running at once. Since PSGI
HTTP servers should already be configured to use multiple
processes for other requests; limit concurrency of smart
backends to one; and fall back to dumb responses if we're
already generating a pack.
Eric Wong [Thu, 28 Apr 2016 01:56:07 +0000 (01:56 +0000)]
githttpbackend: fall back to dumb if smart HTTP is off
Using http.getanyfile still keeps the http-backend process
alive, so it's better to break out of that process and
handle serving entirely within the HTTP server.
Eric Wong [Thu, 28 Apr 2016 01:03:31 +0000 (01:03 +0000)]
import: run git-update-server-info when done
We should update $GIT_DIR/info/refs for dumb HTTP clients
whenever we make changes to the repository. The best place
to update is immediately after making commits.
This fixes a bug where public-inbox-learn did not properly
update $GIT_DIR/info/refs after inserting or removing
messages.
Eric Wong [Mon, 25 Apr 2016 09:50:02 +0000 (09:50 +0000)]
remove GIT_DIR env usage in favor of --git-dir
No need to maintain per-block environment state when we can
localize it to per-command. We've had --git-dir= in git
since 1.4.2 (2006-08-12) and already use it all over the
place.
Eric Wong [Mon, 25 Apr 2016 07:51:26 +0000 (07:51 +0000)]
nntp: reduce timers for weakening
Danga::Socket timers are not cheap, so avoid creating up
to 3 timers per-newsgroup by batching resource weakening.
This lets us reduce resource consumption for scheduing
additional resource consumption reduction :)
Eric Wong [Sun, 24 Apr 2016 23:52:00 +0000 (23:52 +0000)]
view: add extra newline in flat thread view for lynx
This shouldn't show up in other browsers (tested with w3m, too),
but the extra newline makes a difference for delineating
messages when viewed with lynx.
Eric Wong [Thu, 21 Apr 2016 22:46:04 +0000 (22:46 +0000)]
mda: reject multiple Message-IDs up front
While ssoma now documents it uses the first Message-ID, they
are confusing and could be a sign of a broken mail software,
and broken mail software is often a sign of spam...