Eric Wong [Fri, 12 Aug 2016 19:52:35 +0000 (19:52 +0000)]
www: allow including links to NNTP sites in HTML footer
Improve the discoverability of NNTP endpoints for users
who still know what NNTP is.
==> ~/.public-inbox/config <==
; aliases for the locally-run nntpd can be specified in
; the "publicinbox" section:
[publicinbox]
nntpserver = nntp://ou63pmih66umazou.onion/
nntpserver = news.public-inbox.org
; NNTPS is not supported natively, yet,
; but one can use haproxy or similar
; nntpserver = nntps://news.public-inbox.invalid/
; mirrors for specific inboxes may be specified either as full
; NNTP (or NNTPS) URLs, or with the server name only if the
; newsgroup name is specfied for a local NNTP server
[publicinbox "git"]
...
newsgroup = inbox.a.b.c
nntpmirror = nntp://czquwvybam4bgbro.onion/
nntpmirror = hjrcffqmbrq6wope.onion
; there may be a mirror on a different server with a
; different name:
nntpmirror = nntp://news.example.com/differently.named.group
; (And I really need to write manpages for all this...)
Eric Wong [Thu, 11 Aug 2016 00:23:48 +0000 (00:23 +0000)]
search: support alt-ID for mapping legacy serial numbers
For some existing mailing list archives, messages are identified
by serial number (such as NNTP article numbers in gmane). Those
links may become inaccessible (as is the current case for
gmane), so ensure users can still search based on old serial
numbers.
Now, I run the following periodically to get article numbers
from gmane (while news.gmane.org remains):
; relative pathnames expand to $mainrepo/public-inbox/$file
altid = serial:gmane:file=gmane.sqlite3
And run "public-inbox-index --reindex /path/to/git.vger.git"
periodically.
This ought to allow searching for "gmane:12345" to work for
Xapian-enabled instances.
Disclaimer: while public-inbox supports NNTP and stable article
serial numbers, use of those for public links is discouraged
since it encourages centralization.
Eric Wong [Tue, 9 Aug 2016 23:59:10 +0000 (23:59 +0000)]
searchidx: allow searching Message-IDs in free-form text
It is not unheard of for users to attempt finding messages by
entering Message-IDs into the "Search" box instead of using the
existing URL structure. So make it possible for them.
Fwiw, I've definitely encountered users who enter entire URLs
into generic search engines.
Eric Wong [Tue, 9 Aug 2016 00:41:37 +0000 (00:41 +0000)]
searchidx: avoid holding Xapian lock in cat-file
We must ensure cat-file process is launched before Xapian
grabs lock, too. Our use of "git cat-file --batch" has
the same problem as "git log" did, (which was fixed in
commit 3713c727cda431a0dc2865a7878c13ecf9f21851)
"searchidx: release Xapian FDs before spawning git log"
Eric Wong [Sat, 6 Aug 2016 01:58:47 +0000 (01:58 +0000)]
mbox: be fair to other HTTP clients
At least for public-inbox-httpd, this allows us to avoid having
a client monopolize one event loop tick of the server for too
long. It hurts throughput for the /all.mbox.gz endpoint, but I
doubt anybody cares and the latency improvement for other
clients would be appreciated.
We already do the same fairness thing for HTML pages.
Eric Wong [Fri, 5 Aug 2016 22:07:25 +0000 (22:07 +0000)]
search: disable batching in newer versions of Xapian, for now
This warrants further investigation, but it appears we cannot
release Xapian reliably after forking "git log" due to the
lack of a close-on-exec flag on the Xapian flintlock FD
Eric Wong [Fri, 5 Aug 2016 18:10:50 +0000 (18:10 +0000)]
view: use <hr> to delineate in /$MID/T/ view
The sacrifice in vertical space might be worth it to improve
ease-of-reading, as it's unreasonable to expect an entire
message thread to be able to fit into a single window.
Eric Wong [Thu, 4 Aug 2016 21:58:40 +0000 (21:58 +0000)]
view: do not fail on empty In-Reply-To
Sometimes messages have an empty In-Reply-To header which throws
threaders off. This actually causes public-inbox-httpd to die,
which is probably bad and will be fixed elsewhere.
Eric Wong [Tue, 2 Aug 2016 10:02:54 +0000 (10:02 +0000)]
searchmsg: add git object ID to doc_data
Doing git tree lookups based on the SHA-1 of the Message-ID
is expensive as trees get larger, instead, use the SHA-1
object ID directly. This drastically reduces the amount
of time spent in the "git cat-file --batch" process for
fetching the /$INBOX/all.mbox.gz endpoint on the ~800MB
git@vger.kernel.org mirror
This retains backwards compatibility and allows existing
indices to be transparently upgraded without performance
degradation.
Eric Wong [Tue, 2 Aug 2016 01:47:55 +0000 (01:47 +0000)]
wwwstream: prioritize search in top title bar
search is probably more useful so users should be able to select
it sooner. Put it on its own line so it won't get scrolled off
the edge for non-CSS users.
Fix a minor spacing bug in the input tag while we're at it, too
Eric Wong [Tue, 2 Aug 2016 01:47:54 +0000 (01:47 +0000)]
daemon: do not chdir unless daemonizing
As far as most process managers are concerned (e.g. systemd),
they should already start in '/'. So avoid making our daemon
more complex to run by requiring absolute paths during
development.
Eric Wong [Sun, 31 Jul 2016 00:02:06 +0000 (00:02 +0000)]
search: support reindexing existing search indices
This should make tweaking the way we search more efficiet
by allowing us to avoid doubling destroying the index every
time we want to change something.
We also give priority to incremental indexing via
public-inbox-{watch,mda} and have manual invocations of
public-inbox-index perform batch updates while releasing
ssoma.lock.
Eric Wong [Sun, 31 Jul 2016 00:02:05 +0000 (00:02 +0000)]
msgmap: fix use of transactions
We want transactions to be the responsibility of the
caller when possible; this fixes the potential for
the msgmap to internally become inconsistent when
using it from inside searchidx.
Eric Wong [Sat, 30 Jul 2016 23:33:11 +0000 (23:33 +0000)]
t/config_limiter: fix check for identical Git object
If we completely undef an object, it is likely possible
to have the same scalar address as the original object
even if they are different. So keep the same object
around and only force creation of the same reference.
Tested on Perl 5.14.2 on Debian 7.x wheezy.
Eric Wong [Fri, 29 Jul 2016 18:58:51 +0000 (18:58 +0000)]
daemon: re-enable SIGWINCH without setsid
This allows systemd users to use SIGWINCH to temporarily
(and gracefully) stop an instance of a service without
doing a code reload to bring it back up:
# start temporary new service code
systemctl start public-inbox-nntpd@2.service
# momentarily paralyze original service
systemctl kill -s WINCH public-inbox-nntpd@1.service
if new_code_at_2_sucks
then
# restart original workers
systemctl kill -s HUP public-inbox-nntpd@1.service
else # new is better than old, replace original instance
systemctl restart public-inbox-nntpd@1.service
fi
# cleanup the temporary service
systemctl stop public-inbox-nntpd@2.service
Eric Wong [Thu, 21 Jul 2016 01:23:03 +0000 (01:23 +0000)]
view: split up --cc args for git-send-email
Having long Cc: lines is inevitable for large threads
with many participants, and git-send-email only gained
the ability to recognize ',' in the "--cc" arg recently
with the release of git v2.6.0 in September 2015.
Eric Wong [Thu, 21 Jul 2016 01:23:02 +0000 (01:23 +0000)]
www: label sections and hopefully improve navigation
Clearly label "Thread overview" and "Reply instructions"
so users can quickly skip stuff they're not interested in.
Additionally, note the fact the thread view allows quick
navigation within the thread to avoid extra network requests
and improve the display for single-message threads.
Finally, use <hr> to better-delineate sections of each page.
Eric Wong [Sun, 17 Jul 2016 23:27:02 +0000 (23:27 +0000)]
extmsg: favor user-provided URL on partial matches
While an inbox may have multiple URLs, we will favor
the existing URL for the current inbox on partial matches
to avoid confusing users or slowing them down by requiring
a new TCP connection.
Eric Wong [Sat, 9 Jul 2016 07:53:17 +0000 (07:53 +0000)]
view: improve grouping for topic view
This reduces the amount of mbox/Atom links while keeping
better track of overall thread count. We no longer loop
to fill up slots to simplify the code a bit and hopefully
get better grouping.
Eric Wong [Sat, 9 Jul 2016 04:51:37 +0000 (04:51 +0000)]
httpd/async: reinstate D::S timer usage for cleanup
EvCleanup::asap events are not guaranteed to run after
Danga::Socket closes sockets at the event loop. Thus we
must use slower Danga::Socket timers which are guaranteed
to run at the end of the event loop.
Eric Wong [Sat, 9 Jul 2016 04:51:36 +0000 (04:51 +0000)]
httpd/async: do not attempt future writes on closed sockets
Danga::Socket::close does not clear the write_buf_size field,
so it's conceivable we could attempt to queue up data and
callbacks we can never flush out.
Eric Wong [Sat, 9 Jul 2016 03:18:35 +0000 (03:18 +0000)]
www: add configurable limiters
Currently only for git-http-backend use, this allows limiting
the number of spawned processes per-inbox or by group, if there
are multiple large inboxes amidst a sea of small ones.
For example, a "big" repo limiter could be used for big inboxes:
which would be shared between multiple repos:
[limiter "big"]
max = 4
[publicinbox "git"]
address = git@vger.kernel.org
mainrepo = /path/to/git.git
; shared limiter with giant:
httpbackendmax = big
[publicinbox "giant"]
address = giant@project.org
mainrepo = /path/to/giant.git
; shared limiter with git:
httpbackendmax = big
; This is a tiny inbox, use the default limiter with 32 slots:
[publicinbox "meta"]
address = meta@public-inbox.org
mainrepo = /path/to/meta.git
Eric Wong [Sat, 9 Jul 2016 03:18:34 +0000 (03:18 +0000)]
qspawn: allow configurable limiters
And bump the default limit to 32 so we match git-daemon
behavior. This shall allow us to configure different levels
of concurrency for different repositories and prevent clones
of giant repos from stalling service to small repos.
Eric Wong [Sat, 9 Jul 2016 00:00:11 +0000 (00:00 +0000)]
nntp: return if a client drops on us
Danga::Socket::write will set the closed flag on a socket,
automatically, and we do not need to bring down an entire
server when one client breaks the connection :P
Eric Wong [Thu, 7 Jul 2016 01:39:37 +0000 (01:39 +0000)]
www: remove old footer generation code and normalize new.html
We now generate all of our HTML using WwwStream which
forces us to have consistent headers and footers in
the HTML itself.
This also makes the search-capable vs search-less installs
go to the new.html endpoint to maintain consistency
(in case an admin decides to enable Xapian).
Eric Wong [Thu, 7 Jul 2016 01:39:36 +0000 (01:39 +0000)]
inbox: cleanup and consolidate object weakening
This fixes some layering violations and consolidates
the cleanup into the inbox object itself. Keeping in
mind weakening does not work at all without our PSGI
server.
Eric Wong [Wed, 6 Jul 2016 02:32:07 +0000 (02:32 +0000)]
www: use HTML <hr> instead of XHTML <hr />
We only need XHTML-compatibility inside Atom feeds, as
anecdotally, feed readers are stricter than normal browsers and
some do not support HTML, only XHTML. So we will continue to
accomodate them. However we favor HTML elsewhere since it
tends to be smaller than the equivalent well-formed XHTML.
Eric Wong [Wed, 6 Jul 2016 01:21:17 +0000 (01:21 +0000)]
extmsg: disable automatic inbox switching
Automatic inbox switching was a potentially deceptive pattern
and surprises readers who do not check the URL bar closely.
Furthermore, a message could be cross-posted to multiple lists,
too.
Eric Wong [Wed, 6 Jul 2016 00:36:59 +0000 (00:36 +0000)]
address: attempt to handle comments somewhat
They're uncommon, fortunately, but we make no attempt to
handle nested comments (which would open us up to things
like CVE-2015-7686) or use the comment in place of a
missing name.
Eric Wong [Tue, 5 Jul 2016 13:05:41 +0000 (13:05 +0000)]
daemon: disable USR2/TTIN/TTOU/WINCH in workers
If using a master/worker setup, a careless user could be trying
to signal all processes using "killall". This may trigger bad
side-effects; but try to limit the side-effects as much as
possible.
Eric Wong [Sat, 2 Jul 2016 22:57:29 +0000 (22:57 +0000)]
wwwstream: wording/grammar tweaks in trailer
git.git documentation uses "clonable" so that's probably
a better term than "clone-able". Also, shorten the section
for retrieving our code and remove an obvious typo.
Eric Wong [Fri, 1 Jul 2016 02:09:45 +0000 (02:09 +0000)]
git: allow cloning from the URL root, too
This means we can still show non-git users a somewhat browseable
URL with a link to the README.html file while allowing git users
to type less when cloning.