Eric Wong [Sat, 19 Sep 2015 02:03:38 +0000 (02:03 +0000)]
nntp: fix ARTICLE/HEAD/BODY/STAT
Article number is optional, but we need to update the
article number of the client connection if it was specified
(but not if it was given a Message-ID) as stipulated by
RFC 977
Eric Wong [Sat, 19 Sep 2015 02:03:34 +0000 (02:03 +0000)]
nntp: speed up XHDR for the Message-ID case
We can use our msgmap database to implement "XHDR Message-ID [range]"
commands quickly. This helps immensely with slrnpull, which prefers
XHDR to LISTGROUP for some reason...
Eric Wong [Sat, 19 Sep 2015 02:03:30 +0000 (02:03 +0000)]
nntp: introduce long response API for streaming
XOVER, NEWNEWS, XHDR responses may be arbitrarily long and cause
memory usage via buffering. This problem would exist even if we
were to optimize them by caching headers for fast retrieval and
search.
Introduce a generic API to handle all of these commands fairly
without triggering excessive buffering and unfairness of the
existing implementation.
Generating these responses is still expensive for now, but at least
the implementation is fair to other clients and prevents one client
from using one of these commands to monopolize resources away from
other clients.
Eric Wong [Tue, 15 Sep 2015 01:08:03 +0000 (01:08 +0000)]
extmsg: wire up to use msgmap for prefixes
DBI + DBD::SQLite has much better handling of prefix lookups
than Xapian. While we're at it, avoid linking blatantly wrong
Message-IDs to external services.
Eric Wong [Tue, 15 Sep 2015 01:08:02 +0000 (01:08 +0000)]
searchidx: sync Msgmap database along with Xapian
We can avoid duplicating work of extracting messages from git if we
tie this to Xapian. Of course, this ties the two features together,
but it's probably reasonable to expect that anybody who wants to use
public-inbox to serve messages to front-end users will have both.
Eric Wong [Mon, 14 Sep 2015 00:04:30 +0000 (00:04 +0000)]
searchview: do not link Atom feed by relevance
Atom feeds only make sense when sorted by time, not when our
search indexing rules change and affect relevance. So do not
include the relevance option when linking to Atom feeds.
However, we shall still honor the 'r' query parameter in case
somebody wants to manually include that in the URL for
testing/experimental purposes. We simply will not advertise
it.
Eric Wong [Sat, 12 Sep 2015 01:23:32 +0000 (01:23 +0000)]
searchview: support displaying entire threads
This hopefully makes it easy to perform queries to display
an entire thread. Raise the limit in the threaded view to
display more results and hopefully improve the output of
thread display.
Eric Wong [Mon, 7 Sep 2015 07:02:20 +0000 (07:02 +0000)]
view: change References link to expand thread
The expanded thread view is generally more useful. Having links
to more links at the bottom seems to a waste of navigation time.
However, keep the '#r' anchor in case people rely on it for
links.
Eric Wong [Sat, 5 Sep 2015 08:00:12 +0000 (08:00 +0000)]
search: tweak parsing for internal queries
We should not need to use QueryParser for internal queries,
but rather for external ones.
We'll also be exposing searching Message-IDs with the "mid:" prefix
for broken mids on some servers, and enabling partial searching
with 'm' to help with URL truncations.
Since thread IDs may be volatile, they cannot be exposed to the
public, there's no reason to expose them to the query parser,
either.
Also, add 's:' as an alternative probabilistic prefix to 'subject'
as it is shorter.
Eric Wong [Sat, 5 Sep 2015 05:54:26 +0000 (05:54 +0000)]
searchview: improve footer navigation
Aallow navigating backwards and forwards, as some pages will be
bookmarked or some browsers may not have history. Also add a
link back to the index where they presumably came from.
While we're at it, limit the number of results we have to 25
for now to avoid making the page too big and wasting clients
memory for irrelevant results.
Eric Wong [Fri, 4 Sep 2015 08:49:29 +0000 (08:49 +0000)]
www: extra redirects for the '/'-challenged
Omitting a slash should not be fatal if unambiguous. Add
fallbacks so users who expect a directory structure-like
experience can have it at the cost of one extra HTTP
request/response pair.
Eric Wong [Fri, 4 Sep 2015 05:33:17 +0000 (05:33 +0000)]
view: reduce redundant attributions in permalink refs
No point in repeating authorship when PATCH messages are
threaded and it's obvious from the top message who the author
is of the series:
[this message] - John Smith @ 2015-09-04 00:04:20 UTC
` [PATCH 1/4] view: eliminate redundant [threaded|flat] link
` [PATCH 2/4] view: one line for thread subjects
` [PATCH 3/4] view: adjust spacing and indentation of index threads
` [PATCH 4/4] view: add missing newline to inline dump
Eric Wong [Fri, 4 Sep 2015 02:18:09 +0000 (02:18 +0000)]
view: avoid attempting to find "subject dummy"
This is an internal Message-ID used by Mail::Thread, to group
messages with identical subjects but common parent. Don't
attempt to redirect users to external sites when we cannot
find it.
Eric Wong [Fri, 4 Sep 2015 02:18:06 +0000 (02:18 +0000)]
index: use message threading if search is available
This lets us merge topics with different subjects with a common parent
(common in "[PATCH 0/X]" threads). This also lets us avoid forking for
the HTML index page, too.
Eric Wong [Thu, 3 Sep 2015 08:28:54 +0000 (08:28 +0000)]
www: move fallback after legacy matches
We do not want to get legacy URLs swallowed up by our workaround
for weird and wonky servers that attempt to unescape PATH_INFO
before the app sees it.
Eric Wong [Thu, 3 Sep 2015 04:23:21 +0000 (04:23 +0000)]
www: attempt to handle Message-IDs with slashes
Unfortunately, some HTTP servers will try to be clever
with %2F and escape it to '/', making life difficult for
us. Fortunately, not many Message-IDs have slashes in
them.
Eric Wong [Thu, 3 Sep 2015 03:00:28 +0000 (03:00 +0000)]
get rid of Message-ID compression entirely
Provide a fallback for legacy SHA-1 messages, but do not
advertise shorter URLs anymore for data portability concerns.
This fixes a regression introduced in
commit 81a9c1b476987d845b340ab9013d26cf4487cb9a
("search: disable Message-ID compression in Xapian")
which ended up breaking thread-related endpoints for
large Message-IDs, as lookups on the SHA-1 message no longer
worked.
Eric Wong [Thu, 3 Sep 2015 01:57:11 +0000 (01:57 +0000)]
search: disable Message-ID compression in Xapian
We'll continue to compress long Message-IDs in URLs (which we know
about), but we will store entire Message-IDs in the Xapian database
to facilitate ease-of-lookups in external databases.
Eric Wong [Wed, 2 Sep 2015 02:37:23 +0000 (02:37 +0000)]
implement external Message-ID finder
Currently, this looks at other public-inbox configurations
served in the same process. In the future, it will generate
links to other Message-ID lookup endpoints.
Eric Wong [Wed, 2 Sep 2015 02:37:22 +0000 (02:37 +0000)]
view: avoid links to unknown compressed Message-IDs
Compressed Message-IDs are irreversible and may not be used
at other sites. So avoid compressing Message-IDs we do not
know about so users have a chance of finding the message in
other archives by doing a Message-ID lookup.
Eric Wong [Wed, 2 Sep 2015 02:37:17 +0000 (02:37 +0000)]
view: close possible race condition in thread view
It's possible that the Xapian index and git HEAD can be out-of-sync
and a message which existed when we did the search is no longer
accessible by the time we get to rendering it.
Eric Wong [Tue, 1 Sep 2015 08:55:28 +0000 (08:55 +0000)]
view: more robust link generation
We must avoid double-escaping in cases where we have URLs anchored
by "<>" in the plain-text as is common (and AFAIK recommended)
convention. So we must use a two step linkification process
to prevent double-escaping.
Eric Wong [Tue, 1 Sep 2015 08:55:18 +0000 (08:55 +0000)]
search: reduce redundant doc data
Redundant document data increases our database size, pull the
smsg->mid off the unique term, the smsg->ts off the value, and
only generate the formatted display date off smsg->ts.
Eric Wong [Sun, 30 Aug 2015 10:12:54 +0000 (10:12 +0000)]
www: avoid BEGIN block for config loading
It fails the syntax check if a user does not have
~/.public-inbox/config setup. Anyways we can safely
use ||= on a global since we do not support threads.
Eric Wong [Sun, 30 Aug 2015 01:51:22 +0000 (01:51 +0000)]
view: remove "threadlink" from thread view
We're already inside the thread, and our thread summary inside
/m/$MESSAGE_ID/ is already sufficient got navigate back to the
/t/$MESSAGE_ID/ page. So I think it's sufficient to keep the
/t/$MESSAGE_ID/ page lighter with fewer links and avoid
introducing strange terminology.
In contrast, "permalink" is relatively well-known and
not an alien term to readers:
Eric Wong [Sun, 30 Aug 2015 01:26:46 +0000 (01:26 +0000)]
mid2path: clean MID of angle brackets '<>'
We screwed up and needed to fix URL generation with '<>'
in them. Regardless, users may attempt to copy and paste
URLs with '<>' in them, do not punish them for that.
Eric Wong [Sun, 30 Aug 2015 01:04:31 +0000 (01:04 +0000)]
public-inbox-index: resolve git directory if run inside one
I often forget to pass the correct path to a git directory
or run from inside one. Fortunately git is script-friendly
and allows easily resolving the correct GIT_DIR path.
Eric Wong [Sun, 30 Aug 2015 00:38:05 +0000 (00:38 +0000)]
search: do not index references and inreplyto terms
We no longer need them, as we can rely on index-time thread
resolution and thread merging. This allows us to index less
data and hopefully increase efficiency.