Eric Wong [Fri, 23 Jun 2017 02:25:33 +0000 (02:25 +0000)]
reply: handle address obfuscation :<
We can show users a lightly-obfuscated Bourne shell command
for invoking "git send-email" for address obfuscation. However,
I'm not sure if the mailto: arg will work effectively since
URL encoding is probably too well-known to be effective.
Eric Wong [Fri, 23 Jun 2017 01:43:21 +0000 (01:43 +0000)]
searchidx: fallback to lookup on pre-set article numbers
Yet another hiccup from reusing pre-set article numbers on
various ruby-lang.org mailing lists. This was causing messages
to not appear to NNTP readers which use XOVER.
Eric Wong [Fri, 23 Jun 2017 00:47:23 +0000 (00:47 +0000)]
watchmaildir: deal with rejected (100) messages
The RubyLang filter is strict about what messages it rejects, so
the spam learning path will not auto-train or remove messages
missing X-Mail-Count headers.
Eric Wong [Wed, 21 Jun 2017 23:33:49 +0000 (23:33 +0000)]
add filter for RubyLang lists
Unfortunately, it appears we have to reject this and instead add
support filtering at View time(*), due to DKIM signatures in
messages from ruby-lang.org.
Eric Wong [Fri, 16 Jun 2017 02:03:32 +0000 (02:03 +0000)]
view: implement optional address obfuscation
This is lightly-tested and seems to work. I'm still
hesitant to support this, but the alternative of receiving death
threats for displaying unobfuscated addresses seems to
be not worth it.
Eric Wong [Wed, 14 Jun 2017 00:14:47 +0000 (00:14 +0000)]
searchidx: switch to accounting by message bytes
Xapian memory usage is tied to the size of the indexed
text, so take the raw message size into account when
deciding when to flush Xapian data.
More importantly, we now flush Xapian before we have it
buffer beyond our maximum; and we do it unconditionally
to prevent even high priority processes from OOM-ing.
Eric Wong [Fri, 12 May 2017 18:49:32 +0000 (18:49 +0000)]
filter/subjecttag: account for missing Subject: header
This is a high indicator of spam (but out-of-scope for this
particular module) but sometimes it is not, and people
legitimately forget to set a Subject: header at all.
Eric Wong [Tue, 23 May 2017 23:07:24 +0000 (23:07 +0000)]
searchview: retry queries if uri_unescape-able
It is possible to have double-escaped queries when copy and
pasting into browsers, so try to help users work around this
common error by automatically retrying after unescaping once.
Of course, we must inform the user when doing this results in
success, in case they really meant to search for a
double-escaped term which resulted in nothing.
Eric Wong [Sun, 7 May 2017 10:49:00 +0000 (10:49 +0000)]
searchidx: fix ghost root vivification
Due to the asynchronous nature of SMTP, it is possible for the
root message of a thread (with no References/In-Reply-To)
to arrive last in a series. We must preserve the thread_id
of the ghost message in this case, as we do when vivifiying
non-root ghosts.
Otherwise, this causes threads to be broken when the root
arrives last.
Eric Wong [Tue, 11 Apr 2017 23:39:54 +0000 (23:39 +0000)]
search: fix help message for searching within quotes
I'm not sure if people use either and it's not in mairix
(where we base our abbreviations off of). Lets go
with the shorter prefix since it's easier-to-type.
Eric Wong [Fri, 24 Mar 2017 01:41:11 +0000 (01:41 +0000)]
searchview: show full (&x=t) messages in ascending chronlogical order
When displaying search results with full messages, it makes
more sense to show them in ascending chronological order when
going by date. Reverse chronological order makes more sense
for search results which only show the subject.
This causes a mismatch between how our search indexer threads
and how our HTML view handles threading. In the future, View.pm
will use the smsg-parsed {references} field and avoid redoing
Email::MIME header parsing.
We will still need to figure out a way to deal with messages
with repeated Message-IDs, at some point, too.
Eric Wong [Mon, 6 Feb 2017 21:39:45 +0000 (21:39 +0000)]
search: schema version bump for empty References/In-Reply-To
We cannot distinguish between legitimate ghosts and mis-threaded
messages before commit 83425ef12e4b65cdcecd11ddcb38175d4a91d5a0
("searchidx: deal with empty In-Reply-To and References headers")
so we must rebuild the index in parallel to fix it.
Eric Wong [Mon, 6 Feb 2017 19:54:25 +0000 (19:54 +0000)]
searchidx: deal with empty In-Reply-To and References headers
In some messages, these headers exist, but have empty values.
Do not let empty values throw off our search indexer to tie
threads together, as it can make non-sensical threads grouped
to a Message-Id of "" (empty string).
See
<https://public-inbox.org/git/11340844841342-git-send-email-mailing-lists.git@rawuncut.elitemail.org/raw>
for an example of such a message.
Thanks-to: Johannes Schindelin <Johannes.Schindelin@gmx.de>
<https://public-inbox.org/git/alpine.DEB.2.20.1702041206130.3496@virtualbox/>
Eric Wong [Mon, 6 Feb 2017 02:38:37 +0000 (02:38 +0000)]
searchview: increase limit for displaying search results
We are in no danger of excessive buffering or OOM-ing,
the main page for every inbox already loads 200 results;
and thread page views even load 1000! Increase this to
200 for now.
Eric Wong [Mon, 6 Feb 2017 02:07:24 +0000 (02:07 +0000)]
searchview: clarify numeric summary at bottom
Xapian can only give estimated results when a result limit is
given to it, so make clear it is an estimate to avoid showing
non-sensical ranges when no results are returned.
Eric Wong [Thu, 26 Jan 2017 02:09:36 +0000 (02:09 +0000)]
add filter for Subject: tags
Some mailing lists add annoying tags into the Subject line which
discourages readers from doing proper mail organization on the
client side. They also waste precious screen space and
attention span.
Eric Wong [Thu, 19 Jan 2017 00:31:30 +0000 (00:31 +0000)]
learn: implement "rm" only functionality
Do not consider this interface stable, but I just needed a
way to remove mis-imported multipart messages so
public-inbox-watch could pick them up again from my Maildir.
Eric Wong [Wed, 18 Jan 2017 23:50:57 +0000 (23:50 +0000)]
mime: avoid SUPER usage in Email::MIME subclass
We must call Email::Simple methods directly in our monkey patch
for Email::MIME to call the intended method. Using SUPER in our
subclass would instead hit a different, unintended method in
Email::MIME.
Reported-by: Junio C Hamano <gitster@pobox.com>
<xmqq4m0wb43w.fsf@gitster.mtv.corp.google.com>
Eric Wong [Wed, 11 Jan 2017 10:13:00 +0000 (10:13 +0000)]
inbox: reinstate periodic cleanup of Xapian and SQLite objects
We may need to do this even more aggressively, since the
Xapian database does not always give the latest results.
This time, we'll do it without relying on weak references,
and instead check refcounts.
Eric Wong [Sat, 7 Jan 2017 02:10:23 +0000 (02:10 +0000)]
inbox: properly register cleanup timer for git processes
We still need to cleanup git processes occasionally, since
"git cat-file --batch" does not release old packs (and
git processes are fairly expensive).
For SQLite and Xapian file handles, they should be capable
of managing themselves without too much trouble, so lets
try keeping them for the lifetime of a process.
Eric Wong [Sat, 7 Jan 2017 01:44:50 +0000 (01:44 +0000)]
remove incorrect comment about strftime + locales
We only need strftime to be locale-independent when generating
dates for email and HTTP headers. Purely numeric dates can
use strftime for ease-of-readability.
Avoiding weaken here is no more dangerous than the existing
circular refs (e.g. psgix.io) we create and manage throughout
the lifetime of the connection. So, trust ourselves to maintain
the data structure properly and avoid triggering extra memory
usage.
While we're at it, avoid having anonymous subroutines capture
more variables than necessary to simplify reference auditing.
Eric Wong [Wed, 4 Jan 2017 11:20:50 +0000 (11:20 +0000)]
httpd/async: remove weaken usage
We do not need to use weaken() here, so avoid it to simplify our
interactions with Perl; as weaken requires additional storage
and (it seems) time complexity.
Eric Wong [Mon, 26 Dec 2016 03:05:15 +0000 (03:05 +0000)]
evcleanup: ensure deferred close from timers are handled ASAP
Danga::Socket defers close() syscalls until the end of the event
loop to avoid FD recycling. Unfortunately, this is dependent on
IO events firing and waking the process up from
poll/kevent/epoll_wait.
Without any I/O activity, a socket could remain in the
@Danga::Socket::ToClose array indefinitely. Thus, we will
trigger a fake IO event after running all timers to trigger
the deferred close in Danga::Socket::PostEventLoop.
Eric Wong [Sat, 24 Dec 2016 11:52:42 +0000 (11:52 +0000)]
view: stop chomping off whitespace at ends of messages
This allows a 3-4% speedup in $MESSAGE_ID/T/ page generation
speed for a 368+ message thread. It also more faithfully
preserves the message as intended; even if the it makes the
sender look like a space-wasting slob :P
Eric Wong [Tue, 20 Dec 2016 03:03:57 +0000 (03:03 +0000)]
searchmsg: remove ensure_metadata
Instead, only preload the ->mid field for threading,
as we only need ->thread and ->path once in Search->get_thread
(but we will need the ->mid field repeatedly).
This more than doubles View->load_results performance on
according to thread-all on an inbox with over 300K messages.
Eric Wong [Wed, 14 Dec 2016 20:58:00 +0000 (20:58 +0000)]
wwwtext: remove outdated comment
I originally envisioned wwwtext being more flexible and able to
serve arbitrary blobs; but at this point I consider it redundant
and public-inbox is not wiki software.
Eric Wong [Mon, 12 Dec 2016 12:14:02 +0000 (12:14 +0000)]
daemon: set $now time for NNTP shutdown
commit 6e238ee3396719e578d6a90e177a71ce9f8c1ca0
("nntp: respect 3 minute idle time for shutdown")
was incomplete, and needed this change to Daemon
to be effective.
In the future, there will be more common code between
NNTP.pm and HTTP.pm
Eric Wong [Sat, 10 Dec 2016 23:35:43 +0000 (23:35 +0000)]
search: retry document loading from Xapian
In addition to needing to retry enquire queries, we also need
to protect document loading from the Xapian DB and retry on
modification, as it seems to throw the same errors.
Checking the $@ ref for Search::Xapian::DatabaseModifiedError
is actually in the test suite for both the XS and SWIG Xapian
bindings, so we should be good as far as forward/backwards
compatibility.
Eric Wong [Sat, 10 Dec 2016 01:09:51 +0000 (01:09 +0000)]
search: always sort thread results in ascending time order
This makes life easier for the threading algorithm, as we can
use the implied ordering of timestamps to avoid temporary ghosts
and resulting container vivication.
This would've also allowed us to hide the bug (in most cases)
fixed by the patch titled "thread: last Reference always wins",
in case that needs to be reverted due to infinite looping.