Eric Wong [Thu, 22 Jun 2017 21:04:53 +0000 (21:04 +0000)]
filter/rubylang: reuse altid entry from inbox object
This allows users to DRY up their config a bit and avoid
specifying altid twice when reusing the NNTP-centric msgmap
for [ruby-*:\d+] serial numbers.
My current work-in-progress ~/.public-inbox/config entry
for the ruby-core list is:
------8<-------
[publicinbox "ruby-core"]
address = ruby-core@ruby-lang.org
url = //public-inbox.org/ruby-core
mainrepo = /path/to/ruby-core.git
newsgroup = inbox.comp.lang.ruby.core
watchheader = List-Id:<ruby-core.ruby-lang.org>
altid = serial:ruby-core:file=msgmap.sqlite3
watch = maildir:/path/to/Maildir/.INBOX.ruby
filter = PublicInbox::Filter::RubyLang
Eric Wong [Thu, 22 Jun 2017 19:51:23 +0000 (19:51 +0000)]
msgmap: mid_insert ignores duplicates instead of die-ing
This will allow smoother imports as occasional Message-ID
duplicates happen and the best we can do is ignore the
second one.
Eric Wong [Wed, 21 Jun 2017 23:33:49 +0000 (23:33 +0000)]
add filter for RubyLang lists
Unfortunately, it appears we have to reject this and instead add
support filtering at View time(*), due to DKIM signatures in
messages from ruby-lang.org.
(*) which may not be worth it
Eric Wong [Tue, 20 Jun 2017 22:06:54 +0000 (22:06 +0000)]
import: fix encoding issues from weird "raw" emails
This seems to allow weirdly-encoded "raw" emails in
blade.nagaokaut.ac.jp/ruby/ruby-core/*
to be handled without difficulties.
Eric Wong [Fri, 16 Jun 2017 02:03:32 +0000 (02:03 +0000)]
view: implement optional address obfuscation
This is lightly-tested and seems to work. I'm still
hesitant to support this, but the alternative of receiving death
threats for displaying unobfuscated addresses seems to
be not worth it.
Eric Wong [Wed, 14 Jun 2017 00:10:53 +0000 (00:10 +0000)]
reply: support Reply-To
Reply-To is common and probably should've been supported,
since day one, but we won't omit other addresses, either.
Eric Wong [Wed, 14 Jun 2017 00:10:52 +0000 (00:10 +0000)]
replyto parameter support
This allows us to support centralized mailing lists (which suck,
but better than no mailing list at all).
Eric Wong [Wed, 14 Jun 2017 00:10:51 +0000 (00:10 +0000)]
view: split out reply logic into its own module
We'll be adding more reply options for centralized mailing
lists. So split out the logic so it's easy-to-find.
Organizing code is hard :<
Eric Wong [Thu, 15 Jun 2017 23:07:58 +0000 (23:07 +0000)]
searchidx: remove messages correctly from Xapian index
This fixes a bug introduced in
commit
7eeadcb62729b0efbcb53cd9b7b181897c92cf9a
("search: remove unnecessary abstractions and functionality")
Eric Wong [Wed, 14 Jun 2017 00:14:48 +0000 (00:14 +0000)]
search: allow searching within mail diffs
This can be tied into a repository browser to browse
in-flight topics on a mailing list.
Eric Wong [Wed, 14 Jun 2017 00:14:47 +0000 (00:14 +0000)]
searchidx: switch to accounting by message bytes
Xapian memory usage is tied to the size of the indexed
text, so take the raw message size into account when
deciding when to flush Xapian data.
More importantly, we now flush Xapian before we have it
buffer beyond our maximum; and we do it unconditionally
to prevent even high priority processes from OOM-ing.
Eric Wong [Wed, 14 Jun 2017 00:14:46 +0000 (00:14 +0000)]
search: remove unnecessary abstractions and functionality
This simplifies the code a bit and reduces the translation
overhead for looking directly at data from tools shipped
with Xapian.
While we're at it, fix thread-all.t :)
Eric Wong [Fri, 12 May 2017 18:49:32 +0000 (18:49 +0000)]
filter/subjecttag: account for missing Subject: header
This is a high indicator of spam (but out-of-scope for this
particular module) but sometimes it is not, and people
legitimately forget to set a Subject: header at all.
Eric Wong [Thu, 25 May 2017 02:24:16 +0000 (02:24 +0000)]
import: reset :raw mode for commit title (subject)
This was necessary for the presence of the 0xa0 byte(*)
in the Subject: of the message at:
http://blade.nagaokaut.ac.jp/ruby/ruby-core/3220
(*) That is 0xa0, not 0x0a ("\n"), so I wonder if the
nibbles got swapped somehow.
Eric Wong [Tue, 23 May 2017 23:07:24 +0000 (23:07 +0000)]
searchview: retry queries if uri_unescape-able
It is possible to have double-escaped queries when copy and
pasting into browsers, so try to help users work around this
common error by automatically retrying after unescaping once.
Of course, we must inform the user when doing this results in
success, in case they really meant to search for a
double-escaped term which resulted in nothing.
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
https://public-inbox.org/meta/CACBZZX5Gnow08r=0A1J_kt3a=zpGyMfvsqu8nAN7kacNnDm+dg@mail.gmail.com/
Eric Wong [Tue, 23 May 2017 21:53:57 +0000 (21:53 +0000)]
www: do not mangle characters from search queries
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
https://public-inbox.org/meta/CACBZZX5Gnow08r=0A1J_kt3a=zpGyMfvsqu8nAN7kacNnDm+dg@mail.gmail.com/
Eric Wong [Tue, 9 May 2017 20:43:33 +0000 (20:43 +0000)]
www: avoid undefined warnings for query string parsing
Sometimes bots generate malformed queries with sequential
"&" and ";" characters.
Eric Wong [Tue, 9 May 2017 06:30:42 +0000 (06:30 +0000)]
watchmaildir: show $@ in warning message
It should be helpful to know what error happened.
Eric Wong [Tue, 9 May 2017 06:30:41 +0000 (06:30 +0000)]
searchidx: use cached local $@ copy
umask should never fail and set $@, but use the cached local
to be more explicit just in case.
Eric Wong [Sun, 7 May 2017 00:46:46 +0000 (00:46 +0000)]
spamassassin: update example ~/.spamassassin/user_prefs file
This is closer to what I run on the public-inbox.org servers.
Eric Wong [Sun, 7 May 2017 10:49:00 +0000 (10:49 +0000)]
searchidx: fix ghost root vivification
Due to the asynchronous nature of SMTP, it is possible for the
root message of a thread (with no References/In-Reply-To)
to arrive last in a series. We must preserve the thread_id
of the ghost message in this case, as we do when vivifiying
non-root ghosts.
Otherwise, this causes threads to be broken when the root
arrives last.
Eric Wong [Tue, 11 Apr 2017 23:39:54 +0000 (23:39 +0000)]
search: fix help message for searching within quotes
I'm not sure if people use either and it's not in mairix
(where we base our abbreviations off of). Lets go
with the shorter prefix since it's easier-to-type.
Eric Wong [Wed, 5 Apr 2017 01:41:28 +0000 (01:41 +0000)]
learn: scan all inboxes when learning spam
This matches the behavior of the -watch daemon since
6d534038285ddd760709ba76ea007f9108200097
("watch: watchspam affects all configured inboxes")
Eric Wong [Tue, 4 Apr 2017 18:25:47 +0000 (18:25 +0000)]
watchmaildir: do not reject lowercase flags on Maildir files
Dovecot uses 'a'..'z' (lowercase) to designate keywords
in Maildir flags. This was preventing certain messages
from being marked as spam.
https://wiki2.dovecot.org/MailboxFormat/Maildir
Eric Wong [Fri, 24 Mar 2017 01:41:11 +0000 (01:41 +0000)]
searchview: show full (&x=t) messages in ascending chronlogical order
When displaying search results with full messages, it makes
more sense to show them in ascending chronological order when
going by date. Reverse chronological order makes more sense
for search results which only show the subject.
Eric Wong [Fri, 24 Mar 2017 00:15:08 +0000 (00:15 +0000)]
searchview: add "t" id to link to thread overview
At least for the thread view (&x=t); this will make it
easy to link to the overview.
Eric Wong [Wed, 22 Mar 2017 02:14:19 +0000 (02:14 +0000)]
extmsg: use updated mail-archive.com URL
Apparently mid.mail-archive.com does not support HTTPS,
and the HTTP version redirects to the search query, anyways.
Eric Wong [Tue, 14 Mar 2017 21:23:39 +0000 (21:23 +0000)]
view: escape HTML description name
Otherwise funky filenames can cause HTML injection
vulnerabilities (hope you have JavaScript disabled!)
Eric Wong [Tue, 14 Feb 2017 22:45:15 +0000 (22:45 +0000)]
www: do not unescape PATH_INFO twice
PSGI specs already require PATH_INFO to be unescaped;
so our tests were wrong, too.
Eric Wong [Sun, 12 Feb 2017 02:41:22 +0000 (02:41 +0000)]
t/mime: quiet warnings for old versions of Email::Simple
This is fixed in the newest versions of Email::Simple,
but not the version in Debian jessie (2.203)
Eric Wong [Sat, 11 Feb 2017 23:54:48 +0000 (23:54 +0000)]
handle repeated References and In-Reply-To headers
It seems possible for git-send-email(1) to generate repeated
repeated instances of References and In-Reply-To headers,
as evidenced in:
https://public-inbox.org/git/
20161111124541.8216-17-vascomalmeida@sapo.pt/raw
This causes a mismatch between how our search indexer threads
and how our HTML view handles threading. In the future, View.pm
will use the smsg-parsed {references} field and avoid redoing
Email::MIME header parsing.
We will still need to figure out a way to deal with messages
with repeated Message-IDs, at some point, too.
Eric Wong [Wed, 8 Feb 2017 21:41:38 +0000 (21:41 +0000)]
config: do not slurp lines into memory
There's no need to hold everything in memory, here,
since apparently "foreach" will read everything at
once in array context
(for some reason, I thought Perl5 was smart enough
to avoid creating a temporary array, here...)
Eric Wong [Tue, 7 Feb 2017 22:27:52 +0000 (22:27 +0000)]
TODO: several updates
Always plenty to do while working on this...
Eric Wong [Mon, 6 Feb 2017 21:39:45 +0000 (21:39 +0000)]
search: schema version bump for empty References/In-Reply-To
We cannot distinguish between legitimate ghosts and mis-threaded
messages before commit
83425ef12e4b65cdcecd11ddcb38175d4a91d5a0
("searchidx: deal with empty In-Reply-To and References headers")
so we must rebuild the index in parallel to fix it.
Eric Wong [Mon, 6 Feb 2017 21:37:26 +0000 (21:37 +0000)]
Revert "searchidx: reindex clobbers old thread IDs"
Oops, that's broken, too. I guess the only way to reindex
after fixing the thread detection is to start from scratch.
This reverts commit
5d91adedf5f33ef1cb87df2a86306ddf370b4f8d.
Eric Wong [Mon, 6 Feb 2017 21:08:13 +0000 (21:08 +0000)]
searchidx: reindex clobbers old thread IDs
We cannot always reuse thread IDs since our threading
logic may change as bugs are fixed.
Eric Wong [Mon, 6 Feb 2017 19:54:25 +0000 (19:54 +0000)]
searchidx: deal with empty In-Reply-To and References headers
In some messages, these headers exist, but have empty values.
Do not let empty values throw off our search indexer to tie
threads together, as it can make non-sensical threads grouped
to a Message-Id of "" (empty string).
See
<https://public-inbox.org/git/
11340844841342-git-send-email-mailing-lists.git@rawuncut.elitemail.org/raw>
for an example of such a message.
Thanks-to: Johannes Schindelin <Johannes.Schindelin@gmx.de>
<https://public-inbox.org/git/alpine.DEB.2.20.
1702041206130.3496@virtualbox/>
Eric Wong [Mon, 6 Feb 2017 02:38:37 +0000 (02:38 +0000)]
searchview: increase limit for displaying search results
We are in no danger of excessive buffering or OOM-ing,
the main page for every inbox already loads 200 results;
and thread page views even load 1000! Increase this to
200 for now.
Eric Wong [Mon, 6 Feb 2017 02:07:24 +0000 (02:07 +0000)]
searchview: clarify numeric summary at bottom
Xapian can only give estimated results when a result limit is
given to it, so make clear it is an estimate to avoid showing
non-sensical ranges when no results are returned.
Eric Wong [Thu, 26 Jan 2017 02:09:36 +0000 (02:09 +0000)]
add filter for Subject: tags
Some mailing lists add annoying tags into the Subject line which
discourages readers from doing proper mail organization on the
client side. They also waste precious screen space and
attention span.
Remove them from our archives to reduce clutter.
Eric Wong [Wed, 25 Jan 2017 21:39:06 +0000 (21:39 +0000)]
watchmaildir: allow arguments for filters
We'll want to allow some degree of configuration for
various mailing lists.
Eric Wong [Wed, 18 Jan 2017 19:13:09 +0000 (19:13 +0000)]
watchmaildir: limit live importer processes
We don't want to be triggering OOM or swapping on weaker
systems when we have dozens of inboxes as potential targets.
Eric Wong [Thu, 19 Jan 2017 00:31:30 +0000 (00:31 +0000)]
learn: implement "rm" only functionality
Do not consider this interface stable, but I just needed a
way to remove mis-imported multipart messages so
public-inbox-watch could pick them up again from my Maildir.
Eric Wong [Wed, 18 Jan 2017 23:50:57 +0000 (23:50 +0000)]
mime: avoid SUPER usage in Email::MIME subclass
We must call Email::Simple methods directly in our monkey patch
for Email::MIME to call the intended method. Using SUPER in our
subclass would instead hit a different, unintended method in
Email::MIME.
Reported-by: Junio C Hamano <gitster@pobox.com>
<xmqq4m0wb43w.fsf@gitster.mtv.corp.google.com>
Eric Wong [Wed, 11 Jan 2017 10:13:00 +0000 (10:13 +0000)]
inbox: reinstate periodic cleanup of Xapian and SQLite objects
We may need to do this even more aggressively, since the
Xapian database does not always give the latest results.
This time, we'll do it without relying on weak references,
and instead check refcounts.
Eric Wong [Tue, 10 Jan 2017 21:40:37 +0000 (21:40 +0000)]
introduce PublicInbox::MIME wrapper class
This should fix problems with multipart messages where
text/plain parts lack a header.
cf. git clone --mirror https://github.com/rjbs/Email-MIME.git
refs/pull/28/head
In the future, we may still introduce as streaming
interface to reduce memory usage on large emails.
Eric Wong [Sat, 7 Jan 2017 02:10:23 +0000 (02:10 +0000)]
inbox: properly register cleanup timer for git processes
We still need to cleanup git processes occasionally, since
"git cat-file --batch" does not release old packs (and
git processes are fairly expensive).
For SQLite and Xapian file handles, they should be capable
of managing themselves without too much trouble, so lets
try keeping them for the lifetime of a process.
Eric Wong [Sat, 7 Jan 2017 01:44:52 +0000 (01:44 +0000)]
search: remove subject_summary
Apparently it never actually got used, and the world seems
fine without it, so we can drop it.
While we're at it, consider removing our subject_path
usage from existence, too. We are not using fancy subject-line
based URLs, here.
Eric Wong [Sat, 7 Jan 2017 01:44:51 +0000 (01:44 +0000)]
searchmsg: favor direct hash access over accessor methods
This is faster, smaller, and more straighforward to me with
fewer layers of indirection.
Eric Wong [Sat, 7 Jan 2017 01:44:50 +0000 (01:44 +0000)]
remove incorrect comment about strftime + locales
We only need strftime to be locale-independent when generating
dates for email and HTTP headers. Purely numeric dates can
use strftime for ease-of-readability.
Eric Wong [Sat, 7 Jan 2017 01:44:49 +0000 (01:44 +0000)]
config: allow per-inbox nntpserver
This allows certain inboxes to override the global nntpserver
(perhaps under a different domain).
Eric Wong [Sat, 7 Jan 2017 01:44:48 +0000 (01:44 +0000)]
inbox: eliminate weaken usage entirely
We can do a better job initializing the data structure
so we no longer need to rely on weak references to cleanup
when we ditch the config on reload.
Eric Wong [Sat, 7 Jan 2017 01:44:47 +0000 (01:44 +0000)]
inbox: describe the full key name
Hopefully make this easier for future generations to understand.
Eric Wong [Sat, 7 Jan 2017 01:44:46 +0000 (01:44 +0000)]
config: remove unused get() method
This seems like an unnecessary abstraction, or an abstraction
on the wrong level.
Eric Wong [Sat, 7 Jan 2017 01:44:45 +0000 (01:44 +0000)]
config: always use namespaced "publicinboxlimiter"
I'm not sure if we'll ever support sharing a config file
with other tools, but maybe we will, and "limiter" is
too generic.
Eric Wong [Sat, 7 Jan 2017 01:44:44 +0000 (01:44 +0000)]
qspawn: prepare to support runtime reloading of Limiter
We may allow the {max} value of a limiter to be changed
in the future, so lets start accounting for it before we
spawn followup processes.
Eric Wong [Wed, 4 Jan 2017 11:20:51 +0000 (11:20 +0000)]
http: remove weaken usage, reduce anonsub capture scope
Avoiding weaken here is no more dangerous than the existing
circular refs (e.g. psgix.io) we create and manage throughout
the lifetime of the connection. So, trust ourselves to maintain
the data structure properly and avoid triggering extra memory
usage.
While we're at it, avoid having anonymous subroutines capture
more variables than necessary to simplify reference auditing.
Eric Wong [Wed, 4 Jan 2017 11:20:50 +0000 (11:20 +0000)]
httpd/async: remove weaken usage
We do not need to use weaken() here, so avoid it to simplify our
interactions with Perl; as weaken requires additional storage
and (it seems) time complexity.
Eric Wong [Wed, 4 Jan 2017 11:20:49 +0000 (11:20 +0000)]
http: fix spelling error
Oops. And we'll be fixing circular references from now...
Eric Wong [Mon, 2 Jan 2017 13:16:15 +0000 (13:16 +0000)]
watch: watchspam affects all configured inboxes
If a message is spam in one mailbox, it is spam in all others a
particular user/group will care about.
Eric Wong [Mon, 26 Dec 2016 21:41:15 +0000 (21:41 +0000)]
doc: minor updates to design notes
ssoma is not worth marketing, but perhaps our mirror of
the git mailing list archives is...
Eric Wong [Mon, 26 Dec 2016 03:05:15 +0000 (03:05 +0000)]
evcleanup: ensure deferred close from timers are handled ASAP
Danga::Socket defers close() syscalls until the end of the event
loop to avoid FD recycling. Unfortunately, this is dependent on
IO events firing and waking the process up from
poll/kevent/epoll_wait.
Without any I/O activity, a socket could remain in the
@Danga::Socket::ToClose array indefinitely. Thus, we will
trigger a fake IO event after running all timers to trigger
the deferred close in Danga::Socket::PostEventLoop.
Eric Wong [Sun, 25 Dec 2016 08:09:48 +0000 (08:09 +0000)]
httpd/async: improve variable naming
We only refer to PublicInbox::HTTP objects here, so '$io'
was a bad name.
Eric Wong [Sun, 25 Dec 2016 07:33:02 +0000 (07:33 +0000)]
githttpbackend: minor cleanups to improve readability
Fewer returns improves readability and the diffstat agrees.
Eric Wong [Sun, 25 Dec 2016 06:52:03 +0000 (06:52 +0000)]
githttpbackend: simplify compatibility code
Fewer conditionals means theres fewer code paths to test
and makes things easier-to-read.
Eric Wong [Sun, 25 Dec 2016 06:39:13 +0000 (06:39 +0000)]
githttpbackend: minor readability improvement
Use a more meaningful variable name for the Qspawn
object, since this module is the reference for its
use.
Eric Wong [Sun, 25 Dec 2016 09:40:25 +0000 (09:40 +0000)]
http: fix clobbering of $null_io
Oops, this would be disatrous if we started handling
bigger request bodies or slow clients.
Fixes: c008654229a9 ("avoid IO::File for anonymous temporary files")
Eric Wong [Sat, 24 Dec 2016 11:52:44 +0000 (11:52 +0000)]
linkify: modify argument in place
This results in over 1% speedup doing $MESSAGE_ID/T/ HTML
generation for a 368-message thread.
Eric Wong [Sat, 24 Dec 2016 11:52:43 +0000 (11:52 +0000)]
view: do not modify array during iteration
This results in a half percent speedup or so doing
$MESSAGE_ID/T/ HTML generation for a 368 message thread.
Eric Wong [Sat, 24 Dec 2016 11:52:42 +0000 (11:52 +0000)]
view: stop chomping off whitespace at ends of messages
This allows a 3-4% speedup in $MESSAGE_ID/T/ page generation
speed for a 368+ message thread. It also more faithfully
preserves the message as intended; even if the it makes the
sender look like a space-wasting slob :P
Eric Wong [Sat, 24 Dec 2016 11:52:41 +0000 (11:52 +0000)]
view: remove unused parameter
And add a comment about it to remind our future selves.
Eric Wong [Thu, 22 Dec 2016 08:00:26 +0000 (08:00 +0000)]
search: lookup_mail handles modified DBs
We call lookup_mail all over the place, be sure we can handle
database modifications in those cases.
Eric Wong [Thu, 22 Dec 2016 07:29:17 +0000 (07:29 +0000)]
doc: various comments on async handling
Notes for future developers (myself included) since we
can't assume people can read my mind.
Eric Wong [Tue, 20 Dec 2016 23:42:36 +0000 (23:42 +0000)]
searchthread: simplify API and remove needless OO
This simplifies callers to prevent errors and avoids
needless object-orientation in favor of a single procedure
call to handle threading and ordering.
Eric Wong [Tue, 20 Dec 2016 23:42:35 +0000 (23:42 +0000)]
searchthread: update comment about loop prevention
It definitely is necessary to prevent looping with the
%seen hash.
Eric Wong [Tue, 20 Dec 2016 03:03:57 +0000 (03:03 +0000)]
searchmsg: remove ensure_metadata
Instead, only preload the ->mid field for threading,
as we only need ->thread and ->path once in Search->get_thread
(but we will need the ->mid field repeatedly).
This more than doubles View->load_results performance on
according to thread-all on an inbox with over 300K messages.
Eric Wong [Tue, 20 Dec 2016 03:03:56 +0000 (03:03 +0000)]
tests: add thread-all testing for benchmarking
I'll be using this to improve message threading performance.
Eric Wong [Sat, 17 Dec 2016 12:04:11 +0000 (12:04 +0000)]
searchmsg: do not memoize {date} field
We only generate the ->date once in NNTP, so creating
the hash entry is a waste.
Eric Wong [Sat, 17 Dec 2016 12:04:10 +0000 (12:04 +0000)]
searchmsg: remove locale-dependency for ->date
strftime is locale-dependent, which can cause surprising
failures for some users.
Eric Wong [Sat, 17 Dec 2016 05:50:30 +0000 (05:50 +0000)]
t/config.t: fix feedmax default
Oops :x
Eric Wong [Wed, 14 Dec 2016 21:00:13 +0000 (21:00 +0000)]
wwwtext: link to RFC4685 (Atom Threading)
This should give this feature some more visibility.
Eric Wong [Tue, 13 Dec 2016 02:33:30 +0000 (02:33 +0000)]
atom: implement message threading per RFC 4685
This will allows certain feed readers to render a message thread
as described in <https://www.jwz.org/doc/threading.html>.
Feed readers with knowledge of of RFC 4685 are unknown to us at
this time, but perhaps this will encourage future implementations.
Existing feed readers I've tested (newsbeuter, feed2imap) seem
to ignore these tags gracefully without degradation.
Eric Wong [Sat, 17 Dec 2016 04:27:52 +0000 (04:27 +0000)]
feed: support publicinbox.<name>.feedmax
This allows users to customize by using smaller or larger Atom
feeds than the default value of 25 entries.
Eric Wong [Wed, 14 Dec 2016 23:53:06 +0000 (23:53 +0000)]
TODO: note IO::KQueue for the ticket
Do not require users to have network access to know what
the link refers to.
Eric Wong [Wed, 14 Dec 2016 19:28:53 +0000 (19:28 +0000)]
t/thread-cycle: no need for Xapian to run this test
We don't actually use anything from SearchMsg,
just the class name.
Eric Wong [Wed, 14 Dec 2016 20:58:00 +0000 (20:58 +0000)]
wwwtext: remove outdated comment
I originally envisioned wwwtext being more flexible and able to
serve arbitrary blobs; but at this point I consider it redundant
and public-inbox is not wiki software.
Eric Wong [Tue, 13 Dec 2016 03:10:13 +0000 (03:10 +0000)]
searchmsg: remove unused EPOCH_822 constant
This hasn't been needed since our Email::Abstract removal
for message threading.
Eric Wong [Tue, 13 Dec 2016 03:10:12 +0000 (03:10 +0000)]
nntp: avoid useless use of strftime
There's no need to use strftime if we'll be converting the date
by hand, anyways.
Eric Wong [Tue, 13 Dec 2016 03:10:11 +0000 (03:10 +0000)]
nntp: add test case for the "DATE" command
We may not always use strftime and may implement caching.
But for now, just add a test.
Eric Wong [Mon, 12 Dec 2016 12:14:02 +0000 (12:14 +0000)]
daemon: set $now time for NNTP shutdown
commit
6e238ee3396719e578d6a90e177a71ce9f8c1ca0
("nntp: respect 3 minute idle time for shutdown")
was incomplete, and needed this change to Daemon
to be effective.
In the future, there will be more common code between
NNTP.pm and HTTP.pm
Eric Wong [Mon, 12 Dec 2016 12:07:21 +0000 (12:07 +0000)]
doc: simplify makefile snippet
We have these manpages, and will always have them, so stop
trying to pretend we're doing something about maintainability,
here.
Eric Wong [Mon, 12 Dec 2016 12:02:45 +0000 (12:02 +0000)]
init: preserve permissions of existing config file
This matches git-config(1) behavior, and implied user
intent when it comes to programatically editing files.
Eric Wong [Sat, 10 Dec 2016 23:35:43 +0000 (23:35 +0000)]
search: retry document loading from Xapian
In addition to needing to retry enquire queries, we also need
to protect document loading from the Xapian DB and retry on
modification, as it seems to throw the same errors.
Checking the $@ ref for Search::Xapian::DatabaseModifiedError
is actually in the test suite for both the XS and SWIG Xapian
bindings, so we should be good as far as forward/backwards
compatibility.
Eric Wong [Sat, 10 Dec 2016 01:09:51 +0000 (01:09 +0000)]
search: always sort thread results in ascending time order
This makes life easier for the threading algorithm, as we can
use the implied ordering of timestamps to avoid temporary ghosts
and resulting container vivication.
This would've also allowed us to hide the bug (in most cases)
fixed by the patch titled "thread: last Reference always wins",
in case that needs to be reverted due to infinite looping.
Eric Wong [Sat, 10 Dec 2016 01:09:50 +0000 (01:09 +0000)]
thread: last Reference always wins
Since we use SearchMsg from Xapian data, we can be
assured we do not get self-referential {references}
field.
However, we may need to be more careful when checking
has_descendent for loops, as blindly calling add_child
could open us up to that possibility...
Eric Wong [Sat, 10 Dec 2016 01:09:49 +0000 (01:09 +0000)]
view: skip ghosts with no direct children
Otherwise, a malicious or broken client could populate the
thread skeleton with invalid References. We only care about
ghosts which messages correctly refer to, not totally bogus ones
which may be the result of long line or token truncation +
wrapping in MUA headers.
Eric Wong [Sat, 10 Dec 2016 01:09:48 +0000 (01:09 +0000)]
view: reduce indentation for skeleton generation
This should reduce the number of subroutine calls needed
for the common case of real (non-ghost) messages as well
as shortening code.
Eric Wong [Sat, 10 Dec 2016 01:09:47 +0000 (01:09 +0000)]
thread: fix comment describing its existence
Mail::Thread is UNavailable on many distros, meaning ordinary
users will have to rely on CPAN, a Perl-specific packaging tool.
Eric Wong [Sat, 10 Dec 2016 03:21:29 +0000 (03:21 +0000)]
view: favor SearchMsg for In-Reply-To over Email::MIME
This should avoid warnings during thread skeleton generation if
ever the Xapian database disagrees with View.pm about which is
the proper direct parent of a message. We will treat the data
in Xapian as the truth (if Xapian is available).
Eric Wong [Sat, 10 Dec 2016 01:09:46 +0000 (01:09 +0000)]
search: favor In-Reply-To over last References iff IRT exists
Some email clients set the References headers backwards, so
trust the In-Reply-To header if (and only if) it exists and
is parseable as direct parent of the current message.
For affected repos, this will require reindexing (via
"public-inbox-index --reindex"), but there will be no
version bump for this bugfix.