Eric Wong [Sat, 7 Jan 2017 01:44:52 +0000 (01:44 +0000)]
search: remove subject_summary
Apparently it never actually got used, and the world seems
fine without it, so we can drop it.
While we're at it, consider removing our subject_path
usage from existence, too. We are not using fancy subject-line
based URLs, here.
Eric Wong [Sat, 7 Jan 2017 01:44:51 +0000 (01:44 +0000)]
searchmsg: favor direct hash access over accessor methods
This is faster, smaller, and more straighforward to me with
fewer layers of indirection.
Eric Wong [Sat, 7 Jan 2017 01:44:50 +0000 (01:44 +0000)]
remove incorrect comment about strftime + locales
We only need strftime to be locale-independent when generating
dates for email and HTTP headers. Purely numeric dates can
use strftime for ease-of-readability.
Eric Wong [Sat, 7 Jan 2017 01:44:49 +0000 (01:44 +0000)]
config: allow per-inbox nntpserver
This allows certain inboxes to override the global nntpserver
(perhaps under a different domain).
Eric Wong [Sat, 7 Jan 2017 01:44:48 +0000 (01:44 +0000)]
inbox: eliminate weaken usage entirely
We can do a better job initializing the data structure
so we no longer need to rely on weak references to cleanup
when we ditch the config on reload.
Eric Wong [Sat, 7 Jan 2017 01:44:47 +0000 (01:44 +0000)]
inbox: describe the full key name
Hopefully make this easier for future generations to understand.
Eric Wong [Sat, 7 Jan 2017 01:44:46 +0000 (01:44 +0000)]
config: remove unused get() method
This seems like an unnecessary abstraction, or an abstraction
on the wrong level.
Eric Wong [Sat, 7 Jan 2017 01:44:45 +0000 (01:44 +0000)]
config: always use namespaced "publicinboxlimiter"
I'm not sure if we'll ever support sharing a config file
with other tools, but maybe we will, and "limiter" is
too generic.
Eric Wong [Sat, 7 Jan 2017 01:44:44 +0000 (01:44 +0000)]
qspawn: prepare to support runtime reloading of Limiter
We may allow the {max} value of a limiter to be changed
in the future, so lets start accounting for it before we
spawn followup processes.
Eric Wong [Wed, 4 Jan 2017 11:20:51 +0000 (11:20 +0000)]
http: remove weaken usage, reduce anonsub capture scope
Avoiding weaken here is no more dangerous than the existing
circular refs (e.g. psgix.io) we create and manage throughout
the lifetime of the connection. So, trust ourselves to maintain
the data structure properly and avoid triggering extra memory
usage.
While we're at it, avoid having anonymous subroutines capture
more variables than necessary to simplify reference auditing.
Eric Wong [Wed, 4 Jan 2017 11:20:50 +0000 (11:20 +0000)]
httpd/async: remove weaken usage
We do not need to use weaken() here, so avoid it to simplify our
interactions with Perl; as weaken requires additional storage
and (it seems) time complexity.
Eric Wong [Wed, 4 Jan 2017 11:20:49 +0000 (11:20 +0000)]
http: fix spelling error
Oops. And we'll be fixing circular references from now...
Eric Wong [Mon, 2 Jan 2017 13:16:15 +0000 (13:16 +0000)]
watch: watchspam affects all configured inboxes
If a message is spam in one mailbox, it is spam in all others a
particular user/group will care about.
Eric Wong [Mon, 26 Dec 2016 21:41:15 +0000 (21:41 +0000)]
doc: minor updates to design notes
ssoma is not worth marketing, but perhaps our mirror of
the git mailing list archives is...
Eric Wong [Mon, 26 Dec 2016 03:05:15 +0000 (03:05 +0000)]
evcleanup: ensure deferred close from timers are handled ASAP
Danga::Socket defers close() syscalls until the end of the event
loop to avoid FD recycling. Unfortunately, this is dependent on
IO events firing and waking the process up from
poll/kevent/epoll_wait.
Without any I/O activity, a socket could remain in the
@Danga::Socket::ToClose array indefinitely. Thus, we will
trigger a fake IO event after running all timers to trigger
the deferred close in Danga::Socket::PostEventLoop.
Eric Wong [Sun, 25 Dec 2016 08:09:48 +0000 (08:09 +0000)]
httpd/async: improve variable naming
We only refer to PublicInbox::HTTP objects here, so '$io'
was a bad name.
Eric Wong [Sun, 25 Dec 2016 07:33:02 +0000 (07:33 +0000)]
githttpbackend: minor cleanups to improve readability
Fewer returns improves readability and the diffstat agrees.
Eric Wong [Sun, 25 Dec 2016 06:52:03 +0000 (06:52 +0000)]
githttpbackend: simplify compatibility code
Fewer conditionals means theres fewer code paths to test
and makes things easier-to-read.
Eric Wong [Sun, 25 Dec 2016 06:39:13 +0000 (06:39 +0000)]
githttpbackend: minor readability improvement
Use a more meaningful variable name for the Qspawn
object, since this module is the reference for its
use.
Eric Wong [Sun, 25 Dec 2016 09:40:25 +0000 (09:40 +0000)]
http: fix clobbering of $null_io
Oops, this would be disatrous if we started handling
bigger request bodies or slow clients.
Fixes: c008654229a9 ("avoid IO::File for anonymous temporary files")
Eric Wong [Sat, 24 Dec 2016 11:52:44 +0000 (11:52 +0000)]
linkify: modify argument in place
This results in over 1% speedup doing $MESSAGE_ID/T/ HTML
generation for a 368-message thread.
Eric Wong [Sat, 24 Dec 2016 11:52:43 +0000 (11:52 +0000)]
view: do not modify array during iteration
This results in a half percent speedup or so doing
$MESSAGE_ID/T/ HTML generation for a 368 message thread.
Eric Wong [Sat, 24 Dec 2016 11:52:42 +0000 (11:52 +0000)]
view: stop chomping off whitespace at ends of messages
This allows a 3-4% speedup in $MESSAGE_ID/T/ page generation
speed for a 368+ message thread. It also more faithfully
preserves the message as intended; even if the it makes the
sender look like a space-wasting slob :P
Eric Wong [Sat, 24 Dec 2016 11:52:41 +0000 (11:52 +0000)]
view: remove unused parameter
And add a comment about it to remind our future selves.
Eric Wong [Thu, 22 Dec 2016 08:00:26 +0000 (08:00 +0000)]
search: lookup_mail handles modified DBs
We call lookup_mail all over the place, be sure we can handle
database modifications in those cases.
Eric Wong [Thu, 22 Dec 2016 07:29:17 +0000 (07:29 +0000)]
doc: various comments on async handling
Notes for future developers (myself included) since we
can't assume people can read my mind.
Eric Wong [Tue, 20 Dec 2016 23:42:36 +0000 (23:42 +0000)]
searchthread: simplify API and remove needless OO
This simplifies callers to prevent errors and avoids
needless object-orientation in favor of a single procedure
call to handle threading and ordering.
Eric Wong [Tue, 20 Dec 2016 23:42:35 +0000 (23:42 +0000)]
searchthread: update comment about loop prevention
It definitely is necessary to prevent looping with the
%seen hash.
Eric Wong [Tue, 20 Dec 2016 03:03:57 +0000 (03:03 +0000)]
searchmsg: remove ensure_metadata
Instead, only preload the ->mid field for threading,
as we only need ->thread and ->path once in Search->get_thread
(but we will need the ->mid field repeatedly).
This more than doubles View->load_results performance on
according to thread-all on an inbox with over 300K messages.
Eric Wong [Tue, 20 Dec 2016 03:03:56 +0000 (03:03 +0000)]
tests: add thread-all testing for benchmarking
I'll be using this to improve message threading performance.
Eric Wong [Sat, 17 Dec 2016 12:04:11 +0000 (12:04 +0000)]
searchmsg: do not memoize {date} field
We only generate the ->date once in NNTP, so creating
the hash entry is a waste.
Eric Wong [Sat, 17 Dec 2016 12:04:10 +0000 (12:04 +0000)]
searchmsg: remove locale-dependency for ->date
strftime is locale-dependent, which can cause surprising
failures for some users.
Eric Wong [Sat, 17 Dec 2016 05:50:30 +0000 (05:50 +0000)]
t/config.t: fix feedmax default
Oops :x
Eric Wong [Wed, 14 Dec 2016 21:00:13 +0000 (21:00 +0000)]
wwwtext: link to RFC4685 (Atom Threading)
This should give this feature some more visibility.
Eric Wong [Tue, 13 Dec 2016 02:33:30 +0000 (02:33 +0000)]
atom: implement message threading per RFC 4685
This will allows certain feed readers to render a message thread
as described in <https://www.jwz.org/doc/threading.html>.
Feed readers with knowledge of of RFC 4685 are unknown to us at
this time, but perhaps this will encourage future implementations.
Existing feed readers I've tested (newsbeuter, feed2imap) seem
to ignore these tags gracefully without degradation.
Eric Wong [Sat, 17 Dec 2016 04:27:52 +0000 (04:27 +0000)]
feed: support publicinbox.<name>.feedmax
This allows users to customize by using smaller or larger Atom
feeds than the default value of 25 entries.
Eric Wong [Wed, 14 Dec 2016 23:53:06 +0000 (23:53 +0000)]
TODO: note IO::KQueue for the ticket
Do not require users to have network access to know what
the link refers to.
Eric Wong [Wed, 14 Dec 2016 19:28:53 +0000 (19:28 +0000)]
t/thread-cycle: no need for Xapian to run this test
We don't actually use anything from SearchMsg,
just the class name.
Eric Wong [Wed, 14 Dec 2016 20:58:00 +0000 (20:58 +0000)]
wwwtext: remove outdated comment
I originally envisioned wwwtext being more flexible and able to
serve arbitrary blobs; but at this point I consider it redundant
and public-inbox is not wiki software.
Eric Wong [Tue, 13 Dec 2016 03:10:13 +0000 (03:10 +0000)]
searchmsg: remove unused EPOCH_822 constant
This hasn't been needed since our Email::Abstract removal
for message threading.
Eric Wong [Tue, 13 Dec 2016 03:10:12 +0000 (03:10 +0000)]
nntp: avoid useless use of strftime
There's no need to use strftime if we'll be converting the date
by hand, anyways.
Eric Wong [Tue, 13 Dec 2016 03:10:11 +0000 (03:10 +0000)]
nntp: add test case for the "DATE" command
We may not always use strftime and may implement caching.
But for now, just add a test.
Eric Wong [Mon, 12 Dec 2016 12:14:02 +0000 (12:14 +0000)]
daemon: set $now time for NNTP shutdown
commit
6e238ee3396719e578d6a90e177a71ce9f8c1ca0
("nntp: respect 3 minute idle time for shutdown")
was incomplete, and needed this change to Daemon
to be effective.
In the future, there will be more common code between
NNTP.pm and HTTP.pm
Eric Wong [Mon, 12 Dec 2016 12:07:21 +0000 (12:07 +0000)]
doc: simplify makefile snippet
We have these manpages, and will always have them, so stop
trying to pretend we're doing something about maintainability,
here.
Eric Wong [Mon, 12 Dec 2016 12:02:45 +0000 (12:02 +0000)]
init: preserve permissions of existing config file
This matches git-config(1) behavior, and implied user
intent when it comes to programatically editing files.
Eric Wong [Sat, 10 Dec 2016 23:35:43 +0000 (23:35 +0000)]
search: retry document loading from Xapian
In addition to needing to retry enquire queries, we also need
to protect document loading from the Xapian DB and retry on
modification, as it seems to throw the same errors.
Checking the $@ ref for Search::Xapian::DatabaseModifiedError
is actually in the test suite for both the XS and SWIG Xapian
bindings, so we should be good as far as forward/backwards
compatibility.
Eric Wong [Sat, 10 Dec 2016 01:09:51 +0000 (01:09 +0000)]
search: always sort thread results in ascending time order
This makes life easier for the threading algorithm, as we can
use the implied ordering of timestamps to avoid temporary ghosts
and resulting container vivication.
This would've also allowed us to hide the bug (in most cases)
fixed by the patch titled "thread: last Reference always wins",
in case that needs to be reverted due to infinite looping.
Eric Wong [Sat, 10 Dec 2016 01:09:50 +0000 (01:09 +0000)]
thread: last Reference always wins
Since we use SearchMsg from Xapian data, we can be
assured we do not get self-referential {references}
field.
However, we may need to be more careful when checking
has_descendent for loops, as blindly calling add_child
could open us up to that possibility...
Eric Wong [Sat, 10 Dec 2016 01:09:49 +0000 (01:09 +0000)]
view: skip ghosts with no direct children
Otherwise, a malicious or broken client could populate the
thread skeleton with invalid References. We only care about
ghosts which messages correctly refer to, not totally bogus ones
which may be the result of long line or token truncation +
wrapping in MUA headers.
Eric Wong [Sat, 10 Dec 2016 01:09:48 +0000 (01:09 +0000)]
view: reduce indentation for skeleton generation
This should reduce the number of subroutine calls needed
for the common case of real (non-ghost) messages as well
as shortening code.
Eric Wong [Sat, 10 Dec 2016 01:09:47 +0000 (01:09 +0000)]
thread: fix comment describing its existence
Mail::Thread is UNavailable on many distros, meaning ordinary
users will have to rely on CPAN, a Perl-specific packaging tool.
Eric Wong [Sat, 10 Dec 2016 03:21:29 +0000 (03:21 +0000)]
view: favor SearchMsg for In-Reply-To over Email::MIME
This should avoid warnings during thread skeleton generation if
ever the Xapian database disagrees with View.pm about which is
the proper direct parent of a message. We will treat the data
in Xapian as the truth (if Xapian is available).
Eric Wong [Sat, 10 Dec 2016 01:09:46 +0000 (01:09 +0000)]
search: favor In-Reply-To over last References iff IRT exists
Some email clients set the References headers backwards, so
trust the In-Reply-To header if (and only if) it exists and
is parseable as direct parent of the current message.
For affected repos, this will require reindexing (via
"public-inbox-index --reindex"), but there will be no
version bump for this bugfix.
Eric Wong [Tue, 6 Dec 2016 23:40:33 +0000 (23:40 +0000)]
linkify: implement Markdown link compatibility (again)
Although unescaped parentheses in URLs are technically allowed,
they are uncommon. However, Markdown-like syntaxes are
unfortunately common for URLs, so we might as well support them.
This fixes parentheses detection at sentence endings, as seen
in practice on emails.
Eric Wong [Tue, 6 Dec 2016 23:15:02 +0000 (23:15 +0000)]
Revert "linkify: implement Markdown link compatibility"
This reverts commit
130d0c4e33c5c73dc69e270fc698735d49e0f159.
Eric Wong [Tue, 6 Dec 2016 23:01:39 +0000 (23:01 +0000)]
linkify: implement Markdown link compatibility
Although unescaped parentheses in URLs are technically allowed,
they are uncommon. However, Markdown-like syntaxes are
unfortunately common for URLs, so we might as well support them.
Eric Wong [Sat, 3 Dec 2016 00:24:06 +0000 (00:24 +0000)]
atom: switch to getline/close for response bodies
This will let us stream larger Atom documents bodies without
wasting too much memory and reduce the amount of round-trip
requests needed to get necessary information.
Hopefully clients are using streaming (SAX) parsers, too.
This is the final transition in the core public-inbox
code to allow migrating to a "pull"-based body streaming
scheme which allows a HTTP server to respond appropriately
to backpressure from slow clients.
Eric Wong [Sat, 3 Dec 2016 00:24:05 +0000 (00:24 +0000)]
wwwstream: improve documentation and variable naming
Hopefully this makes the code more readable for newbies.
Eric Wong [Sat, 3 Dec 2016 00:24:51 +0000 (00:24 +0000)]
searchview: fix <title> tag in Atom feed
This only affects the Atom feed for search results.
"xmlstarlet val" failed to detect or warn about this,
and I only noticed this bug while working on another
patch.
Eric Wong [Tue, 29 Nov 2016 21:40:35 +0000 (21:40 +0000)]
note the source code is AGPL for cloning
This should be adequate warning for folks who may be
uncomfortable or uncertain about even possessing AGPL
source code due to employer agreements and such.
Disclaimer: I remain completely in favor of AGPL and strong
copyleft, and am more than willing to risk my own future on it.
However, I refuse to even nudge people into downloading AGPL
source code if it presents any legal risk to them.
Eric Wong [Sat, 26 Nov 2016 08:52:50 +0000 (08:52 +0000)]
avoid IO::File for anonymous temporary files
We do not need to import IO::File into the main programs
since Perl 5.8+ supports literal "undef" for generating
anonymous temporary file handles.
Eric Wong [Sat, 26 Nov 2016 08:34:34 +0000 (08:34 +0000)]
githttpbackend: error checking for input handling
This was sloppy code, all calls need to be checked
for failure.
Eric Wong [Tue, 22 Nov 2016 02:49:40 +0000 (02:49 +0000)]
view: fix spaces in mailto: link
Some mail clients do not seem to handle '+' as a space in query
parameters for the mail subject, use the more common '%20' for
compatibility.
Eric Wong [Fri, 4 Nov 2016 21:11:35 +0000 (21:11 +0000)]
index: allow indexing before configuration
One may build the initial index on a powerful host and transfer
it to a weaker one for incremental indexing. Thus there is
no requirement to have a configured public-inbox for building
the index unless a user needs altid support or some such.
Eric Wong [Sun, 16 Oct 2016 00:36:14 +0000 (00:36 +0000)]
import: failed GC runs are non-fatal
We should not completely kill a process if "git gc --auto"
errors out due to a warning or whatnot.
Eric Wong [Fri, 14 Oct 2016 09:00:01 +0000 (09:00 +0000)]
thread: reinstates stable ordering when ghosts are present
This reverts commit
3c9dd6619f825f0515e7e4afa1bd55c99c1a68d3
("thread: fix sorting without topmost")
and reinstates the "topmost" routine for sorting purposes.
Eric Wong [Thu, 13 Oct 2016 03:59:03 +0000 (03:59 +0000)]
thread: fix parent/child relationships
The ordering change in add_child is critical if $self == $parent
as the {children} hash was lost before this change.
has_descendent can be simplified by walking upwards from the child
instead of downwards from the parent.
This fixes threading regressions introduced in
commit
30100c46326e2eac275e0af13116636701d2537e
("thread: use hash + array instead of hand-rolled linked list")
Eric Wong [Thu, 13 Oct 2016 03:59:02 +0000 (03:59 +0000)]
thread: reduce indentation level
This should reduce differences from the original Mail::Thread
code and hopefully make things easier-to-follow.
Eric Wong [Wed, 5 Oct 2016 23:47:32 +0000 (23:47 +0000)]
thread: remove weaken dependency
We have to walk through all the messages after threading
anyways to build the rootset, so we can just delete all
the parent references at that point.
Eric Wong [Wed, 5 Oct 2016 23:47:31 +0000 (23:47 +0000)]
t/thread-cycle: test self-referential messages
Some broken (or malicious) mailers may include a generated
Message-ID in its References header, so be prepared for it.
Eric Wong [Wed, 5 Oct 2016 23:47:30 +0000 (23:47 +0000)]
view: remove redundant children array in thread views
Each node has an entire arrayref of its children nowadays, so
there's no need to waste time and memory creating another one.
Eric Wong [Wed, 5 Oct 2016 23:47:29 +0000 (23:47 +0000)]
thread: use hash + array instead of hand-rolled linked list
This starts to show noticeable performance improvements when
attempting to thread over 400 messages; but the improvement
may not be measurable with less.
However, the resulting code is much shorter and (IMHO)
much easier to understand.
Eric Wong [Wed, 5 Oct 2016 23:47:28 +0000 (23:47 +0000)]
thread: fix sorting without topmost
This bug was hidden, and we may not be able to efficiently
implement a topmost subroutine with the hash-based (vs
linked-list) based container for threading in the next
commit.
Eric Wong [Wed, 5 Oct 2016 23:47:27 +0000 (23:47 +0000)]
thread: inline and remove recurse_down logic
We no longer recurse, and it's too hard to come up with
a new name for a sub we will only use once.
Eric Wong [Wed, 5 Oct 2016 23:47:26 +0000 (23:47 +0000)]
thread: order_children no longer cares about depth
We never use the depth anywhere in this sub
Eric Wong [Wed, 5 Oct 2016 23:47:25 +0000 (23:47 +0000)]
thread: avoid incrementing undefined value
It is pointless to increment when setting a true value is
simpler as there is no need to read before writing.
Eric Wong [Wed, 5 Oct 2016 23:47:24 +0000 (23:47 +0000)]
thread: remove iterate_down
Unnecessary subs and complexity. This was hiding the fact
that $before is never used.
Eric Wong [Wed, 5 Oct 2016 23:47:23 +0000 (23:47 +0000)]
thread: simplify
Single use subroutines actually make the code more complex in
this case, and there's never a {seen} field in $self.
Eric Wong [Wed, 5 Oct 2016 23:47:22 +0000 (23:47 +0000)]
thread: remove rootset accessor method
It doesn't buy us much and copying to a new array is slower;
but probably not measurable in real-world use.
Eric Wong [Wed, 5 Oct 2016 23:47:21 +0000 (23:47 +0000)]
thread: remove Email::Abstract wrapping
This roughly doubles performance due to the reduction in
object creation and abstraction layers.
Eric Wong [Wed, 5 Oct 2016 23:47:20 +0000 (23:47 +0000)]
inbox: deal with ghost smsg
smsg will be undef for ghost messages in a subsequent commit
Eric Wong [Wed, 5 Oct 2016 23:47:19 +0000 (23:47 +0000)]
thread: remove accessor usage in internals
This improves top-level index generation performance by 3-4%.
Eric Wong [Wed, 5 Oct 2016 23:47:18 +0000 (23:47 +0000)]
thread: pass array refs instead of entire arrays
Copying large arrays is expensive, so avoid it.
This reduces /$INBOX/ time by around 1%.
Eric Wong [Wed, 5 Oct 2016 23:47:17 +0000 (23:47 +0000)]
thread: remove Mail::Thread dependency
Introduce our own SearchThread class for threading messages.
This should allow us to specialize and optimize away objects
in future commits.
Eric Wong [Wed, 5 Oct 2016 23:47:16 +0000 (23:47 +0000)]
view: remove "subject dummy" references
We will not care for inexact threading by subject or pruning.
Eric Wong [Tue, 13 Sep 2016 01:18:30 +0000 (01:18 +0000)]
help: document new search prefixes
Support (and document) 'a:' after all, as "mairix -h" uses it,
so this should reduce the learning curve for mairix users.
Eric Wong [Fri, 9 Sep 2016 18:43:55 +0000 (18:43 +0000)]
nntp: cleanup: move use statements out of sub scope
This clarifies the code somewhat, and we don't care to lazy-load
in NNTP.pm anyways since this is only used for a long-lived
daemon.
Eric Wong [Fri, 9 Sep 2016 09:05:18 +0000 (09:05 +0000)]
TODO: updates for done items
The existing string -> number date range Xapian query is good
enough, and having too much flexibility is probably bad for
caching (as well as increasing our attack surface, because
parsing queries is tricky).
Tags-as-skiplists are probably not worth the effort given
Xapian, and we may have to import old messages after-the-fact,
anyways, and message delivery for mirrors is never orderly.
Other items are all done and need to be maintained (like the
search engine docs for the mairix-compatibility features that
just got pushed out)
Eric Wong [Fri, 9 Sep 2016 03:09:00 +0000 (03:09 +0000)]
t/httpd-unix: warn about connection failure
Output $! for diagnostic purposes since I've noticed this on
two slow machines, today (and seemingly, never prior).
Eric Wong [Fri, 9 Sep 2016 00:01:31 +0000 (00:01 +0000)]
search: index attachment filenames
And while we're at it, ensure searching inside displayable
attachment bodies works.
Eric Wong [Fri, 9 Sep 2016 00:01:30 +0000 (00:01 +0000)]
search: match the behavior of WWW for indexing text
The basic rule is that if it is displayable via our WWW
interface, it should be indexable text for Xapian search.
Eric Wong [Fri, 9 Sep 2016 00:01:29 +0000 (00:01 +0000)]
search: avoid mindlessly calling body_set
It's not worth entering a complex codepath in Email::MIME to
save some (probably immeasurable amount of) memory, here. We've
already stopped doing this in our WWW code a while back, too.
If we really cared enough about it, we'd prioritize work on a
streaming replacement for Email::MIME.
Eric Wong [Fri, 9 Sep 2016 00:01:28 +0000 (00:01 +0000)]
search: fix compatibility with Debian wheezy
Specifying the "d:" field only worked for
NumberValueRangeProcessor in older versions of Xapian, such
as the one in Debian wheezy (libsearch-xapian-perl=1.2.10.0-1)
This slipped through since I rarely use wheezy, anymore, and
perhaps nobody else does, either. Perhaps wheezy support may be
dropped, soon.
Unfortunately, this requires a schema version bump.
Eric Wong [Fri, 9 Sep 2016 00:01:27 +0000 (00:01 +0000)]
search: increase term positions for each quoted hunk
We pay a storage cost for storing positional information
in Xapian, make good use of it by attempting to preserve
it for (hopefully) better search results.
Eric Wong [Fri, 9 Sep 2016 00:01:26 +0000 (00:01 +0000)]
search: match quote detection behavior of view
This is stricter than the mutt quote_regexp default
("^([ \t]*[|>:}#])+" on Debian jessie),
but matches what we have in View.pm.
I prefer the stricter quote detection since it is less ambiguous
and less likely to hide/obscure important details.
Eric Wong [Fri, 9 Sep 2016 00:01:25 +0000 (00:01 +0000)]
search: fix space regressions from recent changes
As of Xapian 1.0.4 (from 2007) is possible to use
Search::Xapian::QueryParser::add_prefix multiple times with the
same user field name but different term prefixes.
This brings my current git@vger mirror from 6.5GB to 2.1GB
(both sizes are after xapian-compact).
Eric Wong [Fri, 9 Sep 2016 00:01:24 +0000 (00:01 +0000)]
search: more granular message body searching
"bs:" and "b:" are adapted from mairix(1)
We will also support searching explicitly for quoted vs
non-quoted text via "q:" and "nq:" prefixes since sometimes
readers will not care for quoted text.
In the future, we will support parsing diffs (perhaps when
repobrowse integration is complete).
Note: this roughly doubles the size of the Xapian database due
to the additional information; so this change may not be worth
it.
Eric Wong [Fri, 9 Sep 2016 00:01:23 +0000 (00:01 +0000)]
search: drop longer subject: prefix for search
We only document the "s:" anyways. While the long name is more
descriptive, the ambiguity makes agnostic caching (by Varnish or
similar) slightly harder and longer URLs are more likely to be
accidentally truncated when shared.
Eric Wong [Fri, 9 Sep 2016 00:01:22 +0000 (00:01 +0000)]
search: allow searching user fields (To/Cc/From)
Sometimes it can be useful to search based on who the
message was sent to, sent by, or Cc:-ed. Of course,
headers can be faked, but they usually are not...
Anyways this mostly matches the behavior of mairix(1).
Eric Wong [Thu, 8 Sep 2016 22:42:42 +0000 (22:42 +0000)]
import: run "git gc --auto" when done
We need to prevent excessive repository growth for
public-inbox-watch and public-inbox-mda users.