]> Sergey Matveev's repositories - public-inbox.git/log
public-inbox.git
7 years agoTODO: note IO::KQueue for the ticket
Eric Wong [Wed, 14 Dec 2016 23:53:06 +0000 (23:53 +0000)]
TODO: note IO::KQueue for the ticket

Do not require users to have network access to know what
the link refers to.

7 years agot/thread-cycle: no need for Xapian to run this test
Eric Wong [Wed, 14 Dec 2016 19:28:53 +0000 (19:28 +0000)]
t/thread-cycle: no need for Xapian to run this test

We don't actually use anything from SearchMsg,
just the class name.

7 years agowwwtext: remove outdated comment
Eric Wong [Wed, 14 Dec 2016 20:58:00 +0000 (20:58 +0000)]
wwwtext: remove outdated comment

I originally envisioned wwwtext being more flexible and able to
serve arbitrary blobs; but at this point I consider it redundant
and public-inbox is not wiki software.

7 years agosearchmsg: remove unused EPOCH_822 constant
Eric Wong [Tue, 13 Dec 2016 03:10:13 +0000 (03:10 +0000)]
searchmsg: remove unused EPOCH_822 constant

This hasn't been needed since our Email::Abstract removal
for message threading.

7 years agonntp: avoid useless use of strftime
Eric Wong [Tue, 13 Dec 2016 03:10:12 +0000 (03:10 +0000)]
nntp: avoid useless use of strftime

There's no need to use strftime if we'll be converting the date
by hand, anyways.

7 years agonntp: add test case for the "DATE" command
Eric Wong [Tue, 13 Dec 2016 03:10:11 +0000 (03:10 +0000)]
nntp: add test case for the "DATE" command

We may not always use strftime and may implement caching.
But for now, just add a test.

7 years agodaemon: set $now time for NNTP shutdown
Eric Wong [Mon, 12 Dec 2016 12:14:02 +0000 (12:14 +0000)]
daemon: set $now time for NNTP shutdown

commit 6e238ee3396719e578d6a90e177a71ce9f8c1ca0
("nntp: respect 3 minute idle time for shutdown")
was incomplete, and needed this change to Daemon
to be effective.

In the future, there will be more common code between
NNTP.pm and HTTP.pm

7 years agodoc: simplify makefile snippet
Eric Wong [Mon, 12 Dec 2016 12:07:21 +0000 (12:07 +0000)]
doc: simplify makefile snippet

We have these manpages, and will always have them, so stop
trying to pretend we're doing something about maintainability,
here.

7 years agoinit: preserve permissions of existing config file
Eric Wong [Mon, 12 Dec 2016 12:02:45 +0000 (12:02 +0000)]
init: preserve permissions of existing config file

This matches git-config(1) behavior, and implied user
intent when it comes to programatically editing files.

7 years agosearch: retry document loading from Xapian
Eric Wong [Sat, 10 Dec 2016 23:35:43 +0000 (23:35 +0000)]
search: retry document loading from Xapian

In addition to needing to retry enquire queries, we also need
to protect document loading from the Xapian DB and retry on
modification, as it seems to throw the same errors.

Checking the $@ ref for Search::Xapian::DatabaseModifiedError
is actually in the test suite for both the XS and SWIG Xapian
bindings, so we should be good as far as forward/backwards
compatibility.

7 years agosearch: always sort thread results in ascending time order
Eric Wong [Sat, 10 Dec 2016 01:09:51 +0000 (01:09 +0000)]
search: always sort thread results in ascending time order

This makes life easier for the threading algorithm, as we can
use the implied ordering of timestamps to avoid temporary ghosts
and resulting container vivication.

This would've also allowed us to hide the bug (in most cases)
fixed by the patch titled "thread: last Reference always wins",
in case that needs to be reverted due to infinite looping.

7 years agothread: last Reference always wins
Eric Wong [Sat, 10 Dec 2016 01:09:50 +0000 (01:09 +0000)]
thread: last Reference always wins

Since we use SearchMsg from Xapian data, we can be
assured we do not get self-referential {references}
field.

However, we may need to be more careful when checking
has_descendent for loops, as blindly calling add_child
could open us up to that possibility...

7 years agoview: skip ghosts with no direct children
Eric Wong [Sat, 10 Dec 2016 01:09:49 +0000 (01:09 +0000)]
view: skip ghosts with no direct children

Otherwise, a malicious or broken client could populate the
thread skeleton with invalid References.  We only care about
ghosts which messages correctly refer to, not totally bogus ones
which may be the result of long line or token truncation +
wrapping in MUA headers.

7 years agoview: reduce indentation for skeleton generation
Eric Wong [Sat, 10 Dec 2016 01:09:48 +0000 (01:09 +0000)]
view: reduce indentation for skeleton generation

This should reduce the number of subroutine calls needed
for the common case of real (non-ghost) messages as well
as shortening code.

7 years agothread: fix comment describing its existence
Eric Wong [Sat, 10 Dec 2016 01:09:47 +0000 (01:09 +0000)]
thread: fix comment describing its existence

Mail::Thread is UNavailable on many distros, meaning ordinary
users will have to rely on CPAN, a Perl-specific packaging tool.

7 years agoview: favor SearchMsg for In-Reply-To over Email::MIME
Eric Wong [Sat, 10 Dec 2016 03:21:29 +0000 (03:21 +0000)]
view: favor SearchMsg for In-Reply-To over Email::MIME

This should avoid warnings during thread skeleton generation if
ever the Xapian database disagrees with View.pm about which is
the proper direct parent of a message.  We will treat the data
in Xapian as the truth (if Xapian is available).

7 years agosearch: favor In-Reply-To over last References iff IRT exists
Eric Wong [Sat, 10 Dec 2016 01:09:46 +0000 (01:09 +0000)]
search: favor In-Reply-To over last References iff IRT exists

Some email clients set the References headers backwards, so
trust the In-Reply-To header if (and only if) it exists and
is parseable as direct parent of the current message.

For affected repos, this will require reindexing (via
"public-inbox-index --reindex"), but there will be no
version bump for this bugfix.

7 years agolinkify: implement Markdown link compatibility (again)
Eric Wong [Tue, 6 Dec 2016 23:40:33 +0000 (23:40 +0000)]
linkify: implement Markdown link compatibility (again)

Although unescaped parentheses in URLs are technically allowed,
they are uncommon.  However, Markdown-like syntaxes are
unfortunately common for URLs, so we might as well support them.

This fixes parentheses detection at sentence endings, as seen
in practice on emails.

7 years agoRevert "linkify: implement Markdown link compatibility"
Eric Wong [Tue, 6 Dec 2016 23:15:02 +0000 (23:15 +0000)]
Revert "linkify: implement Markdown link compatibility"

This reverts commit 130d0c4e33c5c73dc69e270fc698735d49e0f159.

7 years agolinkify: implement Markdown link compatibility
Eric Wong [Tue, 6 Dec 2016 23:01:39 +0000 (23:01 +0000)]
linkify: implement Markdown link compatibility

Although unescaped parentheses in URLs are technically allowed,
they are uncommon.  However, Markdown-like syntaxes are
unfortunately common for URLs, so we might as well support them.

7 years agoatom: switch to getline/close for response bodies
Eric Wong [Sat, 3 Dec 2016 00:24:06 +0000 (00:24 +0000)]
atom: switch to getline/close for response bodies

This will let us stream larger Atom documents bodies without
wasting too much memory and reduce the amount of round-trip
requests needed to get necessary information.

Hopefully clients are using streaming (SAX) parsers, too.

This is the final transition in the core public-inbox
code to allow migrating to a "pull"-based body streaming
scheme which allows a HTTP server to respond appropriately
to backpressure from slow clients.

7 years agowwwstream: improve documentation and variable naming
Eric Wong [Sat, 3 Dec 2016 00:24:05 +0000 (00:24 +0000)]
wwwstream: improve documentation and variable naming

Hopefully this makes the code more readable for newbies.

7 years agosearchview: fix <title> tag in Atom feed
Eric Wong [Sat, 3 Dec 2016 00:24:51 +0000 (00:24 +0000)]
searchview: fix <title> tag in Atom feed

This only affects the Atom feed for search results.
"xmlstarlet val" failed to detect or warn about this,
and I only noticed this bug while working on another
patch.

7 years agonote the source code is AGPL for cloning
Eric Wong [Tue, 29 Nov 2016 21:40:35 +0000 (21:40 +0000)]
note the source code is AGPL for cloning

This should be adequate warning for folks who may be
uncomfortable or uncertain about even possessing AGPL
source code due to employer agreements and such.

Disclaimer: I remain completely in favor of AGPL and strong
copyleft, and am more than willing to risk my own future on it.
However, I refuse to even nudge people into downloading AGPL
source code if it presents any legal risk to them.

7 years agoavoid IO::File for anonymous temporary files
Eric Wong [Sat, 26 Nov 2016 08:52:50 +0000 (08:52 +0000)]
avoid IO::File for anonymous temporary files

We do not need to import IO::File into the main programs
since Perl 5.8+ supports literal "undef" for generating
anonymous temporary file handles.

7 years agogithttpbackend: error checking for input handling
Eric Wong [Sat, 26 Nov 2016 08:34:34 +0000 (08:34 +0000)]
githttpbackend: error checking for input handling

This was sloppy code, all calls need to be checked
for failure.

7 years agoview: fix spaces in mailto: link
Eric Wong [Tue, 22 Nov 2016 02:49:40 +0000 (02:49 +0000)]
view: fix spaces in mailto: link

Some mail clients do not seem to handle '+' as a space in query
parameters for the mail subject, use the more common '%20' for
compatibility.

7 years agoindex: allow indexing before configuration
Eric Wong [Fri, 4 Nov 2016 21:11:35 +0000 (21:11 +0000)]
index: allow indexing before configuration

One may build the initial index on a powerful host and transfer
it to a weaker one for incremental indexing.  Thus there is
no requirement to have a configured public-inbox for building
the index unless a user needs altid support or some such.

7 years agoimport: failed GC runs are non-fatal
Eric Wong [Sun, 16 Oct 2016 00:36:14 +0000 (00:36 +0000)]
import: failed GC runs are non-fatal

We should not completely kill a process if "git gc --auto"
errors out due to a warning or whatnot.

7 years agothread: reinstates stable ordering when ghosts are present
Eric Wong [Fri, 14 Oct 2016 09:00:01 +0000 (09:00 +0000)]
thread: reinstates stable ordering when ghosts are present

This reverts commit 3c9dd6619f825f0515e7e4afa1bd55c99c1a68d3
("thread: fix sorting without topmost")
and reinstates the "topmost" routine for sorting purposes.

7 years agothread: fix parent/child relationships
Eric Wong [Thu, 13 Oct 2016 03:59:03 +0000 (03:59 +0000)]
thread: fix parent/child relationships

The ordering change in add_child is critical if $self == $parent
as the {children} hash was lost before this change.

has_descendent can be simplified by walking upwards from the child
instead of downwards from the parent.

This fixes threading regressions introduced in
commit 30100c46326e2eac275e0af13116636701d2537e
("thread: use hash + array instead of hand-rolled linked list")

7 years agothread: reduce indentation level
Eric Wong [Thu, 13 Oct 2016 03:59:02 +0000 (03:59 +0000)]
thread: reduce indentation level

This should reduce differences from the original Mail::Thread
code and hopefully make things easier-to-follow.

7 years agothread: remove weaken dependency
Eric Wong [Wed, 5 Oct 2016 23:47:32 +0000 (23:47 +0000)]
thread: remove weaken dependency

We have to walk through all the messages after threading
anyways to build the rootset, so we can just delete all
the parent references at that point.

7 years agot/thread-cycle: test self-referential messages
Eric Wong [Wed, 5 Oct 2016 23:47:31 +0000 (23:47 +0000)]
t/thread-cycle: test self-referential messages

Some broken (or malicious) mailers may include a generated
Message-ID in its References header, so be prepared for it.

7 years agoview: remove redundant children array in thread views
Eric Wong [Wed, 5 Oct 2016 23:47:30 +0000 (23:47 +0000)]
view: remove redundant children array in thread views

Each node has an entire arrayref of its children nowadays, so
there's no need to waste time and memory creating another one.

7 years agothread: use hash + array instead of hand-rolled linked list
Eric Wong [Wed, 5 Oct 2016 23:47:29 +0000 (23:47 +0000)]
thread: use hash + array instead of hand-rolled linked list

This starts to show noticeable performance improvements when
attempting to thread over 400 messages; but the improvement
may not be measurable with less.

However, the resulting code is much shorter and (IMHO)
much easier to understand.

7 years agothread: fix sorting without topmost
Eric Wong [Wed, 5 Oct 2016 23:47:28 +0000 (23:47 +0000)]
thread: fix sorting without topmost

This bug was hidden, and we may not be able to efficiently
implement a topmost subroutine with the hash-based (vs
linked-list) based container for threading in the next
commit.

7 years agothread: inline and remove recurse_down logic
Eric Wong [Wed, 5 Oct 2016 23:47:27 +0000 (23:47 +0000)]
thread: inline and remove recurse_down logic

We no longer recurse, and it's too hard to come up with
a new name for a sub we will only use once.

7 years agothread: order_children no longer cares about depth
Eric Wong [Wed, 5 Oct 2016 23:47:26 +0000 (23:47 +0000)]
thread: order_children no longer cares about depth

We never use the depth anywhere in this sub

7 years agothread: avoid incrementing undefined value
Eric Wong [Wed, 5 Oct 2016 23:47:25 +0000 (23:47 +0000)]
thread: avoid incrementing undefined value

It is pointless to increment when setting a true value is
simpler as there is no need to read before writing.

7 years agothread: remove iterate_down
Eric Wong [Wed, 5 Oct 2016 23:47:24 +0000 (23:47 +0000)]
thread: remove iterate_down

Unnecessary subs and complexity.  This was hiding the fact
that $before is never used.

7 years agothread: simplify
Eric Wong [Wed, 5 Oct 2016 23:47:23 +0000 (23:47 +0000)]
thread: simplify

Single use subroutines actually make the code more complex in
this case, and there's never a {seen} field in $self.

7 years agothread: remove rootset accessor method
Eric Wong [Wed, 5 Oct 2016 23:47:22 +0000 (23:47 +0000)]
thread: remove rootset accessor method

It doesn't buy us much and copying to a new array is slower;
but probably not measurable in real-world use.

7 years agothread: remove Email::Abstract wrapping
Eric Wong [Wed, 5 Oct 2016 23:47:21 +0000 (23:47 +0000)]
thread: remove Email::Abstract wrapping

This roughly doubles performance due to the reduction in
object creation and abstraction layers.

7 years agoinbox: deal with ghost smsg
Eric Wong [Wed, 5 Oct 2016 23:47:20 +0000 (23:47 +0000)]
inbox: deal with ghost smsg

smsg will be undef for ghost messages in a subsequent commit

7 years agothread: remove accessor usage in internals
Eric Wong [Wed, 5 Oct 2016 23:47:19 +0000 (23:47 +0000)]
thread: remove accessor usage in internals

This improves top-level index generation performance by 3-4%.

7 years agothread: pass array refs instead of entire arrays
Eric Wong [Wed, 5 Oct 2016 23:47:18 +0000 (23:47 +0000)]
thread: pass array refs instead of entire arrays

Copying large arrays is expensive, so avoid it.
This reduces /$INBOX/ time by around 1%.

7 years agothread: remove Mail::Thread dependency
Eric Wong [Wed, 5 Oct 2016 23:47:17 +0000 (23:47 +0000)]
thread: remove Mail::Thread dependency

Introduce our own SearchThread class for threading messages.
This should allow us to specialize and optimize away objects
in future commits.

7 years agoview: remove "subject dummy" references
Eric Wong [Wed, 5 Oct 2016 23:47:16 +0000 (23:47 +0000)]
view: remove "subject dummy" references

We will not care for inexact threading by subject or pruning.

7 years agohelp: document new search prefixes
Eric Wong [Tue, 13 Sep 2016 01:18:30 +0000 (01:18 +0000)]
help: document new search prefixes

Support (and document) 'a:' after all, as "mairix -h" uses it,
so this should reduce the learning curve for mairix users.

7 years agonntp: cleanup: move use statements out of sub scope
Eric Wong [Fri, 9 Sep 2016 18:43:55 +0000 (18:43 +0000)]
nntp: cleanup: move use statements out of sub scope

This clarifies the code somewhat, and we don't care to lazy-load
in NNTP.pm anyways since this is only used for a long-lived
daemon.

7 years agoTODO: updates for done items
Eric Wong [Fri, 9 Sep 2016 09:05:18 +0000 (09:05 +0000)]
TODO: updates for done items

The existing string -> number date range Xapian query is good
enough, and having too much flexibility is probably bad for
caching (as well as increasing our attack surface, because
parsing queries is tricky).

Tags-as-skiplists are probably not worth the effort given
Xapian, and we may have to import old messages after-the-fact,
anyways, and message delivery for mirrors is never orderly.

Other items are all done and need to be maintained (like the
search engine docs for the mairix-compatibility features that
just got pushed out)

7 years agot/httpd-unix: warn about connection failure
Eric Wong [Fri, 9 Sep 2016 03:09:00 +0000 (03:09 +0000)]
t/httpd-unix: warn about connection failure

Output $! for diagnostic purposes since I've noticed this on
two slow machines, today (and seemingly, never prior).

7 years agosearch: index attachment filenames
Eric Wong [Fri, 9 Sep 2016 00:01:31 +0000 (00:01 +0000)]
search: index attachment filenames

And while we're at it, ensure searching inside displayable
attachment bodies works.

7 years agosearch: match the behavior of WWW for indexing text
Eric Wong [Fri, 9 Sep 2016 00:01:30 +0000 (00:01 +0000)]
search: match the behavior of WWW for indexing text

The basic rule is that if it is displayable via our WWW
interface, it should be indexable text for Xapian search.

7 years agosearch: avoid mindlessly calling body_set
Eric Wong [Fri, 9 Sep 2016 00:01:29 +0000 (00:01 +0000)]
search: avoid mindlessly calling body_set

It's not worth entering a complex codepath in Email::MIME to
save some (probably immeasurable amount of) memory, here.  We've
already stopped doing this in our WWW code a while back, too.
If we really cared enough about it, we'd prioritize work on a
streaming replacement for Email::MIME.

7 years agosearch: fix compatibility with Debian wheezy
Eric Wong [Fri, 9 Sep 2016 00:01:28 +0000 (00:01 +0000)]
search: fix compatibility with Debian wheezy

Specifying the "d:" field only worked for
NumberValueRangeProcessor in older versions of Xapian, such
as the one in Debian wheezy (libsearch-xapian-perl=1.2.10.0-1)

This slipped through since I rarely use wheezy, anymore, and
perhaps nobody else does, either.  Perhaps wheezy support may be
dropped, soon.

Unfortunately, this requires a schema version bump.

7 years agosearch: increase term positions for each quoted hunk
Eric Wong [Fri, 9 Sep 2016 00:01:27 +0000 (00:01 +0000)]
search: increase term positions for each quoted hunk

We pay a storage cost for storing positional information
in Xapian, make good use of it by attempting to preserve
it for (hopefully) better search results.

7 years agosearch: match quote detection behavior of view
Eric Wong [Fri, 9 Sep 2016 00:01:26 +0000 (00:01 +0000)]
search: match quote detection behavior of view

This is stricter than the mutt quote_regexp default
("^([ \t]*[|>:}#])+" on Debian jessie),
but matches what we have in View.pm.

I prefer the stricter quote detection since it is less ambiguous
and less likely to hide/obscure important details.

7 years agosearch: fix space regressions from recent changes
Eric Wong [Fri, 9 Sep 2016 00:01:25 +0000 (00:01 +0000)]
search: fix space regressions from recent changes

As of Xapian 1.0.4 (from 2007) is possible to use
Search::Xapian::QueryParser::add_prefix multiple times with the
same user field name but different term prefixes.

This brings my current git@vger mirror from 6.5GB to 2.1GB
(both sizes are after xapian-compact).

7 years agosearch: more granular message body searching
Eric Wong [Fri, 9 Sep 2016 00:01:24 +0000 (00:01 +0000)]
search: more granular message body searching

"bs:" and "b:" are adapted from mairix(1)

We will also support searching explicitly for quoted vs
non-quoted text via "q:" and "nq:" prefixes since sometimes
readers will not care for quoted text.

In the future, we will support parsing diffs (perhaps when
repobrowse integration is complete).

Note: this roughly doubles the size of the Xapian database due
to the additional information; so this change may not be worth
it.

7 years agosearch: drop longer subject: prefix for search
Eric Wong [Fri, 9 Sep 2016 00:01:23 +0000 (00:01 +0000)]
search: drop longer subject: prefix for search

We only document the "s:" anyways.  While the long name is more
descriptive, the ambiguity makes agnostic caching (by Varnish or
similar) slightly harder and longer URLs are more likely to be
accidentally truncated when shared.

7 years agosearch: allow searching user fields (To/Cc/From)
Eric Wong [Fri, 9 Sep 2016 00:01:22 +0000 (00:01 +0000)]
search: allow searching user fields (To/Cc/From)

Sometimes it can be useful to search based on who the
message was sent to, sent by, or Cc:-ed.  Of course,
headers can be faked, but they usually are not...

Anyways this mostly matches the behavior of mairix(1).

7 years agoimport: run "git gc --auto" when done
Eric Wong [Thu, 8 Sep 2016 22:42:42 +0000 (22:42 +0000)]
import: run "git gc --auto" when done

We need to prevent excessive repository growth for
public-inbox-watch and public-inbox-mda users.

7 years agoimport: hoist out common run_die subroutine
Eric Wong [Thu, 8 Sep 2016 22:41:01 +0000 (22:41 +0000)]
import: hoist out common run_die subroutine

We will be reusing this in the next commit, too.

7 years agodoc: document PERL_INLINE_DIRECTORY usage
Eric Wong [Thu, 8 Sep 2016 20:15:25 +0000 (20:15 +0000)]
doc: document PERL_INLINE_DIRECTORY usage

For now, we will document this since it allows better
performance without the burden of extensions.  Perhaps one day
far in the future Perl can natively support vfork(2) AND that
version of Perl will be widely available, but I suspect that day
is at least a decade away, if not two:

https://rt.perl.org/Ticket/Display.html?id=128227

7 years agoimport: hoist out _check_path function
Eric Wong [Thu, 8 Sep 2016 10:23:34 +0000 (10:23 +0000)]
import: hoist out _check_path function

This reduces duplication, slightly.  We may be using it
yet again in a to-be-introduced function (or we may not
introduce it).

7 years agoview: handle missing Content-Type in message
Eric Wong [Thu, 8 Sep 2016 19:44:16 +0000 (19:44 +0000)]
view: handle missing Content-Type in message

Email::MIME internally assumes "text/plain" for messages
missing a Content-Type, but does not expose that in the
Email::MIME::content_type API method.  We must assume it
ourselves to avoid uninitialized value warnings for the
rare (nowadays) MUAs which do not set it.

7 years agodoc: flesh out public-inbox-index documentation
Eric Wong [Wed, 7 Sep 2016 21:53:11 +0000 (21:53 +0000)]
doc: flesh out public-inbox-index documentation

And include it into the build + website

7 years agodoc: new docs for user-level commands
Eric Wong [Wed, 7 Sep 2016 00:47:15 +0000 (00:47 +0000)]
doc: new docs for user-level commands

Hopefully more folks can download and run public-inbox,
nowadays.

7 years agoconfig: use "publicinboxlimiter" prefix
Eric Wong [Fri, 2 Sep 2016 21:00:50 +0000 (21:00 +0000)]
config: use "publicinboxlimiter" prefix

Just having "limiter" in the prefix may confuse
it with something else.  Use the full prefix to
avoid this confusion.

7 years agoinit: enable pack bitmaps by default
Eric Wong [Fri, 2 Sep 2016 01:06:42 +0000 (01:06 +0000)]
init: enable pack bitmaps by default

We want to encourage users to serve repositories.  So enable
bitmaps by default so performance suffers less with smart HTTP.

7 years agowatch: use "publicinboxwatch" namespace
Eric Wong [Thu, 1 Sep 2016 19:31:12 +0000 (19:31 +0000)]
watch: use "publicinboxwatch" namespace

We'll keep supporting "publicinboxlearn" indefinitely,
but "publicinboxwatch" is probably more appropriate
at the moment.

Noticed while writing documentation.

7 years agodoc: set release and section properly for manpages
Eric Wong [Wed, 31 Aug 2016 19:50:52 +0000 (19:50 +0000)]
doc: set release and section properly for manpages

This will be important as we will have more of them.

7 years agotxt2pre: allow overriding title via env
Eric Wong [Wed, 31 Aug 2016 19:32:12 +0000 (19:32 +0000)]
txt2pre: allow overriding title via env

This will allow reasonable titles to be generated for
manpages.

7 years agotxt2pre: use public-inbox internal APIs
Eric Wong [Wed, 31 Aug 2016 19:25:15 +0000 (19:25 +0000)]
txt2pre: use public-inbox internal APIs

Since this is bundled with the source, we might as well use
internal APIs to avoid having duplicate code (and bugs :P)

7 years agowww: give tor2web some exposure, too
Eric Wong [Tue, 23 Aug 2016 21:23:53 +0000 (21:23 +0000)]
www: give tor2web some exposure, too

Not everybody can run Tor, hopefully more can use Tor2web
even if it compromises their privacy.  This should help
make system more resilient for users unable to use Tor.

7 years agodoc: avoid conflicting with MakeMaker variable names
Eric Wong [Sun, 21 Aug 2016 09:30:00 +0000 (09:30 +0000)]
doc: avoid conflicting with MakeMaker variable names

We want the pod2man(1) executable for handling certain
options.  Also, use the correct year while we're at it :P

7 years agoavoid spaces after shell redirection operators
Eric Wong [Sat, 20 Aug 2016 00:25:16 +0000 (00:25 +0000)]
avoid spaces after shell redirection operators

This makes us closer to git.git style (though I'm not quite sure
why we do this...)

7 years agodoc: mda: remove vestigial pandoc comment
Eric Wong [Sat, 20 Aug 2016 00:25:15 +0000 (00:25 +0000)]
doc: mda: remove vestigial pandoc comment

We use perlpod nowadays since it's Perl, like our code base.

7 years agoREADME: add link to source code mirrors
Eric Wong [Sun, 21 Aug 2016 09:22:14 +0000 (09:22 +0000)]
README: add link to source code mirrors

Centralization sucks, so we mirror everything.

7 years agosearchview: link to internal help text
Eric Wong [Thu, 18 Aug 2016 09:25:21 +0000 (09:25 +0000)]
searchview: link to internal help text

The internal help text links to the Xapian query parser
documentation anyways, but also provides information
on which prefixes exist.

7 years agowww: implement generic help text
Eric Wong [Thu, 18 Aug 2016 04:44:07 +0000 (04:44 +0000)]
www: implement generic help text

Begin documenting some basic help functionality.
I may tweak the anchor names of the various HTML endpoints
to be more consistent with each other (old ones will be
supported for a short while), so I'm not documenting
those, for now.

This may become part of a builtin key-value store for
basic texts, but this probably shouldn't become a wiki
engine, either.

7 years agolinkify: be stricter about matching RFC 3986
Eric Wong [Thu, 18 Aug 2016 02:02:50 +0000 (02:02 +0000)]
linkify: be stricter about matching RFC 3986

We're not to-the-letter about percent-encoding, but
we should allow all the characters.  This is mainly
so we can effectively use the link to some Wikipedia
pages with parentheses in them:

https://en.wikipedia.org/wiki/Atom_(standard)
https://en.wikipedia.org/wiki/Git_(software)

7 years agoview: try assuming UTF-8 for bogus charsets
Eric Wong [Thu, 18 Aug 2016 01:10:35 +0000 (01:10 +0000)]
view: try assuming UTF-8 for bogus charsets

For some reason, Alpine will set X-UNKNOWN for valid UTF-8.
Since we favor UTF-8 HTML anyways, try forcing Email::MIME to
handle text/plain as UTF-8 which might show up better.

At least this change renders

<alpine.DEB.2.20.1608131214070.4924@virtualbox>

properly by showing "•" (&#8226;) instead of
"â ¢" (&#226;&#128;&#162;)

Reported-by: Thomas Ferris Nicolaisen <tfnico@gmail.com>
7 years agoview: try to display bogus charsets for text/plain
Eric Wong [Thu, 18 Aug 2016 00:54:25 +0000 (00:54 +0000)]
view: try to display bogus charsets for text/plain

Alpine seems to set charset=X-UNKNOWN for valid UTF-8 text,
which causes Email::MIME::body_str to fail as X-UNKNOWN
is not a valid encoding.  So, blindly display the body
as plain-text but warn users about possibly mangled text.

Reported-by: Thomas Ferris Nicolaisen <tfnico@gmail.com>
7 years agoview: attach_link uses string concatentation
Eric Wong [Wed, 17 Aug 2016 23:07:43 +0000 (23:07 +0000)]
view: attach_link uses string concatentation

There is no point in using an array to join on an
empty string (my original intention was probably to
join on "\n").

This is only preparation for the next change to show
a warning to in the attachment link.

7 years agosearch: add YYYYMMDD search range via "d:" prefix
Eric Wong [Tue, 16 Aug 2016 08:49:26 +0000 (08:49 +0000)]
search: add YYYYMMDD search range via "d:" prefix

This is similar to mairix in that it uses a "d:" prefix; but
only takes YYYYMMDD, for now.  Using custom date/time parsers
via Perl will be much more work:

nntp://news.gmane.org/20151005222157.GE5880@survex.com

Anyhow, this ought to be more human-friendly than searching by
Unix timestamps, but it requires reindexing to take advantage of.

7 years agosearch: drop pointless range processors for Unix timestamp
Eric Wong [Tue, 16 Aug 2016 08:49:25 +0000 (08:49 +0000)]
search: drop pointless range processors for Unix timestamp

The Unix timestamp isn't meaningful for users searching,
we will start indexing the YYYYMMDD date stamp which may
use StringValueRangeProcessor, instead.

7 years agoHACKING: minor updates and add to the website
Eric Wong [Tue, 16 Aug 2016 07:49:56 +0000 (07:49 +0000)]
HACKING: minor updates and add to the website

Also, at least add one of the Tor mirrors (the rest will
be discoverable through the mirrors themselves).

7 years agoimport: use common address parsing to drop unnecessary quotes
Eric Wong [Mon, 15 Aug 2016 01:54:51 +0000 (01:54 +0000)]
import: use common address parsing to drop unnecessary quotes

Not sure why or how I missed this before; but the common address
parsing routine we have should be more correct.

Add a test to ensure excessively quoted names don't make it
through, either.

7 years agoTODO: updates based on git@vger mirror experience
Eric Wong [Sun, 14 Aug 2016 11:30:32 +0000 (11:30 +0000)]
TODO: updates based on git@vger mirror experience

Plenty more to do!

7 years agowww: do not double-clean Message-IDs from internal DBs
Eric Wong [Sun, 14 Aug 2016 10:21:10 +0000 (10:21 +0000)]
www: do not double-clean Message-IDs from internal DBs

Ensure we usually strip one level of '<>' from Message-IDs,
since our internal SQLite, Xapian, and SHA-1 storage all
assume that.

Realistically, we screw up if somebody has '<<' or '>>',
but those are screwed up mail clients and we can deal with
it another time.  Currently, this means some messages with
'>>' in References or Message-Id are not handled correctly,
yet, but we match the behavior of Mail::Thread in keeping
the extra '>'.

7 years agowww: do not unecessarily escape some chars in paths
Eric Wong [Sun, 14 Aug 2016 10:21:09 +0000 (10:21 +0000)]
www: do not unecessarily escape some chars in paths

Based on reading RFC 3986, it seems '@', ':', '!', '$', '&',
"'", '; '(', ')', '*', '+', ',', ';', '=' are all allowed
in path-absolute where we have the Message-ID.

In any case, it seems '@' is fairly common in path components
nowadays and too common in Message-IDs.

7 years agowww: ensure XML validity for some odd ASCII chars
Eric Wong [Sun, 14 Aug 2016 10:21:17 +0000 (10:21 +0000)]
www: ensure XML validity for some odd ASCII chars

I've seen 0x1b (\e) in at least one message and some other
possibly non-printable chars.  In any case, make sure they're
valid XML with us-ascii encoding as far as xmlstarlet(1) thinks
so.

7 years agomid: no wide characters for sha1_hex
Eric Wong [Sun, 14 Aug 2016 10:21:11 +0000 (10:21 +0000)]
mid: no wide characters for sha1_hex

Apparently there are some really screwed up In-Reply-To
fields out there.

7 years agosearch: gracefully handle lookup_message failure
Eric Wong [Sun, 14 Aug 2016 10:21:15 +0000 (10:21 +0000)]
search: gracefully handle lookup_message failure

We can't blindly assume a ghost even exists in the DB, as the
rules can change internally for some corner-case Message-IDs.

7 years agoview: remove redundant pre closing tag
Eric Wong [Sun, 14 Aug 2016 10:21:13 +0000 (10:21 +0000)]
view: remove redundant pre closing tag

7 years agoview: allow for missing In-Reply-To mapping
Eric Wong [Sun, 14 Aug 2016 10:21:12 +0000 (10:21 +0000)]
view: allow for missing In-Reply-To mapping

Because buggy mail clients exist and generate invalid
In-Reply-To headers we cannot handle across the board...

7 years agosearchidx: do not release Xapian lock while (only) Msgmap is indexing
Eric Wong [Sun, 14 Aug 2016 10:21:08 +0000 (10:21 +0000)]
searchidx: do not release Xapian lock while (only) Msgmap is indexing

SQLite might index quickly, so we hold the lock used by Xapian
for the duration.  This probably needs to be reworked entirely,
actually.