Sergey Matveev's repositories - public-inbox.git/commit

author	Eric Wong (Contractor, The Linux Foundation) <e@80x24.org>
	Thu, 22 Feb 2018 01:49:08 +0000 (01:49 +0000)
committer	Eric Wong (Contractor, The Linux Foundation) <e@80x24.org>
	Thu, 22 Feb 2018 18:33:46 +0000 (18:33 +0000)
commit	9ecbfc09928dada28094fd3fc79e91a5472b27ea
tree	a829ab7765f45e139e8a9d5de1c3784fc26bbf69	tree
parent	a81ad9c4b1b5d8c2ae8444b6dcb8710bd361f628	commit \| diff

v2: parallelize Xapian indexing

The parallelization requires splitting Msgmap, text+term
indexing, and thread-linking out into separate processes.

git-fast-import is fast, so we don't bother parallelizing it.

Msgmap (SQLite) and thread-linking (Xapian) must be serialized
because they rely on monotonically increasing numbers (NNTP
article number and internal thread_id, respectively).

We handle msgmap in the main process which drives fast-import.
When the article number is retrieved/generated, we write the
entire message to per-partition subprocesses via pipes for
expensive text+term indexing.

When these per-partition subprocesses are done with the
expensive text+term indexing, they write SearchMsg (small data)
to a shared pipe (inherited from the main V2Writable process)
back to the threader, which runs its own subprocess.

The number of text+term Xapian partitions is chosen at import
and can be made equal to the number of cores in a machine.

V2Writable --> Import -> git-fast-import
           \-> SearchIdxThread -> Msgmap (synchronous)
           \-> SearchIdxPart[n] -> SearchIdx[*]
   \-> SearchIdxThread -> SearchIdx ("threader", a subprocess)

[* ] each subprocess writes to threader

MANIFEST		diff \| blob \| history
lib/PublicInbox/Import.pm		diff \| blob \| history
lib/PublicInbox/Search.pm		diff \| blob \| history
lib/PublicInbox/SearchIdx.pm		diff \| blob \| history
lib/PublicInbox/SearchIdxPart.pm	[new file with mode: 0644]	blob
lib/PublicInbox/SearchIdxThread.pm	[new file with mode: 0644]	blob
lib/PublicInbox/SearchMsg.pm		diff \| blob \| history
lib/PublicInbox/V2Writable.pm		diff \| blob \| history