1 Internal data structures of public-inbox
3 This is a guide for hackers new to our code base. Do not
4 consider our internal data structures stable for external
5 consumers, this document should be updated when internals
6 change. I recommend reading this document from the source tree,
7 with the source code easily accessible if you need examples.
9 This mainly documents in-memory data structures. If you're
10 interested in the stable on-filesystem formats, see the
11 public-inbox-config(5), public-inbox-v1-format(5) and
12 public-inbox-v2-format(5) manpages.
14 Common abbreviations when used outside of their packages are
15 documented. `$self' is the common variable name when used
21 PublicInbox::Config is the root class which loads a
22 public-inbox-config file and instantiates PublicInbox::Inbox,
23 PublicInbox::WWW, PublicInbox::NNTPD, and other top-level
26 Outside of tests, this is typically a singleton.
31 * PublicInbox::Eml - Email::MIME-like class
32 Common abbreviation: $mime, $eml
33 Used by: PublicInbox::WWW, PublicInbox::SearchIdx
35 An representation of an entire email, multipart or not.
36 An option to use libgmime or libmailutils may be supported
37 in the future for performance and memory use.
39 This can be a memory hog with big messages and giant
40 attachments, so our PublicInbox::WWW interface only keeps
41 one object of this class in memory at-a-time.
43 In other words, this is the "meat" of the message, whereas
44 $smsg (below) is just the "skeleton".
46 Our PublicInbox::V2Writable class may have two objects of this
47 type in memory at-a-time for deduplication.
49 In public-inbox 1.4 and earlier, Email::MIME and its subclass,
50 PublicInbox::MIME were used. Despite still slurping,
51 PublicInbox::Eml is faster and uses less memory due to
52 lazy header parsing and lazy subpart instantiation with
53 shorter object lifetimes.
55 * PublicInbox::Smsg - small message skeleton
56 Used by: PublicInbox::{NNTP,WWW,SearchIdx}
57 Common abbreviation: $smsg
59 Represents headers shown in NNTP overview and PSGI message
60 summaries (thread skeleton).
62 This is loaded from either the overview DB (over.sqlite3) or
63 the Xapian DB (docdata.glass), though the Xapian docdata
64 is won't hold NNTP-only fields (Cc:/To:)
66 There may be hundreds or thousands of these objects in memory
67 at-a-time, so fields are pruned if unneeded.
69 * PublicInbox::SearchThread::Msg - subclass of Smsg
70 Common abbreviation: $cont or $node
71 Used by: PublicInbox::WWW
73 The structure we use for a non-recursive[1] variant of
74 JWZ's algorithm: <https://www.jwz.org/doc/threading.html>.
75 Nowadays, this is a re-blessed $smsg with additional fields.
77 As with $smsg objects, there may be hundreds or thousands
78 of these objects in memory at-a-time.
80 We also do not use a linked-list for storing children as JWZ
81 describes, but instead a Perl hashref for {children} which
82 becomes an arrayref upon sorting.
84 [1] https://rt.cpan.org/Ticket/Display.html?id=116727
89 * PublicInbox::Inbox - represents a single public-inbox
90 Common abbreviation: $ibx
93 This represents a "publicinbox" section in the config
94 file, see public-inbox-config(5) for details.
96 * PublicInbox::Git - represents a single git repository
97 Common abbreviation: $git, $ibx->git
100 Each configured "publicinbox" or "coderepo" has one of these.
102 * PublicInbox::Msgmap - msgmap.sqlite3 read-write interface
103 Common abbreviation: $mm, $ibx->mm
104 Used everywhere if SQLite is available.
106 Each indexed inbox has one of these, see
107 public-inbox-v1-format(5) and public-inbox-v2-format(5)
108 manpages for details.
110 * PublicInbox::Over - over.sqlite3 read-only interface
111 Common abbreviation: $over, $ibx->over
112 Used everywhere if SQLite is available.
114 Each indexed inbox has one of these, see
115 public-inbox-v1-format(5) and public-inbox-v2-format(5)
116 manpages for details.
118 * PublicInbox::Search - Xapian read-only interface
119 Common abbreviation: $srch, $ibx->search
120 Used everywhere if Search::Xapian (or Xapian.pm) is available.
122 Each indexed inbox has one of these, see
123 public-inbox-v1-format(5) and public-inbox-v2-format(5)
124 manpages for details.
129 The main PSGI web interface, uses several other packages to
130 form our web interface.
132 PublicInbox::SolverGit
133 ----------------------
135 This is instantiated from the $INBOX/$BLOB_OID/s/ WWW endpoint
136 and represents the stages and states for "solving" a blob by
137 searching for and applying patches. See the code and comments
138 in PublicInbox/SolverGit.pm
143 This is instantiated from various WWW endpoints and represents
144 the stages and states for running and managing subprocesses
145 in a way which won't exceed configured process limits defined
146 via "publicinboxlimiter.*" directives in public-inbox-config(5).
148 ad-hoc structures shared across packages
149 ----------------------------------------
151 * $ctx - PublicInbox::WWW app request context
152 This holds the PSGI $env as well as any internal variables
153 used by various modules of PublicInbox::WWW.
155 As with the PSGI $env, there is one per-active WWW
156 request+response cycle. It does not exist for idle HTTP
162 * PublicInbox::NNTP - a NNTP client socket
163 Common abbreviation: $nntp
164 Used by: PublicInbox::DS, public-inbox-nntpd
166 Unlike PublicInbox::HTTP, all of the NNTP client logic for
167 serving to NNTP clients is here, including what would be
168 in $ctx on the HTTP or WWW side.
170 There may be thousands of these since we support thousands of
173 * PublicInbox::HTTP - a HTTP client socket
174 Common abbreviation: $http
175 Used by: PublicInbox::DS, public-inbox-httpd
177 Unlike PublicInbox::NNTP, this class no knowledge of any of
178 the email or git-specific parts of public-inbox, only PSGI.
179 However, it supports APIs and behaviors (e.g. streaming large
180 responses) which PublicInbox::WWW may take advantage of.
182 There may be thousands of these since we support thousands of
185 * PublicInbox::Listener - a SOCK_STREAM listen socket (TCP or Unix)
186 Used by: PublicInbox::DS, public-inbox-httpd, public-inbox-nntpd
187 Common abbreviation: @listeners in PublicInbox::Daemon
189 This class calls non-blocking accept(2) or accept4(2) on a
190 listen socket to create new PublicInbox::HTTP and
191 PublicInbox::HTTP instances.
194 Common abbreviation: $httpd
196 Represents an HTTP daemon which creates PublicInbox::HTTP
197 wrappers around client sockets accepted from
198 PublicInbox::Listener.
200 Since the SERVER_NAME and SERVER_PORT PSGI variables needs to be
201 exposed for HTTP/1.0 requests when Host: headers are missing,
202 this is per-Listener socket.
204 * PublicInbox::HTTPD::Async
205 Common abbreviation: $async
207 Used for implementing an asynchronous "push" interface for
208 slow, expensive responses which may require spawning
209 git-httpd-backend(1), git-apply(1) or other commands.
210 This will also be used for dealing with future asynchronous
211 operations such as HTTP reverse proxying and slow storage
212 retrieval operations.
215 Common abbreviation: $nntpd
217 Represents an NNTP daemon which creates PublicInbox::NNTP
218 wrappers around client sockets accepted from
219 PublicInbox::Listener.
221 This is currently a singleton, but it is associated with a
222 given PublicInbox::Config which may be instantiated more than
225 * PublicInbox::EOFpipe
227 Used throughout to trigger a callback when a pipe(7) is closed.
228 This is frequently used to portably detect process exit without
229 relying on a catch-all waitpid(-1, ...) call.