Heb12 supplies some formats for storing the Bible.
haplous (ἁπλοῦς) - simple
A simple, fast, and extensible format for reading the Bible.
Note: While nous (the official haplous parser) is being written, all of haplous should be considered unstable and is subject to completely change daily.
These are the main principles of the format and their priority in comparison to each other.
The format should be optimized for quick parsing. It should be trivial to find text in one pass without having to read it to memory.
All configuration of the format should be obvious and minimal to enable easy parsing and human readability. Everything should have an obvious purpose even without reading the spec.
Since the main format is just plain text, configuration can be added in many forms while still maintaining simplicity.
Metadata MUST be stored at the beginning of the file, in the form of #id:value
.
Each Work MUST include this information:
lang
: the language code for the documenttitle
: the title of the documentid
: the short ID of the dodcumentpublic_domain
: whether or not the document is in the public domaintype
: the type of document (most often “bible”)Metadata MUST NOT appear after the document.
The actual text of the Work MUST be divided into books. The start of a book is shown by #book:id
, and is ended when the next book is found.
#lang:en
#title:World English Bible
#id:WEB
#public_domain:true
#type:bible
#book:Gen
#chapter:1
In the beginning God created the heavens and the earth.
Now the earth was formless and empty. Darkness was on the surface of the deep. God`s Spirit was hovering over the surface of the waters.
... rest of Genesis 1
^
#chapter:2
...
^
#book:exod
#chapter:1
Now these are the names of the children of Israel, who came into Egypt; every man and his household came with Jacob.
Reuben, Simeon, Levi, and Judah,
^
^
is foundBibleC is a tiny format to store Bible text.
It is designed to be:
It was built with a “Make it simple and keep it simple” design philosophy.
The verses are simply stored in a file seperated by newlines. This allows for mass
grammar/spelling fixes without interference with formatting characters.
A seperate data structure stored in memory (can also be loaded via index file)
is used to quickly calculate what line a verse(s) is on from reference.
It does not require complicated parsing or memory allocations, so it is very easy to port to other languages and platforms.
A test was done to see whether
BibleC or Haplous was faster. Each verse was grabbed
100 times, with a new instance set up each time.
Note that BibleC loaded the index file
Gen 1 1:1
haplous: ~0.000225ms
biblec: ~0.002346ms
Exod 1 1:1
haplous: ~0.040692ms
biblec: ~0.014129ms
Rev 1 1:1
biblec: ~0.178406ms
haplous: ~0.759610ms
Both tests leaked no memory.
As you can see, haplous is faster with Gen 1 1:1
, since
it didn’t have to parse a 4kb index file before reading the first verse.
Most of the time, BibleC is faster than Haplous. BibleC first
calculates the verse line, then seeks to it. Haplous has to parse every
line it reads.