|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147 |
- # Formats
-
- Heb12 supplies some formats for storing the Bible.
-
- ## haplous
-
- haplous (ἁπλοῦς) - simple
-
- A simple, fast, and extensible format for reading the Bible.
-
- **Note:** While nous (the official haplous parser) is being written, all of haplous should be considered unstable and is subject to completely change daily.
-
- ### Principles
-
- These are the main principles of the format and their priority in comparison to each other.
-
- 1. Speed
- 2. Simplicity
- 3. Flexibility
-
- #### Speed
-
- The format should be optimized for quick parsing. It should be trivial to find text in one pass without having to read it to memory.
-
- #### Simplicity
-
- All configuration of the format should be obvious and minimal to enable easy parsing and human readability. Everything should have an obvious purpose even without reading the spec.
-
- #### Flexibility
-
- Since the main format is just plain text, configuration can be added in many forms while still maintaining simplicity.
-
- ### Spec
-
- #### Vocabulary
-
- - "Work" is a Bible document and related metadata
- - "Metadata" refers to the information above the document
- - "Document" is the text itself and related information
-
- #### Metadata
-
- Metadata MUST be stored at the beginning of the file, in the form of `#id:value`.
-
- Each Work MUST include this information:
-
- - `lang`: the language code for the document
- - `title`: the title of the document
- - `id`: the short ID of the dodcument
- - `public_domain`: whether or not the document is in the public domain
- - `type`: the type of document (most often "bible")
-
- Metadata MUST NOT appear after the document.
-
- #### Document
-
- The actual text of the Work MUST be divided into books. The start of a book is shown by `#book:id`, and is ended when the next book is found.
-
- ### Examples
-
- ```
- #lang:en
- #title:World English Bible
- #id:WEB
- #public_domain:true
- #type:bible
-
- #book:Gen
- #chapter:1
- In the beginning God created the heavens and the earth.
- Now the earth was formless and empty. Darkness was on the surface of the deep. God`s Spirit was hovering over the surface of the waters.
- ... rest of Genesis 1
- ^
- #chapter:2
- ...
- ^
-
- #book:exod
- #chapter:1
- Now these are the names of the children of Israel, who came into Egypt; every man and his household came with Jacob.
- Reuben, Simeon, Levi, and Judah,
- ^
- ```
-
- ### Parsing strategies
-
- #### Chapter
-
- 1. Find the requested book
- 2. Find the requested chapter
- 3. Collect lines until `^` is found
-
- #### Verses
-
- 1. Find the requested book
- 2. Find the requested chapter
- 3. Find the start verse
- 4. Collect lines until end verse is found
-
- ## BibleC
-
- BibleC is a tiny format to store Bible text.
-
- It is designed to be:
- 1. Extremely Minimal (One C source file, under 200 lines)
- 2. Flexible, Hackable - Easy to understand how the code works
-
- It was built with a "Make it simple and keep it simple" design philosophy.
-
- ### Design
- The verses are simply stored in a file seperated by newlines. This allows for mass
- grammar/spelling fixes without interference with formatting characters.
-
- A seperate data structure stored in memory (can also be loaded via index file)
- is used to quickly calculate what line a verse(s) is on from reference.
-
- It does not require complicated parsing or memory allocations,
- so it is very easy to port to other languages and platforms.
-
- ## Comparison:
- A test was done to see whether
- BibleC or Haplous was faster. Each verse was grabbed
- 100 times, with a new instance set up each time.
-
- Note that BibleC loaded the index file
-
- ```
- Gen 1 1:1
- haplous: ~0.000225ms
- biblec: ~0.002346ms
-
- Exod 1 1:1
- haplous: ~0.040692ms
- biblec: ~0.014129ms
-
- Rev 1 1:1
- biblec: ~0.178406ms
- haplous: ~0.759610ms
- ```
-
- Both tests leaked no memory.
- As you can see, haplous is faster with `Gen 1 1:1`, since
- it didn't have to parse a 4kb index file before reading the first verse.
-
- Most of the time, BibleC is faster than Haplous. BibleC first
- calculates the verse line, then seeks to it. Haplous has to parse every
- line it reads.
|