Concepts and Vocabulary
Earthstar is an:
- eventually consistent
- offline-first
- embedded
- NoSQL document database
- that syncs.
It...
- can sync peer-to-peer and peer-to-server
- can sync across untrusted peers
- uses cryptographic signatures to prevent data tampering
#
SpecificationThis is an informal introduction. See the Specification for more precise details.
#
AuthorAn author is a person identified by a public key. This is a "user account".
A person can use the same author name and key from multiple devices.
Shortnames are exactly 4 characters long with only lower case ASCII letters. They can't be changed later. They help protect against phishing attacks.
Authors also have human-readable long names which are saved as regular documents along with other profile information, and can contain any characters. They can be changed. This is like a "display name" in Slack or Twitter.
#
WorkspaceA workspace is a collection of documents that can be accessed by certain authors.
There are 2 kinds of workspaces:
Unlisted workspaces can be written by anyone, if they know the workspace address. The address has a 20-character random number at the end to make it hard to guess.
Invite-only workspaces have a public key at the end. They can only be written to by authors who know the matching private key. Anyone can still sync and read the data, but authors can choose to encrypt their documents using the workspace key so only workspace members can read them. (NOT IMPLEMENTED YET)
Workspace names are short strings containing 1 to 15 lower-case ASCII letters.
#
DocumentA document is a JSON-style object with the following shape:
#
FormatThis string identifies the feed format or schema used by Earthstar. It controls the rules used to validate signatures and documents. The format is versioned to help preserve old data.
#
PathA string identifying this document like a path in a filesystem.
Example paths:
#
Valid pathsTypically the first segment of the path represents the type of document, or the application used to make it. When syncing or querying data you will often give a path prefix to get a subset of the documents (like /wiki/
to get all wiki documents).
Paths start with a /
and do not end with a /
.
Paths are a sequence of one or more path segments, separated by /
.
A path segment must have at least one character. These characters are the only ones allowed in path segments:
- ASCII uppsercase and lowercase letters
- ASCII numbers
- These characters:
()'-._~!*$&+,:=?@%
Note that ~
has a special meaning (see the Path Ownership section).
Paths are case sensitive.
No spaces, no unicode characters, no unprintable ASCII characters. Use browser-style percent-encoding to embed these other characters.
#
There are no "folders", only documentsAlthough these are like filesystem paths, "folders" or "directories" don't actually exist, there are only documents. So if there are two documents at these two paths:
Each of those two separate documents would contain their own data. There may or may not be a third document at /wiki
.
Instead of folders we talk about "path prefixes", like "all paths starting with /wiki/
"
#
ContentThe main content of the document. User data goes here.
This must be a unicode string of any length. Empty strings are allowed. To store binary data you can base64-encode it first.
There is no way to store the MIME type of the data - consider using a file extension in the path, like /notes/todo.txt
.
TODO: expand this to allow JSON-style nested objects, binary data, ?
TODO: add MIME type?
#
TimestampA document's timestamp is when it was created / modified by the most recent author. Higher timestamps allow newer documents to take precedence over older versions. Earthstar doesn't have version numbers or sequence numbers, only timestamps.
Timestamps are integers in Unix microseconds (millionths of a second). Javascript measures time in milliseconds, so multiply by 1,000 to get microseconds. If you have seconds, multiply by 1,000,000.
Authors set the timestamps themselves so we can't trust them completely. During sync, documents with timestamps too far in the future are silently skipped (not accepted or stored). This prevents authors from faking future timestamps and spreading them through the network. However, authors are able to dishonestly set old timestamps and we can't detect if they've done so.
"Too far in the future" is recommended to be 10 minutes.
Timestamps must be >= 10000000000000 (10^13). This number was chosen to reject timestamps accidentally set in milliseconds instead of microseconds.
Timestamps must be < 9007199254740991 (2^53-1). This is Javascript's Number.MAX_SAFE_INTEGER
.
#
Writing documentsIndividual document versions are immutable. However, newer versions overwrite older versions at the same path, so overall the document at a certain path is mutable.
#
Document HistoryWithin a certain path, Earthstar keeps the latest document from each author who has written to that path.
In other words, if 3 authors have ever written to /example/path
, Earthstar will remember 3 documents there -- the newest from each author.
When new versions arrive, older versions are forgotten (locally deleted).
The newest version is returned by default; the others are available by querying with inclueHistory: true
in case you want to do more sophisticated conflict resolution.
#
Deleting documentsDocuments can't be deleted but they can be overwritten by a newer version with content: ""
. Documents with empty-string content should generally be treated by applications as if they don't exist.
TODO: Storage queries should allow you to filter out empty documents.
#
Path ownershipAuthors can own certain paths, which means that only they can write there.
If a path contains an author address prefixed with a tilde, only that author can write to that path.
If it contains multiple such tilde-authors, any of them can write.
Paths with no tildes are shared paths with no owners, and anyone in the workspace can write there:
#
QueryYou can retrieve documents in several ways:
- Listing all paths, sorted by path
- Getting all documents, sorted by path
- Getting the document at one specific path
- Querying
To query, you supply a query object:
#
Pub serversA pub is a server that helps sync workspaces. It holds a copy of the data and sits at a publically accessible URL, usually on a cloud server.
Pubs have regular HTTP style URLs:
Pubs can be configured to accept any workspace that's pushed to them, or they can have allowlists or blocklists to limit which workspaces they'll host.
A workspace can be hosted by multiple pubs.
Pubs have no authority over users, they just help sync data.
#
Finding your friendsThere is no centralized discovery or friend-finding system.
To join a workspace you need to know:
- The workspace address:
+gardening.mVkCjHbAcjEBddaZwxFV
- The workspace private key, if it's an invite-only workspace
- One or more pubs that people in that workspace are using, so you can sync
Users are expected to share their workspace addresses and pubs with each other outside of Earthstar, such as by email or chat.
#
URLs and URIsHere's how to combine different kinds of Earthstar addresses. In this documentation, xxxxx
is an abbreviation for long keys in base58 format.
TODO: how to link to a specific version of a document?
#
ClassesThis is specific to the Javascript implementation; other libraries might have different internal structures.
#
Replication / SyncingThese are two words for the same thing - trading data with other peers to bring each other up to date. This can be one-way (push or pull), or two-way.
#
Incoming and Outgoing Replication QueriesA peer's Incoming Replication Queries specify which data it wants from other peers.
Its Outgoing Replication Queries control which data it will give to other peers.
Both of these are lists of Query objects. Adding more clauses inside a Query object narrows down the results (logical AND). Adding more Query objects to the list broadens the results (logical OR).
#
Transactions, data integrityThere are no transactions or batch writes. The atomic unit is the document. If you update 2 documents at the same time, it's possible that peers will end up with just one of the updates -- because of an interrupted sync, or because one was filtered out by a replication query.
If certain pieces of state need to always be updated together, you can just design them to be part of the same document. But there's a tradeoff -- larger documents are more likely to have conflicts when multiple people edit at the same time. Smaller documents let people make narrower changes that sync together easily.
#
Conflict resolutionEarthstar does not have fancy conflict resolution.
Each document has a history of old versions. Earthstar keeps one version from each author. Older versions are forgotten.
When fetching a path, the latest version is returned (by author-asserted timestamp). You can also get all versions if you want to do manual conflict resolution.
Note that this image is simplified. A real document looks approximately like this: