Skip to main content

Automerge 2.2: Rich Text

· 6 min read

We are delighted to announce the release of rich text support in Automerge, including a fully supported ProseMirror binding as the initial reference implementation. This means that you can now build collaborative applications on Automerge with realtime and asynchronous editing of rich text including inline formatting, block elements, and more.

If you want to get started building right away, check out the library here: https://github.com/automerge/automerge-prosemirror

For everyone else, let's start with a demo, before moving on to discuss what rich text is and how Automerge helps you use it.

Demo

On its own, this should seem pretty boring: it's a rich text editor which supports most of the features users typically expect from a rich text editor. What makes this demo interesting is the support for real time collaboration which means that we can manage concurrent changes to complex formatting, like this:

The Automerge-ProseMirror binding is designed to be easy to integrate into any ProseMirror editor you might want to build. To see how it works, refer to the cookbook, but the short story is that it takes just a few lines of code.

Why is rich text a custom datatype in Automerge?

Automerge aims to make the experience of building production-ready collaborative applications as close as possible to the ease and speed of writing a local prototype. This is why the Automerge API focuses on giving you something that feels like just modifying a local Javascript object. Automerge provides a consistent abstraction for your data so that you can focus on your users' needs and not on the finer points of storage and synchronization.

In this context, rich text poses a problem. As we discuss at length in our past paper, Peritext, rich text doesn't map easily on to plain-text or tree structures. Attempting to do so can lead to incorrect behaviour during a merge.

For a real-world example of the kind of data-loss that is difficult to avoid with traditional approaches, here's an example using the yjs prosemirror bindings:

When the edits from the two sides come together, the representation of the data requires the editor to choose between either adding a list item, or converting the list into a paragraph. In this case, the extra list item is lost (though it could have been the opposite.)

This kind of conflict is very rare in online editing scenarios. It only occurs when two users manage to submit conflicting structural edits concurrently. This becomes much more likely during longer sessions of offline collaboration Automerge is designed to support. Our goal is to ensure consistent and correct behaviour under all network and collaboration conditions, so for us this was an important problem.

Our goal has been to provide an implementation of rich-text support which allows both edits to be kept.

How it works

Rather than representing rich text as a tree structure like HTML, we represent it as plain text annotatedwith spans and blocks:

examplesbehaviour
spans<a> <em>overlapping
blocks<p> <li>independent

The difference between the two is that text may appear in many spans, but should only ever be in a single block. A sentence may be bold and italic, but it cannot be simultaneously part of two paragraphs.

Formatting spans, originally described in Peritext are conceptually stored outside the text. A formatting span has a beginning and an end within the text sequence and a flag detailing whether the span should expand when characters are inserted at the boundaries of the span.

Block markers have a type - such as "ordered list item" - and parents - such as "blockquote". The parents represent the hierarchical structure of the document. Block markers are inserted into the sequence of text characters.

These elements map quite closely to user actions whilst editing. Typically a text editor allows you to highlight a sequence of characters and format them - regardless of whether they are in different regions of the document (try highlighting and bolding half of a list item and preceding paragraph in Google Docs for example). On the other hand, inserting a new list item is usually achieved by pressing Enter at the end of the current list item - inserting a block marker; and indenting a list item is done by pressing a button in the toolbar - inserting a new parent into the block parents.

Choosing operations on the underlying data structure which map well to typical actions performed while editing text means we can provide accurate representations of the difference between two versions of the text. Here's the same structure change example in automerge

We plan to write a more detailed description of these algorithms (which were developed in concert with Martin Kleppmann) in a future paper.

How can I try it?

Support for rich text landed in Automerge 2.2 and you can find a writeup of the API here. You can find several examples of how to use the Prosemirror bindings in the Automerge-ProseMirror repository. We've also made a simple starter project to a starter project you can fork. Please feel free to experiment with the playground above. If you find any behaviours that seem surprising, we'd love to hear about it. Whatever you're doing, we hope you'll join us in the Automerge Discord and let us know how you're getting on.

Commercial Support for Automerge

If you're a business building a commercial product on top of Automerge, we recommend becoming a commercial sponsor. Automerge is only available for production use thanks to our supporters and we are highly motivated to ensure their success.

Sponsors of the project receive ongoing support from our team, including architecture review early in a project, advice around scaling or launch issues, and extra visibility and influence into our roadmap. Sponsors also get a private Discord channel for asking questions specific to their project.

Email alex@inkandswitch.com or message us in the Automerge Discord if you'd like to learn more about sponsorship and support options.

Merry Commitmas - a year-end recap of what's new in Automerge

· 7 min read

Since releasing automerge-repo last month, we've been working closely with our users to improve the library based on real-world usage. One in-house project, the Tiny Essay Editor, is a Markdown editor with comment support which was used to write the latest Ink & Switch essay, Embark.

Like many Ink & Switch essays, Embark is a large piece: the final version is over 11,000 words and 60,000 characters. The full edit history is just shy of 200,000 edits. That means the team produced roughly 3x as much text as made it into the final version. The final version of the document with the full history and all the comments included is only 376kb and takes a little under 4s to load from disk, but from then on edits are reasonably snappy: most edits take 25ms (equivalent to 30fps) from keypress to paint on my desktop.

There's still plenty of room to improve here. Ultimately our goal is to reliably achieve single-frame updates even on very large documents and we still have a long way to go on memory usage. Still, we thought folks might enjoy hearing a little behind-the-scenes description of what we've been up to.

But first, a few feature updates:

CodeMirror Integration

Tiny Essay Editor is built around the automerge-codemirror integration and uses incremental updates to make sure it stays fast even on extremely large documents. We've managed to maintain next-frame performance for most edits and document sizes, but on very large documents we still have a few stalls caused by calculating network synchronization messages to work through.

That said, the CodeMirror integration is stable, efficient, and works well with both the marks and cursors APIs. If you need a well-supported plaintext editor (or want a reference to write your own integration for your favorite editor) start here. ProseMirror integration is coming too, more about that after Christmas.

updateText for easy integration

By default, Automerge's text fields update by replacement, much like they would with any web form. If your application submits the full value for a field, Automerge will replace the whole value. Under the hood, Automerge's strings default to being editable, but integrating a full text editor component in your application is a lot of complication for making a simple text field editable.

The reason is that the interface Automerge exposes for modifying text is Automerge.splice, which lets you insert or delete characters at a particular index in the string. Unfortunately, browsers (and most other platforms) don't give you this information very easily; instead they just give you the whole content of the text field and you have to figure out yourself what changed.

Figuring out what changed between two strings is actually quite fiddly. There are algorithms you can study, such as the Myers diff, and libraries that implement them... but we decided that it would be worthwhile to just build one into Automerge and spare you the hassle. We've therefore introduced a function Automerge.updateText, which looks like this:

let doc1 = Automerge.from({ text: "Hello world!" });

let doc2 = Automerge.clone(doc1);
doc2 = Automerge.change(doc2, (d) => {
/// Note we just pass the new value we want the whole field to have
Automerge.updateText(d, ["text"], "Goodbye world!");
});

doc1 = Automerge.change(doc1, (d) => {
Automerge.updateText(d, ["text"], "Hello friends!");
});

/// text is now "Goodbye friends!"
const merged = Automerge.merge(doc1, doc2);

This approach is really handy for places like form fields where a full rich-text editor would be overkill, but isn't as efficient at capturing inputs, particularly for larger documents. Let us know how it works for you!

Surfacing Sync State

As the Embark essay grew ever-larger, the team began to wonder whether they were up-to-date with each other, our storage server, and so on. We worked with them (thanks to Paul Sonnentag) to allow sync state to be forwarded among peers so that you could subscribe to the sync state of other systems. Right now TEE is just using this to confirm when changes are sent to (or received from) our storage server, particularly after editing offline, but the same infrastructure could be used to keep track of which of your devices were up-to-date, whether a collaborator had received your changes, or even to annotate a chat history. We're eager to see how you might use this. (And don't forget you can always send arbitrary messages to other peers with the ephemeral messaging API!)

Loading faster

Finally, let's wrap up with some performance work. As we described above, the Embark essay's Automerge document history got pretty large – roughly 200,000 operations, with around 1000 edit sessions (one per editor tab). This uncovered some performance problems in Automerge: when we started, loading the editor took around 40 seconds!

If you've ever encountered the "edit trace" benchmark, which is widely used to benchmark CRDT performance, this might be confusing. That benchmark is even larger. It contains around 270,000 operations and Automerge can load it in ~200ms. Why were we taking two orders of magnitude longer to load a similar-sized document?

Well, notice that I said loading the editor took 40s. In profiling this problem, we saw that the Tiny Essay Editor (TEE) created an empty Automerge document, and ran the sync protocol with our sync server to fetch the document. The sync protocol didn't send the whole document down the wire in one go – instead it would send the list of changes that the client doesn't have. In the case of the initial load, the client doesn't have any of the changes, and so each change was sent down the wire individually. TEE would then apply each change one after another. Applying invididual changes in this way is much slower than loading the compressed document format (which is produced by Automerge.save).

We solved this in a straightforward way: when a peer requests a document it doesn't have at all, we skip the elaborate sync protocol and simply send the whole compressed document. Future synchronizations are very fast: the peers remember their last sync state and can quickly calculate the comparison

Unfortunately, even loading the compressed version of this document was much slower than we expected: it was taking somewhere around 5s. That's about 5s too long.

Investigating where the time was being spent we spotted a few performance problems, including:

  • Automerge stores operations internally in a B-tree, which has a vector of operations on each node. We were losing some time allocating these little vectors every time we received a new change.
  • Each node in the B-tree has an index on it, where we store things like the number of ops and the number of characters in its subtree. When loading a document we were updating these indexes for every edit in the document's history.

We solved these problems by making several changes:

  1. Rather than storing the operations directly in the B-tree, we now store them in a separate table and just store indexes into this table in the B-tree. This consolidates allocations so we don't spend so much time making tiny allocations.
  2. When loading a document, we wait until we've inserted every op into the B-tree before generating indexes.

Putting this all together, the load time for the Embark essay is reduced to around ~4 seconds on my machine. This is still about 4s too slow but we've managed to shave the first 90% off of the loading time in this application.

That's it!

You can get all this good stuff by updating to Automerge 2.1.10 or later, as well as plenty of smaller improvements (like import/export) and bug fixes (like getting rid of a React hook race condition).

Automerge-Repo: A "batteries-included" toolkit for building local-first applications

· 10 min read

Today we are announcing our new library, automerge-repo, which makes it vastly easier to build local-first applications with Automerge. Take a look at our quickstart guide or read on for some background and examples.

For those new to this idea: local-first applications are a way of building software that allows both real-time collaboration (think Google Docs) and offline working (think Git). They work by storing the user's data locally, on their own device, and syncing it with collaborators in the background. You can read more about the motivation for local-first software in our essay, or watch a talk introducing the idea.

A challenge in local-first software is how to merge edits that were made independently on different devices, and CRDTs were developed to solve this problem. Automerge is a fairly mature CRDT implementation. In fact, we wrote this blog post using it! The API is quite low-level though, and Automerge-Core has no opinion about how networking or storage should be done. Often, the first thing developers ask after discovering Automerge was how to connect it into an actual application.

Our new library, automerge-repo, extends the collaboration engine of Automerge-Core with networking and storage adapters, and provides integrations with React and other UI frameworks. You can get to building your app straight away by taking advantage of default implementations that solve common problems such as how to send binary data over a WebSocket, how often to send synchronization messages, what network format to use, or how to store data in places like the browser's IndexedDB or on the filesystem.

If you've been intimidated by the effort of integrating Automerge into your application because of these choices, this library is for you. Now you can simply create a repo, point it to a sync server, and get to work on your app.

automerge-repo: a simple example

Let's start by taking a look at a simple example of how automerge-repo works. To begin, create and configure a repository for Automerge documents.

const repo = new Repo({
storage: new IndexedDBStorageAdapter("automerge-demo"),
network: [new WebsocketClientNetworkAdapter("wss://sync.automerge.org")]
})

The code in the example above creates a repository and adds a storage and network adapter to it. It tells automerge-repo to store all changes in an IndexedDB table called automerge-demo and to synchronize documents with the WebSocket server at sync.automerge.org. The library is designed to support a wide variety of network transports, and we include a simple client/server WebSocket adapter out of the box. Members of the community are already adding support for other transports, such as WebRTC.

In this example we're connecting to the public test server hosted by the Automerge team, but you can also run your own sync server. In fact, our sync server runs almost the same code as above, but with a different network and storage adapter.

note

The Automerge project provides a public sync server for you to experiment with sync.automerge.org. This is not a private instance, and as an experimental service has no reliability or data safety guarantees. Basically, it's good for demos and prototyping, but run your own sync server for production uses.

Next, create a document and make some changes to it:

   > const handle = repo.create()
> handle.change(doc => { doc.hello = "World." })
> console.log(handle.url)
automerge:2j9knpCseyhnK8izDmLpGP5WMdZQ

The code logs a URL to the document you created. On another computer, or in another browser, you could load this document using the same URL, as shown below:

   > const handle = repo.find("automerge:2j9knpCseyhnK8izDmLpGP5WMdZQ")
> console.log(await handle.doc())
// why don't you try it and find out?

What's happening here to make all this work? automerge-repo wraps the core Automerge library and handles all the work of moving the bytes around to make your application function.

Key Concepts & Basic Usage

Let's go into a bit more detail. For full documentation please see the docs.

Repo

Create a repo by initializing it with an optional storage plugin and any number of network adapters. These are the options for initializing a repo:

export interface RepoConfig {
// A unique identifier for this peer, the default is a random id
peerId?: PeerId
// Something which knows how to store and retrieve binary blobs
storage?: StorageAdapter
// Something which knows how to send and receive sync messages
network: NetworkAdapter[]
// A function which determines whether to share a document with a peer
sharePolicy?: SharePolicy
}

Don't let the usage of "peer" confuse you into thinking this is limited to peer to peer connectivity, automerge-repo works with both client-server and peer-to-peer network transports.

The main methods on Repo are find(url) and create(), both of which return a DocHandle you can work with.

Handle & Automerge URLs

A DocHandle is a reference to an Automerge document that a Repo syncs and stores . The Repo instance saves any changes you make to the document and syncs with connected peers. Likewise, you can listen over the network to a Repo for any changes it received.

Each DocHandle has a .url property. This is a string which uniquely identifies a document in the form automerge:<base58 encoded bytes>. Once you have a URL you can use it to request the document from other peers.

DocHandle.doc() and DocHandle.docSync()

These two methods return the current state of the document. doc() is an asynchronous method that resolves when a repository loads the document from storage or retrieves it from a peer (whichever happens first), and docSync() is a synchronous method that assumes the document is already available. The examples below illustrate asynchronously loading a document or synchronously loading a document and then interacting with it:

> const handle = repo.find("automerge:2j9knpCseyhnK8izDmLpGP5WMdZQ")
> const doc = await handle.doc()
> console.log(doc)

Or

> const handle = repo.find("automerge:2j9knpCseyhnK8izDmLpGP5WMdZQ")
> handle.whenReady().then(() => {
console.log(handle.docSync())
})

In this latter example we use DocHandle.whenReady, which returns a promise that the repository resolves when it loads a document from storage or fetches it from another peer in the network.

change() and on("change")

Use DocHandle.change when you modify a document.

> const handle = repo.find("automerge:2j9knpCseyhnK8izDmLpGP5WMdZQ")
> await handle.doc()
> handle.change(d => d.foo = "bar")

The Repo calls DocHandle.on("change") whenever the document is modified – either due to a local change or a sync message being received from another peer.

> const handle = repo.find("automerge:4CkUej7mAYnaFMfVnffDipc4Mtvn")
> await handle.doc()
> handle.on("change", ({doc}) => {
console.log("document changed")
console.log("New content: ", doc)
})

Integrations

automerge-repo provides a set of primitives that you can use to build a wide range of applications. To make this easier, we have built integrations with a few common UI frameworks. You can easily add further integrations and we welcome contributions which integrate with popular frameworks!

React Integration

@automerge/automerge-repo-react-hooks makes it easy to use automerge-repo in a React application. Once you've constructed a Repo you can make it available to your React application using RepoContext. Once available, call useHandle to obtain a DocHandle:

function TodoList(listUrl: AutomergeUrl) {
const handle = useHandle(listUrl)
// render the todolist
}

Note that when Repo receives changes over the network or registers local changes, the original Automerge document remains immutable, and any modified parts of the document get new objects. This means that you can continue to use reference equality checks you're used to for in-memory data, in places like React.memo or useMemo.

Svelte Integration

@automerge/automerge-repo-svelte-store provides setContextRepo to set the Repo which is used by the document store:

<script lang="ts">
import { document } from "@automerge/automerge-repo-svelte-store"
import { type AutomergeUrl } from "@automerge/automerge-repo"

export let documentUrl: AutomergeUrl

// Doc is an automerge store with a `change` method which accepts
// a standard automerge change function
const doc = document<HasCount>(documentUrl)
const increment = () => {
doc.change((d: HasCount) => (d.count = (d.count || 0) + 1))
}
</script>

<button on:click={increment}>
count is {$doc?.count || 0}
</button>

What about <X>?

We'd love to help you make automerge work in your favorite development environment! Please reach out to us on GitHub or via our Slack.

Extending automerge-repo

You can extend automerge-repo by writing new storage or network adapters.

Storage Adapters

A storage adapter represents some kind of backend that stores the data in a repo. Storage adapters can be implemented for any key/value store that allows you to query a range of keys with a given prefix. There is no concurrency control required (that's implemented in automerge-repo) so you can safely have multiple repos pointing at the same storage. For example, you could implement an adapter on top of Redis.

The automerge-repo library provides storage adapters for IndexedDB and the file system (on Node).

Network Adapters

A network adapter represents a way of connecting to other peers. Network adapters raise events when a new peer is discovered or when a message is recieved, and implement a send method for transmitting messages to another peer. automerge-repo assumes a reliable, in-order transport for each peer; as long as you can provide this (e.g. using a TCP connection), you can implement an adapter. You could implement an adapter for BLE, for example.

The automerge-repo library provides network adapters for WebSocket, MessageChannel, and BroadcastChannel.

Other languages/platforms

This release of automerge-repo is just for javascript. Automerge is a multi-language library though and there are efforts under way to implement automerge-repo on other platforms. The most mature of these is automerge-repo-rs. We welcome contributions and please reach out if you're starting to develop automerge-repo for a new platform.

Beta Quality

automerge-repo works pretty well – we're using it at Ink & Switch for a bunch of internal projects. The basic shape of the API is simple and useful, and not having to think about the plumbing makes it much, much faster to get a useful application off the ground. However, there are some performance problems we're working on:

  1. Documents with large histories (e.g. a collaboratively edited document with >60,000 edits) can be slow to sync.
  2. The sync protocol currently requires that a document it is syncing be loaded into memory. This means that a sync server can struggle to handle a lot of traffic on large documents.

These two points mean that we're not ready to say this project is ready for production.

We're working hard on fixing the performance so that we can say this is ready for production. But if you are interested in experimenting with the library now, or if you are only going to be working with relatively small documents or low traffic sync servers then you are good to go!

(If you want us to get to production faster, or you have some specific requirements, please consider sponsoring Automerge development 🙂)

Finally, we don't want to give the impression that everything is smooth sailing. automerge-repo solves a bunch of the hard problems people were encountering around networking and storage. There are still plenty of other difficult problems in local first software where we don't have turnkey solutions: authentication and authorization, end-to-end encryption, schema changes, version control workflows etc. automerge-repo makes many things much easier, but it's a frontier out here.

Automerge 2.0

· 12 min read
Contributor

Automerge 2.0 is here and ready for production. It’s our first supported release resulting from a ground-up rewrite. The result is a production-ready CRDT with huge improvements in performance and reliability. It's available in both JavaScript and Rust, and includes TypeScript types and C bindings for use in other ecosystems. Even better, Automerge 2.0 comes with improved documentation and, for the first time, support options for production users.

Automerge, CRDTs, and Local-first Software

Before getting into the details of why we're excited about Automerge 2.0, let's take a bit of time to explain what Automerge is for anyone unfamiliar with the project.

Automerge is a CRDT, or "conflict-free replicated data type", but if you're allergic to buzzwords you can just think of it as a version controlled data structure. Automerge lets you record changes made to data and then replay them in other places, reliably producing the same result in each. It supports JSON-like data, including arbitrarily nested maps and arrays, as well as some more advanced data types such as text and numeric counters.

This is useful for quite a few reasons: you can use it to implement real-time collaboration for an application without having to figure out tricky application-specific algorithms on the server. You can also use it to better support offline work. We think it has even more potential than just that.

Since the rise of the cloud, developers have largely had to choose between building cloud software or traditional installed software. Although installed software has some reliability and performance benefits, cloud software has dominated the market. Cloud software makes sharing data between users easy and includes ubiquitous access from any computing device. Unfortunately, the advantages of cloud software come at a high price. Cloud software is fragile and prone to outages, rarely supports offline use, and is expensive to scale to large audiences.

At Ink & Switch, we’ve been researching a model for developing software which we call local-first software, with the goal of combining the best of both worlds: reliable, locally-executed software paired with scalable offline-friendly collaboration infrastructure. We believe that a strong data model based on recording change over time for every application should be a cornerstone of that effort.

Automerge-RS: Rebuilt for Performance & Portability

Earlier versions of Automerge were implemented in pure JavaScript. Our initial implementations were theoretically sound but much too slow and used too much memory for most production use cases.

Furthermore, JavaScript support on mobile devices and embedded systems is limited. We wanted a fast and efficient version of Automerge that was available everywhere: in the browser, on any mobile device, and even microcontrollers like the ESP32.

Instead of trying to coordinate multiple distinct versions of Automerge, we decided to rewrite Automerge in Rust and use platform-specific wrappers to make it available in each language ecosystem. This way we can be confident that the core CRDT logic is identical across all platforms and that everyone benefits from new features and optimizations together.

For JavaScript applications, this means compiling the Rust to WebAssembly and providing a JavaScript wrapper that maintains the existing Automerge API. Rust applications can obviously use the library directly, and we're making sure that it's as easy as possible to implement support in other languages with well-designed traits and a comprehensive set of C bindings.

To deliver this new version, lab members Alex Good and Orion Henry teamed up with open source collaborators including Andrew Jeffery and Jason Kankiewicz to polish and optimize the Rust implementation and JavaScript wrapper. The result is a codebase that is hundreds of times faster than past releases, radically more memory efficient, better tested, and more reliable.

Documenting Automerge

With Automerge 2.0 we've made a big investment in improving documentation. In addition to sample code, we now have a quick-start guide that supports both Vite and create-react-app, as well as internals documentation, file format and sync protocol documentation. This work was led by lab alumnus Rae McKelvey and we hope it helps make getting started with Automerge much easier. Please let us know if there are other topics or areas you'd like to see covered!

Supporting Automerge

Those who have been following Automerge for a while may have noticed that we describe Automerge 2.0 as our first supported release. That’s because as part of the Automerge 2.0 release we’ve brought Alex Good onto the team full-time to provide support to external users, handle documentation, release management, and—of course—to continue implementing new Automerge features for the community.

This is a big moment for Ink & Switch and the Automerge project: we’re now able to provide support to our users thanks to sponsorship from enterprises like Fly.io, Prisma, and Bowtie as well as so many others who have contributed either directly to Automerge or through supporting Martin Kleppmann on Patreon.

If your business is interested in sponsoring Automerge, you can sponsor us directly, or get in touch with us for more information or other sponsorship methods. Every little bit helps, and the more sponsors we have, the more work we can do while still remaining an independent open source project.

At Bowtie we support Automerge because it's the best way to achieve the resilliency properties that we're delivering to globally distributed private networks. It's clear to me that our sponsorship has furthered our software, and that this crew are among the best distributed-systems thinkers in the business. -- Issac Kelly, CTO, Bowtie.

Performance: Speed, Memory and Disk

Using a CRDT inherently comes with overhead: we have to track additional information in order to be able to correctly merge work from different sources. The goal of all CRDT authors is to find the right trade-offs between preserving useful history, reducing CPU overhead, and efficiently storing data in memory and on disk.

With the Automerge project, our goal is to retain the full history of any document and allow an author to reconstruct any point in time on demand. As software developers we're accustomed to having this power: it's hard to imagine version control without history.

With Automerge 2.0, we've brought together an efficient binary data format with fast updates, save, and load performance. Without getting too into the details, we accomplish this by packing data efficiently in memory, ensuring that related data is stored close together for quick retrieval.

Let's take a look at some numbers. One of the most challenging benchmarks for CRDTs is realtime text collaboration. That's because a long editing session can result in hundreds of thousands of individual keystrokes to record and synchronize. Martin Kleppmann recorded the keystrokes that went into writing an academic paper and replaying that data has become a popular benchmark for CRDTs.

Insert ~260k operationsTiming (ms)Memory (bytes)
Automerge 0.14~500,000~1,100,000,000
Automerge 1.0.113,052184,721,408
Automerge 2.0.11,81644,523,520
Yjs1,07410,141,696
Automerge 2.0.2-unstable66122,953,984

Of course, even the most productive authors struggle to type an entire paper quite so quickly. Indeed, writing a paper can occur over months or even years, making both storage size on disk and load performance important as well.

Size on Diskbytes
plain text107,121
automerge 2.0129,062
automerge 0.14146,406,415

The binary format works wonders in this example, encoding a full history for the document with only 30% overhead. That's less than one additional byte per character! The naive JSON encoding often used circa automerge 0.14 could exceed 1,300 bytes per character. If you'd like to learn more about the file format, we have a specification document.

Load ~260k operationsTiming (ms)
Automerge 1.0.1590
Automerge 2.0.1593
Automerge 2.0.2-unstable438

Loading the compressed document is fast as well, ensuring the best possible start-up time.

While we are proud of these results, we will continue to invest in improved performance with each release as you can see with the preliminary numbers for the upcoming Automerge 2.0.2 release.

A few notes about methodology before we move on. The particular implementation we used to run the benchmarks can be found here. These numbers were produced on Ryzen 9 7900X. The "timing" column is how long it takes to apply every single edit in the trace, whilst the "memory" common is the peak memory usage during this process.

The improvements found in "2.0.2-unstable" mostly result from an upcoming improved API for text. Also note that the "automerge 1.0.1" here is actually the automerge@1.0.1-preview-7 release. Automerge 1.0.1 was a significant rewrite from 0.14 and has a similar architecture to the Rust implementation. Improvements between 1.0.1 and 2.0.1 are a result of both optimization and adopting WebAssembly rather than an architectural change.

Portability & Mobile Devices

Because the core logic of Automerge is now built in Rust, we're able to port it more easily to a wide variety of environments and bind it to almost any language. We have users today who directly build on Automerge using the Rust APIs (and the helpful autosurgeon library). We also have a C-bindings API designed and contributed by Jason Kankiewicz, and are excited to see the automerge-go implementation underway by Conrad Irwin.

In the future, we hope to provide bindings for other languages including Swift, Kotlin, and Python. If you're interested in getting involved in those projects please let us know!

One important note is that React-Native does not support WASM today. Developers building mobile applications will need to bind directly via C. If you're interested in either working on or sponsoring work on this problem, feel free to get in touch.

What’s Next

With the release of Automerge 2.0 out the door, we will of course be listening closely to the community about their experience with the release, but in the months ahead, we expect to work on at least some of the following features:

Native Rich Text Support

As with most CRDTs, Automerge originally focused on optimizing editing of plaintext. In the Peritext paper by Ink & Switch we discuss an algorithm for supporting rich text with good merging accuracy, and we are planning to integrate this algorithm into Automerge. Support for rich text will also make it easier to implement features like comments or cursor and selection sharing.

Automerge-Repo

We’ve worked hard to keep Automerge platform-agnostic and support a wide variety of deployment environments. We don’t require a particular network stack or storage system, and Automerge has been used successfully in, client-server web applications, peer-to-peer desktop software, and as a data synchronization engine for cloud services. Unfortunately, excluding network and storage from the library has left a lot of the busy-work up to application developers, and asked them to learn a lot about distributed systems just to get started.

Our new library, Automerge-Repo, is a modular batteries-included approach to building web applications with Automerge. It works both in the browser (desktop and mobile) and in Node, and supports a variety of networking and storage adapters. There are even text editor bindings for Quill and Prosemirror as well as React Hooks to make it easy to get started quickly.

It's under active development, and available in beta right now. We'll talk more about it when we announce GA, but if you're starting a browser-based application now, it's probably the right place to start.

Rust Developer Experience Improvements

We've seen tremendous enthusiasm for the native Rust experience of Automerge, and the current Rust API is powerful and fast. Unfortunately, it's also low-level and can be difficult to work with directly. To make building Rust applications against automerge easier, Alex built Autosurgeon, a library that helps bind Rust data structures to Automerge documents, and we'll continue to listen to our Rust users and improve on that experience.

Improved Synchronization

Automerge's current synchronization system has some great properties. In many cases it can bring two clients up to date with only a single round-trip each direction. That said, we see big potential to improve the CPU performance of this process, and also lots of opportunity to improve sync performance of many documents at once. We also expect to provide other optimizations our users and sponsors have requested, such as more efficient first-document loading, network compaction of related changes, and enabling something akin to a Git “shallow clone” for clients which don't need historical data.

Built-in Branches

While we retain the full history of Automerge documents and provide APIs to access it, we don’t currently provide an efficient way to reconcile many closely related versions of a given document. This feature is particularly valuable for supporting offline collaboration in professional environments and (combined with Rich Text Support) should make it much easier for our friends in journalism organizations to build powerful and accurate editing tools.

History Management

Today the best way to remove something from an Automerge document's history is to recreate the document from scratch or to reset to a time before that change went in. In the future, we plan to provide additional tools to give developers more control over document history. We expect this to include the ability to share just the latest version of a document (similar to a shallow clone in git), and to share updates that bypass changes you don't want to share (as when a developer squashes commits before publishing).

Conclusion

Automerge 2.0 is here, it’s ready for you, and we’re tremendously excited to share it with you. We’ve made Automerge faster, more memory efficient, and we’re bringing it to more platforms than ever. We’re adding features, making it easier to adopt, and have begun growing a team to support it. There has never been a better moment to start building local-first software: why not give it a try, and please feel welcome to join us in the Automerge Slack, too.

caution

A note to existing users: Automerge 2.0 is found on npm at @automerge/automerge. We have deprecated the automerge package.

Welcome

· One min read
Rae McKelvey
Contributor

You've reached the Automerge docs! We're so happy to have you.

We're using Docusaurus. Please help edit the docs on GitHub.