🚧 This documentation is for the developer preview of m-ld.

documentation

Architecture

m-ld is a decentralised live data sharing component with a JSON-based API.

m-ld is used by including it in a software application (an 'app'). The app gains the power to share live data with other copies of itself, without any need to store or coordinate the data centrally in a database, or otherwise cope with the hard problem of keeping distributed data up to date. A bit like Google docs, but for any structured data specific to the app: designs, bookings, observations, plans or anything else; and also, without the need to entrust the data to a third party.

The set of data being shared is called a 'domain' (note the definitions of terms in the side-bar), for example one design, booking or plan. The data in the domain is copied or 'replicated' to wherever the app is, and after that, all the copies are kept synchronised with each other by m-ld.

Each physical copy is called a 'clone'. In the picture, the "browser", "microservice" and Australia are just possible environments for a clone – and there could be any number of clones, from a handful to hundreds. In reality, most of the time, most of the environments will be of the same type, like on a mobile device, but there is no fundamental need for this. A clone can be deployed with its host app on any platform that has a network connection and for which a clone 'engine' exists.

clone environments

All clones of the data can accept reads and writes without waiting for other clones to agree (consensus- and lock-free). Atomic read and write transactions are done locally using a JSON API. Communication between clones is via a pluggable messaging layer, with implementations provided.

The data may at any moment differ slightly between clones, but in the absence of any writes, and with a live connection, then all clones will 'converge' on an identical set of data, as fast as the network can carry the latest changes.

The 'network' is the medium through which messages travel from one clone to another. Clones treat the messaging network as a resource, and the core of m-ld requires to be able to publish messages to all online clones, and sometimes to address one other clone directly. This can be done using a number of possible technologies, and it's possible to add new ones by creating an adapter. At the moment, m-ld has adapters for MQTT and Ably. See below for more details.

To guarantee the data is safe from accidental loss, at least one clone in a domain should use reliable storage, or else enough clones must exist so that statistically, there is always someone with the data. Apps can choose to use as many domains as they need, to partition all the data that they want to share.

To keep the data safe from attackers, the app needs to ensure that the network and local storage (if used) can only be accessed by legitimate users. It does this by securing these resources before giving them to the local m-ld clone for its use. More details on this and other aspects of security can be found below.

m-ld can be used in many different system architectures, including fully decentralised systems having no back-end servers, local-first apps supporting offline use, and enterprise and cloud architectures having back-end databases or other centralised storage. For a summary of the motivations behind m-ld, read our Manifesto for Data.

🚧 Coming soon: example deployment diagrams for a selection of architectures. Let us know what your app looks like.

View the implementation-independent technical specification of m-ld on the specification portal.

Use-Cases

Here are just some of the uses for m-ld. If you have a use-case, please do let us know.

Apps

App data rarely exists in isolation on a single device, but in-house collaboration solutions are expensive and take years to get right. m-ld can be a solution to data sharing in collaborative apps. It can also be incorporated into architectures with existing database components, to support live data editing features.

Robotics

Robotics and AI are enabling labour-saving opportunities in industry. Fully centralised control of such devices can be impractical as decisions must be made based on data with a high degree of liveness. m-ld can be used to support a 'shared-consciousness' model of derived knowledge, in which autonomous units are able to keep each other instantly appraised of changes in their environment and to their common objectives.

Edge Computing

An approach to managing the deluge of data from traditional and IoT devices is to localise data processing to the 'edge', a network of compute nodes that are outside of dedicated data centres on the cloud. m-ld's decentralised approach is ideal for continuous maintenance of data state on unreliable or temporary compute nodes, while guaranteeing data consistency.

Personal Data

Consumer concerns about personal data include increasing awareness of data ownership. m-ld helps consumers to keep control over their data. This is because apps that use m-ld can provide useful features without having to always store the data in a single, central location controlled only by the app provider. Instead, they can allow the user to choose where data is stored; and can easily move it from place to place on their request.

Decentralisation

m-ld clones are low-cost replicates and can be created and destroyed with impunity. They are "cattle, not pets". This is unlike the traditional database approach of carefully looking after individual database instances.

Decentralisation in m-ld is technically enabled by the low cost of clone replication. The natural deployment pattern is to keep the clone engine close to the app, usually in-process, and so there is no need for a central database. In many deployments it may be desireable to have some clones on servers, to ensure data safety – but these are more like backups.

Resilience

A domain with sufficient clones is resilient to infrastructure failures. In this respect and others, clones are similar to microservices.

The permanent loss of an individual clone is typically inconsequential to the data safety of the domain. Updates are published and received by clones on a continuous basis, interruptible only by network unavailability.

Authority

In the absence of necessary centralisation, an app is at liberty (and has a responsibility) to decide an authority model that works best for the domain. m-ld is generally agnostic to who owns data and decides data correctness. However, it does collaborate with the app to secure the data against unauthorised access.

Realisation

Decentralisation is realised by a foundational protocol for data sharing which is:

  • convergent: all clones will eventually have the same data.
  • automatic: the app never needs to request or react to synchronisation with the domain (it does need to react to individual data changes, but this is irrespective of where they arise).
  • efficient: a new clone or a re-starting clone is able to quickly rev-up to equivalence with other live clones. Furthermore, it can accept updates before this process completes.

More detail on data sharing is found in the concurrency section.

Structured Data

In principle, a m-ld domain contains structured data comprised of linked 'subjects' having properties of interest.

This model is fundamentally a graph of data, and is represented in the app programming interface (API) as JSON.

m-ld is schema-less in the sense that schema is not a first-class citizen (as it is in a relational model, for example). However, any schema or ontology can be embedded in the data, and enforced by the app.

These principles are realised by the internal adoption of the Resource Description Framework (RDF) as a data representation. Because of the consistent JSON syntax and the use of some sane defaults, an app does not typically need to be aware of the RDF; but it helps with interoperability, and ensures that future developments (for example, to supported data types) do not break compatibility.

These choices achieve an optimum of flexibility and familiarity for a broad selection of use-cases, while supporting the convergence guarantees.

Realtime

In principle, a m-ld clone contains realtime domain data. Publication and receipt of updates is continuous and automatic. A clone may be offline, temporarily or for an extended period, but this is not logically distinct from extraordinary network latency.

Another way to look at this, is to say that the domain is ephemeral and only exists as some hypothetical convergent state of all the clones. A clone can only ever be "up-to-date" with respect to another clone.

This means that an app does not in principle need to behave differently based on the currency of the domain data presented – for example, once initialised, a clone can always accept data writes.

In practice, the clone API does support an indication of status, including online/offline, to allow for special behaviours such as user notifications.

Concurrency

Any clone operation may be occurring concurrently with operations on other clones. Transaction operations combine to realise the final convergent state of every clone. This is unlike many existing data management technologies, which require serialisable transactions and so frequently, various locking strategies to prevent concurrent edits.

A helpful way to think about this, is to consider any gathering of human beings with an information goal, like a conference. At any moment, each participant could be receiving information or formulating information, or frequently, both. There could be multiple channels along which information flows. But ideally, everyone will eventually converge on the same set of information. m-ld applies this model to the information stored in clones – with the enhancement that the eventual convergence is guaranteed.

In this model, it is necessary to make a distinction between data consistency and integrity.

"Consistency" is meant slightly more broadly than the 'C' in the CAP theorem conceived by Eric Brewer in 2000.

m-ld guarantees eventual consistency. In the absence of updates and network partitions, all clones will report the same data (they will converge).

Another viewpoint is to imagine a hypothetical single data store for the domain. If no new updates are made, eventually all clones' responses to any query would be indistinguishable from the responses from this store.

"Integrity" is meant as adherence of the domain data to a set of semantic rules like:

  • This property has one and only one value
  • This property is of a specific data type
  • This property refers to some other entity which exists

In a programming language, you might find these rules expressed in the type system. In a relational database, the rules are called "constraints".

In m-ld, integrity is a collaboration between the domain, engine, app and user. m-ld supports constraints (see below), but first it is important to consider how integrity of this kind applies to a collaborative system and to your use-case.

App Integrity

Like most apps, those using m-ld will validate inputs. The validation rules are in the app's code, and are the app's responsibility. Most of the time, these rules will capture constraint violations before they reach the data domain.

However, sometimes a rule will be violated not by the immediate operation, but by an operation done on a different clone, which 'disagrees' in some way according to the natural semantics of the domain. This is a "conflict".

The likelihood of conflicts depends greatly on the domain. For example, an application that records some observed facts about the world may very rarely give rise to conflicts.

Unlike some client-server systems, a m-ld app is constantly notified of updates to the domain data. This gives the app the opportunity to handle conflicts when they arise. Strategies that could be applied by the app fall into the following categories:

  • Temporarily Ignore (recommended). In (surprisingly) many cases, human users will naturally correct conflicting values without significant impedance to their workflow. This depends on the user interface presenting the conflict; again, this may be easier than it sounds, for example by concatenating conflicting strings with a line break in the user interface. Ignored rule violations that are not immediately noticed by the user can be captured later, for example in the next user workflow step, or even by a housekeeping process which applies a procedural fix.
  • Procedure. A conflict may manifest as a nonsensical or ambiguous state for which it is possible to automaticaly make a correction or decide an outcome. The correction can be implemented in the app code. However, since each app instance may at any time 'see' a different state, these corrections can potentially compete with each other, so care must be taken not to create a cascade of competing updates.
  • Consensus. A conflict resolution may require out-of-band collaboration between instances of the app. Human to human interaction could be considered in this category, but other cases may require an automatic consensus protocol. Such a protocol is out of the scope of m-ld, but its result can be applied to the domain via any clone using a normal transaction.

Constraints

The m-ld specification defines a set of declarative constraints that can be applied to a domain. Unlike app-based integrity rules, these do not require that the app recognise and handle rule violations – the app is able to rely on the data it perceives always being compliant with the rule.

🚧 Inclusion of declarative integrity constraints in m-ld is an experimental feature, and the subject of active research. The available constraints and the means by which they are declared for a domain is likely to change. Please do contact us with your requirements.

Multi-Platform

The m-ld 'protocol' is specified without reference to a platform. This protocol comprises the scheme of messages and signals by which m-ld clones talk to each other, how apps talk to their clones, and the logical graph representation.

This allows a clone engine to be implemented on any compute platform, as required by use-cases. See the available Platforms.

Security

The m-ld protocol is designed to ensure that an app using a clone engine can be secured against threats to the shared data. This is necessarily a collaboration with the app implementation, since the app must be allowed privileged access to the data in order to function.

Threats

The threats that must be controlled in collaboration with a clone engine are a subset of the threats to the app. These include threats to:

  • Confidentiality: reading of data to steal information
  • Integrity: modification of data to mislead information consumers
  • Availability: prevention of normal function for sabotage or coercion

The attack surface of an engine generally comprises:

  • The local device storage being used by the engine
  • The network between the engine and other clones
  • The clone API presented to the app

Trust Model

A decentralised data store does not have a privileged, trusted central authority (although the trust model of such a central authority may never have been as straightforward as it seemed, as evidenced by damaging data leakages by such authorities). Since an app must be free to decide its own security model, and since the m-ld engine is not itself afforded any special privilege because of its deployment, an engine trusts the app in principle. The consequences of this are explained in the sections below.

Authentication & Authorisation

It is the app's responsibility to authenticate and authorise its users. For the reason above, and unlike some centralised data management systems, m-ld does not have a first-class 'user' concept with special semantics. This includes any notion of credentials, such as passwords. (Note that this does not prevent an app from storing user information in m-ld, so long as it has suitable access controls.)

This means an app is free to authenticate its users by any chosen means, such as device-native login, or using a third-party single sign-on system. The app should gate access, using the authentication, to its functions which access the m-ld engine.

🚧 In future, m-ld will allow an app to negotiate a 'local key' credential that the engine can use to:

  • confirm the identity of the app instance, for example after a re-start
  • selectively encrypt data in storage and on the network (see below)
  • identify and suppress malware (see below)

A specification document for future security features will shortly be available in this portal. Please feed-back any specific concerns you have.

Auditing & Non-Repudiation

Once a user is authorised to the application, it may be important to record their activity, as well as that of any other system actor such as a bot, in tamper-proof way, for later auditing. This can generally be achieved with the use of audit stamps (time & user) on updates. If necessary, these stamps could be digitally signed by the app.

Clones maintain a 'journal' of updates, so audit data of this kind is effectively distributed in the domain. However, the journal is subject to truncation based on a clone-internal strategy for managing storage. To ensure long-term archival, an app-specific strategy can be adopted to stream update events to some other storage.

🚧 The clone journal is currently an internal feature with no API access. Continuous updates are available via the follow API.

Furthermore, m-ld has been designed from the outset to be able to natively track app/user activity in a cryptographically-verifiable way. Details will be included in the forthcoming security specification.

Storage & Network

A m-ld engine may use storage to automatically persist data between and during app sessions (depending on its documented transaction guarantees). Since this will frequently be local to the device, the storage could be vulnerable to attack on a side channel, such as direct access through the local operating system.

Similarly, the engine uses the network to communicate updates. This happens automatically in principle. The network has an attack surface comprising:

  • Any used network layer components between app instances, including
  • the local device's operating system network drivers, and
  • any third-party message brokers or realtime providers.

In principle, it is the app's responsibility to ensure that the storage and network are secure. In practice, this means that an engine always requires the storage and network handles from the app, usually prior to initialisation (in some platform-specific format). The app is then able to prevent unauthorised access in the same way it would for any other use of a storage or network resource.

Typical app controls will include encryption of data at rest and on the wire. This has the advantage that it prevents unauthorised access and tampering by any device without credentials.

🚧 The app's ownership of storage and network handles could be combined with its authentication mechanism to control access per user. For example, a local user not being authorised to see some data belonging to another user. However, this approach requires the app to have knowledge of the engine's storage data format, and the m-ld protocol's data format. These may not be easy to manipulate. It also requires the app instance to have privileges above that of the local user. On some devices this may not be possible.

We are working on an entension to the m-ld protocol that will support automatic application of selective data encryption. Details will be included in the forthcoming security specification.

Malware

In common with other decentralised technologies, in principle m-ld has no central data gatekeeper with a controlled implementation.

As noted above, when using m-ld, an app is responsible for ensuring that the network used by the engine is secure. This extends to ensuring that it is only available to authorised users.

However in this model it is still possible for a legitimate authorised user to be deceived into entering their credentials into a counterfeit app, and by extension, a counterfeit clone engine. This malware app could then have privileged access to domain data, both for read and write.

It is therefore critical that the app is protected from malware at the level of the compute platform.

🚧 This vulnerability can arise in distributed computing of any kind (except in a trusted compute platform, and that has problems of its own). But once it has arisen, identifying and suppressing malware requires coordination among the peers of a decentralised system.

We are working on an entension to the m-ld protocol that will support early identification and suppression of malware and suspicious activity. Details will be included in the forthcoming security specification.

Trade-Offs

While adhering to the principles above and the m-ld specification, engines may offer differing quality of service, balancing non-functional considerations. These might also be affected by configuration options.

In all cases, engine documentation will provide the necessary details.

Query Patterns

An engine may support a subset of the API query syntax, perhaps because of limitations of the platform storage, but also driven by optimisation for the queries of typical apps on that platform.

Threading

All operations on m-ld data are inherently asychronous and therefore assumed to be concurrent with other operations. Each engine maps this model onto the threading model of the platform, balancing the soonest return of control to the app against the complexity of handling out-of-band errors.

Performance

Engines may balance performance of some operations against others, as well as against other considerations. For example, a clone whose storage is entirely in-memory will offer the fastest transaction performance, but must re-cache all of its data from the domain on start-up. Or there could be an option to periodically flush the memory data to disk – and so on.

Scalability

Every clone logically provides access to all the data in the domain. In most engine implementations, this means that all the data must be stored locally – as this will also provide the fastest access and best data safety.

🚧 Clones which do not store all the data locally are the subject of active research. A proposal document for this feature will shortly be available in this portal. Please feed-back any specific concerns you have.

Platforms

m-ld engines are available or planned for the following platforms.

  • Javascript: For node.js, modern browsers and other Javascript engines. Typescript is supported and recommended.

  • Java (🚧 coming soon): For Java environments, typically servers.

  • Docker (🚧 coming soon): For microservice, serverless and server environments.

  • .NET (please like if required): For mobile, native client and server environments using .NET.

  • Python (please like if required): For scripting, data science and general computing environments.

  • What do you need?: Please create a ticket for the platform you need.

Messaging

m-ld engines generally require the reliable publication of messages to the domain. Logically this is part of the 'network' infrastructure and abstracted in the m-ld specification.

An app provides a network messaging service to the clone, via an adapter. This allows the app to choose an appropriate messaging service for its requirements and architecture, and also to secure access to the service prior to passing it to the clone.

Message-layer adapters are currently specified for:

  • MQTT: a machine-to-machine (M2M)/"Internet of Things" connectivity protocol. Convenient to use for local development or if the deployment environment has an MQTT broker available.
  • Ably: provides infrastructure and APIs to power realtime experiences at scale. It is a managed service, and includes pay-as-you-go developer pricing. It is also convenient to use for global deployments without the need to self-manage a broker.

Ably is similar to other cloud message-publishing services such as AWS SNS and Azure Service Bus, as well as other popular technologies like RabbitMQ and Apache Kafka. All of these would be suitable services for m-ld messaging.

m-ld can also work with a fully peer-to-peer messaging system, to realise complete architecture decentralisation in next-generation internet apps.

🚧 Please let us know if you would like to use any of these options in your system architecture. We would be delighted to work with you to make best use of your infrastructure commitments.

Check the clone engine documentation for its supported message layer and configuration details.