🚧 This documentation is for the developer preview of m-ld.
m-ld is a decentralised live data sharing component with a JSON-based API.
m-ld is used by including it in a software application (an 'app'). The app gains the power to share live data with other copies of itself, without any need to store or coordinate the data centrally in a database, or otherwise cope with the hard problem of keeping distributed data up to date. A bit like Google docs, but for any structured data specific to the app: designs, bookings, observations, plans or anything else; and also, without the need to entrust the data to a third party.
The set of data being shared is called a 'domain' (note the definitions of terms in the side-bar), for example one design, booking or plan. The data in the domain is copied or 'replicated' to wherever the app is, and after that, all the copies are kept synchronised with each other by m-ld.
Each physical copy is called a 'clone'. In the picture, the "browser", "microservice" and Australia are just possible environments for a clone – and there could be any number of clones, from a handful to hundreds. In reality, most of the time, most of the environments will be of the same type, like on a mobile device, but there is no fundamental need for this. A clone can be deployed with its host app on any platform that has a network connection and for which a clone 'engine' exists.
All clones of the data can accept reads and writes without waiting for other clones to agree (consensus- and lock-free). Atomic read and write transactions are done locally using a JSON API. Communication between clones is via a pluggable messaging layer, with implementations provided.
The data may at any moment differ slightly between clones, but in the absence of any writes, and with a live connection, then all clones will 'converge' on an identical set of data, as fast as the network can carry the latest changes.
The 'network' is the medium through which messages travel from one clone to another. Clones treat the messaging network as a resource, and the core of m-ld requires to be able to publish messages to all online clones, and sometimes to address one other clone directly. This can be done using a number of possible technologies, and it's possible to add new ones by creating an adapter. At the moment, m-ld has adapters for MQTT and Ably. See below for more details.
To guarantee the data is safe from accidental loss, at least one clone in a domain should use reliable storage, or else enough clones must exist so that statistically, there is always someone with the data. Apps can choose to use as many domains as they need, to partition all the data that they want to share.
To keep the data safe from attackers, the app needs to ensure that the network and local storage (if used) can only be accessed by legitimate users. It does this by securing these resources before giving them to the local m-ld clone for its use. More details on this and other aspects of security can be found below.
m-ld can be used in many different system architectures, including fully decentralised systems having no back-end servers, local-first apps supporting offline use, and enterprise and cloud architectures having back-end databases or other centralised storage. For a summary of the motivations behind m-ld, read our Manifesto for Data.
🚧 Coming soon: example deployment diagrams for a selection of architectures. Let us know what your app looks like.
View the implementation-independent technical specification of m-ld on the specification portal.
Here are just some of the uses for m-ld. If you have a use-case, please do let us know.
App data rarely exists in isolation on a single device, but in-house collaboration solutions are expensive and take years to get right. m-ld can be a solution to data sharing in collaborative apps. It can also be incorporated into architectures with existing database components, to support live data editing features.
Robotics and AI are enabling labour-saving opportunities in industry. Fully centralised control of such devices can be impractical as decisions must be made based on data with a high degree of liveness. m-ld can be used to support a 'shared-consciousness' model of derived knowledge, in which autonomous units are able to keep each other instantly appraised of changes in their environment and to their common objectives.
An approach to managing the deluge of data from traditional and IoT devices is to localise data processing to the 'edge', a network of compute nodes that are outside of dedicated data centres on the cloud. m-ld's decentralised approach is ideal for continuous maintenance of data state on unreliable or temporary compute nodes, while guaranteeing data consistency.
Consumer concerns about personal data include increasing awareness of data ownership. m-ld helps consumers to keep control over their data. This is because apps that use m-ld can provide useful features without having to always store the data in a single, central location controlled only by the app provider. Instead, they can allow the user to choose where data is stored; and can easily move it from place to place on their request.
m-ld clones are low-cost replicates and can be created and destroyed with impunity. They are "cattle, not pets". This is unlike the traditional database approach of carefully looking after individual database instances.
Decentralisation in m-ld is technically enabled by the low cost of clone replication. The natural deployment pattern is to keep the clone engine close to the app, usually in-process, and so there is no need for a central database. In many deployments it may be desireable to have some clones on servers, to ensure data safety – but these are more like backups.
A domain with sufficient clones is resilient to infrastructure failures. In this respect and others, clones are similar to microservices.
The permanent loss of an individual clone is typically inconsequential to the data safety of the domain. Updates are published and received by clones on a continuous basis, interruptible only by network unavailability.
In the absence of necessary centralisation, an app is at liberty (and has a responsibility) to decide an authority model that works best for the domain. m-ld is generally agnostic to who owns data and decides data correctness. However, it does collaborate with the app to secure the data against unauthorised access.
Decentralisation is realised by a foundational protocol for data sharing which is:
- convergent: all clones will eventually have the same data.
- automatic: the app never needs to request or react to synchronisation with the domain (it does need to react to individual data changes, but this is irrespective of where they arise).
- efficient: a new clone or a re-starting clone is able to quickly rev-up to equivalence with other live clones. Furthermore, it can accept updates before this process completes.
More detail on data sharing is found in the concurrency section.
In principle, a m-ld domain contains structured data comprised of linked 'subjects' having properties of interest.
This model is fundamentally a graph of data, and is represented in the app programming interface (API) as JSON.
m-ld is schema-less in the sense that schema is not a first-class citizen (as it is in a relational model, for example). However, any schema or ontology can be embedded in the data, and enforced by the app.
These principles are realised by the internal adoption of the Resource Description Framework (RDF) as a data representation. Because of the consistent JSON syntax and the use of some sane defaults, an app does not typically need to be aware of the RDF; but it helps with interoperability, and ensures that future developments (for example, to supported data types) do not break compatibility.
These choices achieve an optimum of flexibility and familiarity for a broad selection of use-cases, while supporting the convergence guarantees.
In principle, a m-ld clone contains realtime domain data. Publication and receipt of updates is continuous and automatic. A clone may be offline, temporarily or for an extended period, but this is not logically distinct from extraordinary network latency.
Another way to look at this, is to say that the domain is ephemeral and only exists as some hypothetical convergent state of all the clones. A clone can only ever be "up-to-date" with respect to another clone.
This means that an app does not in principle need to behave differently based on the currency of the domain data presented – for example, once initialised, a clone can always accept data writes.
In practice, the clone API does support an indication of status, including online/offline, to allow for special behaviours such as user notifications.
Any clone operation may be occurring concurrently with operations on other clones. Transaction operations combine to realise the final convergent state of every clone. This is unlike many existing data management technologies, which require serialisable transactions and so frequently, various locking strategies to prevent concurrent edits.
A helpful way to think about this, is to consider any gathering of human beings with an information goal, like a conference. At any moment, each participant could be receiving information or formulating information, or frequently, both. There could be multiple channels along which information flows. But ideally, everyone will eventually converge on the same set of information. m-ld applies this model to the information stored in clones – with the enhancement that the eventual convergence is guaranteed.
In this model, it is necessary to make a distinction between data consistency and integrity.
"Consistency" is meant slightly more broadly than the 'C' in the CAP theorem conceived by Eric Brewer in 2000.
m-ld guarantees eventual consistency. In the absence of updates and network partitions, all clones will report the same data (they will converge).
Another viewpoint is to imagine a hypothetical single data store for the domain. If no new updates are made, eventually all clones' responses to any query would be indistinguishable from the responses from this store.
"Integrity" is meant as adherence of the domain data to a set of semantic rules like:
- This property has one and only one value
- This property is of a specific data type
- This property refers to some other entity which exists
In a programming language, you might find these rules expressed in the type system. In a relational database, the rules are the schema and constraints.
In m-ld, integrity is a collaboration between the domain, engine, app and user. m-ld supports constraints (see below), but first it is important to consider how integrity of this kind applies to a collaborative system and to your use-case.
Like most apps, those using m-ld will validate inputs. The validation rules are in the app's code, and are the app's responsibility. Most of the time, these rules will capture constraint violations before they reach the data domain.
However, sometimes a rule will be violated not by the immediate operation, but by an operation done on a different clone, which 'disagrees' in some way according to the natural semantics of the domain. This is a "conflict".
The likelihood of conflicts depends greatly on the domain. For example, an application that records some observed facts about the world may very rarely give rise to conflicts.
Unlike some client-server systems, a m-ld app is constantly notified of updates to the domain data. This gives the app the opportunity to handle conflicts when they arise. Strategies that could be applied by the app fall into the following categories:
- Temporarily Ignore (recommended). In (surprisingly) many cases, human users will naturally correct conflicting values without significant impedance to their workflow. This depends on the user interface presenting the conflict; again, this may be easier than it sounds, for example by concatenating conflicting strings with a line break in the user interface. Ignored rule violations that are not immediately noticed by the user can be captured later, for example in the next user workflow step, or even by a housekeeping process which applies a procedural fix.
- Procedure. A conflict may manifest as a nonsensical or ambiguous state for which it is possible to automatically make a correction or decide an outcome. The correction can be implemented in the app code. However, since each app instance may at any time 'see' a different state, these corrections can potentially compete with each other, so care must be taken not to create a cascade of competing updates.
- Consensus. A conflict resolution may require out-of-band collaboration between instances of the app. Human to human interaction could be considered in this category, but other cases may require an automatic consensus protocol. Such a protocol is out of the scope of m-ld, but its result can be applied to the domain via any clone using a normal transaction.
A m-ld domain can be configured with constraints, which are rules that apply to both local and remote transactions. Unlike app-based integrity rules, these do not require that the app recognise and handle rule violations – the app is able to rely on the data it perceives always being compliant with the rule. For example:
- The value of a property always has a specific type, like a number or a string
- The value of a property always has exactly one value
Constraints can also be used to describe and enforce arbitrarily complex rules on transactions, for example:
- Some data must only be changed by someone authorised (see security), or by consensus
- Certain inputs are normalised to other structures before being committed to the data
- Additional data is inferred from input data
Constraints are an extension point, allowing apps to choose a pre-existing implementation or to create their own. The implementation of constraints may require handling of concurrency edge-cases to ensure that convergence is still guaranteed.
For details of the available constraints and how to configure them, consult the engine documentation for the platform you are using.
Some state changes in a distributed system require the agreement of system participants. This is associated with the potential for a user, who is somehow unaware of the agreement (for example, by being offline), to have their changes declared void – that is, entirely revoked. For example:
- Changes concurrent with a data schema change – the changed state may not be compatible with the new schema
- Changes concurrent with access control changes – the concurrent change might not be have been allowed
An agreement is a coordinated change of state, as distinct from an inherently uncoordinated conflict-free change. An agreement is binding on all participants, but its coordination may involve any subset of them. Examples of coordination schemes include:
- consensus algorithms like Paxos or Raft
- decentralised consensus mechanisms like proof-of-stake
- unilateral agreement by one participant who has "authority"
m-ld is designed for Decentralised Extensibility. This is a property of systems that permit and support any interested party to develop an extension, and new extensions can be added without permission from a central authority.
You can choose which extensions to use in an app; some may be bundled in a platform engine package, others you can write yourself.
Some extensions must be pre-selected by the app in order to connect a new clone to a domain of information. Other extensions can be declared in the data and loaded dynamically by the engine. This allows apps to adapt their behaviour at runtime.
m-ld defines a number of extension points:
- Messaging is usually pre-selected by the app.
- Constraints (see above) define integrity rules on the domain's data.
- Transport Security allows an app to protect m-ld network traffic by encrypting and/or digitally signing it.
- Agreement Conditions assert necessary preconditions for an agreement.
The m-ld 'protocol' is specified without reference to a platform. This protocol comprises the scheme of messages and signals by which m-ld clones talk to each other, how apps talk to their clones, and the logical graph representation.
This allows a clone engine to be implemented on any compute platform, as required by use-cases. See the available Platforms.
The m-ld protocol is designed to ensure that an app using a clone engine can be secured against threats to the shared data. This is necessarily a collaboration with the app implementation, since the app must be allowed privileged access to the data in order to function.
The threats that must be controlled in collaboration with a clone engine are a subset of the threats to the app. These include threats to Confidentiality, Integrity and Availability.
The attack surface of an engine generally comprises:
- The local device storage being used by the engine
- The network between the engine and other clones
- The clone API presented to the app
An app using m-ld is free to decide its own security model. Because the m-ld engine is embedded in the app, it does not have any special privilege because of its deployment. In particular, a m-ld domain does not inherently have any privileged, trusted central authority. However, an app may choose to deploy m-ld clones to its own trusted service or data tier.
It is the app's responsibility to authenticate its users by any chosen method, such as device-native login, or using a third-party single sign-on system, in order to gate access to its functions which access the m-ld engine.
Data is transmitted between clones using a choice of messaging provider. Since this data is at risk from network attacks, the messaging system itself should be authenticated, either with the user credentials or some token obtained with them.
An example app login behaviour:
- Redirect the user to login via an identity provider
- Retrieve a signed token from identity provider
- Connect to the messaging system using the token
- Initialise the m-ld clone with the messaging system connection
In many apps, authenticated users will have read/write access to the domain as a whole. The app controls which domains the user can select from and connect a clone to. To prevent unauthorised access to data-in-transit from other domains, it is generally necessary to control access to the channels of the messaging system in use.
Fine-grained write access control within a single domain can be achieved using constraints.
Auditing & Non-Repudiation
Once a user is authorised to the application, it may be important to record their activity, as well as that of any other system actor such as a bot, in a tamper-proof way, for later auditing. In common with other systems, it is usually most efficient to use a dedicated system component for this. In an app using m-ld, a clone of the data can be located with the audit logging component.
It is possible to use the Transport Security extension point to provide assurance of user identity to the audit logging system, by means of digital signatures.
Storage & Network
A m-ld engine may use storage to automatically persist data between and during app sessions (depending on its documented transaction guarantees). Since this will frequently be local to the device, the storage could be vulnerable to attack on a side channel, such as direct access through the local operating system.
Similarly, the engine uses the network to communicate updates. This happens automatically in principle. The network has an attack surface comprising:
- Any used network layer components between app instances, including
- the local device's operating system network drivers, and
- any third-party message brokers or realtime providers.
In principle, it is the app's responsibility to ensure that the storage and network are secure. In practice, this means that an engine always requires the storage and network handles from the app, usually prior to initialisation (in some platform-specific format). The app is then able to prevent unauthorised access in the same way it would for any other use of a storage or network resource.
Typical app controls will include encryption of data at rest and on the wire. This has the advantage that it prevents unauthorised access and tampering by any device without credentials.
In common with other decentralised technologies, in principle m-ld has no central data gatekeeper with a controlled implementation.
As noted above, when using m-ld, an app is responsible for ensuring that the network used by the engine is secure. This extends to ensuring that it is only available to authorised users.
However in this model it is still possible for a legitimate authorised user to be deceived into entering their credentials into a counterfeit app, and by extension, a counterfeit clone engine. This malware app could then have privileged access to domain data, both for read and write.
It is therefore critical that the app is protected from malware at the level of the compute platform.
While adhering to the principles above and the m-ld specification, engines may offer differing quality of service, balancing non-functional considerations. These might also be affected by configuration options.
In all cases, engine documentation will provide the necessary details.
An engine may support a subset of the API query syntax, perhaps because of limitations of the platform storage, but also driven by optimisation for the queries of typical apps on that platform.
All operations on m-ld data are inherently asychronous and therefore assumed to be concurrent with other operations. Each engine maps this model onto the threading model of the platform, balancing the soonest return of control to the app against the complexity of handling out-of-band errors.
Engines may balance performance of some operations against others, as well as against other considerations. For example, a clone whose storage is entirely in-memory will offer the fastest transaction performance, but must re-cache all of its data from the domain on start-up. Or there could be an option to periodically flush the memory data to disk – and so on.
Every clone logically provides access to all the data in the domain. In most engine implementations, this means that all the data must be stored locally – as this will also provide the fastest access and best data safety.
🚧 Clones which do not store all the data locally are the subject of active research. A proposal document for this feature will shortly be available in this portal. Please feed-back any specific concerns you have.
m-ld engines are available or planned for the following platforms.
Docker (🚧 coming soon): For microservice, serverless and server environments.
Python (please like if required): For scripting, data science and general computing environments.
Rust (please like if required): For cross-platform live data - endpoint devices and servers.
Swift (please like if required): For peer-to-peer live data among iOS, iPadOS and MacOS devices.
Java (please like if required): For Java environments, typically servers.
.NET (please like if required): For mobile, native client and server environments using .NET.
What do you need?: Talk to us about the platform you need.
m-ld engines generally require the reliable publication of messages to the domain. Logically this is part of the 'network' infrastructure and abstracted in the m-ld specification.
An app provides a network messaging service to the clone, via an adapter. This allows the app to choose an appropriate messaging service for its requirements and architecture, and also to secure access to the service prior to passing it to the clone.
Message-layer adapters are currently specified for:
- MQTT: a machine-to-machine (M2M)/"Internet of Things" connectivity protocol. Convenient to use for local development or if the deployment environment has an MQTT broker available.
- Ably: provides infrastructure and APIs to power realtime experiences at scale. It is a managed service, and includes pay-as-you-go developer pricing. It is also convenient to use for global deployments without the need to self-manage a broker.
Ably is similar to other cloud message-publishing services such as AWS SNS and Azure Service Bus, as well as other popular technologies like RabbitMQ and Apache Kafka. All of these would be suitable services for m-ld messaging.
m-ld can also work with a fully peer-to-peer messaging system, to realise complete architecture decentralisation in next-generation internet apps.
🚧 Please let us know if you would like to use any of these options in your system architecture. We would be delighted to work with you to make best use of your infrastructure commitments.
Check the clone engine documentation for its supported message layer and configuration details.