Distributed Database

DolphinDB has 2 deployment modes: standalone mode and cluster mode.

In the standalone mode, the database is deployed on one physical server.

A DolphinDB cluster has 3 types of nodes: controller, agent, and data node. The standalone mode only has 1 data node and no controller node or agent node.

  • Controller. There can be multiple controllers in a cluster. The controllers are the central nervous system of a DolphinDB cluster. They collect heartbeats of agents and data nodes, monitor the status of each node, and manage metadata and transactions of the distributed file system.

  • Agent. An agent node executes the commands issued by a controller to start/stop local data nodes. Each physical server has one and only one agent within a cluster.

  • Data node. Data are stored and queries (or more complex computations) are executed on data nodes. A physical server can be configured with multiple data nodes.

  • Compute node: The compute node is used for queries and computation, including historical data queries, distributed joins, batch processing, streaming, and machine learning model training. A physical server can be configured with multiple compute nodes. Since data is not stored on a compute node, you can use loadTable to load data from a data node to a compute node for computational work. On a compute node, you can create the database and partitioned tables, and write data to the partitioned tables by calling a write interface. However, writing data on a compute node will lead to more network overhead than a data node as the compute node need to send data to data nodes for storage.

Each agent or data node sends a heartbeat to the controller every second through UDP broadcasting, informing the controller that it is alive and transmitting other runtime statistics. If the controller does not receive a heartbeat from a node for 3 consecutive seconds, this node is marked as a dead node.

The standalone mode and the cluster mode have the same data model and both support transactions. Only the cluster mode, however, supports high availability.

Peer to peer Architecture

Most mainstream distributed databases adopt a master-slave architecture. The master node is not only responsible for the synchronization between metadata and state, but also the dual-server hot-backup to ensure the high availability, which can easily become a system bottleneck and increase the difficulty of scale out. DolphinDB adopts a P2P architecture and relies on globally-visible metadata services, which is easier to achieve resource load balancing. In DolphinDB, any data node can be the entry point for query requests and data writing with no system bottleneck.

High Availability

DolphinDB provides high availability for data nodes, controller nodes and API clients. It ensures that the database can continue to work when one or more database nodes become unavailable due to network interruption or node crash.

High availability of data nodes

Multiple copies of the same data chunk can be stored on different data nodes. Even if one or more data nodes is unavailable, the database continues to work as long as at least one copy is still available in the cluster. Multiple replicas not only enables high availability but also offers load balance. If data can be loaded from multiple data nodes, the system automatically chooses a node with low work load.

DolphinDB supports storing replicas on different servers, with strong consistency across all replicas. If data on one machine is corrupted, the database can still be accessed by visiting other machines.

High availability of controller nodes

Metadata is stored at controller nodes. We can deploy multiple controller nodes in a cluster to ensure that metadata service is uninterrupted if one or more controller nodes become unavailable. All controller nodes in a cluster form a Raft group. One controller node is the leader and the rest controller nodes are the followers. Metadata on the leader and on the followers maintains strong consistency. Data nodes can only interact with the leader. If the leader is not available, a new leader is immediately elected to provide the metadata service. The Raft group can tolerate less than half of the controller nodes become unavailable. For examples, a cluster with 3 controller nodes can tolerate 1 unavailable controller node; a cluster with 5 controller nodes can tolerate 2 unavailable controller nodes. To enable high availability for controller nodes, the number of controller nodes is at least 3, and we must set the configuration parameter ‘dfsReplicationFactor’ to be greater than 1.

APIs’ support of high availability

DolphinDB APIs also support high availability. When the data node an API is connecting with becomes unavailable, the API will attempt to reconnect to the data node. If the reconnecting attempt fails, the API will automatically connect to another data node that is available.

Regarding how to deploy a multi-machine cluster please refer to DolphinDB Cluster Deployment on Multiple Servers .