Features and Roadmap
High-performance Database
Columnar in-memory engine for extremely high throughput and low latency
Columnar hybrid (in-memory and disk-based) engine delivers fast performance for data warehouses with vast amount of data
Flexible partition schemes: value, range, list, hash and composite partitions
Support millions of partitions for a table
In-database analytics: complicated computing can be executed within the database. Significantly reduces time for data transfer.
Native support for processing time series data with up to nanosecond precision
Standard SQL with enhancements, such as panel data processing, bi-temporal joins (asof join, window join), window functions, pivoting, composite columns, etc.
Table co-location for fast join
Support data compression
Support dynamically increasing table columns
Support dynamically increasing table columns
Highly expressive. Support imperative programming, functional programming, vector programming, SQL programming, and RPC (remote procedure call) programming.
Easy to learn. The syntax is very similar to SQL and Python.
About 600 built-in functions for various data types (number, temporal, string), data structures (vector, matrix, set, dictionary, table), and system calls (file, database, distributed computing).
Extended functionalities with user defined functions and plugins
Distributed Computing
High speed distributed computing through in-memory engine, data localization, fine-grained data partitioning, and parallel computing.
Offer various built-in computing models such as pipeline, map-reduce, and iterative computing.
Provide snapshot isolations on computing of distributed dynamic data
Boost system throughput by sharing data copies in memory among multiple jobs
Efficient programming for distributed computing. Can write script on one node to execute on the entire cluster instantly without the need of compilation and deployment.
Automatic data replica management for load balance and fault tolerance with embedded distributed file system
Convenient horizontal scaling on both storage capacity and computing capacity
Real-time Data Streaming
Adopt publish/subscribe framework. Support chained subscription.
First-class support for stream-table duality. Publishing a message is equivalent to inserting a row to a table. Can use SQL queries on local or distributed streaming data.
Deliver messages with sub-millisecond latency
Update historical data warehouse with live data with sub-second delay.
Replay historical messages from arbitrary offset.
Provide configurable building blocks (e.g. partition, worker, queue) for traffic control and performance tuning
System Management and APIs/plugins
Embedded web interface for cluster management, performance monitoring and data access.
System monitoring via built-in functions, web interface, or Prometheus.
Portable IDE for data analysis.
Programming APIs for C++, C#, Java, Python, R, JavaScript and Excel.
User access control on tables and functions
Run scheduled tasks of user-defined functions
We are working on the following features:
Use just-in-time compiling to improve the performance of iterative computing.
Offer more built-in machine learning packages.
Provide support for other distributed file systems.