RethinkDB: Install Open Source Database for Realtime Apps on Kubernetes
Written by 8grams
Published at November 20, 2023
Introduction
RethinkDB is an innovative database system designed primarily for applications that need real-time updates. It was initially created by a company named RethinkDB and later became an open-source project. Unlike traditional databases that require polling for data updates, RethinkDB's main attraction is its ability to push live updates to applications as soon as changes occur. This makes it highly effective for apps where immediate data reflection is critical, like in chat applications, live feeds, or collaborative tools.
The database is structured to store data in JSON documents, offering flexibility for various data models. This design is particularly beneficial for developers working on modern web applications that deal with complex, unstructured data.
After the original company behind RethinkDB shut down, the database was sustained by an open-source community, which has been responsible for its maintenance and further development. The transition to open-source has allowed a broader developer community to contribute to its growth and adaptability.
RethinkDB stands out for its real-time capabilities, making it a go-to choice for applications where live data synchronization is a key requirement. Its journey from a proprietary product to a community-driven open-source project is a testament to its utility and the strong support from its user and developer community.
Origin
RethinkDB originated as a company-led project, developed by the Y Combinator-backed company also named RethinkDB. The project began around 2009, spearheaded by Slava Akhmechet and Michael Glukhovsky. The development focused on addressing the limitations of traditional relational databases, particularly in handling real-time data processing and updates.
The vision for RethinkDB was to create a database that was not only scalable and efficient but also provided a more developer-friendly approach to dealing with real-time data. This was particularly relevant as web applications were becoming increasingly dynamic and interactive, requiring databases to provide instant updates and real-time feeds.
In 2012, RethinkDB took a significant turn when it transitioned from a proprietary product to an open-source project. This change opened the door for a wider community of developers to contribute to its development and enhancement. The move to open-source was driven by the belief that community involvement would accelerate innovation and adoption of the database.
Despite its innovative features and growing popularity, the company behind RethinkDB faced financial challenges. In October 2016, RethinkDB announced that it was shutting down. However, the database itself did not meet the same fate. The community around RethinkDB, passionate about its potential and capabilities, rallied to keep the project alive. This led to the creation of a community-driven fork of RethinkDB, ensuring its continued development and support.
The origin of RethinkDB is a story of innovation, community resilience, and the transition from a company-led project to a community-driven open-source database. Its journey reflects the evolving nature of technology development, where community involvement can play a pivotal role in sustaining and evolving software projects.
Key Features
RethinkDB's key features make it unique among database management systems, especially for applications requiring real-time updates and scalable architectures. Here's a deeper look into its significant features:
Real-Time Push Architecture
RethinkDB's most distinctive feature is its real-time functionality. It can push updated query results to applications instantly as data changes, which is ideal for applications where timely data updates are critical, like live feeds, gaming, or collaborative tools.
Document-Based Data Model
It uses a JSON document model, which means you can store data in a format that's both human-readable and easily manipulated by web applications. This flexible data model is great for unstructured data and can support varied and complex data structures.
Scalability
Designed with scalability in mind, RethinkDB can handle growing data needs and increased user loads. It can be deployed across multiple machines, and its architecture allows for efficient scaling, making it suitable for both startups and large enterprises.
Distributed Database
It's inherently distributed, meaning it can run on multiple servers, providing high availability and failover protection. This is essential for critical applications where downtime can have significant impacts.
Powerful Query Language (ReQL)
RethinkDB introduces ReQL, a rich and intuitive query language. ReQL is embedded in programming languages like JavaScript, Python, and Ruby, allowing developers to write queries in a way that feels natural and is easy to integrate into existing codebases.
Strong Consistency
It offers strong consistency options, ensuring that all clients see the same data at the same time. This is particularly important for applications where data integrity and consistency are crucial.
Geospatial Queries
RethinkDB supports geospatial data types and queries. This feature is beneficial for applications dealing with location-based data, like mapping services or location tracking.
Easy Administration
The database comes with an intuitive web interface for easy administration. This includes tasks like cluster management, performance monitoring, and troubleshooting.
Community and Ecosystem
Since becoming open-source, it has fostered a vibrant community and ecosystem, leading to a wealth of resources, libraries, and tools developed around it.
RethinkDB is designed for modern web applications that demand real-time updates, flexible data models, and scalable, distributed architectures. Its rich set of features, including the real-time push architecture, JSON document storage, and robust query language, make it a powerful choice for a wide range of applications.
Realtime Push Architecture
RethinkDB's real-time push architecture is one of its most defining and innovative features. This architecture fundamentally changes how applications receive updates from the database, making it exceptionally useful for applications that require immediate data updates.
Concept of Real-Time Push in RethinkDB
Traditional Polling vs. Push Architecture
In traditional databases, applications typically poll the database at regular intervals to check for updates, a process that can be inefficient and slow. RethinkDB reverses this model with a push architecture. Instead of constantly polling for changes, applications can subscribe to changes in the database. When data changes, the database actively pushes these changes to the application.
Change Feeds
This functionality is implemented through 'change feeds'. A change feed is a continuous query that automatically updates the result set in real-time as the underlying data changes. Developers can set up change feeds on tables, specific documents, or even filtered query results. Whenever there's a change (like insert, update, or delete operations), the change feed sends an update to the application.
Real-Time Data Synchronization
This feature allows for real-time synchronization between the database and applications. For instance, if a record is updated in the database, all clients subscribed to the change feed for that record receive the update instantly. This is particularly useful for applications like collaborative tools, real-time analytics dashboards, or online games where the current state must be reflected immediately across all clients.
Efficiency and Scalability
By pushing updates only when changes occur, RethinkDB reduces the amount of unnecessary data transfer and processing, making it more efficient than traditional polling mechanisms. The architecture is designed to be scalable, handling numerous simultaneous change feeds without significant degradation in performance.
Ease of Use
Setting up and using change feeds is straightforward, often requiring just a few lines of code. This simplicity makes it accessible to developers without requiring extensive experience in database management.
Technical Implementation
Underlying Technology
RethinkDB’s real-time push architecture is deeply integrated into its core. It uses efficient algorithms to monitor changes and propagate them through change feeds. The system is optimized to handle high throughput and low latency, ensuring that updates are pushed to clients as quickly as possible.
Client-Side Integration:
On the client side, integrating change feeds typically involves using RethinkDB's query language, ReQL, within the application code. Most popular programming languages supported by RethinkDB have libraries or drivers that provide a seamless way to work with change feeds.
RethinkDB's real-time push architecture represents a significant shift from traditional database interaction models. It provides an efficient, scalable, and easy-to-use mechanism for applications to receive live updates, making it a powerful tool for developers building interactive, real-time applications.
Documents-based Data Model
RethinkDB's document-based data model is a core aspect of its design, catering to the needs of modern applications that often deal with complex and varied data structures.
Concept of Document-Based Data Model
JSON Documents
In RethinkDB, data is stored in the form of JSON (JavaScript Object Notation) documents. JSON is a lightweight data-interchange format that's easy for humans to read and write, and easy for machines to parse and generate. Each document is essentially a collection of key-value pairs, where the values can be various data types, including numbers, strings, booleans, arrays, or even nested documents.
Schema-less Nature
Unlike traditional relational databases that require a predefined schema, RethinkDB's document model is schema-less. This means that each document in the same table can have a different structure. The flexibility of a schema-less model is particularly useful for dealing with unstructured or semi-structured data and allows for rapid development and iteration.
Handling Complex Data Structures
The JSON format enables the storage of complex and nested data structures. This is beneficial for applications that deal with multifaceted data models or require the representation of hierarchical relationships.
Advantages of Document-Based Model
Flexibility
The schema-less nature provides immense flexibility in data modeling. Developers can easily modify the structure of the data without the need for extensive database refactoring.
Intuitive Data Representation
For developers, particularly those working with JavaScript or other web technologies, JSON is a familiar format, making it more intuitive to work with compared to traditional table-based structures.
Efficient Data Retrieval
Documents encapsulate related data in a single entity. This often results in more efficient data retrieval, as related information is stored together and can be retrieved in a single query.
Scalability
Document databases like RethinkDB can scale out horizontally, which means they can distribute data across multiple servers or nodes. This scalability is well-suited for large-scale applications and big data needs.
Considerations
Data Redundancy
Since the schema-less model allows for flexibility, it can also lead to data redundancy. This requires careful design and sometimes additional logic in the application layer to handle data consistency.
Query Complexity
While RethinkDB offers powerful querying capabilities, queries can become complex, especially when dealing with deeply nested documents or performing operations that are traditionally relational, like joins.
Indexing and Performance
Efficient indexing is crucial for performance, particularly for large datasets. RethinkDB allows for the creation of secondary indexes to optimize query performance.
RethinkDB's document-based data model offers a modern, flexible approach to data storage and retrieval. It aligns well with the needs of web applications and services dealing with diverse and complex data sets, offering a more natural and efficient way of handling data compared to traditional relational databases.
ReQL DB
ReQL is a core component of RethinkDB that stands out for its deep integration with the host programming language. Unlike traditional SQL, which often feels like a separate entity within the code, ReQL queries are written in the same language as the rest of the application, be it JavaScript, Python, or Ruby. This integration offers a seamless experience for developers, allowing them to leverage the familiar syntax and constructs of their chosen language while interacting with the database.
The essence of ReQL lies in its functional and chainable nature. Queries in ReQL are composed by chaining together methods, each performing a specific operation on the data. This approach resembles building a sentence, where each word has a purpose, and together they form a meaningful statement. For instance, a query to retrieve all users over the age of 30 might look like a simple, readable line of code that progressively filters, sorts, and selects data.
What sets ReQL apart is its adaptability to both synchronous and asynchronous programming environments. In a NodeJS application, for instance, queries can be executed asynchronously, returning promises. This feature is particularly advantageous in web development, where non-blocking operations are crucial for performance.
Another notable aspect of ReQL is its capability to handle real-time data. It provides the tools to create real-time feeds, enabling applications to react to database changes instantly. This feature is a cornerstone for applications where timely data updates are crucial, such as in collaborative tools or live dashboards.
When it comes to performance, understanding the intricacies of ReQL's execution is key. Knowing how to efficiently use indexes, for example, can significantly enhance query performance, especially with large datasets. The language's design considers distributed environments, allowing queries to run across multiple nodes in a RethinkDB cluster, thus leveraging the full power of distributed computing.
Example
These examples will be written in JavaScript, which is a commonly used language with RethinkDB. Let's explore a few scenarios to illustrate the versatility and power of ReQL.
Inserting Data
r.table('users').insert({
name: 'John Doe',
age: 28,
email: 'johndoe@example.com'
}).run(conn, (err, result) => {
if (err) throw err;
console.log(result);
});
Retrieving Data
r.table('users').filter(r.row('age').gt(30)).run(conn, (err, cursor) => {
if (err) throw err;
cursor.toArray((err, results) => {
if (err) throw err;
console.log(results);
});
});
Updating Data
r.table('users').get('userId123').update({
email: 'newemail@example.com'
}).run(conn, (err, result) => {
if (err) throw err;
console.log(result);
});
Deleting Data
r.table('users').get('userId123').delete().run(conn, (err, result) => {
if (err) throw err;
console.log(result);
});
Real-Time Feeds
r.table('users').changes().run(conn, (err, cursor) => {
if (err) throw err;
cursor.each((err, row) => {
if (err) throw err;
console.log(row);
});
});
Install RethinkDB on Kubernetes
RethinkDB can be straightforwardly installed on Kubernetes using the Helm chart template provided by 8grams, available at: https://github.com/8grams/microk8s-helm-chart. So make sure you already have Helm installed on your machine.
Download Helm Chart Template
~$ git clone git@github.com:8grams/microk8s-helm-chart.git charts
Create a file values.yaml
to override default Helm Values
Install it
~$ helm install rethinkdb ./charts/general -n rethinkdb -f values.yaml --create-namespace
Check Installation
~$ kubectl -n rethinkdb get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
rethinkdb-general 1/1 1 1 10m
And also check Ingress
~$ kubectl -n rethinkdb get ingress
NAME HOSTS
rethinkdb-general rethinkdb.example.com
Looks good! Now you have a Distributed and Realtime Database installed on your own Kubernetes Cluster. You can access RethinkDB Web UI on https://rethinkdb.example.com