The search for a cloud-native database

Cedrick Lunven (@clunven) and Jeff Carpenter (@jscarp) of K8ssandra discuss the search fora cloud-native database.

The concept of “cloud-native” has come to stand for a collection of best practices for application logic and infrastructure, including databases. However, many of the databases supporting our applications have been around for decades, before the cloud or cloud-native was a thing. The data gravity associated with these legacy solutions has limited our ability to move applications and workloads. As we move to the cloud, how do we evolve our data storage approach? Do we need a cloud-native database? What would it even mean for a database to be cloud-native? Let’s take a look at these questions.

What is Cloud-Native?

It’s helpful to start by defining terms. In unpacking “cloud-native”, let’s start with the word “native”. For individuals, the word may evoke thoughts of your first language, or your country or origin – things that feel natural to you. Or in nature itself, we might consider the native habitats inhabited by wildlife, and how each species is adapted to its environment. We can use this as a basis to understand the meaning of cloud-native.

Here’s how the Cloud Native Computing Foundation (CNCF) defines the term:

“Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds: Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.

These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.”

This is a rich definition, but it can be a challenge to use this to define what a cloud-native database is, as evidenced by the Database section of the CNCF Landscape Map:

Database section of the CNCF Landscape Map

Databases are just a small portion of a crowded cloud computing landscape.

Look closely, and you’ll notice a wide range of offerings: both traditional relational databases and NoSQL databases, supporting a variety of different data models including key/value, document, and graph. You’ll also find technologies that layer clustering, querying or schema management capabilities on top of existing databases. And this doesn’t even consider related categories in the CNCF landscape such as Streaming and Messaging for data movement, or Cloud Native Storage for persistence.

Which of these databases are cloud-native? Only those that are designed for the cloud, should we include those that can be adapted to work in the cloud? Bill Wilder provides an interesting perspective in his 2012 book, “Cloud Architecture Patterns”, defining “cloud-native” as:

Any application that was architected to take full advantage of cloud platforms”

By this definition, cloud-native databases are those that have been architected to take full advantage of underlying cloud infrastructure. Obvious? Maybe. Contentious? Probably…

Why should I care if my database is cloud-native?

Or to ask a different way, what are the advantages of a cloud-native database? Consider the two main factors driving the popularity of the cloud: cost and time-to-market.

  • Cost – the ability to pay-as-you-go has been vital in increasing cloud adoption. (But that doesn’t mean that cloud is cheap or that cost management is always straightforward.)
  • Time-to-market – the ability to quickly spin up infrastructure to prototype, develop, test, and deliver new applications and features. (But that doesn’t mean that cloud development and operations are easy.)

These goals apply to your database selection, just as they do to any other part of your stack.

What are the characteristics of a cloud-native database?

Now we can revisit the CNCF definition and extract characteristics of a cloud-native database that will help achieve our cost and time-to-market goals:

  • Scalability – the system must be able to add capacity dynamically to absorb additional workload
  • Elasticity – it must also be able to scale back down, so that you only pay for the resources you need
  • Resiliency – the system must survive failures without losing your data
  • Observability – tracking your activity, but also health checking and handling failovers
  • Automation – implementing operations tasks as repeatable logic to reduce the possibility of error. This characteristic is the most difficult to achieve, but is essential to achieve a high delivery tempo at scale

Cloud-native databases are designed to embody these characteristics, which distinguish them from “cloud-ready” databases, that is, those that can be deployed to the cloud with some adaptation.

What’s a good example of a cloud-native database?

Let’s test this definition of a cloud-native database by applying it to Apache Cassandra™ as an example. While the term “cloud-native” was not yet widespread when Cassandra was developed, it bears many of the same architectural influences, since it was inspired by public cloud infrastructure such as Amazon’s Dynamo Paper and Google’s BigTable. Because of this lineage, Cassandra embodies the principles outlined above:

  • Cassandra demonstrates horizontal scalability through adding nodes, and can be scaled down elastically to free resources outside of peak load periods
  • By default, Cassandra is an AP system, that is, it prioritizes availability and partition tolerance over consistency, as described in the CAP theorem. Cassandra’s built in replication, shared-nothing architecture and self-healing features help guarantee resiliency.
  • Cassandra nodes expose logging, metrics, and query tracing, which enable observability
  • Automation is the most challenging aspect for Cassandra, as typical for databases.

While automating the initial deployment of a Cassandra cluster is a relatively simple task, other tasks such as scaling up and down or upgrading can be time-consuming and difficult to automate. After all, even single-node database operations can be challenging, as many a DBA can testify. Fortunately, the K8ssandra project provides best practices for deploying Cassandra on Kubernetes, including major strides forward in automating “day 2” operations.

Does a cloud-native database have to run on Kubernetes?

Speaking of Kubernetes… When we talk about databases in the cloud, we’re really talking about stateful workloads requiring some kind of storage. But in the cloud world, stateful is painful. Data gravity is a real challenge – data may be hard to move due to regulations and laws, and the cost can get quite expensive. This results in a premium on keeping applications close to their data.

The challenges only increase when we begin deploying containerized applications using Kubernetes, since it was not originally designed for stateful workloads. There’s an emerging push toward deploying databases to run on Kubernetes as well, in order to maximize development and operational efficiencies by running the entire stack on a single platform. What additional requirements does Kubernetes put on a cloud-native database?


First, the database must run in containers. This may sound obvious, but some work is required. Storage must be externalized, the memory and other computing resources must be tuned appropriately, and the application logs and metrics must be made available to infrastructure for monitoring and log aggregation.


Next, we need to map the database’s storage needs onto Kubernetes constructs. At a minimum, each database node will make a persistent volume claim that Kubernetes can use to allocate a storage volume with appropriate capacity and I/O characteristics. Databases are typically deployed using Kubernetes Stateful Sets, which help manage the mapping of storage volumes to pods and maintain consistent, predictable, identity.

Automated Operations

Finally, we need tooling to manage and automate database operations, including installation and maintenance. This is typically implemented via the Kubernetes operator pattern. Operators are basically control loops that observe the state of Kubernetes resources and take actions to help achieve a desired state. In this way they are similar to Kubernetes built-in controllers, but with the key difference that they understand domain-specific state and thus help Kubernetes make better decisions.

For example, the K8ssandra project uses cass-operator, which defines a Kubernetes custom resource (CRD) called “CassandraDatacenter” to describe the desired state of each top-level failure domain of a Cassandra cluster. This provides a level of abstraction higher than dealing with Stateful Sets or individual pods.

Kubernetes database operators typically help to answer questions like:

  • What happens during failovers? (pods, disks, networks)
  • What happens when you scale out? (pod rescheduling)
  • How are backups performed?
  • How do we effectively detect and prevent failure?
  • How is software upgraded? (rolling restarts)

Conclusion and what’s next

A cloud-native database is one that is designed with cloud-native principles in mind, including scalability, elasticity, resiliency, observability, and automation. As we’ve seen with Cassandra, automation is often the final milestone to be achieved, but running databases in Kubernetes can actually help us progress toward this goal of automation.

What’s next in the maturation of cloud-native databases? We’d love to hear your input as we continue to invent the future of this technology together.

This blog post originally appeared on K8ssandra and is based on Cedrick’s presentation “Databases in the Cloud-Native Era” from BluePrint London, March 11, 2021 (registration required).


7 DevOps books to read in 2021

If you are looking to learn more about Ansible, Azure, Docker, Terraform, Kubernetes, and their roles in DevOps, then this blog post is for you. We continue our series of must-read books with 7 DevOps books to read in 2021, as recommended by our friends at Packt.

Azure DevOps Explained

Get started with Azure DevOps and develop your DevOps practices

What reviews say:

” In my opinion, it is definitely one of the greatest books I ever read for DevOps.
Although I am Azure DevOps certified, I really enjoy reading this book and it gives me an extra overview of what I have learned.
It is well structured and the fact that is simple to read and follow along makes it more attractive. “

Terraform Cookbook

Efficiently define, launch, and manage Infrastructure as Code across various cloud platforms

What reviews say:

” I had the chance to read this book and I was really pleased by its content.
noting that this is not the first book or terraform material that I read, I would say that this book contains valuable structured information with also access to code used in various chapters.
it is certainly an asset for those starting their journey with terraform.”

Practical Ansible 2

Automate infrastructure, manage configuration, and deploy applications with Ansible 2.9

What reviews say:

This book is probably perfect for someone with reasonable experience. It was what I needed as a second book to get a good look at the ecosystem and a second opinion of how to use it. “

Kubernetes – A Complete DevOps Cookbook

Build and manage your applications, orchestrate containers, and deploy cloud-native services

What reviews say:

” Great coverage of common Kubernetes and DevOps tools. I’ve learned about some of the tools I haven’t used before like Jenkins X, GitLab, Fossa, Trivy, Litmus Chaos etc.
Although some of the long YAML files are provided in the GitHub repository I got the digital version, makes it easier to copy paste. “

Kubernetes and Docker – An Enterprise Guide

Effectively containerize applications, integrate enterprise systems, and scale applications in your enterprise

What reviews say:

“If you have worked on Kubernetes at all, you have experienced the frustration of trying to go beyond a cluster that has a single config file and a simple layer 7 load-balancer using NGINX. This book does truly target not only the enterprise user, but any person that wants to learn topics that make Kubernetes a complete offering.

I have been looking into the external-dns project on my list for a few months, but I never got around to doing much – Much to my surprise, when I was reading the topics covered in the book, it mentioned Services and external-dns. Chapter 6, alone, to me is one reason to buy the book since it explained and showed me how to install Metallb with external-dns in easy to understand terms and hands-on configuration.”

Learning DevOps

The complete guide to accelerate collaboration with Jenkins, Kubernetes, Terraform and Azure DevOps

” I would suggest reading through each section before you work along with the steps. There’s lots of references to other resources that are not necessarily part of the topics being discussed ”

Docker for Developers

Develop and run your application with Docker containers using DevOps tools for continuous delivery

” When reading articles, tutorials and even books, that is very common that at the end of the reading you struggle about how to translate that to a real production situation. Believe me, this book is different. You get to the end with a sense that you are very likely to know what are the next steps to apply what you learned to your existent or new projects. And this means a lot. The book has some great balance from history, concepts, example and practice. ”

What books have helped you deepen your knowledge of DevOps? Do share in the comments. Looking for more books to read? We have also shared recommended Backend and Frontend books.


Eight must-read books for developers in 2021

What are the top books on your reading list this season? Whether you’re learning a new skill or adding depth to your existing knowledge in a particular development area, it’s always a good idea to get a few more recommendations to your list. We’ve teamed up with Packt to help you discover eight must-read books that you need to add to your collection in 2021.

All Packt eBooks and Videos are for $5! A key part of Packt’s mission is to unlock new opportunities for developers and help put software to work in new ways. They want this year’s $5 campaign to help developers unlock new opportunities.

Cloud and Admin

Azure DevOps Explained

Implement real-world DevOps and cloud deployment scenarios using Azure Repos, Azure Pipelines, and other Azure DevOps tools.

What reviews say:

“The book is very carefully walking the reader through everything you need to know to become an Azure DevOps expert. I use DevOps all the time to build and manage Business Central AL development and found the book very useful.”

Kubernetes and Docker – An Enterprise Guide

Apply Kubernetes beyond the basics of Kubernetes clusters by implementing IAM using OIDC and Active Directory, Layer 4 load balancing using MetalLB, advanced service integration, security, auditing, and CI/CD.

What reviews say:

“This book covers most of the topics when an enterprise would like to adopt Kubernetes. What’s more, you hardly can find coverage on these topics in the market!”

Coding and tools

Learning C# by Developing Games with Unity 2020

Get to grips with coding in C# and build simple 3D games with Unity from the ground up with this updated fifth edition of the bestselling guide.

What reviews say:

“If you’re serious about learning to build games in Unity your progress will be advanced rapidly if you first have a solid foundation of understanding of C#. This book explains the necessary information to start understanding and using C# to develop games in Unity. After reading this you’ll have enough context to begin tearing down other people’s code and repurposing it to build your own functionalities for your game.”

iOS 14 Programming for Beginners

Learn iOS app development and work with the latest Apple development tools. Explore the latest features of Xcode 12 and the Swift 5.3 programming language in this updated fifth edition.

What reviews say:

“The author does a good job to capture an effective, quick, and breezy reading/learning/code-along experience. The explanations are concise and easy to follow, although I would imagine a complete newbie to programming entirely might ask a lot of questions in the earlier chapters.”


Learn Amazon SageMaker

Quickly build and deploy machine learning models without managing infrastructure, and improve productivity using Amazon SageMaker’s capabilities such as Amazon SageMaker Studio, Autopilot, Experiments, Debugger, and Model Monitor.

What reviews say:

“This is a comprehensive book for a data scientist looking to use the AWS ecosystem for machine learning with a focus on Sagemaker. I like the way it is organized which is practical and matches a typical life-cycle of a project.”

Data Engineering with Python 

Build, monitor, and manage real-time data pipelines to create data engineering infrastructure efficiently using open-source Apache projects.

What reviews say:

“Data Engineering With Python provides a solid overview of pipelining and database connections for those tasked with processing both batch and stream data flows. Not only for the data miners, this book will be useful as well in a CI/CD environment using Kafka and Spark. It’s very readable and contains lots of practical, illustrative examples.”


40 Algorithms Every Programmer Should Know: Hone your problem

Learn algorithms for solving classic computer science problems with this concise guide covering everything from fundamental algorithms, such as sorting and searching, to modern algorithms used in machine learning and cryptography.

What reviews say:

“Who the book is aimed at: if you self-identify as a data scientist, serious algorithms specialist, or even the quant type, then you won’t be disappointed! If you’re just starting in the field, the author has done the hard work of selecting some of the commonly used techniques & algorithms in the field today.”

Learn Quantum Computing with Python and IBM Quantum Experience

A step-by-step guide to learning the implementation and associated methodologies in quantum computing with the help of the IBM Quantum Experience, Qiskit, and Python that will have you up and running and productive in no time.

What reviews say:

“I really like this book. It takes a step-by-step approach to introduce the reader to the IBM Q Experience, to the basics underlying quantum computing, and to the reality of the noise involved in the current machines. This introduction is technical and shows the user how to use the IBM system either directly through the GUI on their website or by running Python code on one’s own machine.”

Have you read any of these already? Leave your impressions in the comments and don’t forget to share the list with other developers in your circle!

Be a guest writer on our blog
Have you got brilliant tips and resources that developers love to read? Then we want you on our blog! Find out more.