Categories
Analysis

The search for a cloud-native database

Cedrick Lunven (@clunven) and Jeff Carpenter (@jscarp) of K8ssandra discuss the search fora cloud-native database.

The concept of “cloud-native” has come to stand for a collection of best practices for application logic and infrastructure, including databases. However, many of the databases supporting our applications have been around for decades, before the cloud or cloud-native was a thing. The data gravity associated with these legacy solutions has limited our ability to move applications and workloads. As we move to the cloud, how do we evolve our data storage approach? Do we need a cloud-native database? What would it even mean for a database to be cloud-native? Let’s take a look at these questions.

What is Cloud-Native?

It’s helpful to start by defining terms. In unpacking “cloud-native”, let’s start with the word “native”. For individuals, the word may evoke thoughts of your first language, or your country or origin – things that feel natural to you. Or in nature itself, we might consider the native habitats inhabited by wildlife, and how each species is adapted to its environment. We can use this as a basis to understand the meaning of cloud-native.

Here’s how the Cloud Native Computing Foundation (CNCF) defines the term:

“Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds: Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.

These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.”

This is a rich definition, but it can be a challenge to use this to define what a cloud-native database is, as evidenced by the Database section of the CNCF Landscape Map:

Database section of the CNCF Landscape Map

Databases are just a small portion of a crowded cloud computing landscape.

Look closely, and you’ll notice a wide range of offerings: both traditional relational databases and NoSQL databases, supporting a variety of different data models including key/value, document, and graph. You’ll also find technologies that layer clustering, querying or schema management capabilities on top of existing databases. And this doesn’t even consider related categories in the CNCF landscape such as Streaming and Messaging for data movement, or Cloud Native Storage for persistence.

Which of these databases are cloud-native? Only those that are designed for the cloud, should we include those that can be adapted to work in the cloud? Bill Wilder provides an interesting perspective in his 2012 book, “Cloud Architecture Patterns”, defining “cloud-native” as:

Any application that was architected to take full advantage of cloud platforms”

By this definition, cloud-native databases are those that have been architected to take full advantage of underlying cloud infrastructure. Obvious? Maybe. Contentious? Probably…

Why should I care if my database is cloud-native?

Or to ask a different way, what are the advantages of a cloud-native database? Consider the two main factors driving the popularity of the cloud: cost and time-to-market.

  • Cost – the ability to pay-as-you-go has been vital in increasing cloud adoption. (But that doesn’t mean that cloud is cheap or that cost management is always straightforward.)
  • Time-to-market – the ability to quickly spin up infrastructure to prototype, develop, test, and deliver new applications and features. (But that doesn’t mean that cloud development and operations are easy.)

These goals apply to your database selection, just as they do to any other part of your stack.

What are the characteristics of a cloud-native database?

Now we can revisit the CNCF definition and extract characteristics of a cloud-native database that will help achieve our cost and time-to-market goals:

  • Scalability – the system must be able to add capacity dynamically to absorb additional workload
  • Elasticity – it must also be able to scale back down, so that you only pay for the resources you need
  • Resiliency – the system must survive failures without losing your data
  • Observability – tracking your activity, but also health checking and handling failovers
  • Automation – implementing operations tasks as repeatable logic to reduce the possibility of error. This characteristic is the most difficult to achieve, but is essential to achieve a high delivery tempo at scale

Cloud-native databases are designed to embody these characteristics, which distinguish them from “cloud-ready” databases, that is, those that can be deployed to the cloud with some adaptation.

What’s a good example of a cloud-native database?

Let’s test this definition of a cloud-native database by applying it to Apache Cassandra™ as an example. While the term “cloud-native” was not yet widespread when Cassandra was developed, it bears many of the same architectural influences, since it was inspired by public cloud infrastructure such as Amazon’s Dynamo Paper and Google’s BigTable. Because of this lineage, Cassandra embodies the principles outlined above:

  • Cassandra demonstrates horizontal scalability through adding nodes, and can be scaled down elastically to free resources outside of peak load periods
  • By default, Cassandra is an AP system, that is, it prioritizes availability and partition tolerance over consistency, as described in the CAP theorem. Cassandra’s built in replication, shared-nothing architecture and self-healing features help guarantee resiliency.
  • Cassandra nodes expose logging, metrics, and query tracing, which enable observability
  • Automation is the most challenging aspect for Cassandra, as typical for databases.

While automating the initial deployment of a Cassandra cluster is a relatively simple task, other tasks such as scaling up and down or upgrading can be time-consuming and difficult to automate. After all, even single-node database operations can be challenging, as many a DBA can testify. Fortunately, the K8ssandra project provides best practices for deploying Cassandra on Kubernetes, including major strides forward in automating “day 2” operations.

Does a cloud-native database have to run on Kubernetes?

Speaking of Kubernetes… When we talk about databases in the cloud, we’re really talking about stateful workloads requiring some kind of storage. But in the cloud world, stateful is painful. Data gravity is a real challenge – data may be hard to move due to regulations and laws, and the cost can get quite expensive. This results in a premium on keeping applications close to their data.

The challenges only increase when we begin deploying containerized applications using Kubernetes, since it was not originally designed for stateful workloads. There’s an emerging push toward deploying databases to run on Kubernetes as well, in order to maximize development and operational efficiencies by running the entire stack on a single platform. What additional requirements does Kubernetes put on a cloud-native database?

Containerization

First, the database must run in containers. This may sound obvious, but some work is required. Storage must be externalized, the memory and other computing resources must be tuned appropriately, and the application logs and metrics must be made available to infrastructure for monitoring and log aggregation.

Storage

Next, we need to map the database’s storage needs onto Kubernetes constructs. At a minimum, each database node will make a persistent volume claim that Kubernetes can use to allocate a storage volume with appropriate capacity and I/O characteristics. Databases are typically deployed using Kubernetes Stateful Sets, which help manage the mapping of storage volumes to pods and maintain consistent, predictable, identity.

Automated Operations

Finally, we need tooling to manage and automate database operations, including installation and maintenance. This is typically implemented via the Kubernetes operator pattern. Operators are basically control loops that observe the state of Kubernetes resources and take actions to help achieve a desired state. In this way they are similar to Kubernetes built-in controllers, but with the key difference that they understand domain-specific state and thus help Kubernetes make better decisions.

For example, the K8ssandra project uses cass-operator, which defines a Kubernetes custom resource (CRD) called “CassandraDatacenter” to describe the desired state of each top-level failure domain of a Cassandra cluster. This provides a level of abstraction higher than dealing with Stateful Sets or individual pods.

Kubernetes database operators typically help to answer questions like:

  • What happens during failovers? (pods, disks, networks)
  • What happens when you scale out? (pod rescheduling)
  • How are backups performed?
  • How do we effectively detect and prevent failure?
  • How is software upgraded? (rolling restarts)

Conclusion and what’s next

A cloud-native database is one that is designed with cloud-native principles in mind, including scalability, elasticity, resiliency, observability, and automation. As we’ve seen with Cassandra, automation is often the final milestone to be achieved, but running databases in Kubernetes can actually help us progress toward this goal of automation.

What’s next in the maturation of cloud-native databases? We’d love to hear your input as we continue to invent the future of this technology together.

This blog post originally appeared on K8ssandra and is based on Cedrick’s presentation “Databases in the Cloud-Native Era” from BluePrint London, March 11, 2021 (registration required).

Categories
Tools

What have developers been reading

While novel readers were busy paging through murder mysteries and historical fiction this past spring, developer interests were data and analytics, Jakarta, cloud-native articles, Kubernetes and open source.

That’s what we discovered when we took a look at quarter-over-quarter pageview trends in DZone.com. A little background: 29 million unique readers visit each year. The research is based on article tags assigned by DZone editors and used to help readers search once they’re on our site. They aren’t keywords.

Consuming All Things Data in the Data Category

Our first pass researching tags occurred in the first quarter, when interest in data analytics and tools articles skyrocketed. Same situation for Q2 vs Q1. Our findings did show that readership shifted away from the data scientist tagline toward specific tools and data strategies that anyone can implement.

Q2TopicTags_2

Of the fastest-growing data analysis topics, the data analysis tools tag grew over 30X, and related topics like ingesting data (the collection of data into/out of the database for immediate use) and augmented analytics (machine learning-powered data analysis) grew about 10X more popular with readers.

Terms like ingesting data and augmented analytics speak to the need for more than just a dashboard approach to consuming data. Tim Spann, a Big Data thought leader and field engineer at Cloudera, thinks a consolidation in analytics is underway.

“I think there’s going to be consolidation. And a lot of startups are going to try to integrate a couple of these things together. They’re going to try to add more features and differentiate themselves. You’re going to see more of the data analytics tools try to do ingest and vice versa, (so) they’ll be a more interactive platform,” he explains.

“You’re getting more data, you need to be able to ingest, you need to be able to analyze it, you need to be able to build apps out of them — it’s not just enough to have a static report, or even a dashboard that people look at, people actually have uses for this data.”

From Java EE to Jakarta

After Oracle announced in 2017 that they were handing over Java EE to the Eclipse Foundation, a few changes began to take place. One was that Enterprise Java would now be called Jakarta EE.

In the second quarter of this year, the Eclipse Foundation announced that all Jakarta specifications with “javax.*” must be changed to “jakarta.*” This had the potential to significantly impact, and potentially harm, existing Enterprise Java applications.

It’s no surprise that developers were on DZone searching for the best ways to comply with these changes. We saw a lot of growth in all related topic tags, including Enterprise Java, which grew 10 times more popular, and Jakarta, which grew over 35 times more popular in Q2.

Apps and Cloud-Native Development

Cloud-native development is drastically changing the way we build applications. The term cloud-native refers to a style of container-based development that creates applications from scratch, or refactors older applications, to be fully optimized for the cloud. This is very different from older application development philosophies that retroactively adjusted apps to be cloud-enabled using methods like lift-and-shift or re-platforming.

According to this article on what it means to be cloud-native, the applications contain three major traits. They are container-centric, dynamically managed (i.e., containers are managed and organized by Kubernetes or other similar platforms), and microservices-oriented.

From Q1 to Q2, many aspects of cloud development showed steady growth. However, topics specific to cloud-native development grew exponentially, with the term itself showing over 300% growth last quarter.

Related tags that also showed an increase included:

  • Cloud-based microservices (500%)
  • Cloud-native deployment (140%)
  • Scaling microservices (160%)

Q2TopicTags_4

“The cloud-native ecosystem will see explosive growth as the growing adoption of Kubernetes will translate to a growing need to make it manageable for the enterprise. There are both huge gaps in tooling and many unrealized opportunities in making fleets of microservices more manageable — and we expect to see projects sprout up to handle both the gaps and the opportunities,” said Gwen Shapira, Data Architect at Confluent, in this interview with DZone.

Is Kubernetes the King of Development?

Kubernetes is the largest cloud-native platform designed to manage, scale, and deploy containers. As cloud-native development continues to grow, so does interest in Kubernetes.

“Kubernetes is a game changer. It’s slowly taking over the way the Internet works as far as application development and deployment,” explains Bob Reselman, an industry analyst and technical educator.

“It’s all changing so fast. Every five years, the stack is changing. Because of this, developers are finding themselves in a constant state of adaptivity and looking for the next best tool and ways to manage, scale, and deploy applications.”

Not surprisingly, we found topic tags related to Kubernetes and Kubernetes deployment showing tremendous growth from Q1 to Q2, including:

  • k8s — another name for Kubernetes (7X more popular).
  • Kubeadm — a fast-path command for creating a Kubernetes cluster (151%).
  • Kubelet — a command that runs individual Kubernetes nodes (118%).
  • Kubernetes services — rules and abstractions for Kubernetes pods (85%).
  • GKE — Google Kubernetes Engine (66% growth).

Open Source Topics Remain Popular Developer Interests

Nearly all of the above-mentioned topic tags contain one common theme: Open source.

The open-source topics that saw the most growth include:

  • Open APIs (11X).
  • Open source big data tools (200%).
  • Open source communities (110%).

Additionally, we saw growth in a wide range of open-source tools and platforms — some mentioned above, like Kubernetes, Apache, Jakarta, RSocket, and many more.

As the stack continues to change and evolve, developers will seek out open-source software first. Without question. Kubernetes, data tools like Apache Spark and Kafka— all open source, all dominating the ecosystem and rank high in developer interests.

Q2TopicTags_6

“I believe enterprises will increasingly turn to managed platforms delivering 100% open-source technologies in 2019, as they increasingly seek to avoid the vendor and technological lock-in that remain too common with proprietary open source offerings,” explained Ben Slater, CPO at Instaclustr, in an interview with DZone late last year.

“Given the fact that commercialized open-source technologies can leave enterprises at the mercy of price increases (and make it impossible to run solutions on their own or implement useful modifications), fully open-source technologies offer a compelling alternative.”

“Open-source solutions are empowered by engaged communities that help ensure rapid improvements and bug resolution, better security, full transparency and reliability, and a faster time to market at a lower cost.”

About the author:

Lindsay is a Content Coordinator at Devada. She works closely with contributors to DZone, a website for software developers and IT professionals to learn and share their knowledge. Editing and reviewing submissions to the site, she specializes in content related to Java, IoT, and software security.