Kubernetes Overview, News and Trends | The New Stack

A Microservices Outcome: Testing Boomed

Alex Williams — Fri, 15 Sep 2023 18:15:18 +0000

A microservices outcome of the past five to ten years: testing boomed.

It boomed as more people just needed to test microservices. Microservices and the rise of Kubernetes reflected the shift from large application architectures to approaches that broke services into little pieces, said Bruno Lopes of Kubeshop.

Kubeshop is a Kubernetes company incubator, Lopes said. They have six different projects that they created in the Kubernetes. Lopes is the product leader of the company’s Kubernetes native testing framework, TestKube.

The ability to test more easily means it is more accessible to everybody. People feel more comfortable with testing due to the better developer experience. For example, automation improves product quality, especially as people have more time to differentiate than perform manual tasks.

Teams use Kubernetes; they develop applications there but then don’t test the applications where they live, Lopes said. They have the old ways of testing, but they also want to push out new features. Developers move fast — often faster than the organization can change its methodologies. Modern testing methods get adopted, but it takes time for the organization to adapt.

Lopes said no one should ship anything that did not get tested before it goes into production. Secondly, a company should establish an environment resembling production where you can run all your tests and deploy applications. The environments are never 100% the same, but they can be similar to deployment.

“And make it very fast,” Lopes said.” You shouldn’t make your development team wait for manual QA to make sure everything is all right before deploying. It should deploy as fast as soon as you can. You should deploy without waiting for manual tests.”

Take the SRE team, for example. They need to respond fast to issues. They want fast debugging. The more they spend time looking at the problems, the more downtime for their customers.

Sometimes, especially in critical systems, the applications cannot be exposed to the Internet, Lopes said. That means it becomes essential to run the tests in Kubernetes itself. A matter that will take time for companies to understand, of course, accelerating as the developer experience improves.

The post A Microservices Outcome: Testing Boomed appeared first on The New Stack.

Edge AI: How to Make the Magic Happen with Kubernetes

Saad Malik — Thu, 14 Sep 2023 11:00:13 +0000

Picture this: You walk into a Macy’s store, taking a break from the virtual world to indulge in the tactile experience of brick-and-mortar shopping.

As you enter, a tall and broad-shouldered Android named Friday greets you warmly.

“Welcome back, Lucy. How are you doing today? Did you like the pink sweater you bought last week?”

Taken aback by the personal touch, you reply, “I’m doing well, thank you. I’m actually looking for a blouse to go with that sweater.”

Friday engages you in conversation, subtly inquiring about your preferences, until finally, the smart mirror built into its torso lights up. It scans your body and overlays selected clothing options on your avatar. Impressed?

Now, with a nod of your head, one of Friday’s minions retrieves your chosen attire and some matching accessories, and leads you to the next available dressing room.

Sci-fi has been dreaming (not always positively) about this kind of thing for decades — can you believe that Minority Report came out more than 20 years ago?

But at last the future is here. This is the evolving reality in retail, healthcare, agriculture, and a wide range of other sectors, thanks to rapid maturation of artificial intelligence (AI). From self-driving cars and other autonomous vehicles, to fruit-picking robots and AI medical imaging diagnoses, AI is now truly everywhere.

AI: The Magic Behind the Curtain

As enchanting as the customer experience in our Macy’s scenario seems, the technological orchestration behind it is no less impressive.

When we say “AI” we are probably talking about the seamless integration of so many different technologies:

Text-to-speech (TTS), to convert Friday’s script and product names into spoken audio.
Speech-to-text (STT), to recognize your responses and store them.
Object detection, to recognize apparel and customers.
Natural language processing, to extract meaning from your spoken responses.
Image generation to create sample outfits from prompts.

And of course behind it all, an up-to-date, vectorized database of available store merchandise and customer records.

When I said that AI has been maturing rapidly, I wasn’t kidding. These complex capabilities are now really in reach of every business, including yours, thanks to a blooming landscape of open source technologies and simple SaaS products.

I recently built a demo mimicking this interactive personal shopping experience. It’s not quite Friday, and it’s not in an android body — b u t it is in your browser for you to try out right here.

By the way: as someone who doesn’t code as much as I used to, I was still able to implement this entire stack in less than a day, and most of the time was wrestling with CSS to make rounded corners! The actual TTS, STT, and LLM portions were relatively easy, using the amazing OpenAI APIs.

Of course, in the real world, I probably wouldn’t want to send my corporate data into the bowels of OpenAI, which is why I strongly urge you to check out our open source LocalAI project, which enables you to run the whole lot on-prem.

AI’s True Home Is on the Edge

Today, training and deep learning of models happens at huge expense in clouds and data centers, because it’s so computationally intensive. Many of the core processing and services we talk about above run on cloud services, where they can easily scale and be managed.

But for actual business-critical inferencing work in the real world, retail assistants and other AI workloads probably shouldn’t live in the cloud. They need to live at the edge. In fact, we believe the natural home for most AI workloads will be running at the edge of the network.

Why? Distributed computing puts processing power and compute resources closer to the user and the business process. It sidesteps several key challenges:

Connectivity: AI workloads can involve a huge amount of data. Yet, particularly in rural or industrial use cases, internet connections at edge locations are often intermittent or slow, presenting a major bottleneck. And 5G rollouts won’t fix that any time soon. If you process the data on the device, you don’t need to connect to the internet.
Latency: Even if you have connectivity to the DC or cloud, latency becomes a factor. A few hundred milliseconds might not sound like much, but in a real-time interaction, it’s an eternity. Running AI workloads at the edge enables real-time experiences with almost instantaneous response times.
Cloud costs: Hyperscalers charge to ingest your data, move it between availability zones, and extract it again. Across millions of AI interactions, these costs add up.
Data privacy: Many AI workloads will gather sensitive, regulated data. Do you really want your body measurements and shopping history floating around in the cloud? With edge computing, your personal sensitive data is processed locally right there on the edge server and that’s where it can stay, if compliance demands it.

But Edge Introduces Its Own Challenges…

Anyone who has tried deploying edge computing infrastructure at scale, whether on Kubernetes or another platform, will tell you that it’s hard work.

You’ll be confronted with challenges around deploying and onboarding hardware, on an ongoing basis. How can you get your Friday android booted and active when there’s no IT expert in the store to ninja the CLI?

You have to address security, when devices may be vulnerable to physical tampering. At minimum you need to consider encryption and a means of verifying device trust, at boot and beyond.

And you need to manage hardware and software stacks at scale, from monitoring to patching. This itself is a major challenge when considering the hardware limitations of small form-factor edge devices, and the intermittent connectivity between the device and management infrastructure back at HQ. You need an architecture with zero-risk updates and the ability to run effectively in an air-gap environment.

…and AI Amplifies the Challenges of the Edge

Adding AI into edge environments introduces further layers of complexity. It’s not just infrastructure you need to manage now, but also:

Models: Your data scientists have an overwhelming number of ready-made datasets and models to choose from on popular repositories like Hugging Face. How can you help them quickly try out and deploy these models — and then keep them up to date on a daily or weekly basis?

AI engines: Engines such as Seldon, BentoML and Kserve need constant maintenance, updates, and tuning for optimal performance. Updating these across many locations becomes tedious and error-prone.

AI models process incoming data in real time, turning raw inputs like voice commands or sensor readings into actionable insights or personalized interactions. AI engines such as Seldon, BendoML and Kserve run those AI models. Think of it like this: AI models are workloads, and the AI engine is the runtime in which these models are executed.

Solving the Challenges of Edge for Your AI Workloads

This is the problem space we’ve been attacking as we build Palette EdgeAI, announced today.

Palette EdgeAI helps you deploy and manage the complete stack, from the edge OS and Kubernetes infrastructure to the models powering your innovative AI apps.

Without diving too deep into the feature list, EdgeAI enables you to:

Deploy and manage your edge AI stacks to edge locations at scale, from easy hardware onboarding options to repeatable ‘blueprints’ that include your chosen AI engine.
Update your edge models frequently without risk of downtime, with easy repo integration and safe, version-controlled updates and rollbacks.
Secure critical intellectual property and sensitive data, with a complete security architecture from silicon to app, including immutability, secure boot, SBOM scans and air-gap mode.

The integration of AI and edge computing is not just an intriguing possibility; it’s a necessity for the next leap in technology and user experience. As we stand on this exciting frontier, one thing is clear: the future of shopping, healthcare, business and many other aspects of life will be smarter, faster, and more personalized than ever.

There are edge AI use cases coming for your industry, too. The growth forecasts are stratospheric: The global Edge AI software market is set to grow from $590 million in 2020 to $1.83 trillion by 2026, according to MarketsandMarkets Research.

Ready to be a part of this future? Of course, you are. But the benefits of AI will only be in your grasp if you can tackle the challenges of the edge. That’s where we can help. So why not take a look at Palette EdgeAI and let us know what you think?

The post Edge AI: How to Make the Magic Happen with Kubernetes appeared first on The New Stack.

WebAssembly Reaches a Cloud Native Milestone

B. Cameron Gain — Mon, 11 Sep 2023 14:50:44 +0000

The CNCF WebAssembly Landscape Report published last week offered an overview of the status of WebAssembly (Wasm) as a technology and its adoption at this time. As WebAssembly’s growth and adoption continue, the report provides a good summary of the WebAssembly players, tools, usage and how it works, as well as its overlap with cloud native environments.

The report also underscores an unofficial turning point or milestone for WebAssembly, as measured in its adoption alone as the initial Wasm landscape revealed in the report has rapidly exploded from its use in the web browser to now represent 11 categories and 120 projects or products, worth an estimated $59.4 billion.

It will be a long road before WebAssembly sees its full potential. But in theory, Wasm is designed as a way to deploy code in a secured sandbox anywhere, on any device running a CPU instruction set in any language, simultaneously through a single module. The technology is not there yet, of course, but a number of developments were discussed and demonstrated at WasmCon 2023 last week — which represents an additional milestone as the first Linux Foundation Wasm stand-alone event beyond the umbrella of KubeCon + CloudNativeCon.

@CloudNativeFdn CTO @cra says #WebAssembly's status reminds him of the early days of @kubernetesio. i.e. The @CloudNativeFdn's #Wasm Landscape report cites 11 categories and 120 projects or products, representing $59.4B. @linuxfoundation https://t.co/HEGNv1uZgU pic.twitter.com/ewjfE9zQk9

— BC Gain (@bcamerongain) September 7, 2023

In many ways, the Wasm landscape is similar to the early days of Kubernetes’ then burgeoning development and adoption a few years ago. While discussing the report and WebAssembly’s status in the cloud native landscape and in general during a WasmCon keynote, CNCF CTO Chris Anisczcyk said he sees Wasm in the early cloud native and container days.

“Remember back in the day there was a lot of innovation happening in container and cloud native space: there were multiple runtimes, multiple specs, everyone kind of fighting for mindshare,” Anisczcyk said. “I feel like something similar is happening in the Wasm state and that’s kind of where we currently are…A lot of the adoption and innovation are happening among the early adopters and will naturally progress.”

While Anisczcyk insisted that Wasm is still in its early stages of development and “a lot of the early stuff is still brewing,” he noted how the CNCF has been an early adopter of the technology. “A lot of our projects have used WebAssembly.”

Indeed, Wasm is expected to play a large role as an ultralight way to deploy sandboxed applications to endpoints in cloud native environments. Wasm, of course, has its niche usages, beyond the container sphere, of course as well. “WebAssembly complements and piggybacks on the existing Kubernetes ecosystem, opening up many new opportunities,” Daniel Lopez Ridruejo, founder and former CEO of Bitnami (now part of VMware), told The New Stack. “WebAssembly can run on microcontrollers and IoT devices in a way that Kubernetes never could, as there are many devices where you cannot even use a container. So, the momentum is building with many different industry players coming together to build a platform for it.”

Among application frameworks alone, the CNCF covers Spin, WasmCloud (CNCF sandbox), SpiderLightning, WasmEdge plug-ins, Dapr SDK for WasmEdge, Homestar, Ambient, WASIX, Extism, Timecraft, vscode-wasm, and WasmEx.

The CNCF’s coverage now extends to many more areas, for runtimes, plugins and other uses with and for AI, edge devices, web and mobile deployments and a number of other applications:

The State of Wasm 2023 report was also released at WasmCon. The survey of 255 WebAssembly users was conducted by SlashData in collaboration with the CNCF. Key findings included:

While Wasm is still primarily used to develop web applications (58%), its use is expanding beyond this original use case into new areas like data visualization (35%), Internet of Things (32%), artificial intelligence (30%), backend services (excluding serverless) (27%), and edge computing (25%).
The most significant benefits attracting developers to Wasm are faster loading times (23%), opportunities to explore new use cases and technologies (22%), and sharing code between projects (20%).
The top challenges faced by Wasm users were difficulties with debugging and troubleshooting (19%), as well as different performance and a lack of consistent developer experience between runtimes (both at 15%). At the same time, 17% of respondents did not face any challenges.

The post WebAssembly Reaches a Cloud Native Milestone appeared first on The New Stack.

Kubernetes Building Blocks: Nodes, Pods, Clusters

Robert Kimani — Mon, 11 Sep 2023 13:00:51 +0000

Understanding the distinctions between nodes, pods, and clusters is crucial for effectively working with Kubernetes. It enables efficient utilization of Kubernetes capabilities and empowers organizations to leverage its benefits for managing containerized applications.

By comprehending the roles and relationships of these components, developers and operators can make informed decisions when designing, deploying, and managing applications on Kubernetes.

These three components are fundamental to the architecture of Kubernetes and play different roles in managing containerized applications.

We will delve into the specific characteristics and purposes of each component

Kubernetes Nodes. This section will explain what nodes are and their significance within the Kubernetes ecosystem. It will cover their definition, their role as the underlying infrastructure for running containers, and the key hardware and OS requirements.
Kubernetes Pods. This section will focus on pods, which are the basic units of deployment in Kubernetes. It will clarify the significance of pods, emphasizing their encapsulation of one or more containers and the benefits of grouping containers within a pod. We will also cover topics such as pod lifecycle management, scaling, and communication and networking within a pod.
Kubernetes Clusters. The cluster is the highest-level component in Kubernetes. This section will highlight the definition and importance of clusters, explaining their composition, which includes nodes, the control plane, and etcd. It will discuss the role of clusters in achieving high availability and fault tolerance, as well as their ability to scale and distribute workloads effectively through load balancing.

Understanding Core Components in Kubernetes Architecture

Kubernetes architecture consists of several core components that work together to enable the deployment, scaling, and management of containerized applications. Understanding these core components is crucial for effectively working with Kubernetes. Here are the key components in Kubernetes architecture

Control Plane

API Server. Serves as the central management point and exposes the Kubernetes API. All interactions with the cluster are made through the API server.
Responsible for assigning pods to nodes based on resource requirements, constraints, and policies.
Controller Manager. Manages various controllers that handle tasks such as node and pod lifecycle, replication, and monitoring.
Distributed key-value store that stores the cluster’s configuration and state information, ensuring consistency and high availability.

Nodes

Node (also known as a worker or minion). A physical or virtual machine that runs containers and forms the underlying infrastructure of the cluster.
The primary agent running on each node, responsible for communication between the control plane and the node. It manages containers, ensures they are running as expected, and reports their status to the control plane.
Container Runtime. The software responsible for running containers, such as Docker, containerd or CRI-O on each node.

Pods

The smallest deployable unit in Kubernetes. It represents one or more containers that are scheduled and run together on the same node. Containers within a pod share the same network namespace and can communicate with each other using localhost.
Shared Resources. Pods share certain resources, such as IP address and storage volumes, making it easier for containers within a pod to interact and share data.

Networking

An abstraction that defines a logical set of pods and a policy for accessing them. Services provide a stable network endpoint for connecting to the pods, even as they may be dynamically created or terminated.
Manages incoming network traffic and routes it to services within the cluster based on specified rules. It acts as a reverse proxy and load balancer for external access to services.
CNI (Container Networking Interface). A specification that defines how networking is configured for containers. Various CNI plugins are available to implement networking solutions in Kubernetes.

Volumes

An abstraction that provides a way to store data in a pod. Volumes can be connected to one or more containers within a pod, allowing data to persist even when containers are terminated or rescheduled.

Understanding the roles and interactions of these core components is essential for effectively deploying, managing, and scaling applications on Kubernetes. It provides a foundation for harnessing the full power and capabilities of the Kubernetes platform.

Kubernetes Nodes

Nodes possess specific characteristics that determine their suitability for hosting containers within a Kubernetes cluster. These characteristics include:

Computing Resources. Nodes are required to have sufficient CPU and memory resources to accommodate the containers running on them. The available resources on a node contribute to the overall capacity of the cluster.
Nodes need to have storage capabilities for persisting data and managing volumes used by containers. This can include local disk storage, network-attached storage (NAS), or cloud-based storage solutions.
Nodes should be equipped with network connectivity to allow communication between containers running on different nodes, as well as with external networks and services. Networking capabilities enable containers within the cluster to interact and facilitate seamless service discovery and communication.
Compatible Operating Systems. Kubernetes supports multiple operating systems, including Linux, Windows, and others. Nodes must have a compatible operating system to ensure compatibility with the container runtime and other Kubernetes components.

Node’s Role in Hosting and Executing Pods

Nodes provide the execution environment for pods within a Kubernetes cluster. Pods are scheduled onto nodes based on resource requirements, constraints, and other factors determined by the cluster’s scheduler. When a pod is scheduled to a node, the node allocates the necessary resources to accommodate the pod’s containers.

Nodes manage the lifecycle of pods hosted on them, ensuring that containers within the pods are running as expected. The Kubernetes control plane communicates with the nodes’ kubelets, which are agents running on each node, to monitor the health and status of pods and containers.

If a node fails or becomes unavailable, the control plane reschedules the affected pods onto other available nodes, ensuring high availability and fault tolerance.

In essence, nodes serve as the foundation that supports the execution and operation of pods in Kubernetes.

Kubernetes Pods

The significance of pods lies in their role as the atomic unit for scheduling and scaling in Kubernetes. Instead of scheduling individual containers, Kubernetes schedules and manages pods. Pods provide a higher level of abstraction, enabling easier management, scaling, and coordination of containers within the cluster.

Encapsulation of one or more Containers within a Pod

A pod encapsulates one or more containers and provides a shared execution environment for them. Containers within a pod are co-located and share the same network and storage namespaces. They can communicate with each other using localhost, making it simple for containers within a pod to interact and coordinate their activities.

The encapsulation of containers within a pod allows them to share resources, such as CPU and memory, and simplifies the management and deployment of related containers. Containers within a pod can also mount shared volumes, enabling them to access and share persistent data.

Lifecycle Management and Scaling of Pods

Pods have their own lifecycle within the Kubernetes cluster. The Kubernetes control plane is responsible for managing the creation, termination, and updates of pods based on the desired state defined in the deployment configurations.

Pods can be created, deleted, or updated using declarative configuration files. Kubernetes ensures that the desired number of replicas of a pod is maintained based on the specified configurations. If scaling is required, Kubernetes can horizontally scale the pods by replicating them across multiple nodes.

Communication and Networking within a Pod

Containers within a pod share the same network namespace, allowing them to communicate with each other using localhost. They can use standard inter-process communication mechanisms, such as TCP/IP or Unix sockets, to exchange data.

Each pod is assigned a unique IP address within the cluster, known as the pod IP address. Containers within the pod can communicate with each other using this shared IP address. Additionally, containers within a pod share the same port space, meaning they can communicate over common ports without conflict.

This communication and networking model within a pod enables containers to collaborate and work together as a cohesive unit, making it easier to build and manage complex, multi-container applications within the Kubernetes ecosystem.

Kubernetes Cluster

In Kubernetes, a cluster refers to a group of nodes that work together as a single unit to run containerized applications. It is a fundamental concept in Kubernetes architecture, providing the foundation for managing and orchestrating applications at scale.

The importance of a cluster in Kubernetes lies in its ability to provide high availability, fault tolerance, and load balancing for applications. By distributing workloads across multiple nodes, a cluster ensures that applications remain accessible and responsive even if individual nodes or components fail. Clusters enable organizations to build resilient and scalable environments for running containerized applications, accommodating varying levels of demand and traffic.

Kubernetes Cluster Key Components

Nodes. Nodes form the worker machines within the cluster. They host and execute pods, which encapsulate containers. Nodes provide the necessary computing resources, storage, and networking capabilities for running containers. They are the primary infrastructure on which the cluster operates.
Control Plane. The control plane is responsible for managing and controlling the cluster.
Etcd. A distributed and consistent key-value store that serves as the cluster’s database. It stores critical cluster information, such as configuration, state, and metadata. etcd is highly reliable and resilient, ensuring that the cluster can maintain consistency and recover from failures.

High Availability and Fault Tolerance Considerations in Cluster Design

When designing a Kubernetes cluster, ensuring high availability and fault tolerance is crucial. Some considerations include.

Replicating Control Plane Components

To ensure the availability of the control plane, key components such as the API server, scheduler, and controller manager are often replicated across multiple nodes. Replication provides redundancy and fault tolerance, allowing the cluster to continue operating even if some control plane components become unavailable.

Distributing Pods Across Multiple Nodes

Kubernetes schedules and distributes pods across multiple nodes to avoid a single point of failure. By spreading pods across different nodes, the cluster can tolerate node failures without disrupting the availability of the applications.

Scaling Nodes and Pods

Kubernetes enables scaling of nodes and pods to handle increased workloads. Nodes can be added or removed dynamically to accommodate resource demands. Pods can also be scaled horizontally by replicating them across multiple nodes, allowing applications to handle higher traffic and workloads.

Load Balancing Traffic

Kubernetes provides built-in load-balancing mechanisms to distribute traffic across nodes in a cluster. Load balancers can be configured to evenly distribute incoming requests to multiple instances of an application, ensuring optimal utilization of resources and improved application performance.

These scaling and load-balancing capabilities help Kubernetes clusters handle varying workloads efficiently and ensure that applications remain available and responsive as demand fluctuates.

The post Kubernetes Building Blocks: Nodes, Pods, Clusters appeared first on The New Stack.

Can ChatGPT Save Collective Kubernetes Troubleshooting?

Blair Rampling — Fri, 08 Sep 2023 14:54:27 +0000

Decades ago, sysadmins started flooding the internet with questions about the technical problems they faced daily. They had long, vibrant and valuable discussions about how to investigate and troubleshoot their way to understanding the root cause of the problem; then they detailed the solution that ultimately worked for them.

This flood has never stopped, only changed the direction of its flow. Today, these same discussions still happen on Stack Overflow, Reddit and postmortems on corporate engineering blogs. Each one is a valuable contribution to the global anthology of IT system troubleshooting.

Kubernetes has profoundly altered the flow as well. The microservice architecture is far more complex than the virtual machines (VMs) and monolithic applications that have troubled sysadmin and IT folks for decades. Local reproductions of K8s-scale bugs are often impossible to operate. Observability data gets fragmented across multiple platforms, if captured at all, due to Kubernetes’ lack of data persistence. Mapping the interconnectedness of dozens or hundreds of services, resources and dependencies is an effort in futility.

Now your intuition, driven by experience, isn’t necessarily enough. You need to know how to debug the cluster for clues as to your next step.

This complexity means that public troubleshooting discussions are more important now than ever, but now we’re starting to see this valuable flood not get redirected, but dammed up entirely. You’ve seen this in Google. Any search for a Kubernetes-related issue brings you a half-dozen paid ads and at least a page of SEO-driven articles that lack technical depth. Stack Overflow is losing its dominance as the go-to Q&A resource for technical folks, and Reddit’s last few years have been mired in controversy.

Now, every DevOps platform for Kubernetes is building one last levee: Centralize your troubleshooting knowledge within their platform, and replace it with AI and machine learning (ML) until the entire stack becomes a black box to even your most experienced cloud native engineers. When this happens, you lose the skills for individually probing, troubleshooting and fixing your system. This trend turns what used to be a flood of crowdsourced troubleshooting know-how into a mere trickle compared to what was available in the past.

When we become dependent on platforms, the collective wisdom of troubleshooting techniques disappears.

The Flood Path of Troubleshooting Wisdom

In the beginning, sysadmins relied on genuine books for technical documentation and holistic best practices to implement in their organizations. As the internet proliferated in the ‘80s and ‘90s, these folks generally adopted Usenet to chat with peers and ask technical questions about their work in newsgroups like comp.lang.*, which operated like stripped-down versions of the forums we know today.

The general availability of the World Wide Web quickly and almost completely diverted the flood of troubleshooting wisdom. Instead of newsgroups, engineers and administrators flocked to thousands of forums, including Experts Exchange, which went live in 1996. After amassing a repository of questions and answers, the team behind Experts Exchange put all answers behind a $250-a-year paywall, which isolated countless valuable discussions from public consumption and ultimately led to the site’s sinking relevance.

Stack Overflow came next, opening up these discussions to the public again and gamifying discussions through reputation points, which could be earned by providing insights and solutions. Other users then vote for and validate the “best” solution, which helps follow-on searchers find an answer quickly. The gamification, self-moderation and community around Stack Overflow made it the singular channel where the flood of troubleshooting know-how flowed.

But, like all the other eras, nothing good can last forever. Folks have been predicting the “decline of Stack Overflow” for nearly 10 years, citing that it “hates new users” due to its combative nature and structure of administration by whoever has the most reputation points. While Stack Overflow has certainly declined in relevance and popularity, with Reddit’s development/engineering-focused subreddits filling the void, it remains the largest repository of publicly accessible troubleshooting knowledge.

Particularly so for Kubernetes and the cloud native community, which is still experiencing major growing pains. And that’s an invaluable resource, because if you think Kubernetes is complex now …

The Kubernetes Complexity Problem

In a fantastic article about the downfall of “intuitive debugging,” software delivery consultant Pete Hodgson argues that the modern architectures for building and delivering software, like Kubernetes and microservices, are far more complex than ever. “The days of naming servers after Greek gods and sshing into a box to run tail and top are long gone for most of us,” he writes, but “this shift has come at a cost … traditional approaches to understanding and troubleshooting production environments simply will not work in this new world.”

Cynefin model. Source: Wikipedia

Hodgson uses the Cynefin model to illustrate how software architecture used to be complicated, in that given enough experience, one could understand the cause-and-effect relationship between troubleshooting and resolution.

He argues that distributed microservice architectures are instead complex, in that even experienced folks only have a “limited intuition” as to the root cause and how to troubleshoot it. Instead of driving straight toward results, they must spend more time asking and answering questions with observability data to eventually hypothesize what might be going wrong.

If we agree with Hodgson’s premise — that Kubernetes is inherently complex and requires much more time spent analyzing the issue before responding — then it seems imperative that engineers working with Kubernetes learn which questions are most imperative to ask, then answer with observability data, to make the optimal next move.

That’s exactly the type of wisdom disappearing into this coming generation of AI-driven troubleshooting platforms.

Two Paths for AI in Kubernetes Troubleshooting

For years, companies like OpenAI have been scraping and training their models based on public data published on Stack Overflow, Reddit and others, which means these AI models have access to lots of systems and applications knowledge including Kubernetes. Others recognize that an organization’s observability data is a valuable resource for training AI/ML models for analyzing new scenarios.

They’re both asking the same question: How can we leverage this existing data about Kubernetes to simplify the process of searching for the best solution to an incident or outage? The products they’re building take very different paths.

First: Augment the Operator’s Analysis Efforts

These tools automate and streamline access to that existing flood of troubleshooting knowledge published publicly online. They don’t replace the human intuition and creativity that’s required to do proper troubleshooting or root-cause analysis (RCA), but rather thoughtfully automate how an operator finds relevant information.

For example, if a developer new to Kubernetes struggles with deploying their application because they see a CrashLoopBackOff status when running kubectl get pods, they can query an AI-powered tool to provide recommendations, like running kubectl describe $POD or kubectl logs $POD. Those steps might in turn lead the developer to investigate the relevant deployment with kubectl describe $DEPLOYMENT.

At Botkube, we found ourselves invested in this concept of using AI, trained on the flood of troubleshooting wisdom, to automate this back-and-forth querying process. Users should be able to ask questions directly in Slack, like “How do I troubleshoot this nonfunctional service?” and receive a response penned by ChatGPT. During a companywide hackathon, we followed through, building a new plugin for our collaborative troubleshooting platform designed around this concept.

With Doctor, you can tap into the flood of troubleshooting know-how, with Botkube as the bridge between your Kubernetes cluster and your messaging/collaboration platform without trawling through Stack Overflow or Google search ads, which is particularly useful for newer Kubernetes developers and operators.

The plugin also takes automation a step further by generating a Slack message with a Get Help button for any error or anomaly, which then queries ChatGPT for actionable solutions and next steps. You can even pipe the results from the Doctor plugin into other actions or integrations to streamline how you actively use the existing breadth of Kubernetes troubleshooting knowledge to debug more intuitively and sense the problem faster.

Second: Remove the Operator from Troubleshooting

These tools don’t care about the flood of public knowledge. If they can train generalist AI/ML models based on real observability data, then fine-tune based on your particular architecture, they can seek to cut out the human operator in RCA and remediation entirely.

Causely is one such startup, and they’re not shying away from their vision of using AI to “eliminate human troubleshooting.” The platform hooks up to your existing observability data and processes them to fine-tune causality models, which theoretically take you straight to remediation steps — no probing or kubectl-ing required.

I’d be lying if I said a Kubernetes genie doesn’t sound tempting on occasion, but I’m not worried about a tool like Causely taking away operations jobs. I’m worried about what happens to our valuable flood of troubleshooting knowledge in a Causely-led future.

The Gap Between These Paths: The Data

I’m not priming a rant about how “AI will replace all DevOps jobs.” We’ve all read too many of these doomsday scenarios for every niche and industry. I’m far more interested in the gap between these two paths: What data is used for training and answering questions or presenting results?

The first path generally uses existing public data. Despite concerns around AI companies crawling these sites for training data — looking at you, Reddit and Twitter — the openness of this data still provides an incentive loop to keep developers and engineers contributing to the continued flood of knowledge on Reddit, Stack Overflow and beyond.

The cloud native community is also generally amenable to an open source-esque sharing of technical knowledge and the idea that a rising tide (of Kubernetes troubleshooting tips) lifts all boats (of stressed-out Kubernetes engineers).

The second path looks bleaker. With the rise of AI-driven DevOps platforms, more troubleshooting knowledge gets locked inside these dashboards and the proprietary AI models that power them. We all agree that Kubernetes infrastructure will continue to get more complex, not less, which means that over time, we’ll understand even less about what’s happening between our nodes, pods and containers.

When we stop helping each other analyze a problem and sense a solution, we become dependent on platforms. That feels like a losing path for everyone but the platforms.

How Can We Not Lose (or Lose Less)?

The best thing we can do is continue to publish amazing content online about our troubleshooting endeavors in Kubernetes and beyond, like “A Visual Guide on Troubleshooting Kubernetes Deployments”; create apps that educate through gamification, like SadServers; take our favorite first steps when troubleshooting a system, like “Why I Usually Run ‘w’ First When Troubleshooting Unknown Machines”; and conduct postmortems that detail the stressful story of probing, sensing and responding to potentially disastrous situations, like the July 2023 Tarsnap outage.

We can go beyond technical solutions, too, like talking about how we can manage and support our peers through stressful troubleshooting scenarios, or building organizationwide agreement on what observability is.

Despite their current headwinds, Stack Overflow and Reddit will continue to be reliable outlets for discussing troubleshooting and seeking answers. If they end up in the same breath as Usenet and Experts Exchange, they’ll likely be replaced by other publicly available alternatives.

Regardless of when and how that happens, I hope you’ll join us at Botkube, and the new Doctor plugin, to build new channels for collaboratively troubleshooting complex issues in Kubernetes.

It doesn’t matter if AI-powered DevOps platforms continue to train new models based on scraped public data about Kubernetes. As long as we don’t willingly and wholesale deposit our curiosity, adventure and knack for problem-solving into these black boxes, there will always be a new path to keep the invaluable flood of troubleshooting know-how flowing.

The post Can ChatGPT Save Collective Kubernetes Troubleshooting? appeared first on The New Stack.

Achieve Cloud Native without Kubernetes

Rak Siva — Thu, 07 Sep 2023 15:57:34 +0000

This is the second of a two-part series. Read part one here.

At its core, cloud native is about leveraging the benefits of the cloud computing model to its fullest. This means building and running applications that take advantage of cloud-based infrastructures. The foundational principles that consistently rise to the forefront are:

Scalability — Dynamically adjust resources based on demand.
Resiliency — Design systems with failure in mind to ensure high availability.
Flexibility — Decouple services and make them interoperable.
Portability — Ensure applications can run on any cloud provider or even on premises.

In Part 1 we highlighted the learning curve and situations where directly using Kubernetes might not be the best fit. This part zeros in on constructing scalable cloud native applications using managed services.

Managed Services: Your Elevator to the Cloud

Reaching the cloud might feel like constructing a ladder piece by piece using tools like Kubernetes. But what if we could simply press a button and ride smoothly upward? That’s where managed services come into play, acting as our elevator to the cloud. While it might not be obvious without deep diving into specific offerings, managed services often use Kubernetes behind the scenes to build scalable platforms for your applications.

There’s a clear connection between control and complexity when it comes to infrastructure (and software in general). We can begin to tear down the complexity by delegating some of the control to managed services from cloud providers like AWS, Azure or Google Cloud.

Managed services empower developers to concentrate on applications, relegating the concerns of infrastructure, scaling and server management to the capable hands of the cloud provider. The essence of this approach is crystallized in its core advantages: eliminating server management and letting the cloud provider handle dynamic scaling.

Think of managed services as an extension of your IT department, bearing the responsibility of ensuring infrastructure health, stability and scalability.

Choosing Your Provider

When designing a cloud native application, the primary focus should be on architectural principles, patterns and practices that enable flexibility, resilience and scalability. Instead of immediately selecting a specific cloud provider, it’s much more valuable for teams to start development without the blocker of this decision-making.

Luckily, the competitive nature of the cloud has driven cloud providers toward feature parity. Basically, they have established foundational building blocks which have taken inspiration greatly from each other and ultimately offer the same or extremely similar functionality and value to end users.

This paves the way for abstraction layers and frameworks like Nitric, which can be used to take advantage of these similarities to deliver cloud development patterns for application developers with greater flexibility. The true value here is the ability to make decisions about technologies like cloud providers on the timeline of the engineering team, not upfront as a blocker to starting development.

Resources that Scale by Default

The resource choices for apps set the trajectory for their growth; they shape the foundation upon which an application is built, influencing its scalability, security, flexibility and overall efficiency. Let’s categorize and examine some of the essential components that contribute to crafting a robust and secure application.

Execution, Processing and Interaction

Handlers: Serve as entry points for executing code or processing events. They define the logic and actions performed when specific events or triggers occur.
API gateway: Acts as a single entry point for managing and routing requests to various services. It provides features like rate limiting, authentication, logging and caching, offering a unified interface to your backend services or microservices.
Schedules: Enable tasks or actions to be executed at predetermined times or intervals. Essential for automating repetitive or time-based workloads such as data backups or batch processing.

Communication and Event Management

Events: Central to event-driven architectures, these represent occurrences or changes that can initiate actions or workflows. They facilitate asynchronous communication between systems or components.
Queues: Offer reliable message-based communication between components, enhancing fault tolerance, scalability and decoupled, asynchronous communication.

Data Management and Storage

Collections: Data structures, such as arrays, lists or sets that store and organize related data elements. They underpin many application scenarios by facilitating efficient data storage, retrieval and manipulation.
Buckets: Containers in object storage systems like Amazon S3 or Google Cloud Storage. They provide scalable and reliable storage for diverse unstructured data types, from media files to documents.

Security and Confidentiality

Secrets: Concerned with securely storing sensitive data like API keys or passwords. Using centralized secret management systems ensures these critical pieces of information are protected and accessible only to those who need them.

Automating Deployments

Traditional cloud providers have offered services for CI/CD but often fall short of delivering a truly seamless experience. Services like AWS CodePipeline or Azure DevOps require intricate setup and maintenance.

Why is this a problem?

Time-consuming: Setting up and managing these pipelines takes away valuable developer time that could be better spent on feature development.
Complexity: Each cloud provider’s CI/CD solution might have its quirks and learning curves, making it harder for teams to switch or maintain multicloud strategies.
Error-prone: Manual steps or misconfigurations can lead to deployment failures or worse, downtime.

You might notice a few similarities here with some of the challenges of adopting K8s, albeit at a smaller scale. However, there are options that simplify the deployment process significantly, such as using an automated deployment engine.

Example: Simplified Process

This is the approach Nitric takes to streamline the deployment process:

The developer pushes code to the repository.
Nitric’s engine detects the change, builds the necessary infrastructure specification and determines the minimal permissions, policies and resources required.
The entire infrastructure needed for the app is automatically provisioned, without the developer explicitly defining it and without the need for a standalone Infrastructure as Code (IaC) project.

Basically, the deployment engine intelligently deduces and sets up the required infrastructure for the application, ensuring roles and policies are configured for maximum security with minimal privileges.

This streamlined process relieves application and operations teams from activities like:

The need to containerize images.
Crafting, troubleshooting and sustaining IaC tools like Terraform.
Managing discrepancies between application needs and existing infrastructure.
Initiating temporary servers for prototypes or test phases.

Summary

Using managed services streamlines the complexities associated with infrastructure, allowing organizations to zero in on their primary goals: application development and business expansion. Managed services, serving as an integrated arm of the IT department, facilitate a smoother and more confident transition to the cloud than working directly with K8s. They’re a great choice for cloud native development to reinforce digital growth and stability.

With tools like Nitric streamlining the deployment processes and offering flexibility across different cloud providers, the move toward a cloud native environment without Kubernetes seems not only feasible but also compelling. If you’re on a journey to build a cloud native application or a platform for multiple applications, we’d love to hear from you.

Read Part 1 of this series: “Kubernetes Isn’t Always the Right Choice.”

The post Achieve Cloud Native without Kubernetes appeared first on The New Stack.

Streamline Platform Engineering with Kubernetes

Robert Kimani — Wed, 06 Sep 2023 15:39:26 +0000

Platform engineering plays a pivotal role in the modern landscape of application development and deployment. As software applications have evolved to become more complex and distributed, the need for a robust and scalable infrastructure has become paramount. This is where platform engineering steps in, acting as the backbone that supports the entire software development lifecycle. Let’s delve deeper into the essential role of platform engineering in creating and maintaining the infrastructure for applications.

Understanding Platform Engineering

At its core, platform engineering involves creating an environment that empowers developers to focus on building applications without the burden of managing underlying infrastructure intricacies. Platform engineers architect, build, and maintain the infrastructure and tools necessary to ensure that applications run smoothly and efficiently, regardless of the complexities they might encompass.

In the dynamic world of application development, platform engineers face multifaceted challenges. One of the most prominent challenges is managing diverse applications and services that vary in requirements, technologies, and operational demands. As applications span across cloud environments, on-premises setups, and hybrid configurations, platform engineers are tasked with creating a unified, consistent, and reliable infrastructure.

Managing this diverse landscape efficiently is crucial to ensuring applications’ reliability and availability. In the absence of streamlined management, inefficiencies arise, leading to resource wastage, operational bottlenecks, and decreased agility. This is where Kubernetes comes into the spotlight as a transformative solution for platform engineering.

Enter Kubernetes: A Powerful Solution

Kubernetes, a container orchestration platform, has emerged as a game-changer in the field of platform engineering. With its ability to automate deployment, scaling, and management of containerized applications, Kubernetes addresses the very challenges that platform engineers grapple with. By providing a unified platform to manage applications regardless of their underlying infrastructure, Kubernetes aligns seamlessly with the goals of platform engineering.

Kubernetes takes the burden off platform engineers by allowing them to define application deployment, scaling, and management processes in a declarative manner. This eliminates manual interventions and streamlines repetitive tasks, enabling platform engineers to focus on higher-level strategies and optimizations.

Furthermore, Kubernetes promotes collaboration between different teams, including developers and operations, by providing a common language for application deployment and infrastructure management. This fosters a DevOps culture, where the lines between development and operations blur, and teams work collaboratively to achieve shared goals.

From here, we will delve deeper into the specifics of Kubernetes orchestration and how it revolutionizes platform engineering practices. From managing multi-tenancy to automating infrastructure, from ensuring security to optimizing scalability, Kubernetes offers a comprehensive toolkit that addresses the intricate needs of platform engineers. Join us on this journey as we explore how Kubernetes empowers platform engineering to streamline deployment and management, ultimately leading to more efficient and reliable software ecosystems.

Challenges of Managing Diverse Applications: A Platform Engineer’s Dilemma

The role of a platform engineer is akin to being the architect of a bustling metropolis, responsible for designing and maintaining the infrastructure that supports a myriad of applications and services. However, in today’s technology landscape, this task has become increasingly intricate and challenging. Platform engineers grapple with a range of difficulties as they strive to manage diverse applications and services across complex and dynamic environments.

In the ever-expanding digital realm, applications exhibit a stunning diversity in terms of their technologies, frameworks, and dependencies. From microservices to monoliths, from stateless to stateful, each application type presents its own set of demands. Platform engineers are tasked with creating an environment that caters to this diversity seamlessly, ensuring that every application can function optimally without interfering with others.

Modern applications are no longer confined to a single server or data center. They span across hybrid cloud setups, utilize various cloud providers, and often incorporate on-premises resources. This heterogeneity of infrastructure introduces challenges in terms of resource allocation, data consistency, and maintaining a coherent operational strategy. Platform engineers must find ways to harmonize these diverse elements into a unified and efficient ecosystem.

Applications’ resource requirements are seldom static. They surge and recede based on user demand, seasonal patterns, or promotional campaigns. Platform engineers must design an infrastructure that can dynamically scale resources up or down to match these fluctuations. This requires not only technical acumen but also predictive analytics to foresee resource needs accurately.

In today’s always-on digital landscape, downtime is not an option. Platform engineers are tasked with ensuring high availability and fault tolerance for applications, which often involves setting up redundant systems, implementing failover strategies, and orchestrating seamless transitions in case of failures. This becomes even more complex when applications are spread across multiple regions or cloud providers.

Applications and services need continuous updates to stay secure, leverage new features, and remain compatible with evolving technologies. However, updating applications without causing downtime or compatibility issues is a challenge. Platform engineers need to orchestrate updates carefully, often requiring extensive testing and planning to ensure a smooth transition.

In an era of heightened cybersecurity threats and stringent data regulations, platform engineers must prioritize security and compliance. They need to implement robust security measures, control access to sensitive data, and ensure that applications adhere to industry-specific regulations. Balancing security with usability and performance is a constant tightrope walk.

In an environment with diverse applications and services, achieving standardization can be elusive. Different development teams might have varying deployment practices, configurations, and toolsets. Platform engineers need to strike a balance between accommodating these unique requirements and establishing standardized processes that ensure consistency and manageability.

Kubernetes: A Paradigm Shift in Platform Engineering

As platform engineers grapple with the intricate landscape of managing diverse applications and services across complex environments, a beacon of transformation has emerged: Kubernetes. This open source container orchestration platform has swiftly risen to prominence as a powerful solution that directly addresses the challenges faced by platform engineers.

The diversity of applications, each with its own unique requirements and dependencies, can create an operational labyrinth for platform engineers. Kubernetes steps in as a unifying force, providing a standardized platform for deploying, managing, and scaling applications, irrespective of their underlying intricacies. By encapsulating applications in containers, Kubernetes abstracts away the specifics, enabling platform engineers to treat every application consistently.

Kubernetes doesn’t shy away from the complexities of modern infrastructure. Whether applications span hybrid cloud setups, multiple cloud providers, or on-premises data centers, Kubernetes offers a common language for orchestrating across these diverse terrains. It promotes the notion of “write once, deploy anywhere,” allowing platform engineers to leverage the same configuration across various environments seamlessly.

The challenge of resource allocation and scaling based on fluctuating user demands finds an elegant solution in Kubernetes. With its automated scaling mechanisms, such as Horizontal Pod Autoscaling, platform engineers are empowered to design systems that can dynamically expand or contract resources based on real-time metrics. This elasticity ensures optimal performance without the need for manual intervention.

Kubernetes embodies the principles of high availability and fault tolerance, critical aspects of platform engineering. By automating load balancing, health checks, and failover mechanisms, Kubernetes creates an environment where applications can gracefully navigate failures and disruptions. Platform engineers can architect systems that maintain continuous service even in the face of unforeseen challenges.

The daunting task of updating applications while minimizing downtime and compatibility hiccups finds a streamlined approach in Kubernetes. With features like rolling updates and canary deployments, platform engineers can orchestrate updates that are seamless, incremental, and reversible. This not only enhances the reliability of the deployment process but also boosts the confidence of developers and operations teams.

Security and Compliance at the Core

Security is paramount in platform engineering, and Kubernetes doesn’t fall short in this domain. By enforcing Role-Based Access Control (RBAC), Network Policies, and Secrets Management, Kubernetes empowers platform engineers to establish robust security practices. Compliance requirements are also met through controlled access and encapsulation of sensitive data.

Kubernetes bridges the gap between accommodating unique application requirements and establishing standard practices. It provides a foundation for creating reusable components through Helm charts and Operators, promoting a cohesive approach while allowing for flexibility. This journey towards standardization enhances manageability, reduces human error, and boosts collaboration across teams.

In the realm of platform engineering, the concept of multitenancy stands as a critical pillar. As organizations host multiple teams or projects within a shared infrastructure, the challenge lies in ensuring resource isolation, security, and efficient management. Kubernetes, with its robust feature set, provides an effective solution to tackle the intricacies of multitenancy.

Understanding Multitenancy

Multitenancy refers to the practice of hosting multiple isolated instances, or “tenants,” within a single infrastructure. These tenants can be teams, departments, or projects, each requiring their own isolated environment to prevent interference and maintain security.

Kubernetes introduces the concept of Namespaces to address the requirements of multitenancy. A Namespace is a logical partition within a cluster that allows for resource isolation, naming uniqueness, and access control. Platform engineers can leverage Namespaces to create segregated environments for different teams or projects, ensuring that resources are isolated and managed independently.

Here are some advantages of Namespaces:

Resource Isolation: Namespaces provide an isolated space where resources such as pods, services, and configurations are contained. This isolation prevents conflicts and resource contention between different teams or projects.
Security and Access Control: Namespaces allow platform engineers to set Role-Based Access Control (RBAC) rules specific to each Namespace. This ensures that team members can only access and manipulate resources within their designated Namespace.
Naming Scope: Namespaces ensure naming uniqueness across different teams or projects. Resources within a Namespace are identified by their names, and Namespaces provide a clear context for these names, avoiding naming clashes.
Logical Partitioning: Platform engineers can logically partition applications within the same cluster, even if they belong to different teams or projects. This makes it easier to manage a diverse application landscape within a shared infrastructure.

Challenges of Resource Allocation and Isolation

While Kubernetes Namespaces offer a solid foundation for multitenancy, challenges related to resource allocation and isolation persist:

Resource Allocation: In a multitenant environment, resource allocation becomes a balancing act. Platform engineers need to ensure that each Namespace receives adequate resources while preventing resource hogging that could impact other Namespaces.
Resource Quotas: Kubernetes enables setting resource quotas at the Namespace level, which can be complex to fine-tune. Striking the right balance between restricting resource usage and allowing flexibility is crucial.
Isolation Assurance: Ensuring complete isolation between Namespaces requires careful consideration. Leaked resources or network communication between Namespaces can compromise the intended isolation.
Managing Complexity: As the number of Namespaces grows, managing and maintaining configurations, RBAC rules, and resource allocations can become challenging. Platform engineers need efficient tools and strategies to manage this complexity effectively.

In the realm of platform engineering, the pursuit of efficiency and reliability hinges on automation. Kubernetes, with its robust set of features, stands as a beacon for platform engineers seeking to automate deployment and scaling processes. Let’s explore how Kubernetes streamlines these processes and empowers platform engineers to elevate their infrastructure management.

Kubernetes Controllers: The Automation Engine

Kubernetes controllers play a pivotal role in orchestrating automated tasks that range from scaling applications to ensuring self-healing.

Scaling: Horizontal Pod Autoscaling (HPA) is a prime example. HPA automatically adjusts the number of pod replicas based on observed CPU or custom metrics. This ensures that applications can seamlessly handle traffic fluctuations without manual intervention.
Self-Healing: Liveness and readiness probes are key components that contribute to application self-healing. Liveness probes detect application failures and trigger pod restarts, while readiness probes ensure that only healthy pods receive traffic.
Updating: Kubernetes controllers, such as Deployments, automate application updates by maintaining a desired number of replicas while transitioning to a new version. This prevents service disruptions during updates and rollbacks, ensuring seamless transitions.

Kustomize: Customized Automation

Kustomize is a tool that allows platform engineers to customize Kubernetes manifests without the need for complex templating. It provides a declarative approach to configuration management, enabling engineers to define variations for different environments, teams, or applications.

Some benefits of Kustomize include:

Reusability: Kustomize promotes reusability by enabling the creation of base configurations that can be extended or modified as needed.
Environment-Specific Customization: Platform engineers can customize configurations for different environments (development, staging, production) or teams without duplicating the entire configuration.
Efficiency: Kustomize reduces duplication and minimizes manual editing, which reduces the risk of inconsistencies and errors.

Policy Enforcement and Governance: Navigating the Path to Stability

In the dynamic landscape of platform engineering, enforcing policies and governance emerges as a linchpin for ensuring stability, security, and compliance. Kubernetes, with its robust feature set, offers tools like RBAC (Role-Based Access Control) and network policies to establish control and enforce governance.

Policy enforcement ensures that the platform adheres to predefined rules and standards. This includes access control, security policies, resource quotas, and compliance requirements. By enforcing these policies, platform engineers maintain a secure and reliable environment for applications.

In a dynamic Kubernetes environment, maintaining security and compliance can be challenging. As applications evolve, keeping track of changing policies and ensuring consistent enforcement across clusters and namespaces becomes complex. The ephemeral nature of Kubernetes resources adds another layer of complexity to achieving persistent security and compliance.

DevOps Culture and Collaboration: Bridging the Divide

In the pursuit of efficient and collaborative platform engineering, fostering a DevOps culture is paramount.

DevOps culture bridges the gap between development, operations, and platform engineering teams. It encourages seamless communication, shared goals, and a collective sense of responsibility for the entire application lifecycle.

Kubernetes acts as a catalyst for collaboration by providing a common language for application deployment and infrastructure management. It encourages cross-functional communication and allows teams to work collaboratively on shared configurations.

Kubernetes’ declarative nature and shared tooling break down silos that often arise in traditional workflows. Developers, operators, and platform engineers can collectively define, manage, and evolve applications without being constrained by rigid boundaries.

The post Streamline Platform Engineering with Kubernetes appeared first on The New Stack.

7 Steps to Highly Effective Kubernetes Policies

Wito Delnat — Wed, 06 Sep 2023 14:37:06 +0000

You just started a new job where, for the first time, you have some responsibility for operating and managing a Kubernetes infrastructure. You’re excited about toeing your way even deeper into cloud native, but also terribly worried.

Yes, you’re concerned about the best way to write secure applications that follow best practices for naming and resource usage control, but what about everything else that’s already deployed to production? You spin up a new tool to peek into what’s happening and find 100 CVEs and YAML misconfigurations issues of high or critical importance. You close the tab and tell yourself you’ll deal with all of that … later.

Will you?

Maybe the most ambitious and fearless of you will, but the problem is that while the cloud native community likes to talk about security, standardization and “shift left” a lot, none of these conversations deaden the feeling of being overwhelmed by security, resource, syntax and tooling issues. No development paradigm or tool seems to have discovered the right way to present developers and operators with the “sweet spot” of making misconfigurations visible without also overwhelming them.

Like all the to-do lists we might face, whether it’s work or household chores, our minds can only effectively deal with so many issues at a time. Too many issues and we get lost in context switching and prioritizing half-baked Band-Aids over lasting improvements. We need better ways to limit scope (aka triage), set milestones and finally make security work manageable.

It’s time to ignore the number of issues and focus on interactively shaping, then enforcing, the way your organization uses established policies to make an impact — no overwhelming feeling required.

The Cloudy History of Cloud Native Policy

From Kubernetes’ first days, YAML configurations have been the building blocks of a functioning cluster and happily running applications. As the essential bridge between a developer’s application code and an Ops engineer’s work to keep the cluster humming, they’re not only challenging to get right, but also the cause of most deployment/service-level issues in Kubernetes. To add in a little extra spiciness, no one — not developers and not Ops engineers — wants to be solely responsible for them.

Policy entered the cloud native space as a way to automate the way YAML configurations are written and approved for production. If no one person or team wants the responsibility of manually checking every configuration according to an internal style guide, then policies can slowly shape how teams tackle common misconfigurations around security, resource usage and cloud native best practices. Not to mention any rules or idioms unique to their application.

The challenge with policies in Kubernetes is that it’s agnostic to how, when and why you enforce them. You can write rules in multiple ways, enforce them at different points in the software development life cycle (SDLC) and use them for wildly different reasons.

There is no better example of this confusion than pod security policy (PSP), which entered the Kubernetes ecosystem in 2016 with v1.3. PSP was designed to control how a pod can operate and reject any noncompliant configurations. For example, it allowed a K8s administrator to prevent developers from running privileged pods everywhere, essentially decoupling low-level Linux security decisions away from the development life cycle.

PSP never left that beta phase for a few good reasons. These policies were only applied when a person or process requested the creation of a pod, which meant there was no way to retrofit PSPs or enable them by default. The Kubernetes team admits PSP made it too easy to accidentally grant too-broad permissions, among other difficulties.

The PSP era of Kubernetes security was so fraught that it inspired a new rule for release cycle management: No Kubernetes project can stay in beta for more than two release cycles, either becoming stable or marked for deprecation and removal.

On the other hand, PSP moved the security-in-Kubernetes space in one positive direction: By separating the creation and instantiation of Kubernetes security policy, PSP opened up a new ecosystem for external admission controllers and policy enforcement tools, like Kyverno, Gatekeeper and, of course, Monokle.

Tools that we’ve used to shed our clusters of the PSP shackles and replaced that with… the Pod Security Standard (PSS). We’ll come back to that big difference in a minute.

A Phase-Based Approach to Kubernetes Policy

With this established decoupling between policy creation and instantiation, you can now apply a consistent policy language across your clusters, environments and teams, regardless of which tools you choose. You can also switch the tools you use for creation and instantiation at will and get reliable results in your clusters.

Creation typically happens in an integrated development environment (IDE), which means you can stick with your current favorite to express rules using rule-specific languages like Open Policy Agent (OPA), a declarative syntax like Kyverno, or a programming language like Go or TypeScript.

Instantiation and enforcement can happen in different parts of the software development life cycle. As we saw in our previous 101-level post on Kubernetes YAML policies, you can apply validation at one or more points in the configuration life cycle:

Pre-commit directly in a developer’s command line interface (CLI) or IDE,
Pre-deployment via your CI/CD pipeline,
Post-deployment via an admission controller like Kyverno or Gatekeeper, or
In-cluster for checking whether the deployed state still meets your policy standards.

The later policy instantiation, validation and enforcement happen in your SDLC, the more likely a dangerous misconfiguration slips its way into the production environment, and the more work will be needed to identify and fix the original source of any misconfigurations found. You can instantiate and enforce policies at several stages, but earlier is always better — something Monokle excels at, with robust pre-commit and pre-deployment validation support.

With the scenario in place — those dreaded 90 issues — and an understanding of the Kubernetes policy landscape, you can start to whittle away at the misconfigurations before you.

Step 1: Implement the Pod Security Standard

Let’s start with the PSS mentioned earlier. Kubernetes now describes three encompassing policies that you can quickly implement and enforce across your cluster. The “Privileged” policy is entirely unrestricted and should be reserved only for system and infrastructure workloads managed by administrators.

You should start with instantiating the “Baseline” policy, which allows for the minimally specified pod, which is where most developers new to Kubernetes begin:

apiVersion: v1
kind: Pod
metadata:
  name: default
spec:
  containers:
    - name: my-container
      Image: my-image

The advantage of starting with the Baseline is that you prevent known privilege escalations without needing to modify all your existing Dockerfiles and Kubernetes configurations. There will be some exceptions, which I’ll talk about in a moment.

Creating and instantiating this policy level is relatively straightforward — for example, on the namespace level:

apiVersion: v1
kind: Namespace
metadata:
  name: my-baseline-namespace
  labels:
    pod-security.kubernetes.io/enforce: baseline
    pod-security.kubernetes.io/enforce-version: latest
    pod-security.kubernetes.io/warn: baseline
    pod-security.kubernetes.io/warn-version: latest

You will inevitably have some special services that require more access than Baseline allows, like a Promtail agent for collecting logs and observability. In these cases, where you need certain beneficial features, those namespaces will need to operate under the Privileged policy. You’ll need to keep up with security improvements from that vendor to limit your risk.

By enforcing the Baseline level of the Pod Security Standard for most configurations and allowing Privileged for a select few, then fixing any misconfigurations that violate these policies, you’ve checked off your next policy milestone.

Step 2: Fix Labels and Annotations

Labels are meant to identify resources for grouping or filtering, while annotations are for important but nonidentifying context. If your head is still spinning from that, here’s a handy definition from Richard Li at Ambassador Labs: “Labels are for Kubernetes, while annotations are for humans.”

Labels should only be used for their intended purpose, and even then, be careful with where and how you apply them. In the past, attackers have used labels to probe deeply into the architecture of a Kubernetes cluster, including which nodes are running individual pods, without leaving behind logs of the queries they ran.

The same idea applies to your annotations: While they’re meant for humans, they are often used to obtain credentials that, in turn, give them access to even more secrets. If you use annotations to describe the person who should be contacted in case of an issue, know that you’re creating additional soft targets for social engineering attacks.

Step 3: Migrate to the Restricted PSS

While Baseline is permissible but safe-ish, the “Restricted” Pod Security Standard employs current best practices for hardening a pod. As Red Hat’s Mo Khan once described it, the Restricted standard ensures “the worst you can do is destroy yourself,” not your cluster.

With the Restricted standard, developers must write applications that run in read-only mode, have enabled only the Linux features necessary for the Pod to run, cannot escalate privileges at any time and so on.

I recommend starting with the Baseline and migrating to Restricted later, as separate milestones, because the latter almost always requires active changes to existing Dockerfiles and Kubernetes configurations. As soon as you instantiate and enforce the Restricted policy, your configurations will need to adhere to these policies or they’ll be rejected by your validator or admission controller.

Step 3a: Suppress, Not Ignore, the Inevitable False Positives

As you work through the Baseline and Restricted milestones, you’re approaching a more mature (and complicated) level of policy management. To ensure everyone stays on the same page regarding the current policy milestone, you should start to deal with the false positives or configurations you must explicitly allow despite the Restricted PSS.

When choosing between ignoring a rule or suppressing it, always favor suppression. That requires an auditable action, with logs or a configuration change, to codify an exception to the established policy framework. You can add suppressions in source, directly into your K8s configurations or externally, where a developer requests their operations peer to reconfigure their validator or admission controller to allow a “misconfiguration” to pass through.

In Monokle, you add in-source suppressions directly in your configuration as an annotation, with what the Static Analysis Results Interchange Format (SARIF) specification calls a justification:

metadata:
  annotations:
    monokle.io/suppress.pss.host-path-volumes: Agent requires access to back up cluster volumes

Step 4: Layer in Common Hardening Guidelines

At this point, you’ve moved beyond established Kubernetes frameworks for security, which means you need to take a bit more initiative on building and working toward your own milestones.

The National Security Agency (NSA) and Cybersecurity and Infrastructure Security Agency (CISA) have a popular Kubernetes Hardening Guide, which details not only pod-level improvements, such as effectively using immutable container file systems, but also network separation, audit logging and threat detection.

Step 5: Time to Plug and Play

After implementing some or all of the established hardening guidelines, every new policy is about choices, trust and trade-offs. Spend some time on Google or Stack Overflow and you’ll find plenty of recommendations for plug-and-play policies into your enforcement mechanism.

You can benefit from crowdsourced policies, many of which come from those with more unique experience, but remember that while rules might be well-intentioned, you don’t understand the recommender’s priorities or operating context. They know how to implement certain “high-hanging fruit” policies because they have to, not because they’re widely valuable.

One ongoing debate is whether to, and how strictly to, limit the resource needs of a container. Same goes for request limits. Not configuring limits can introduce security risks, but if you severely constrain your pods, they might not function properly.

Step 6: Add Custom Rules for the Unforeseen Peculiarities

Now you’re at the far end of Kubernetes policy, well beyond the 20% of misconfigurations and vulnerabilities that create 80% of the negative impact on production. But even now, having implemented all the best practices and collective cloud native knowledge, you’re not immune to misconfigurations that unexpectedly spark an incident or outage — the wonderful unknown unknowns of security and stability.

A good rule of thumb is if a peculiar (mis)configuration causes issues in production twice, it’s time to codify it as a custom rule to be enforced during development or by the admission controller. It’s just too important to be latently documented internally with the hope that developers read it, pay attention to it and catch it in each other’s pull-request reviews.

Once codified into your existing policy, custom rules become guardrails you enforce as close to development as possible. If you can reach developers with validation before they even commit their work, which Monokle Cloud does seamlessly with custom plugins and a development server you run locally, then you can save your entire organization a lot of rework and twiddling their thumbs waiting for CI/CD pipeline to inevitably fail when they could be building new features or fixing bugs.

Wrapping Up

If you implement all the frameworks and milestones covered above and make all the requisite changes to your Dockerfiles and Kubernetes configurations to meet these new policies, you’ll probably find your list of 90 major vulnerabilities has dropped to a far more manageable number.

You’re seeing the value of our step-by-step approach to shaping and enforcing Kubernetes policies. The more you can interact with the impact of new policies and rules, the way Monokle does uniquely at the pre-commit stage, the easier it’ll be to make incremental steps without overwhelming yourself or others.

You might even find yourself proudly claiming that your Kubernetes environment is entirely misconfiguration-free. That’s a win, no doubt, but it’s not a guarantee — there will always be new Kubernetes versions, new applications and new best practices to roll into what you’ve already done. It’s also not the best way to talk about your accomplishments with your leadership or executive team.

The advantage of leveraging the frameworks and hardening guidelines is that you have a better common ground to talk about your impact on certification, compliance and long-term security goals.

What sounds more compelling to a non-expert:

You reduced your number of CVEs from 90 to X,
Or that you fully complied with the NSA’s Kubernetes hardening guidelines?

The sooner we worry less about numbers and more about common milestones, enforced as early in the application life cycle as possible (ideally pre-commit!), the sooner we can find the sustainable sweet spot for each of our unique forays into cloud native policy.

The post 7 Steps to Highly Effective Kubernetes Policies appeared first on The New Stack.

VMware Expands Tanzu into a Full Platform Engineering Environment

Joab Jackson — Tue, 22 Aug 2023 20:03:21 +0000

Enterprise software provider VMware is expanding its Kubernetes platform to include an internal development portal and a runtime for applications to span multiple clouds.

The company unveiled these enhancements Tuesday at its annual user conference, VMware Explore, being held this week in Las Vegas.

“We have expanded the definition of Tanzu Application Platform to include a portal and a secure supply chain, and also a runtime that automatically scales and a common set of services,” said Purnima Padmanabhan, VMware senior vice president and general manager for the company’s modern apps and management business group in a press briefing for the event.

Built on the Tanzu for Kubernetes Operations, the Tanzu Application Platform (TAP) is VMware’s enterprise offering for managing cloud native applications, using a modular architecture, open interfaces and a common data control plane.

TAP now includes an internal developer portal (IDP), based on the popular Spotify-built Backstage, that provides a place for developers to share application templates and (in Beta) DIY Backstage plug-ins. It also includes an admin console to configure and operate Tanzu Application Platform capabilities and applications.

Also new in Beta is the VMware Tanzu Application Engine, a multicloud Kubernetes-based runtime environment for the platform engineering team to procure runtime capabilities with specific requirements — such as high availability, secure connectivity, and scalability — across different Kubernetes clusters and clouds. The engine itself can enforce the requirements.

FinOps Comes to Tanzu

Tanzu is also borrowing some capabilities from the VMware Aria portfolio of cloud management software, and folding it into a new offering called the Tanzu Intelligence Services. Features include:

VMware Tanzu with Intelligent Assist: A chatbot to streamline operational workflow (tech preview).
VMware Tanzu CloudHealth: Machine learning-based forecasting (beta) to improve budget planning and dynamic Kubernetes rightsizing (Beta).
VMware Tanzu Insights: An observability platform for troubleshooting and resolving issues across distributed Kubernetes, Amazon Web Services (AWS), and Microsoft Azure environments. (Initial availability planned for Q3 FY24).
VMware Tanzu Guardrails: Multicloud governance and enforcement through policy-based automation.
VMware Tanzu Transformer: Migration assessment and workflows for moving apps to VMware Clouds and public clouds.

Exploring VMware Explore

The Tanzu announcements were only a handful of the new technologies unveiled at the show. The company also discussed a partnership with Nvidia to equip enterprises for large-scale generative AI; an expansion of VMware Cloud including a new virtual private cloud capabilities built on the NSX+ interface to provide full isolation of networking, security, and services to multiple tenants on a shared VMware Cloud; the VMware Edge Cloud Orchestrator will provide unified management of multiple edge computing nodes; and AI integrations to the Anywhere Workspace platform to streamline employee productivity and reduce exposure to software vulnerabilities.

VMware has conducted research that indicates that 70% of CIOs are now developing new cloud native applications.

@VMware President Sumit Dhawan #VMwareExplore Keynote introduces Tanzu Application Engine for platform engineers to improve scaling and make it simpler to scale and build applications so properties are automatically provisioned for those apps and “stay that way,” he says. pic.twitter.com/0aQYCmr0DE

— BC Gain (@bcamerongain) August 22, 2023

The post VMware Expands Tanzu into a Full Platform Engineering Environment appeared first on The New Stack.

Kubernetes Isn’t Always the Right Choice

Rak Siva — Mon, 21 Aug 2023 14:02:42 +0000

These days, you can encapsulate virtually any application in a container for execution. Containers solve a lot of problems, but they introduce a new challenge of orchestration. Because of the growing need for container orchestration from a huge number of teams working to build cloud native applications, Kubernetes has gained significant popularity as a powerful tool to solve that challenge.

Building in a well-managed Kubernetes environment offers numerous benefits such as autoscaling, self-healing, service discovery and load balancing. However, embracing the world of Kubernetes often implies more than just adopting container orchestration technology. Teams need to strategically consider, “Is Kubernetes the right choice for my solution?” And they must do so by evaluating several components of this broader question.

Is My Team Composition a Fit for Kubernetes?

There’s no shortage of articles praising the capabilities of Kubernetes (K8s), and that’s not what we aim to dispute. K8s is the right choice in many cases. That said, direct interaction with and maintenance of K8s isn’t appropriate for all teams and projects.

Small startups with cloud native applications: These teams will find direct management of Kubernetes to be a complex, time-consuming distraction from their goal of releasing and scaling a product. Given their size, the teams will not have the bandwidth to manage Kubernetes clusters while also developing their application.
Enterprise teams with a variety of application types: For larger teams with specialist skills, Kubernetes is an excellent choice. However, fully managed container runtimes or Kubernetes as service offerings should still be considered. These services allow limited DevOps resources to focus on team productivity, developer self-service, cost management and other critical items.
Midsize companies with a DevOps culture: While these teams are more prepared for a move to Kubernetes, it’s a major project that will disrupt existing workflows. Again, managed offerings unlock many benefits of Kubernetes without significant investment.
Software consultancies: While these teams are adaptable, relying on Kubernetes can limit their ability to serve clients with different needs, as it pushes the consultancy toward recommending it even when it’s not the best fit.

How Complex Is My Project? Is K8s Overkill?

Rather than determining whether K8s meets some of your requirements, consider identifying specific characteristics and requirements that do not align well with capabilities of Kubernetes or introduce unnecessary complexity.

Minimal scalability needs: If the project has consistently low traffic or predictable and steady resource demands without significant scaling requirements, Kubernetes will introduce unnecessary overhead. In these cases, managed container runtimes or virtual private server (VPS) solutions typically represent better value.
Simple monolithic applications: If the project is a monolithic application with limited dependencies and doesn’t require independently scalable services or extremely high instance counts, Kubernetes is too complex for its needs.
Static or limited infrastructure: If the project has small or static infrastructure without much variation in resource usage, then simpler deployment options such as managed services or VPS will suffice.
Limited DevOps resources: Kubernetes requires expertise in container orchestration, which is not feasible for projects with limited DevOps resources or if the team is not willing to invest in learning Kubernetes. The benefits of containers can still be achieved without this additional investment.
Prototyping and short-term projects: For projects with short development life cycles or limited production durations, the Kubernetes overhead cannot be justified.
Project cost constraints: If the project has stringent budget constraints, the additional cost of setting up and maintaining a Kubernetes cluster will not be feasible. This is particularly true when considering the cost of the highly skilled team members required to do this work.
Infrastructure requirements: Kubernetes can be resource-intensive, requiring robust infrastructure to run effectively. If your projects are small or medium-sized with modest resource requirements, using managed services or serverless is far more appropriate.

The complexity of your requirements alone won’t determine whether Kubernetes is perfect or excessive for your team; however, it can help you lean one way or the other. If you’re using Kubernetes directly, it won’t inherently elevate your product. Instead, its strength lies in crafting a resilient platform on which your product may thrive.

Image 1

The consequences are that the development efforts toward your product will shift further away from being the foundation of your business the more you commit to laying your own work underneath it.

This unearths the real question: Are we building a platform or are we trying to expedite our time to market with more immediate return on investment for our core business objectives?

Do We Have the Necessary Skill Set?

Kubernetes is often recognized for its challenging learning journey. What contributes to this complexity? To offer clarity, I’ve curated a list of topics based on specific criteria that help gauge the effort needed to improve one’s skills.

Complexity	Description
Basic	Fundamental, easier concepts
Intermediate	Concepts needing some pre-existing knowledge
Advanced	Complex concepts requiring extensive knowledge

Note: These complexity levels will vary based on individual background and prior experience.

Learning Area	Description	Complexity
Containerization	Understanding of containers and tools like Docker.	Basic
Kubernetes architecture	Knowledge about pods, services, deployments, Replicasets, nodes and clusters.	Intermediate
Kubernetes API and Objects	Understanding the declarative approach of Kubernetes, using APIs and YAML.	Intermediate
Networking	Understanding of inter-pod communication, services, ingress, network policies and service mesh.	Advanced
Storage	Knowledge about volumes, persistent volumes (PV), persistent volume claims (PVC) and storage classes.	Advanced
Security	Understanding of Kubernetes security including RBAC, security contexts, network policies and pod security policies.	Advanced
Observability	Familiarity with monitoring, logging and tracing tools like Prometheus, Grafana, Fluentd, Jaeger.	Intermediate
CI/CD in Kubernetes	Integration of Kubernetes with CI/CD tools such as Jenkins, GitLab and use of Helm charts for deployment.	Intermediate
Kubernetes best practices	Familiarity with best practices and common pitfalls in the use of Kubernetes.	Intermediate to Advanced

For teams that lack the necessary expertise or the time to learn, the overall development and deployment process can become overwhelming and slow, which will not be healthy for projects with tight timelines or small teams.

What Are the Cost Implications?

While Kubernetes itself is open source and free, running it is not. You’ll need to account for the expenses associated with the infrastructure, including the cost of servers, storage and networking as well as hidden costs.

The first hidden cost lies in its management and maintenance — the time and resources spent on training your team, troubleshooting, maintaining the system, maintaining internal workflows and self-service infrastructure.

For various reasons, the salaries of the highly skilled employees required for this work are overlooked by many when calculating the cost of a full-blown Kubernetes environment. Be wary of the many flawed comparisons between fully managed or serverless offerings against self-managed Kubernetes. They often fail to account for the cost of staff and the opportunity costs associated with lost time to Kubernetes.

The second hidden cost is tied to the Kubernetes ecosystem. Embracing the world of Kubernetes often implies more than just adopting a container orchestration platform. It’s like setting foot on a vast continent, rich in features and a whole universe of ancillary tools, services and products offered by various vendors, which ultimately introduce other costs.

Conclusion

A good tool is not about its hype or popularity but how well it solves your problems and fits into your ecosystem. In the landscape of cloud native applications, Kubernetes has understandably taken an oversized share of the conversation. However, I encourage teams to consider the trade-offs of different approaches made viable by solutions like OpenShift, Docker Swarm or serverless and managed services orchestrated by frameworks like Nitric.

In a follow-up post, I’ll explore an approach to creating cloud native apps without direct reliance on Kubernetes. I’ll dig into the process of building and deploying robust, scalable and resilient cloud native applications using infrastructure provisioned through managed services such as AWS Lambda, Google CloudRun and Azure ContainerApps.

This approach to developing applications for the cloud was the inspiration for Nitric, the cloud framework we are building that focuses on improving the experience for both developers and operations.

Nitric is an open source multilanguage framework for cloud native development designed to simplify the process of creating, deploying and managing applications in the cloud. It provides a consistent developer experience across multiple cloud platforms while abstracting and automating the complexities involved in configuring the underlying infrastructure.

For teams and projects that find direct interaction and management of Kubernetes unsuitable, whether due to budget constraints, limited resources or skill set, Nitric provides an avenue to harness the same advantages. Dive deeper into Nitric’s approach and share your feedback with us on GitHub.

The post Kubernetes Isn’t Always the Right Choice appeared first on The New Stack.

Kubernetes 1.28 Accommodates the Service Mesh, Sudden Outages

Joab Jackson — Fri, 18 Aug 2023 17:08:26 +0000

Planternetes logo

With its latest release, version 1.28, Kubernetes has formally recognized the service mesh as a first-class citizen in a cluster.

K8s v1.28, nicknamed “Planternetes” is the first release where the API recognizes a service mesh as a special type of init container, or containers needed to initialize a pod.

“Folks have been using the sidecar pattern for a long time,” said Grace Nguyen, who led the v1.28 release. The API will support actions such as updating secrets and logging. You want logging to continue even after the node has been shut down, or before it is spun up, Nguyen said.

To support the service mesh, the API gets an additional field to designate a service mesh and has the policy that the containerized service mesh can remain operational for the lifetime of the pod, unlike a regular init container.

The sidecar pattern has been around since Kubernetes itself. A sidecar acts as a networking agent for a Kubernetes application, handling all the traffic in and out, as well as performing checks, monitoring, etc.

Ideally, the service mesh container to be running before the app itself, ensuring all inbound/outbound connections are supported. The service mesh container should also be around after the app containers are terminated, to manage any remaining traffic. In practice, making this happen has been tricky for the service mesh providers like Linkerd and Istio, some of whom have created brittle platform-specific workarounds.

Shut It Down

“Planternetes” is the second release of 2023, it consists of 46 enhancements and a fresh new logo (see above). Twenty of these enhancements are in the Alpha early stage, 14 are in Beta mode, and 12 are ready for production usage (“stable”).

Non-graceful shutdown is one such stable feature. A non-graceful shutdown is one in which the kubelet’s Node Shutdown Manager may not detect a pod becoming inoperable, due to some underlying hardware failure or the OS freezing up. Now, there is a mechanism to move pods to another node when the original node fails. The StatefulSet provides K8s with the info needed to airlift the pod to a healthier environment.

Those convinced that the culprit of any networking problem is always DNS will be happy to know that Kubernetes configuration for DNS has been expanded. Previously, K8s could only search across six domains, with a maximum of 256 characters each. Now, the search paths for kubelets have been increased to 32, with a maximum of 2,048 characters.

Enterprises that don’t want to update K8s as quickly as new versions are released will get some breathing room with this release, Nguyen said. Users can now skip up to three new releases of the control plane, instead of two. This means nodes would only have to be updated once a year instead of twice, and still be current with upstream support.

Release Complexities

Nguyen has been on the release team for over two years now, and would not characterize this release of Kubernetes as a major one — most major releases have a lot of deprecation of obsolete features. But there still has been a lot of work that went on here, just by the size Kubernetes is growing to be.

“I think that we are getting to the point where there are so many features coming in and so many code pull requests that it is hard to keep track of both at the same time,” Nguyen said. A large pull request may come in for a particular feature, but that feature may not be ready for release, and so that code may have to get rolled back.

The post Kubernetes 1.28 Accommodates the Service Mesh, Sudden Outages appeared first on The New Stack.

Two Ways Incident Responders Can Make Sense of Kubernetes

Mandi Walls — Thu, 17 Aug 2023 14:28:05 +0000

Managing and monitoring Kubernetes (K8s) ecosystems can be a complex and challenging task without the right processes in place. Yet it’s something teams will increasingly be asked to do. Gartner predicts that by 2028, 95% of global organizations will be running containerized applications in production, significantly up from fewer than 50% in 2023.

The flexibility offered by K8s provides a powerful platform for organizations to meet the needs of many types of applications. Flexibility, however, breeds complexity, and K8s environments are not immune from increasing complexity as environments encompass more of an organization’s workload.

As Kubernetes expands in popularity, it is important that teams put in place tools and processes to help them manage K8s environments. This will involve ensuring service ownership is in place, as well as process automation to reduce the requirement for first responders from untangling and understanding complex K8s applications and environments.

Here are two ways first responders can better manage K8s.

Excuse Me Sir, Is This Your K8s Application?

First, it’s important that organizations adopt a service ownership approach. Given the complexity of K8s environments, outages and slowdowns will unfortunately become inevitable.

But incident response teams cannot be expected to understand complex K8s applications or services. They need subject-matter experts to help them navigate this complexity when an incident strikes.

Service ownership ensures those closest to K8s applications and environments are responsible for them throughout the life cycle. It embeds a “you code it, your own it” mindset that empowers developers and engineers to take responsibility for applications and services in production, rather than incident responders who are less knowledgeable about specific applications or services.

The benefits are clear. It creates a far better experience for customers because it puts developers much closer to them and makes developers able to see the impact of their work. Owning your own code also puts in place an automatic quality control loop, where it is always clear who is responsible for what. Additionally, clear ownership means fewer people need to be drafted in to troubleshoot, and because the service owner acts as first responder, it can significantly reduce mean time to resolution (MTTR).

But there is a cultural shift to navigate here. To drive ownership, you’ll also need organizational buy-in supported by senior managers and a robust change management program. Ultimately, service ownership is the perfect remedy for incident responders expected to address issues with K8s environments.

One-Stop Shop for Diagnostics

The second piece of effective K8s management is ensuring process automation is in place so that responders need only hit a button to run diagnostics for any K8s-related issues that may be affecting performance.

When dealing with an incident, the last thing responders need is to try and untangle complexity to understand what issues or problems may be affecting K8s applications or services. Many responders have a working knowledge of their organization’s IT environment but lack the specific technical expertise to really understand every single issue that could be affecting K8s uptime.

This typically means they need to call in engineers to help to run diagnostics, identify root cause and remediate incidents.

But this takes time. When an incident strikes, 85% of the duration is spent in diagnosis, involving at least four engineers. These engineers are then manually repeating the same diagnostic steps for multiple incidents, such as running health checks or monitoring CPU and memory caches. Every escalation to senior engineers represents lost focus on innovation time, which this averages 25% of their time.

With process automation, diagnostics and remediation can be triggered automatically. Incident responders get access to a push-button library of defined diagnostic and remediation actions. This means they can trigger repetitive tasks, such as server restarts or clearing memory caches, and eliminate the need to have engineers do this for them. By enabling incident responders with automation, they can handle more of the incidents that occur in K8s environments and only involve K8s engineers and experts when absolutely necessary. This results in shorter resolution times and fewer disruptive escalations.

Demystifying the Complex

When it comes to managing complex K8s applications and environments, it’s important that teams have the right processes in place to enable quick remediation. Service ownership and process automation are critical to enabling responders to effectively manage incidents and reduce the time spent on manual tasks and escalation.

Providing one-button, low-code/no-code solutions give teams the tools to run diagnostics, apply fixes and escalate to subject-matter experts as needed. But to enable this, teams need a platform that allows them to manage the full incident life cycle. This operations cloud needs to be able to help teams identify signals in the noise and find critical issues. It can then be used to mobilize the right people at the right time to solve problems, by augmenting teams with automated processes that enhance responders’ ability to triage, diagnose and resolve incidents.

Adopting this essential infrastructure for critical work, especially for teams managing complex K8s applications and environments, can help responders take action on urgent incidents, resolve them faster, reduce IT support costs and eliminate interruptions. This means your organization can resolve unplanned, unstructured, time-sensitive and high-impact issues quickly — and minimize the impact on revenue and reputation.

The post Two Ways Incident Responders Can Make Sense of Kubernetes appeared first on The New Stack.

Kubernetes for Developers with a Distributed App Runtime

Alex Williams — Tue, 15 Aug 2023 20:37:25 +0000

Dapr is a distributed application runtime that does the busy work by codifying platform requirements through APIs accessed over HTTP or gRPC. It makes Kubernetes accessible for developers by, in many respects, creating a separation of concerns. Dapr is an open source project under the Cloud Native Computing Foundation. The Dapr project was originally created at Microsoft. Diagrid was founded by members of that founding team to create new offerings based on Dapr as well as continue to lead the development of Dapr with other contributing organizations and the community.

“We thought, ‘Hey how can we do this better?'” said Mark Fussell, founder, and CEO of Diagrid and Dapr’s co-creator, during a demo with The New Stack. Diagrid provides a service, Diagrid Conductor, that manages Dapr on Kubernetes.

“It helps you operate and manage that on top of Kubernetes,” Fussell said about Conductor. So think of it as a managed version of Dapr on Kubernetes.”

Communication, state management, managing secrets — even things like workflows get codified with APIs in Dapr. Dapr users can use any language of their choice.

“I mean, Kubernetes has got nothing to do with developers,” Fussell said. “And yet they’re told to build on top of it all. So we’ve seen by using these APIs, developers are anywhere between 20 and 40% more productive building their applications.”

Dapr runs on a sidecar model. A developer writes their code, and Dapr appears as a sidecar when the application runs. It does the heavy lifting for the developer to reach their goals.

In the demo, Fussell provided examples of the API calls using the example of a shopping cart application to do a localhost call to the Dapr sidecar, running on port 3500. An invoke method gets called for an order method, and Dapr then does the discovery, secure call with retries, and other requirements.

Using a sidecar method comes down to a separation of concerns. A user may update the libraries separately that often get bundled apart. Also, with the separation, a new version of Dapr eliminates the task of updating the application code.

“Often, when you pull all these runtimes together into one application, you don’t know whether it’s the runtime or your application that’s having issues,” Fussell said. “So the separation of concerns, particularly in a distributed application environment, means that you can update things independently, resolve issues independently.”

And the latency is almost minuscule, he said. It takes milliseconds for service invocation.

“So you get great performance as well as separation of concerns. So yeah, so that’s what Dapr is,” Fussell said.

Fussell said that the concept of components reflects a core aspect of the APIs. It’s how the infrastructure gets integrated into the application. State gets put into state stores. A state store from a shopping cart application that users may swap between different cloud services or a local development machine to the cloud, thereby providing code portability.

“It gives us dynamic behavior about how you tie infrastructure to your application without having to pull in a specific SDK or throw away your code and having to learn these things,” Fussell said.

The Dapr technology reflects what we heard several years ago when technologists said that Kubernetes would eventually disappear. Dapr makes Kubernetes accessible to developers. They still know Kubernetes sits underneath, but they do not need to understand the inner working as much as a team did in 2016. Kubernetes had no use for developers back then. Dapr allows developers to work on applications across cloud services, a dynamic seen now more often across the cloud native landscape.

The post Kubernetes for Developers with a Distributed App Runtime appeared first on The New Stack.

Managing Kubernetes Clusters for Platform Engineers

Alex Williams — Thu, 10 Aug 2023 18:31:17 +0000

Managing Kubernetes cluster gets tricky for platform engineers, but reusability and scalability have become more feasible.

Spectro Cloud has the notion of a cluster profile in a UI that embodies repeatability so a user can apply a configuration in any cloud, anywhere, but still keep the same profile, whether for the cluster, the application, or a virtual cluster.

Spectro Cloud simplifies the process through its Palette service, which allows platform engineers to decouple the cluster, Nic Vermandé, Senior DevRel Manager at Spectro Cloud said. Palette provides a platform for managing Kubernetes environments in the cloud or a data center.

Virtual clusters distinguish the Palette service. In the demo, Vermandé explained how to provision the Kubernetes cluster from the existing cluster in the Palette environment. In the workflow, platform engineers create a new cluster with Palette in the cloud or on-premise. Out of that, developers receive virtual clusters to deploy the application in the cloud or on-premise.

The actual cluster in Palette uses Loft Vcluster, which guarantees isolation. Vcluster is an open source project that allows engineers to spin up lightweight, virtual Kubernetes clusters that run inside the namespaces of an underlying Kubernetes cluster. In doing so, Vcluster provides a more resource-efficient manner of spinning up multiple clusters and provides a form of soft multitenancy without handing out administrator credentials to more users than needed, writes Mike Melanson in The New Stack.

By using virtual clusters, Palette offers a harder form of isolation than softer forms such as Kubernetes namespaces, Vermandé said.

The demo shows how Palette allows developers to deploy applications to a sandbox environment. Based on the same concept as Palette provides, developers get an application profile for defining a Kubernetes cluster, but this time in terms of defining the app declaratively and then deploying it to their sandbox.

The demo shows what fundamentally defines how Spectro Cloud serves the platform engineer by providing an environment to decouple the cluster targets from its configuration.

For example, users may set up a profile for a cloud service such as Google Cloud, Vermandé said. A user may add layers to the cluster and then assign the particulars, such as the region for deployment, with features for day-to-day management.

A user may see all the various workloads and deployment pods. They may plan to scan the cluster. They can include pen testing and different other capabilities.

The concept behind Palette is to manage day-to-day operations, from deploying the cluster to managing the cluster to creating isolation and deploying applications. It uses the same methodology, the same language, and open source-based software that already exists, Vermandé said.

Palette reflects how cloud native technology vendors are leaping in the experience they offer enterprise users. They are managed in a way that serves the customer. For instance, Spectro Cloud engineers contribute to upstream open source projects.

“I would say the key thing here is are two things: reusability, and then scalability,” Vermandé said.

The post Managing Kubernetes Clusters for Platform Engineers appeared first on The New Stack.

Unleashing the Power of Kubernetes Application Mobility

Jason Bloomberg — Thu, 10 Aug 2023 12:00:29 +0000

In a previous article for The New Stack, I discussed the challenges and benefits of cloud native application portability.

Portable applications are good for hot backups, multicloud load balancing, deploying applications to new environments and switching from one cloud to another for business reasons.

However, such portability is difficult, because Kubernetes applications consist of ephemeral microservices, configurations and data. Kubernetes also handles state information in an abstracted way, since microservices are generally stateless.

It is therefore important to understand the value of Kubernetes application mobility. At first glance, application mobility appears to be synonymous with application portability, especially in the Kubernetes context.

If we look more closely, however, there is an important distinction, a distinction that clarifies how organizations can extract the most value out of this important feature.

Application Migration, Portability and Mobility: A Primer

Application migration, portability and mobility are similar but distinct concepts. Here are the differences.

Application migration means moving either source code or application binaries from one environment to another, for example, from a virtual machine instance to one or more containers.
Cloud native application portability centers on moving microservices-based workloads running on different instances of Kubernetes.
Cloud native application mobility, the focus of this article, means ensuring that the consuming applications that interact with microservices work seamlessly regardless of the locations of the underlying software, even as workloads move from one environment to another.

Application portability supports application mobility but is neither necessary nor sufficient for it.

There are many benefits of application mobility, including cloud-service provider choice, revenue analyses and risk profile management. For Kubernetes in particular, application mobility is a valuable data management tool for near real-time analyses and performance evaluation.

As customer use drives the demands for an application, application owners can optimize the mix of cloud environments for each application and risk management system.

The impact of application mobility is its strategic value to short- and long-term planning and operational efforts necessary to protect a Kubernetes application portfolio across its life cycle.

Four Cloud Native Application Mobility Scenarios

For Kubernetes data management platform vendor Kasten by Veeam, application mobility serves four important use cases: cross-cloud portability, cluster upgrade testing, multicloud balancing and data management via spinning off a copy of the data.

Cross-cloud portability for Kubernetes applications is a clear example of application portability supporting application mobility, where application mobility provides seamless behavior for consuming applications during the porting of applications, either to other clouds or to upgraded clusters, respectively.

In Kubernetes, containerized applications are independent from the underlying infrastructure. This independence allows for transfer across a variety of platforms, including on-premises, public, private and hybrid cloud infrastructures.

The key metric for Kubernetes application portability is the mean time to restore (MTTR) — how fast an organization can restore applications from one cluster to another.

Cluster upgrade testing is crucial for business owners who want to manage Kubernetes changes by predictably migrating applications to an upgraded cluster. The ability to catch and address upgrade-related issues as part of a normal operating process is imperative.

The key metric for cluster upgrade testing is the ability to catch important changes before they become a problem at scale so that the organization can address the problems, either by restoring individual components or the entire application.

Multicloud load balancing is an example of application mobility that doesn’t call upon portability, as an API gateway directs traffic and handles load balancing across individual cloud instances. In fact, API gateways enable load balancing across public and private clouds and enable organizations to manage applications according to the business policies in place.

The key metrics for multicloud load balancing center on managing cost, risk and performance in real time as the load balancing takes place.

Finally, data management leverages portability to support application mobility. An organization might use a copy of production data to measure application performance, data usage or other parameters.

Such activities depend on the seamless behavior across live and copied data, behavior that leverages application mobility to spin data to an offline copy for both data analysis as well as data protection once an application or service has begun production.

Key metrics for data management include measures of live application and service data performance, data usage and other characteristics of the current application data set.

The Intellyx Take

The distinction between Kubernetes application portability and mobility is subtle, but important.

Portability is, in essence, one layer of abstraction below mobility, as it focuses on the physical movement of application components or workloads.

Application mobility, in contrast, focuses on making the consumption of application resources location-independent, allowing for the free movement of those consumers as well as the underlying resources.

Given that Kubernetes is infrastructure software, such consumers are themselves applications that may or may not directly affect the user experience. Furthermore, the workloads running on that infrastructure are themselves abstractions of a collection of ephemeral and persistent elements.

Workloads may move, or they may run in many places at once, or they may run in one place and then another, depending on the particular use case. When consuming applications are none the wiser, the organization can say that they have achieved application mobility.

The post Unleashing the Power of Kubernetes Application Mobility appeared first on The New Stack.

Aqua Security Uncovers Major Kubernetes Attacks

Steven J. Vaughan-Nichols — Wed, 09 Aug 2023 14:42:45 +0000

Aqua Security, a leading cloud native security figure, has unveiled alarming findings after a three-month investigation by its research team, Aqua Nautilus. The study revealed that Kubernetes clusters of over 350 entities, including Fortune 500 companies, open source projects, and individuals, were left exposed and vulnerable.

An insanely high 60% of examined clusters had been compromised, with malware and backdoors actively deployed. The vulnerabilities stemmed from two primary misconfigurations, highlighting the dangers of both known and overlooked misconfigurations.

Wrong Hands

Assaf Morag, Aqua Nautilus’s lead threat intelligence analyst, stressed the gravity of the situation, stating, “Access to a company’s Kubernetes cluster in the wrong hands could spell the end. Everything from proprietary code, customer data, financial records, to encryption keys is at risk.”

For example, Aqua found that the Kubernetes cluster was often part of the organization’s Software Development Life Cycle (SDLC). Therefore, the Kubernetes cluster also had access to Source Code Management (SCM), Continuous Integration/Continuous Deployment (CI/CD), registries, and the Cloud Service Provider. In short, everything and the kitchen sink.

Since Kubernetes has become the default platform for managing containers, numerous businesses use it to manage containerized applications efficiently. Now, if only they knew how to secure it! Morag added, “Despite the availability of Kubernetes security tools such as Aqua’s Software Supply Chain Security suite, misconfigurations remain rampant across all organization sizes. The potential damage from these vulnerabilities is immense.”

You think?

Indifference?

So, what did the violated groups have to say about all these problems? The initial response from the affected cluster owners was indifference. Many dismissed their hacked clusters as mere ‘testing environments.” Denial is not just a river in Egypt.

They should be concerned. Three different cryptocurrency mining operators are primarily using the breached Kubernetes clusters. These are TeamTNT’s Silentbob campaign; the role-based access control (RBAC) Buster campaign; and yet another Dero Campaign. Maybe your business can afford to waste compute on mining cryptocurrency. Mine can’t.

So, how are these crypto miners breaking in?

Number one with a bullet is that unauthenticated requests to the cluster are often enabled by default. That means that anyone can send requests and get responses with a Kubernetes cluster. Most cloud providers’ default configuration for the API server is to make it Internet accessible to anyone.

OK, that’s bad, but it’s not a showstopper. Still, since anyone can get answers from the API server, that means they can list all the secrets stored in the distributed key storage etcd. If you include secrets within the environment variables, such as links to other environments and secrets or credentials such as Docker Hub, your cloud service provider, GitHub, GitLab, Bitbucket, etc., etc. If you store secrets like this, you should block anonymous users from any access to your cluster.

Generally speaking, the anonymous user has no other permissions. But far too many administrators give privileges to the anonymous user. Don’t ask me why. This is just asking for trouble.

Aqua reports they’ve seen “cases in which practitioners bind the anonymous user role with other roles, often with admin roles. From there, it’s a very short step indeed to attackers gaining unauthorized access to the Kubernetes cluster. So, it is that “you’re only one YAML away from disaster.”

The other common misconfiguration is how the “kubectl proxy” command is set up. When you run “kubectl proxy”, you are forwarding authorized and authenticated requests to the API server.

So, for example, When you run the same command with the following flags “–address=0.0.0.0 –accept-hosts .*“, your workstation’s proxy will now listen and forward authorized and authenticated requests to the API server from any host that has HTTP access to the workstation. Mind you that the privileges are the same privileges that the user who ran the “kubectl proxy” command has. Whoops.

So what can you do? Lock Your Systems Down. Specifically, Aqua Nautilus recommends using native Kubernetes features like RBAC and admission control policies to enhance security. Regular audits and open source tools like Aqua Trivy, Aqua Tracee, and Kube-Hunter can help in real-time threat detection and prevention.

Really, this is all pretty simple, straightforward security stuff. Unfortunately, all too many of us are still ignoring security 101. Aqua reminds us that we can’t afford to ignore it.

The post Aqua Security Uncovers Major Kubernetes Attacks appeared first on The New Stack.

Install Cloud Foundry Korifi on Google Kubernetes Engine

Ram Iyengar — Tue, 01 Aug 2023 17:00:47 +0000

Managed Kubernetes clusters are very popular among software developers.

They look to managed providers for a variety of reasons, the chief being simplified management, improved reliability, and controlling costs.

Of the many providers available, Google Cloud is rather popular among software developers and is a good choice as an infrastructure provider.

Google Kubernetes Engine is the managed Kubernetes service provided by Google Cloud, but developers need more than SKE alone to manage their apps.

Cloud Foundry is a powerful Platform-as-a-Service (PaaS) that can help software developers build, deploy, and manage applications more easily and securely. It installs on any cloud-based infrastructure and transforms it into a multitenant, self-service, and consumable resource. It is built with the goal of helping developers focus on building applications — and not managing infrastructure.

Cloud Foundry was originally built for use on VM-based compute. For over a decade, Cloud Foundry has been successfully implemented for planet-scale workloads. Now, this same abstraction is being made available for Kubernetes-based workloads, too. Cloud Foundry Korifi is an implementation of the Cloud Foundry API built on top of Kubernetes-native custom resources.

Korifi is designed to install on any infrastructure provider. This is greatly simplified by managed Kubernetes offerings that are available, and has been tested additionally on kind, k3s, and other Kubernetes flavors for local development. Apps written in any language or framework can be deployed using Korifi.

This tutorial will show you how to install the Cloud Foundry Korifi on Google Kubernetes Engine.

Prerequisites

Please install the following tools to start.

kubectl
cf CLI5 or greater
Helm
gcloud cli

Installation Steps

The first step is to create a Kubernetes cluster. When using Google Kubernetes Engine, we found that creating a cluster using “Autopilot” does not work because it conflicts with some of the networking configuration required for Korifi. Please choose the “Standard” mode to configure the cluster. You could create the cluster using the command line using the following command:

The given command is a command-line instruction using the Google Cloud SDK (gcloud) tool to create a Google Kubernetes Engine (GKE) cluster with a specific configuration. Next, we will install the following dependencies: cert-Manager, kpack, and Contour.

Cert-Manager is an open source certificate management solution designed specifically for Kubernetes clusters. It can be installed with a single kubectl apply command, with the latest release referenced in the path to the YAML definition.

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.12.0/cert-manager.yaml

Kpack is an open source project that integrates with Kubernetes to provide a container-native build process. It consumes cloud native Buildpacks to export OCI-compatible containers. Kpack can be installed by using the kubectl apply command, bypassing the yaml containing the declaration of the latest release.

kubectl apply -f https://github.com/pivotal/kpack/releases/download/v0.11.0/release-0.11.0.yaml

Contour is an open source ingress controller for Kubernetes that is built on top of the Envoy proxy. An ingress controller is a Kubernetes resource that manages the inbound network traffic to services within a cluster. It acts as a gateway and provides external access to the services running inside the cluster. Contour specifically focuses on providing advanced features and capabilities for managing ingress in Kubernetes.

kubectl apply -f https://projectcontour.io/quickstart/contour.yaml

Upon installing Contour on the Kubernetes cluster, it will generate external-facing IP addresses. This will allow us to access the cluster. The Contour service will need to be queried to ascertain the IP address we are going to map for the ingress into the cluster. The following command will help with that:

kubectl get service envoy -n projectcontour -ojsonpath='{.status.loadBalancer.ingress[0]}'

The output from this command will be an IP address, e.g. {“ip”: “34.31.52.175”}, which will be used at various places as the base domain, suffixed with nip.io.

The installation requires a container registry to function. For this installation, we will be using the Google Artifact registry. In order to access this container registry, a secret will have to be created and configured. The command for creating the registry credentials is as follows:

kubectl --namespace "cf" create secret docker-registry image-registry-credentials
--docker-username="<docker-hub-username>"
--docker-password "<docker-hub-access-token>"

For this installation, the following values will have to be used:

kubectl --namespace "cf" create secret docker-registry image-registry-credentials --docker-server="us-central1-docker.pkg.dev" --docker-username="_json_key" --docker-password "$(awk -v RS= '{$1=$1}1' ~/Downloads/summit-labs-8ff7123608fe.json)"

Once the secret has been created, use the following Helm chart to install Korifi on the GKE cluster.

helm install korifi https://github.com/cloudfoundry/korifi/releases/download/v0.7.1/korifi-0.7.1.tgz --namespace="korifi"  && 
--set=global.generateIngressCertificates=true --set=global.rootNamespace="cf" &&
--set=global.containerRegistrySecret="image-registry-credentials" --set=adminUserName="riyengar@cloudfoundry.org" &&
--set=api.apiServer.url="api.34.31.52.175.nip.io" --set=global.defaultAppDomainName="apps.34.31.52.175.nip.io" &&
--set=global.containerRepositoryPrefix="us-central1-docker.pkg.dev/summit-labs/korifi/korifi-" && 
--set=kpackImageBuilder.builderRepository="us-central1-docker.pkg.dev/summit-labs/korifi/kpack-builder" --wait

Note: We use nip.io as the suffix for the externally available IP address that can reach the cluster. Nip.io is a wildcard DNS provider.

Once installation is completed, use the cf cli to set the API endpoint and log in to the cluster.

cf api https://api.34.31.52.175.nip.io --skip-ssl-validation

cf login

The following commands can be used to test the installation.

cf create-org active
cf target -o active
cf create-space -o active ant
cf target -o active -s ant
cf push mighty-monkey -p ~/sandbox/korifi/tests/smoke/assets/test-node-app/

Where to Begin?

Cloud Foundry Korifi is available as a fully open source project. If you’re looking for a way to:

simplify the deployment experience for your application developers;
attain operational excellence when using Kubernetes clusters.

Then, Korifi is a great tool in your arsenal. You can go through some basic Korifi tutorials on the official Cloud Foundry page, in addition to this one to learn about the best way to get started.

The post Install Cloud Foundry Korifi on Google Kubernetes Engine appeared first on The New Stack.

The Future of the Enterprise Cloud Is Multi-Architecture Infrastructure

Cheryl Hung — Fri, 28 Jul 2023 17:00:16 +0000

After years of steady declines, the costs of running a cloud data center are now soaring, due to various factors such as aging infrastructure, rising energy costs and supply chain issues.

A survey by Uptime Institute showed that enterprise data-center owners are most concerned about rising energy costs and IT hardware costs, respectively. In a recent example, Google announced increases in some of its cloud storage and network egress prices, some of which had previously been free to users.

In short, “cloud-flation” is a new reality, and developers are paying a price by having to do more with less. Organizations need to continuously understand and measure the impact of their compute to balance performance, efficiency, and design flexibility in line with budget and business goals.

What are some steps developers can take to reduce costs?

There are a few pieces of low-hanging fruit, including:

Optimize the allocation of cloud resources by analyzing usage patterns and adjusting the size of instances, storage, and databases to match the workload’s requirements.
Implement auto-scaling mechanisms that dynamically adjust the number of instances based on demand. This ensures resources are provisioned as needed, preventing over-provisioning during low-traffic periods and reducing costs.
Evaluate your data storage and database needs and choose the most cost-effective options. Use tiered storage options to move infrequently accessed data to lower-cost storage tiers.

Many companies have adopted cost monitoring and reporting, creating alerts to notify their teams of sudden spikes or anomalies. One significant movement in this direction is around FinOps, a concatenation of “finance” and “DevOps.” This emerging cloud financial management discipline enables organizations to help teams manage their cloud costs. Another example is Amazon Web Services, with its cost-optimization guide pointing to Arm-based Graviton as a way to improve price performance.

Embrace the New Paradigm

Perhaps the most important piece of fruit to be harvested, however, is to think about computing differently. And this is easier said than done.

Data centers and the cloud grew up on a single, monolithic approach to computing — one size fits all. That worked in the days when workloads were relatively few and very straightforward. But as cloud adoption exploded, so too have the number and types of workloads that users require. A one-size-fits-all environment just isn’t flexible enough for users to be able to run the types of workloads they want in the most effective and cost-efficient manner.

Today, technologies have emerged to overthrow the old paradigm and give developers and cloud providers what they need: flexibility and choice. One of its manifestations is multi-architecture, the ability of a cloud platform or service to support more than one legacy architecture and offer developers the flexibility to choose.

The Liberty of Flexibility and Choice

Flexibility to run workloads on the architecture of your choice is important for organizations for two reasons: better price performance and — a reason that is far downstream from data centers but nevertheless important — laptops and mobile devices.

Price performance often arises when organizations realize that running workloads, like web servers or databases, on Arm could be cost-effective, either for themselves or in response to customer demand. Remember those statistics I shared earlier? They’re a big motivation in this context. And this is why the flexibility to choose the right compute for the right workload is critical.

Arm has a legacy of delivering cost-effective, power-efficient computing solutions for mobile technologies for over 30 years. During those years, other sectors, including infrastructure, have embraced these benefits. Today every major public cloud provider runs Arm in some form for various workloads. For example, 48 of the top 50 AWS customers run on Arm Neoverse-based AWS Graviton. The cost benefits of deploying an Arm-based server compared to a traditional one are significant.

The second motivation relates to laptops, which are increasingly being run by power-efficient Arm processors. Developers using these machines began wanting to develop on Arm all the way from their laptop into the cloud. They’re embracing Arm64 for their production and development environments because it makes it easier to troubleshoot and reproduce bugs locally earlier in the development process. And with Arm-based processors now available in every major cloud, Arm-native developers need a multi-arch aware toolchain to safely deploy their code.

Doing the Work

We see three main steps to adopting a multi-arch infrastructure: inform, optimize and operate:

Inform involves taking an inventory of your entire software stack, including finding the operating systems, images, libraries, frameworks, deployment and testing tools, monitoring solutions, security measures and other components you rely on. Make a comprehensive list and check each item for Arm support. Additionally, identify the most resource-intensive components in terms of compute, as these will be your hotspots for optimization.
Optimize allows you to provision a test Arm environment easily. You can spin it up on a public cloud and proceed to make necessary upgrades, changes and conditional statements to ensure compatibility with different architectures. It is crucial to determine the key metrics you care about and conduct performance testing accordingly. Simultaneously, consider upgrading your CI/CD processes to accommodate more than one architecture. Keep in mind that this stage may require many iterations and that you can start migrating workloads before completing all infrastructure upgrades.
Operate within your chosen environment. For instance, in Kubernetes, decide how you will build your cluster. Consider whether to prioritize migrating control nodes or worker nodes first, or opting for a mixture of both. This decision will depend on your software stack, availability and initial workload choices. Modify your cluster creation scripts accordingly.

With the infrastructure in place, you can proceed to deploy your workloads.

Conclusion

A few years ago, migrating to Arm meant hunting for specific versions of software and libraries that support Arm. This support was limited and uneven. But over time, the ecosystem matured quickly, making the migration easier and ensuring software on Arm “just works.” Some notable vendors, such as Redis, MongoDB and Nginx, have come out with Arm support in recent years. This improved ecosystem support contributes to the ease of migration.

The high-performance, energy-efficient benefits of transitioning away from running only on legacy architectures are such that big companies like Airbnb are undergoing this migration, even though they know it’s a multi-year journey for them. Airbnb knows they have a complex infrastructure that assumes everything is x86, so they need to adjust and test all their systems to ensure compatibility with both x86 and Arm architectures. In Airbnb’s case, the longtime benefits are worth the up-front costs and time investment.

Melanie Cebula, a software engineer at Airbnb, put it this way at a 2022 KubeCon-CloudNativeCon presentation: “So Arm64 is in the cloud. It’s in all the clouds, and it’s cheaper, and that’s why we’re doing the work.”

Further solidifying these points, Forrester’s Total Economic Impact study, including interviews and survey data, revealed a cloud infrastructure cost savings of up to 80%, with 30% to 60% lower upfront infrastructure costs.

The key to accelerating and unlocking even more innovation in the world today is for workloads to run on the best hardware for the user’s price-performance needs. Arm technology continues to push the boundaries of the performance-efficiency balance without the developers having to worry about whether their software is compatible.

The post The Future of the Enterprise Cloud Is Multi-Architecture Infrastructure appeared first on The New Stack.

The Kubernetes Inner Loop with Cloud Foundry Korifi

Ram Iyengar — Wed, 26 Jul 2023 10:00:16 +0000

Certain developer workflows can be tedious. One example is working locally with containers. It brings to mind an old XKCD comic.

When working locally, the process of building and deploying containers can hinder the development experience and have a negative impact on team productivity. The industry refers to local workflows for developers as the “Inner Loop.” Cloud native development teams can greatly benefit from a reliable inner development loop framework. These frameworks facilitate the iterative coding process by automating repetitive tasks such as code building, containerization, and deployment to the target cluster.

Key expectations for an inner dev loop framework include:

Automation of repetitive steps, such as code building, container creation, and deployment to the desired cluster;
Seamless integration with both remote and local clusters, while providing support for local tunnel debugging in hybrid setups;
Customizable workflows to enhance team productivity, allowing for the configuration of tailored processes based on team requirements.

Cloud native applications introduce additional responsibilities for developers, including handling external dependencies, containerization, and configuring orchestration tools such as Kubernetes YAML. These tasks increase the time involved in deployments and also introduce toil — ultimately hindering productivity.

In this tutorial, you will learn how to simplify inner-loop development workflows for your software development teams by making use of a Cloud Foundry abstraction over kind clusters. This abstraction named Korifi is fully open source and is expected to work for all languages and frameworks. Using Cloud Foundry Korifi will help developers push their source code to a cluster and the PaaS will return a URL/endpoint that the developer can then use to access the application or API.

Using ‘kind’ for Local Kubernetes

In the context of Kubernetes, a kind cluster refers to a lightweight, self-contained, and portable Kubernetes cluster that runs entirely within a Docker container. It is primarily used for local development and testing purposes. The main characteristics of kind clusters that make them suitable for local development are the following.

Kind clusters consume fewer system resources compared to running a full-scale Kubernetes cluster.
Kind clusters eliminate the need for complex cluster provisioning and configuration, making it easier to bootstrap a cluster.
By matching the desired specifications, including the version of Kubernetes, network settings, and installed components — kind clusters provide a way to replicate production-like Kubernetes environments locally.

Installing Korifi on kind Clusters

First, please install the following tools before commencing the tutorial. They’re all required at various stages of the process.

kubectl
cf CLI v8.5 or greater
Helm
Docker desktop
kbld

Set the following environment variables.

export ROOT_NAMESPACE="cf"

export KORIFI_NAMESPACE="korifi-system"

export ADMIN_USERNAME="kubernetes-admin"

export BASE_DOMAIN="apps-127-0-0-1.nip.io"

Note: nip.io is a wildcard DNS for any IP address. It is powered by PowerDNS with a simple, custom PipeBackend written in Python. In this particular case, apps-127-0-0-1.nip.io will resolve to 127.0.0.1, which will direct requests to the localhost.

Use the following configuration to create the kind cluster. The extraPortMappings field maps additional ports between the container and the host machine. Here, it specifies that container ports 80 and 443 should be mapped to the same host ports 80 and 443 respectively using TCP.

Create root namespaces that will be used in the cluster. It also includes labels for pod security policy enforcement.

Install the following dependencies: cert-manager, kpack, and Contour.

Cert manager is installed with a single kubectl apply command, with the latest release referenced in the path to the yaml definition.

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.12.0/cert-manager.yaml

Cert manager is an open source certificate management solution designed specifically for Kubernetes clusters.

Kpack is installed with a single kubectl apply command, with the latest release referenced in the path to the yaml definition.

kubectl apply -f https://github.com/pivotal/kpack/releases/download/v0.11.0/release-0.11.0.yaml

Kpack is an open source project that integrates with Kubernetes to provide a container-native build process. It consumes Cloud Native Buildpacks to export OCI-compatible containers.

Contour is an open source Ingress controller for Kubernetes that is built on top of the Envoy proxy. An Ingress controller is a Kubernetes resource that manages the inbound network traffic to services within a cluster. It acts as a gateway and provides external access to the services running inside the cluster.

kubectl apply -f https://projectcontour.io/quickstart/contour.yaml

Contour specifically focuses on providing advanced features and capabilities for managing ingress in Kubernetes.

The installation requires a container registry to function. When using Korifi on a local kind cluster, the use of Docker Hub is recommended. In order to access this container registry, a secret will have to be created and configured.

Use the following Helm chart to install Korifi on the kind cluster.

Once installed, push an app using the following steps. First, authenticate with the Cloud Foundry API.

cf api https://api.localhost --skip-ssl-validation
cf login

Next, create a cf org and a cf space:

cf create-org acme-corp
cf target -o acme-corp
cf create-space -o acme-corp bu-rning
cf target -o acme-corp -s bu-rning

And finally, deploy the application:

cf push beautiful-bird -p ~/sandbox/korifi/tests/smoke/assets/test-node-app/

The single cf push command is used to deploy an application to the kind cluster that has Korifi installed on it.

An Alternate Way to Install

The community has contributed a script that will help install Korifi on a kind cluster. The use of this script will help speed things up considerably. This method is recommended if you’re trying Korifi for the first time.

git clone https://github.com/cloudfoundry/korifi.git
cd korifi
./scripts/deploy-on-kind.sh demo-korifi

When the installation completes, apps can be pushed using the same steps as above.

Why Pursue Efficiency in Local Development?

Local development is the first workflow that a developer always works with. It is paramount that this step be accurate and efficient in order to keep developers productive. While efficiency can vary depending on the specific development context. It’s essential to experiment with different approaches and tools to find what works best for your team and project.

An optimized build and deployment process is at the center of a good developer experience, and this is true for local environments too. Streamlined build and deployment pipelines go a long way in minimizing time spent which ensures faster iterations.

When using Kubernetes, Cloud Foundry Korifi is one way to make it faster and more efficient for software developers and operators to manage app lifecycles. We encourage you to give it a try and work with the community to make it better.

The post The Kubernetes Inner Loop with Cloud Foundry Korifi appeared first on The New Stack.

Y’all Against My Lingo? Why Everyone Hates on YAML

Simon Bisson — Fri, 21 Jul 2023 10:00:34 +0000

We’ve always needed software configuration files, especially when tuning code to work for our purposes. Way back when, when I used to run the tech side of a UK national ISP, much of my day used to be building and managing all kinds of config files, across everything from PCs to routers, with configurations stored in source control systems.

Some were easy to use, others obtuse and complex, where a simple missed character could take out email service for thousands of users. Remember the old sysadmin joke, that the only way to get a working sendmail configuration was to have a cat walk across a keyboard?

That was thirty years ago. Surely the industry should have moved on by now.

Apparently not.

We still seem to have all the same problems with writing and managing configuration files, whether they’re designed to be human-readable or left for the machines. In fact, things seem to have become worse, with complex node-based configurations written in what is almost, but not quite, a set of key value pairs. And what should we blame this on?

YAML.

Originally named “Yet Another Markup Language,” its name has morphed with its role into the official “YAML: Ain’t Markup Language,” though most of us haven’t read the memo. That name change was meant to reflect its role as a data serialization language, not as an HTML-like markup language. Instead, it’s been designed to be a JSON-like language that uses markup to embody data structures and their content. Or at least that’s what it’s intended to be.

In practice, however, it’s clearly not. Perhaps the biggest problem with YAML is that it’s everywhere. You must have seen the meme based on an old cartoon: a man is with a fortune teller who is looking into a crystal ball. She looks at him in horror, saying “YAML, I see so much YAML.”

If I Had a Hammer

When ubiquity is a joke, it’s time to worry, as it means that the world has a hammer, and everything is a nail and that people have noticed what’s happening and have realized that they don’t like it.

That ubiquity has pushed YAML right across the spectrum of applications and services, targeting everything from Kubernetes to consumer applications. YAML might work well for managing enterprise services, where we can write our own custom configurators that output ready-to-use files, but is it suitable for mom-and-pop and their nascent smart home? Friends rant about configuring the popular Home Assistant IoT hub, trying to remember the syntax for each type of device, and managing the various config files they need — often with basic text editors on Raspberry Pi single-board computers.

Why Is YAML so Complex?

One of the biggest issues with YAML is formatting. It’s strict, and as a result, illogical at first glance, much like making lists in markdown. White space shouldn’t need to matter in today’s world. I learned (and used) FORTRAN many, many, years ago and even then, counting spaces and tabs was tedious and problematic. YAML’s block formatting is at best rigid, especially when you have to remember to terminate a line with a space. Things get more complex when nodes in a YAML document have complex mappings and content quickly becomes hard to read.

Without a linter or syntax highlighting it’s very easy to make a mistake. And not everyone wants to run something like Visual Studio Code over ssh for a basic task like adding a new lightbulb to a smart home. What works for Kubernetes and for platform engineering teams doesn’t scale down to a hobbyist trying to get around the limitations of Siri or Alexa. There are underlying issues here, as YAML is designed to be loosely typed and works best with languages like JavaScript. If you’re parsing YAML configurations in C++ or C# you need to be sure that you’re handling types correctly.

Part of the problem is perhaps that we’re still using many of the same tools I was using to write FORTRAN back in the early 1990s. Basic editors like vi and vim remain the main editors on devices like the Pi, and there’s not really any way to use them to force language-specific formatting rules. They’re simple, quick editors, and that makes it possible to make trivial YAML errors that are hard to debug — as each edit cycle means restarting processes and waiting to see whether programs crash or that your configuration does what you intended.

Things get more complicated with string data in YAML. You don’t have to put quotes around them, and it’s possible for a string to get confused with an intrinsic value. This has become known as the Norway Problem, where when using two-character country codes, Norway is evaluated as a Boolean and set to false by many parsers if not explicitly quoted.

YAML is intentionally complex. Its specifications take years to write and fill entire books, with a lack of test suites to help parser designers. That leaves many parsers lagging the specification, so you’re left using trial and error to determine what YAML documents and formats work for you. And that leaves a worrying gap between authoring tools and parsers: what if your syntax highlighting parses a different version of the YAML specification from the parser in the application you just installed?

There’s a bigger question here: should data serialization languages be used for configuration files? You could argue that they make it easier for applications to read data, parsing the basic primitives used by both YAML and JSON, but what’s good for machines is often bad for people. YAML constructs are often complex, and there’s the risk that different libraries may parse them differently — whether reading or generating YAML for you.

Getting parsers to agree is critical. It’s the heart of specifications like XNL and SGML, where parser design goes hand in hand with the underlying language grammar. YAML’s deliberately informal approach makes it impossible to have a standard parser, you only have to look at the errors that show up on the YAML test matrix — which itself is based on only one version of the specification, with a significant number of processors (many of which are still in use in old and unmaintained code) still using older versions of the YAML specification.

The result is a specification and a tool that’s drowning under the weight of its own complexity. Even JSON is simpler to use and parse — and its specification is only a few pages long. That’s not to say there aren’t good points to YAML. Describing configurations as node trees makes sense, even if it is hard to visualize.

In the decade or so it took to go from YAML 1.1 to 1.2 the industry has moved on. We’re thinking in terms of platform engineering, and while configuration as code is more important than ever, there are new entrants with newer ways of thinking that make more sense when managing complex distributed systems.

So, what do I prefer? I’ve become a convert to procedural approaches to code-based configuration. Languages like Azure’s Bicep and platforms like Pulumi’s make much more sense to me. I can use familiar constructs to build the infrastructures I need, and the same time put in place and manage the platforms my code needs. Sure, they might compile my infrastructure code down to declarative descriptions in JSON and YAML, but I don’t need to worry about the outputs. I even get to use flow control and loops.

There’s an added benefit to using tools that are based on familiar languages like JavaScript and Python. Not only are they easy to learn, but they can use off-the-shelf tools to simplify the development process and to build infrastructure and platform configuration into your existing toolchain and software development lifecycle. Not only that, but you’re able to use familiar linters and testing tools to ensure code is syntactically correct and that it’s bug-free.

As Aaron Kao, Vice President of marketing at Pulumi notes, it’s also a world where automation is increasingly important. We’re building multicloud services, where engineering teams need to control much more than before, and where we need to be able to let teams build what they want, how they want, but keeping them within limits by providing guardrails and guidelines, something that’s nearly impossible to do in YAML. We need to be able to manage tens, hundreds, even thousands of configurations, and controlling that amount of YAML is near impossible.

Of course, that all leads to another argument that general-purpose data description languages like YAML are simply too high-level to deliver the results we want. Instead, we should be thinking about working with more focused domain-specific languages, able to encompass the idiosyncrasies and specific requirements of the platforms they target.

Perhaps we’ll be better off in a world where there’s not one hammer and one nail, but a toolbox that contains the right tool for the job you want to do. Maybe it’s one where we get down to the nitty-gritty of specialized tools, or maybe it’s one where we simply write code and let it compile to manage the command lines, the APIs, and, yes, the configuration files we need.

But I shouldn’t leave things on a low note. After all, there’s one good thing to say about YAML: it’s still better than working with sendmail.conf.

The post Y’all Against My Lingo? Why Everyone Hates on YAML appeared first on The New Stack.

How Data Sovereignty and Data Privacy Affect Your Kubernetes Adoption

Joe Mann — Tue, 18 Jul 2023 16:37:33 +0000

In May, Meta was fined a record $1.3 billion for violating data sovereignty regulations by transferring user data from the European Union to the United States. While all companies might be affected by data privacy and data sovereignty rules, if you work in a regulated industry or government role, you have to take extreme care to ensure that data is managed and protected within sovereign boundaries.

Application modernization and the shift to Kubernetes and cloud native technologies create additional complications as you work to address data sovereignty requirements and simultaneously adapt to the rapidly evolving cloud native ecosystem.

Many organizations turn to public cloud services to help jumpstart cloud native efforts, but public clouds might not satisfy data sovereignty requirements since they are unable to guarantee that data will remain within a country or region and stored on infrastructure operated by sovereign citizens.

By ensuring regulatory compliance for a particular jurisdiction, sovereign clouds can satisfy data sovereignty and other regulatory requirements. If your organization operates in multiple jurisdictions, you can consider using sovereign clouds to address IT needs in those regions, especially regions where your operations aren’t large enough to justify the expense of a data center.

However, you need to make sure that any sovereign cloud you use also offers full support for the Kubernetes ecosystem, including management tools, so your modernization efforts aren’t hampered.

Regulated Industries, Sovereign Clouds and Kubernetes

According to this year’s State of Kubernetes survey by VMware, Kubernetes offers significant operational and business benefits for companies that adopt it, so there are good reasons for regulated industries to modernize.

In another data point from the survey, Kubernetes stakeholders said they are experiencing both direct and indirect business benefits that are hard to ignore.

As your applications are modernized, you need Kubernetes platforms and tools everywhere you operate. Choosing the right management tools is essential. For maximum benefits and minimum friction, you need the same tools everywhere, and those tools have to help ensure you don’t violate data privacy or data sovereignty regulations. Three attributes are especially important for regulated industries:

Ability to standardize security policies
Data protection with fine-grained control
Automation for Kubernetes operations at scale

These capabilities can help you strengthen security, avoid data management mistakes that could violate sovereignty and avoid misconfigurations due to user errors.

A SaaS-based hub for multicloud, multicluster Kubernetes management is often a great way to simplify Kubernetes operations and deliver consistency and automation. However, if you’re in an industry concerned about data privacy and data sovereignty, SaaS might not be an option. In that case, you need tools that can address the unique concerns of your industry and that can operate everywhere you do.

Public Sector

Governments are tasked with storing a wide range of critical data — from the tax records of citizens to health information to national security secrets — and ensuring that all data is maintained securely within national borders and protected from both cybercriminals and international espionage. Software modernization is critical to these efforts. Although governments often face unique constraints, they need access to the same cloud technologies and cloud native methods as the private sector.

According to this year’s State of Kubernetes survey, government entities faced greater Kubernetes management challenges than any other industry, particularly meeting security and compliance requirements(81% of government respondents see this as a challenge, versus 52% overall) and integrating with current infrastructure(45% vs. 41%).

This is almost certainly due to the critical importance of security and the age and complexity of existing infrastructure combined with inadequate internal experience and expertise(65% vs. 57%). Governments are much more likely to operate on-premises or in a single public cloud, likely a sovereign cloud.

Financial Services

Global banks and financial services companies face significantly different challenges due to the need to operate in many jurisdictions, as well as substantial increases in regulation aimed at the industry. A large financial services company might have to comply with sovereignty laws in dozens of countries, and the regulatory environment is more onerous than in most other industries.

According to a 2021 report, 62 countries had imposed a total of 144 restrictions, double the number of restrictions that were in place just five years earlier. New regulations govern both personal data and finance data, including banking, credit reporting, financial, payment, tax, insurance and accounting.

The State of Kubernetes survey found that financial services companies face greater than average challenges (although less than the public sector) when it comes to integrating with current infrastructure and meeting security and compliance requirements. Inadequate internal experience and expertise are also a concern.

Financial services companies prefer commercial, third-party Kubernetes management tools versus open source tools and are more likely to operate in multiple clouds (86% vs. 76% of all respondents). Presumably, this includes sovereign clouds, since “enable data sovereignty” was given as a reason for multicloud operations by 36%, more than any industry outside of government.

Healthcare

While few healthcare organizations face the kind of multi-jurisdictional sovereignty complexities of global financial services, mandates for protecting patient data make data sovereignty and data security just as critical. As a result of the Covid-19 pandemic, new applications and functionality are being added at a faster rate than in the past.

For example, hospitals might want to deploy Kubernetes in multiple clinics in order to run new containerized software for booking appointments and scheduling vaccinations. Additional challenges in healthcare result from the need to connect new and old systems as well as cope with the unusually high rate of mergers and acquisitions in the healthcare industry.

As the walls of the traditional data center evaporate, the risk of data loss and the challenges of data privacy and sovereignty increase. Healthcare institutions must evolve legacy software and processes to meet new digital delivery demands while staying compliant with patient and data privacy regulations like HIPAA in the United States, GDPR in Europe, etc.

Listen to the Unexplored Territory podcast and register for our upcoming webinar on Aug. 1 to learn how VMware Tanzu Mission Control is being used to address the unique challenges these industries face.

The post How Data Sovereignty and Data Privacy Affect Your Kubernetes Adoption appeared first on The New Stack.

The Future of VMs on Kubernetes: Building on KubeVirt

Romain Decker — Tue, 18 Jul 2023 11:00:38 +0000

Remember when virtualization was the hot new thing? Twenty years ago, I was racking and deploying physical servers at a small hosting company when I had my first experience of virtualization.

Watching vMotion live-migrate workloads between physical hosts was an “aha” moment for me, and I knew virtualization would change the entire ecosystem.

Perhaps then it’s not a surprise that I became an architect at VMware for many years.

I had a similar “aha” moment a decade later with containers and Docker, seeing the breakthrough it represented for my dev colleagues. And in the years after it was clear that Kubernetes presented a natural extension of this paradigm shift.

I’m sure many of you reading this will have been through similar awakenings.

Despite 20 years of innovation, reality has a way of bringing us back down to earth. Out in the enterprise, the fact is we have not completely transitioned to cloud native applications or cloud native infrastructure.

Millions of VMs are Here to Stay

While containerized apps are gaining popularity, there are still millions of VM-based applications out there across the enterprise. A new technology wave doesn’t always wipe out its predecessor.

It may be decades before every enterprise workload is refactored into containerized microservices. Some never will be: for example, if their code is too complex or too old.

So we have a very real question: How do we make virtualization and containers coexist within the enterprise?

We have a few options:

Keep it all separate: Most enterprises today run separate virtual machine and container infrastructures. Of course, this is inefficient and risky: It requires different hardware, multiple teams, sets of policies, access controls, network and storage configurations, and much more.
Bring containers into the virtualized infrastructure: You can run containers within VMs in the virtualization infrastructure. But this is a deeply nested environment, which means inefficiency and complexity. It’s not easy to scale, and you often need proprietary solutions to make it happen.
Bring VMs into the Kubernetes infrastructure: This is a more sustainable approach if Kubernetes is the future of your infrastructure. Kubernetes has demonstrated its ability to provide a highly reliable, scalable and extensible platform through the Kubernetes API, with the advantages of declarative management.

And indeed there is a solution to make this third option possible: KubeVirt.

KubeVirt: Making VMs a First-Class Citizen in Kubernetes Clusters

KubeVirt is a Cloud Native Computing Foundation (CNCF) incubating project that, coincidentally, just hit version 1.0 last week.

Leveraging the fact that the kernel-based virtual machine (KVM) hypervisor is itself a Linux process that can be containerized, KubeVirt enables KVM-based virtual machine workloads to be managed as pods in Kubernetes.

This means that you can bring your VMs into a modern Kubernetes-based cloud native environment rather than doing an immediate refactoring of your applications.

KubeVirt under the Hood

KubeVirt brings K8s-style APIs and manifests to drive both the provisioning and management of virtual machines using simple resources, and provides standard VM operations (VM life cycle, power operations, clone, snapshot, etc.).

Source: https://kubevirt.io/user-guide/architecture

Users requiring virtualization services are speaking to the virtualization API (see the diagram below), which in turn is speaking to the Kubernetes cluster to schedule the requested virtual machine instances (VMIs).

Scheduling, networking and storage are all delegated to Kubernetes, while KubeVirt provides the virtualization functionality.

KubeVirt delivers three things to provide virtual machine management capabilities:

A new custom resource definition (CRD) added to the Kubernetes API
Additional controllers for cluster-wide logic associated with these new types
Additional daemons for node-specific logic associated with new types

Because virtual machines run as pods in Kubernetes, they benefit from:

The same declarative model as Kubernetes offers its resources.
The same Kubernetes network plugins to enable communication between VMs and other pods or services in the cluster.
Storage options, including persistent volumes, to provide data persistence for VMs.
Kubernetes’s built-in features for high availability and scheduling: VMs can be scheduled across multiple nodes for workload distribution, affinity and anti-affinity rules, etc.
Integration with the Kubernetes ecosystem: KubeVirt seamlessly integrates with other Kubernetes ecosystem tools and features, such as Kubernetes role-based access control (RBAC) for access control, monitoring and logging solutions, and service mesh technologies.

KubeVirt in the Wild: What’s the Catch?

KubeVirt sounds amazing, doesn’t it? You can treat your VMs like just another container.

Well, that’s the end goal: getting there is another matter.

Installing KubeVirt: Manual Configuration

KubeVirt is open source, so you can download and install it today.

But the manual installation process can be time-consuming, and you may face challenges with integrating and ensuring compatibility with all the necessary components.

To start, you need a running Kubernetes cluster, on which you:

Install the KubeVirt operator (which manages the KubeVirt resources)
Deploy the KubeVirt custom resource definitions (CRDs)
Deploy the KubeVirt components (pods, services and configurations)

You need to do this for each cluster. While a basic installation allows you to create simple virtual machines, advanced features such as live migration, cloning or snapshots require you to deploy and configure additional components (snapshot controller, Containerized Data Importer, etc).

The Challenge of Bare Metal

We mentioned above about the inefficiency of “nested” infrastructures. Although it’s technically possible to run KubeVirt nested on top of other VMs or public cloud instances, it requires software emulation, which has a performance impact on your workloads.

Instead, it makes a lot of sense to run KubeVirt on bare metal Kubernetes — and that, traditionally, has not been easy. Standing up a bare metal server, deploying the operating system and managing it, deploying Kubernetes on top — the process can be convoluted, especially at scale.

Operations: Challenging UX

When it comes to Day 2 operations, KubeVirt leaves the user with a lot of manual heavy lifting. Let’s look at a couple of examples:

First, KubeVirt doesn’t come with a UI by default: it’s all command line interface (CLI) or API. This may be perfectly fine for cluster admins that are used to operating Kubernetes and containers, but it may be a challenging gap for virtualization admins that are used to operating from a graphical user interface (GUI).

Even an operation as simple as starting or stopping a virtual machine requires patching the VM manifest or using the virtctl command line.

Another example is live migration: To live migrate a VM to a different node, you have to create a VirtualMachineInstanceMigration resource that tells KubeVirt what to do.

apiVersion: kubevirt.io/v1
kind: VirtualMachineInstanceMigration
metadata:
  name: live-migrate-webapp01
  namespace: default
spec:
  vmiName: webapp01

If you’re running at scale, performing many such operations each day across multiple clusters, the effort can be considerable. Building out scripting or automation can solve that, but itself increases the learning curve and adds to the setup cost.

Introducing Palette Virtual Machine Orchestrator

We saw an opportunity to take all the goodness that KubeVirt offers, address all these issues, and create a truly enterprise-grade solution for running VMs on Kubernetes.

And today we’ve announced just that: Meet Virtual Machine Orchestrator (VMO), new in version 4.0 of our Palette Kubernetes management platform.

VMO is a free capability that leverages KubeVirt and makes it easy to manage virtual machines (VMs) and Kubernetes containers together, from a single unified platform.

Here are the highlights.

Simplified Setup

If you’re not familiar with Palette, one of the things that makes it unique is the concept of Cluster Profiles, preconfigured and repeatable blueprints that document every layer of the cluster stack, from the underlying OS to the apps on top, which you can deploy to a cluster with a few clicks.

We’ve built an add-on pack for VMO that contains all the KubeVirt components we talked about earlier, and much much more, including:

Snapshot Controller to provide snapshot capabilities to the VMs and referenced volumes
Containerized Data Importer (CDI) to facilitate enabling persistent volume claims (PVCs) to be used as disks for VMs (as DataVolumes).
Multus to provide virtual local area network (VLAN) network access to virtual machines
Out-of-the-box Grafana dashboards to provide monitoring for your VMs

Palette can not only build a cluster for you, but deploy the VM management capability preconfigured into that cluster thanks to the Cluster Profile. The result is much less manual configuration effort.

What’s more, Palette’s multicluster decentralized architecture makes it easy to deliver the VMO capability easily to multiple clusters instead of having to enable it manually per cluster.

Streamlined Bare Metal Experience

We talked about the importance of running KubeVirt on bare metal, and how hard it is to provision and manage bare metal servers for Kubernetes.

Well, Palette was built to simplify the way you deploy Kubernetes clusters in all kinds of environments, and bare metal is no exception.

There are many ways of orchestrating bare-metal servers, but one of the most popular ones is Canonical MAAS, which allows you to manage the provisioning and the life cycle of physical machines like a private cloud.

We’re big fans of MAAS, and we’ve included Canonical MAAS and our MAAS Provider for Cluster API in our VMO pack to automate the deployment of the OS and Kubernetes on bare metal hardware. It makes deploying a new Kubernetes bare metal cluster as easy as cloud.

Of course, you can use your own bare metal provider if you don’t want to use MAAS.

Powerful Management Features and Intuitive Interface

Once everything is up and running, Palette’s always-on declarative management keeps the entire state of your cluster as designed, with automated reconciliation loops to eliminate configuration drift. This covers your VM workloads too.

While DIY KubeVirt leaves you on your own when it comes to some of the more powerful features you’ve come to expect in the world of virtualization, Palette provides a long list of capabilities out of the box.

These include VM live migration, dynamic resource rebalancing and maintenance mode for repairing or replacing host machines, and the ability to declare a new VLAN from the UI. You also get out-of-the-box monitoring of clusters, nodes and virtual machines using Prometheus and Grafana.

And while with DIY KubeVirt the platform operator (that’s you) must select, install and configure one of the open source solutions to get a UI, Palette already looks like this:

For the Future of Your VMs

As you can tell, we’re pretty excited about the launch of Palette 4.0 and the Virtual Machine Orchestrator feature.

We’ve built on the open source foundations of KubeVirt, and delivered a simpler and more powerful experience for enterprises.

The result? Organizations that have committed to Kubernetes on their application modernization journey, and have already invested in Kubernetes skills and tools, will benefit from a single platform to manage both containers and VMs.

And that’s not just as a temporary stepping stone for the applications that will be refactored, but also for hybrid deployments (applications that share VMs and containers) and for workloads that will always be hosted in VMs. Even after nearly 25 years of virtualization, VMs are certainly not dead yet.

To find out more about Palette’s VMO feature, check out our website or our docs site. We’d love to get your feedback.

The post The Future of VMs on Kubernetes: Building on KubeVirt appeared first on The New Stack.

SCARLETEEL Fine-Tunes AWS and Kubernetes Attack Tactics

Steven J. Vaughan-Nichols — Thu, 13 Jul 2023 14:08:55 +0000

With SCARLETEEL, attackers can exploit a vulnerable Kubernetes container and pivot to going after the underlying cloud service account.

Back in February, the Sysdig Threat Research Team discovered a sophisticated cloud attack in the wild, SCARLETEEL, It exploited containerized workloads and leveraged them into AWS privilege attacks. That was bad. It’s gotten worse. Now, Sysdig has found it targeting more advanced platforms, such as AWS Fargate.

Reiterating previous strategies, the group’s recent activities involved compromising AWS accounts by exploiting weak compute services, establishing persistence, and deploying cryptominers to secure financial gain. If unchecked, the group was projected to mine approximately $4,000 per day.

But, wait, there’s more! SCARLETEEL is also in the business of intellectual property theft.

During the recent attack, the group discovered and exploited a loophole in an AWS policy, allowing them to escalate privileges to AdministratorAccess, thereby gaining total control over the targeted account. They have also expanded their focus to Kubernetes, intending to scale up their attacks.

The recent attack brought some new features to the fore. These included:

Scripts capable of detecting Fargate-hosted containers and collecting credentials.
Escalation to Admin status in the victim’s AWS account to start EC2 instances running miners.
Improved tools and techniques to enhance their attack capabilities and evasion techniques.
Exploitation attempts of IMDSv2 to retrieve tokens and AWS credentials.
Multiple changes in C2 domains, leveraging public services for data transmission.
Use of AWS CLI and pacu on exploited containers to increase AWS exploitation.
Use the Kubernetes Penetration Testing tool peirates to exploit Kubernetes further.

SCARLETEEL has also shown a particular fondness for AWS credential theft by exploiting JupyterLab notebook containers deployed in a Kubernetes cluster. This approach involved leveraging several versions of credential-stealing scripts, employing varying techniques and exfiltration endpoints. These scripts hunt for AWS credentials by contacting instance metadata (both IMDSv1 and IMDSv2), in the filesystem, and within Docker containers on the target machine, regardless of their running status.

Interestingly, the exfiltration function employed uses shell built-ins to transmit the Base64 encoded stolen credentials to the C2 IP Address, a stealthier approach that evades tools that typically monitor curl and wget.

By manipulating the “–endpoint-url” option, the group also redirects API requests away from default AWS services endpoints, preventing these requests from appearing in the victim’s CloudTrail. Given the opportunity, it will download and run Mirai Botnet Pandora, a Distributed Denial of Service (DDoS) malware program,

After collecting the AWS keys, SCARLETEEL automated reconnaissance in the victim’s AWS environment. A misstep in the victim’s user naming convention allowed the attackers to bypass a policy that would have otherwise prevented access key creation for admin users.

Once admin access was secured, SCARLETEEL focused on persistence, creating new users and access keys for all users in the account. With admin access, the group then deployed 42 instances of c5.metal/r5a.4xlarge for cryptomining.

Although the noisy launch of excessive instances led to the attacker’s discovery, the assault did not stop there. The attacker turned to other new or compromised accounts, attempting to steal secrets or update SSH keys to create new instances. In the event, the lack of privileges thwarted further progression.

Still, this is a disturbing attack. “The combination of automation and manual review of the collected data makes this attacker a more dangerous threat,” the report author, Alessandro Brucato, Sysdig Threat Research Engineer. Pointed out. “It isn’t just nuisance malware, like a crypto miner is often thought of, as they are looking at as much of the target environment as they can.”

The SCARLETEEL operation’s continued activity underscores the need for multiple defensive layers, including runtime threat detection, response, Vulnerability Management, cloud security posture management (CSPM), and cloud infrastructure entitlement management (CIEM). The absence of these layers could expose organizations to significant financial risks and data theft. To deal with attackers like SCARLETEEL, it’s all hands and tools on deck.

The post SCARLETEEL Fine-Tunes AWS and Kubernetes Attack Tactics appeared first on The New Stack.

DevOps Has Won, Long Live the Platform Engineer

Murli Thirumale — Tue, 11 Jul 2023 18:03:42 +0000

In the world of software development, the concept of DevOps has been so successful that even talking about it as a practice sounds old-fashioned. But while it may be time to declare the old idea of DevOps dead, that speaks less to its demise than to its success.

A decade ago, DevOps was a cultural phenomenon, with developers and operations coming together and forming a joint alliance to break through silos. Fast forward to today and we’ve seen DevOps further formalized with the emergence of platform engineering. Under the platform-engineering umbrella, DevOps now has a budget, a team and a set of self-service tools so developers can manage operations more directly.

The platform engineering team provides benefits that can make Kubernetes a self-service tool, enhancing efficiency and speed of development for hundreds of users. It’s another sign of the maturity and ubiquity of Kubernetes. Gartner says that within the next three years, four out of five software engineering organizations will leverage platform teams to provide reusable services and tools for application delivery.

Platform Engineering Is the New Middleware

As developers have multiplied from the hundreds to the thousands and apps have proliferated, the old concept of middleware — an app server that was ticket-based, but always on-call — is now occupied by platform engineering with a self-service model for developers.

Why this matters: During the awkward adolescent phase of DevOps, there was a lot of experimentation and deployment of new technologies, but the technologies had not coalesced. Now, modern apps have settled in, using containers and storage, networking and security run on Kubernetes in a cloud native manner.

Developers don’t use ticketing anymore. They expect elastic infrastructure that they use and deploy using the platform maintained and run by the platform engineer. This shift in maturity improves responsiveness. Developers can make changes to the app they are working on quickly and drive an app to production very, very fast. With the developer in charge, time for both development and deployment has gone down dramatically.

Portworx enabled T-Mobile to reduce application deployment time to hours, down from six months. Like T-Mobile, enterprises have thousands of developers that require this “self-service” or on-demand access to storage and data services, which platform engineering teams strive to deliver at scale.

As a replacement for IT, the platform engineering group is anchored in two sets of technologies — cloud native technologies and modern databases and data services like Postgres, Redis, Cassandra, Kafka, even streaming services like Spark, all being offered as a service by the platform team to developers.

Key services offered by platform engineers that otherwise would require more and more Kubernetes expertise by users include Kubernetes distribution itself, whether it’s OpenShift or Google Kubernetes Engine (GKE) or Elastic Kubernetes Service (EKS) or Rancher. Security is another important service, with platforms like Prisma Cloud or Sysdig as examples.

Another is data on Kubernetes — to manage storage resources, backup, disaster recovery, and databases and data services underneath the auspices of Kubernetes. At Portworx, we see the efficiencies firsthand, with several of our customers employing a handful of platform engineers to serve hundreds and hundreds of users.

Making Kubernetes Invisible — Focusing on the ‘What’ vs. the ‘How’

When a technology becomes ubiquitous, it starts to become more invisible. Think about semiconductors, for example. They are everywhere. They’ve advanced from micrometers to nanometers, from five nanometers down to three. We use them in our remote controls, phones and cars, but the chips are invisible and as end users, we just don’t think about them.

It’s the same with Kubernetes. In the enterprise, Kubernetes is becoming embedded in more and more things, and the self-service paradigm makes it invisible to the users. Until now in DevOps, every developer needed to know Kubernetes. Now a developer needs to use it, but only the platform engineer needs to really know it.

Platform engineering delivers a beautiful gift to developers who no longer have to strain under the burden of seeing and understanding Kubernetes at a granular level as part of their daily jobs. As Kubernetes continues to flourish, it helps narrow a persistent skills gap and contributes meaningfully to a company’s ability to innovate and maintain a competitive edge.

The post DevOps Has Won, Long Live the Platform Engineer appeared first on The New Stack.

Running ScyllaDB NoSQL on Kubernetes with Spot Instances

Igor Domrev — Mon, 10 Jul 2023 18:50:56 +0000

Serving more than 1 million operations per second with an average latency of a few milliseconds — while reading/writing real-time user-level data that can grow to billions of rows — is not a trivial task. It requires serious infrastructure that typically has a premium price tag and requires a team of experts to operate.

What if I tell you that all you need is a Kubernetes cluster and an open source database to achieve zero downtime failovers, single-digit millisecond-level response times, both vertical and horizontal scaling, data sharding per CPU core, fully distributed read/write ops and much more? In this article, I’ll share how my team at Visually.io used ScyllaDB to replace MongoDB as our main production real-time database.

ScyllaDB is an open source NoSQL database that’s API-compatible with Apache Cassandra (and also DynamoDB). It has all the advantages of a ring architecture masterless database while avoiding all the issues Cassandra is notorious for, including Java virtual machine issues like stop-the-world garbage collection, a large memory footprint, slow startup, just-in-time warmup and complex configuration.

ScyllaDB comes with a production-ready Helm chart, Kubernetes operator, and a plug-and-play configuration. It’s open source and it works flawlessly on spot (volatile) instances that cost 1/4 of the regular cloud compute price.

Why ScyllaDB vs. MongoDB?

All that sounds amazing, but what’s wrong with MongoDB? It’s open source and supports data sharding. But MongoDB’s architecture is quite different. It has a single point of failure: If the coordinator goes down, the database starts a failover, and it’s unavailable during that time. In addition, achieving high availability requires that every MongoDB shard runs as a replica set (more nodes). The ring architecture shared by both Cassandra and ScyllaDB is superior in this sense. Moreover, ScyllaDB’s driver is shard-aware and knows to reach the precise node/CPU that’s responsible for the queried row, which allows true distribution.

But why are high availability and zero downtime failovers so important? If you plan to run on spot instances (1/4 of the compute price), you will experience frequent (daily) failovers because Kubernetes will constantly kill and re-create nodes, which will cause all pods/processes running on them to die, including your database.

Getting up and Running on Kubernetes

First, you’ll want to run ScyllaDB locally and play. Use its drivers and run some CQL (Cassandra Query Language) as described in the docs. I used the gocql driver. Remember that ScyllaDB drivers are shard-aware, and you need to connect to the shard-aware ScyllaDB port 19042 (not the default Cassandra port on 9042).

The ScyllaDB Kubernetes Operator repo contains three Helm charts:

scylla: The database itself. It contains the ScyllaCluster CRD (Kubernetes custom resource definition), a YAML that configures the ScyllaDB cluster, its size, resources, file system and so on.
scylla operator: Installs a Kubernetes controller that will take this YAML and creates from it a StatefulSet, services and other Kubernetes entities.
scylla manager: Basically a singleton service that automates tasks. It is connected to all ScyllaDB nodes and can run clusterwide tasks such as a repair or a cloud storage backup.

I used Argo CD to install and configure the charts mentioned above. It allows GitOps mechanics rollbacks and provides visibility for the things happening in Kubernetes. (Argo CD is outside the scope of this article, but basically, instead of running a Helm install command, I will be clicking a few UI buttons and pushing a few YAMLs into a git repo).

Configuring the cluster

The configuration of the operator chart is pretty straightforward. The only thing you need to define is a Kubernetes nodeSelector and taint tolerations if you need them. Define on which k8s nodes the operator can run, then it’s plug and play.

Now, we’ll move on to ScyllaDB Manager. Let’s look at the Chart.yaml:

The dependencies directive declares that scylla-manager imports scylla chart, so when you install it, you install both of them. The manager configuration (values.yaml) has a section for ScyllaDB, where all the action takes place.

The key point regarding the above configuration is the xfs storageClassName, which is advised by ScyllaDB and provides better performance. The chart does not contain the storage class definition, but you can add it yourself:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: xfs-class
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd
  csi.storage.k8s.io/fstype: xfs
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Notice the allowVolumeExpansion flag. It will allow you to later increase the PersistentVolumeClaim (PVC) disk size seamlessly while the database is running. After Argo CD installed both charts, here is the result:

ScyllaDB Operator

ScyllaDB Operator is up and running. A thing to note here is that the operator itself is highly available and has two replicas of its own. It will now create the ScyllaDB cluster based on its CRD.

ScyllaDB cluster

In our case, the operator created a cluster of three nodes. Every pod is running the database itself, ScyllaDB Manager and operator clients. This helps replace “the team of experts” and automates administration and operation tasks.

Monitoring

No production database can exist without proper monitoring and alerting in place. ScyllaDB Operator achieves this with the Prometheus service monitor configuration.

scylla:
...
  serviceMonitor:
      promRelease: staging-prometheus-operator
      create: true

This flag causes the operator to create two service monitors.

ServiceMonitor

This will cause Prometheus to scrape the database metrics periodically, store them in a time series database and allow running promQL queries to define Grafana dashboards and alerts.

Dashboards

Grafana dashboards.

Grafana JSON dashboards can be found here. Here’s how to add them to the Helm charts that ScyllaDB provides.

To do that, we need to create Kubernetes ConfigMaps and label them as Grafana dashboards. Fortunately, Helm can help us with that.

{{- range $path, $_ :=  .Files.Glob  "dashboards/scylla/*.json" }}
{{- $filename := trimSuffix (ext $path) (base $path) }}
apiVersion: v1
kind: ConfigMap
metadata:
  name: scylla-dashboard-{{ $filename }}
  namespace: monitoring
  labels:
    grafana_dashboard: "1"
    app.kubernetes.io/managed-by: {{ $.Release.Name }}
    app.kubernetes.io/instance: {{ $.Release.Name }}
data:
  {{ base $path }}: |-
{{ $.Files.Get $path | indent 4 }}
---
{{- end }}

The above snippet will result in five config maps being added to Kubernetes and labeled with grafana_dashboard: "1"(which will cause Grafana to mount them).

ScyllaDB overview dashboard – view 1

ScyllaDB overview dashboard – view 2

There are many graphs with nuanced metrics exported, which allows fine-grained monitoring for everything the database experiences. The following graph is very important; it describes all the failovers in the past 24 hours.

13 fails over in 24 hours

Every time Kubernetes kills a random spot instance, it then schedules a new ScyllaDB pod, which rejoins the cluster, without any downtime, in a couple of minutes.

We have been running ScyllaDB for almost a year now, and it works like clockwork. A useful tip here is to overprovision the node pool by one node at all times. This will, most likely, ensure that there is at least one available node that can be scheduled with the new database pod. It increases the price a bit, but it’s still much more cost-efficient than using regular nodes.

Failovers / RAM / CPU / latency

The above image shows that every time a ScyllaDB instance is killed, a short CPU spike occurs, the latency is increased by a couple of milliseconds, and the RAM is dropped since all the cache ScyllaDB was building in RAM disappears. This is a clear disadvantage of using spot instances. However, in our use case it’s worth trading short, very small latency spikes for a large compute price discount.

Conclusion

In conclusion, ScyllaDB proves to be an exceptional open source database that lives up to its promises. The fact that ScyllaDB is freely available as open source is truly remarkable. As a software developer, I have no affiliation with ScyllaDB, but I am grateful for the technology it provides. This article serves as a heartfelt thank you to the ScyllaDB community for its dedication to open source and for empowering developers like myself with such remarkable technology.

The post Running ScyllaDB NoSQL on Kubernetes with Spot Instances appeared first on The New Stack.

How to Secure Kubernetes with KubeLinter

Robert Kimani — Mon, 10 Jul 2023 13:57:08 +0000

KubeLinter is an open source tool that analyzes Kubernetes YAML files and Helm charts to ensure they adhere to best practices, focusing on production readiness and security. It performs checks on various aspects of the configuration to identify potential security misconfigurations and DevOps best practices.

By running KubeLinter, you can obtain valuable information about your Kubernetes configuration files and Helm charts. It helps teams detect and address security issues early in the development process. Some examples of the checks performed by KubeLinter include running containers as non-root users, enforcing least privilege, and properly handling sensitive information by storing it only in secrets.

KubeLinter is licensed under the Apache License 2.0, allowing you to use, modify, and distribute it according to the terms of the license.

Why KubeLinter?

KubeLinter comes with sensible default checks, but it is also configurable. You have the flexibility to enable or disable specific checks according to your organization’s policies. Additionally, you can create your own custom checks to enforce specific requirements.

When a lint check fails, KubeLinter provides recommendations on how to resolve the identified issues. It also returns a non-zero exit code to indicate the presence of potential problems.

Installation, Setup and Getting Started

To get started with KubeLinter, you can refer to the official documentation. The documentation provides detailed information on installing, using and configuring KubeLinter.

Here are a few installation options for KubeLinter.

Using Go

Install KubeLinter using Go by running the following command:

go install golang.stackrox.io/kube-linter/cmd/kube-linter@latest

Using Homebrew (macOS) or LinuxBrew (Linux)

Install KubeLinter using Homebrew or LinuxBrew by running the following command:

brew install kube-linter

Building from Source

If you prefer to build KubeLinter from source, follow these steps:

Clone the KubeLinter repository:

git clone git@github.com:stackrox/kube-linter.git

Compile the source code to create the kube-linter binary files:

make build

Verify the installation by checking the version:

.gobin/kube-linter version

KubeLinter provides different layers of testing, including go unit tests, end-to-end integration tests and end-to-end integration tests using bats-core. You can run these tests to ensure the correctness and reliability of KubeLinter.

How to Use KubeLinter

To use KubeLinter, you can start by running it against your local YAML files. Simply specify the path to the YAML file you want to test, and KubeLinter will perform the linting checks. For example.

kube-linter lint /path/to/your/yaml.yaml

The output of KubeLinter will show any detected issues along with recommended remediation steps. It will also provide a summary of the lint errors found.

You have the option to run it locally or integrate it into your CI systems. Here are the instructions for running KubeLinter locally:

After installing KubeLinter, you can use the lint command and provide the path to your Kubernetes YAML file or directory containing YAML files.

For a single YAML file:

kube-linter lint /path/to/yaml-file.yaml

For a directory containing YAML files:

kube-linter lint /path/to/directory/containing/yaml-files/

To use KubeLinter for local YAML linting, follow these steps:

Locate the YAML file that you want to test for security and production readiness best practices.
Run the following command, replacing /path/to/your/yaml.yaml with the actual path to your YAML file. Here’s the format:

kube-linter lint /path/to/your/yaml.yaml

Here’s an example using a sample pod specification file named pod.yaml that has production readiness and security issues:

apiVersion: v1
kind: Pod
metadata:
 name: security-context-demo
spec:
 securityContext:
  runAsUser: 1000
  runAsGroup: 3000
  fsGroup: 2000
 volumes:
 - name: sec-ctx-vol
  emptyDir: {}
 containers:
 - name: sec-ctx-demo
  image: busybox
  resources:
   requests:
    memory: "64Mi"
    cpu: "250m"
  command: [ "sh", "-c", "sleep 1h" ]
  volumeMounts:
  - name: sec-ctx-vol
   mountPath: /data/demo
  securityContext:
   allowPrivilegeEscalation: false

Save the YAML content above to a file named lint-pod.yaml. Then, you can lint this file by running the following command:

kube-linter lint lint-pod.yaml

KubeLinter will run its default checks and report recommendations based on the linting results. In the example above, the output will show three lint errors:

pod.yaml: (object: /security-context-demo /v1, Kind=Pod)
container "sec-ctx-demo" does not have a read-only root file system (check: 
no-read-only-root-fs, remediation: Set readOnlyRootFilesystem to true in your 
container's securityContext.)

pod.yaml: (object: /security-context-demo /v1, Kind=Pod) 
container "sec-ctx-demo" has cpu limit 0 (check: unset-cpu-requirements, 
remediation: Set your container's CPU requests and limits depending on its 
requirements. See 
https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits for more details.)

pod.yaml: (object: /security-context-demo /v1, Kind=Pod) 
container "sec-ctx-demo" has memory limit 0 (check: unset-memory-requirements, 
remediation: Set your container's memory requests and limits depending on its requirements. 
See 
https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits for more details.)

Error: found 3 lint errors

To run KubeLinter locally for Helm charts, you need to provide the path to a directory containing the chart.yaml file. Here’s the command to run KubeLinter for Helm charts:

kube-linter lint /path/to/directory/containing/chart.yaml-file/

You can also use the --format option to specify the output format. For example, use –format=json for JSON format or –format=sarif for the SARIF spec.

If you’re using the pre-commit framework for managing git pre-commit hooks, you can integrate KubeLinter as a pre-commit hook. Add the following configuration to your .pre-commit-config.yaml file:

- repo: https://github.com/stackrox/kube-linter
 rev: 0.6.0 # kube-linter version
 hooks:
  - id: kube-linter

This configuration sets up the kube-linter hook, which clones, builds, and installs KubeLinter locally using go get.

KubeLinter provides additional commands and options for different operations. Here’s the general syntax for running KubeLinter commands.

kube-linter [resource] [command] [options]

resource. specifies the resources on which you want to perform operations, such as checks or templates
command. specifies the operation you want to perform, such as lint or checks list
options. specifies additional options for each command. For example, you can use the -c or --config option to specify a configuration file.

To view the complete list of available resources, commands, and options, you can use the --help or -h option.

To find all resources:

kube-linter --help

To find available commands for a specific resource, such as checks:

kube-linter checks --help

To find available options for a specific command, such as lint:

kube-linter lint --help

To configure the checks that KubeLinter runs or to create your own custom checks, you can use a YAML configuration file. When running the lint command, you can provide the –config option followed by the path to your configuration file.

If a configuration file is not explicitly provided, KubeLinter will look for a configuration file in the current working directory with the following filenames in order of preference:

.kube-linter.yaml

.kube-linter.yml

If none of these files are found, KubeLinter will use the default configuration.

Here’s an example of how to run the lint command with a specific configuration file:

kube-linter lint pod.yaml –config kubelinter-config.yaml

The configuration file has two main sections

customChecks for configuring custom checks.
checks for configuring default checks.

To view a list of all built-in checks, you can refer to the KubeLinter checks documentation.

Here are some configuration options you can use in the configuration file.

Disable all default checks. You can disable all built-in checks by setting doNotAutoAddDefaults to true in the checks section.

checks:
 doNotAutoAddDefaults: true

Run all default checks. You can run all built-in checks by setting addAllBuiltIn to true in the checks section

checks:
 addAllBuiltIn: true

Run custom checks. You can create custom checks based on existing templates. Each template description in the documentation includes details about the parameters (params) you can use with that template. Here’s an example.

customChecks:
 - name: required-annotation-responsible
  template: required-annotation
  params:
   key: company.io/responsible

These are some of the configuration options available in KubeLinter. You can refer to the KubeLinter documentation for more details on configuration and customization.

Conclusion

KubeLinter is an alpha release, which means it is still in the early stages of development. As a result, there may be breaking changes in the future regarding command usage, flags and configuration file formats.

However, you are encouraged to use KubeLinter to test your environment YAML files, identify issues — and contribute to its development.

The post How to Secure Kubernetes with KubeLinter appeared first on The New Stack.