Cloud Native Ecosystem News and Resources | The New Stack https://thenewstack.io/cloud-native/ Wed, 20 Sep 2023 16:18:45 +0000 en-US hourly 1 https://wordpress.org/?v=6.2.2 Oracle Touts New AppDev Tools, Distributed Cloud Support https://thenewstack.io/oracle-touts-new-appdev-tools-distributed-cloud-support/ Wed, 20 Sep 2023 16:18:45 +0000 https://thenewstack.io/?p=22718720

LAS VEGAS — Oracle’s AppDev environment for cloud apps and infrastructure isn’t talked about all that often, but it received

The post Oracle Touts New AppDev Tools, Distributed Cloud Support appeared first on The New Stack.

]]>

LAS VEGAS — Oracle’s AppDev environment for cloud apps and infrastructure isn’t talked about all that often, but it received high-profile attention this week at the company’s CloudWorld conference here at the Venetian Convention and Expo Center.

Oracle vice-president of products and strategy Leo Leung told The New Stack that the company’s new application development capabilities — including a new one called Alloy — will enable developers to build and deploy applications on Oracle Cloud Infrastructure (OCI) faster, more effectively and with fewer snags than previous iterations of the platform.

“By having these different (OCI) strategies, there are now more opportunities to use the cloud,” Leung said.

Despite the continued growth during the past decade of enterprises switching to applications housed somewhere in a cloud, the vast majority of Oracle’s customers are still building apps for servers in private data centers, Leung said. “IT people are naturally risk-averse. If something works, in general, they don’t want to change it, especially if it’s critical to the operations of a business,” he said. “Once they get it (their IT) to a certain place, I wouldn’t change it either.”

Oracle has to give enterprises like that “enough of a benefit with low enough risk, in most cases” for it to fit into their plans, Leung said. “The more leading-edge, startup-type companies have a different profile,” he said.

Oracle and Generative AI

Oracle is building generative AI capabilities for application development to take advantage of large language models with a high level of security and privacy, CTO and Chairman and co-founder Larry Ellison said Tuesday in a keynote. However, the company made no specific AI announcements this week.

Ellison said that AI is “the most important technology of the 21st century that is going to change everything.” He also said that while AI is still in its early stages of development, it is already having a major impact on the world.

Ellison also discussed the potential for AI to be used to create new jobs and industries. “AI is going to automate many tasks that are currently done by humans, but it will also create new jobs that we can’t even imagine today,” he said.

Java 21 and Cloud Native Development

Here are the key new additions to the OCI dev environment.

  • Java, with an estimated 60 million developers using it frequently, remains a mainstay of development since its introduction at the outset of the internet in 1995. Oracle, which acquired Java’s originator, Sun Microsystems, in 2010, introduced at the conference a new set of capabilities for Java developers, including Java Development Kit 21, GraalOS, OCI Functions powered by GraalOS and Graal Cloud Native 4.0.
  • GraalVM is an alternative Java runtime that works with advanced GraalVM Native Image to enable deployed applications to run as native-machine executables. It enables low latency and fast start capability, reduces memory requirements, and improves efficiency by allowing applications to be suspended and resumed.
  • OCI Functions is powered by GraalOS and is a fully managed, multitenant, highly scalable, on-demand, Functions as a Service platform. It addresses the issue of slow cold starts by running functions as native executables, providing sub-second startup times. It uses GraalVM Native Image to reduce memory usage by up to 50 percent. It also uses out-of-the-box integrations to improve developer productivity.
  • Graal Cloud Native 4.0 is a curated set of open source Micronaut framework modules to help developers take full advantage of cloud services without dependency on proprietary platform APIs. It includes features such as native support for GraalVM Native Image, Kubernetes integration and distributed tracing.

In addition to these new capabilities for Java developers, Oracle is also introducing new features for cloud native deployments, Kubernetes operations and enhanced security.

  • Oracle Cloud Guard Container Governance enables developers to solidify security for Oracle Container Engine for Kubernetes (OKE) via predefined policies aligned to Kubernetes Security Posture Management. It simplifies the configuration of containerized workloads deployed on OKE.
  • Oracle’s own version of Serverless Kubernetes enables customers to ensure reliable operations at scale without the complexities of managing, scaling, upgrading, and troubleshooting the underlying Kubernetes node infrastructure.

Oracle recently announced an expanded partnership with Microsoft to make available these dev tools and services via Azure Cloud, giving users another important option. “We think this is a good thing, just because there are so many workloads and mutual customers,” Leung said.

Distributed Cloud News

The latest additions to OCI’s distributed cloud lineup include Oracle Database@Azure and MySQL HeatWave Lakehouse on AWS. As a result, enterprises get more flexibility to deploy cloud services anywhere while addressing a variety of data privacy, data sovereignty and low-latency requirements. They also enable access to more than 100 services designed to run any workload, Leung said.

Oracle also announced the GA of Oracle Alloy, which enables enterprises to build their own private clouds.

“Alloy is a much more niche type of product,” Leung said. “There are only so many companies that want to be cloud providers in their particular spaces, but we think it’s going to be very powerful.”

Now available to be ordered globally, Oracle Alloy is a complete cloud infrastructure platform that enables partners to become cloud providers and offer a full range of cloud services to expand their businesses. Partners control the commercial and customer experience of Oracle Alloy and can customize and extend it to address their specific market needs. Partners can operate Oracle Alloy independently in their own data centers and fully control its operations to better fulfill customer and regulatory requirements.

Oracle revealed that next-generation Ampere A2 Compute Instances (hardware and software chipsets) based on the latest AmpereOne processor will become generally available starting later this year. The two companies also said that all Oracle Fusion Cloud Applications and OCI services — numbering in the hundreds — are now running on Ampere processors.

The post Oracle Touts New AppDev Tools, Distributed Cloud Support appeared first on The New Stack.

]]>
Next-Gen Observability: Monitoring and Analytics in Platform Engineering https://thenewstack.io/next-gen-observability-monitoring-and-analytics-in-platform-engineering/ Tue, 12 Sep 2023 11:00:21 +0000 https://thenewstack.io/?p=22717524

As applications become more complex, dynamic, and interconnected, the need for robust and resilient platforms to support them has become

The post Next-Gen Observability: Monitoring and Analytics in Platform Engineering appeared first on The New Stack.

]]>

As applications become more complex, dynamic, and interconnected, the need for robust and resilient platforms to support them has become a foundational requirement. Platform engineering is the art of crafting these robust foundations, encompassing everything from orchestrating microservices to managing infrastructure at scale.

In this context, the concept of Next-Generation Observability emerges as a crucial enabler for platform engineering excellence. Observability transcends the traditional boundaries of monitoring and analytics, providing a comprehensive and insightful view into the inner workings of complex software ecosystems. It goes beyond mere visibility, empowering platform engineers with the knowledge and tools to navigate the intricacies of distributed systems, respond swiftly to incidents, and proactively optimize performance.

Challenges Specific to Platform Engineering

Platform engineering presents unique challenges that demand innovative solutions. As platforms evolve, they inherently become more intricate, incorporating a multitude of interconnected services, microservices, containers, and more. This complexity introduces a host of potential pitfalls:

  • Distributed Nature: Services are distributed across various nodes and locations, making it challenging to comprehend their interactions and dependencies.
  • Scaling Demands: As platform usage scales, ensuring seamless scalability across all components becomes a priority, requiring dynamic resource allocation and load balancing.
  • Resilience Mandate: Platform outages or degraded performance can have cascading effects on the applications that rely on them, making platform resilience paramount.

The Role of Next-Gen Observability

Next-Gen observability steps in as a transformative force to address these challenges head-on. It equips platform engineers with tools to see beyond the surface, enabling them to peer into the intricacies of service interactions, trace data flows, and understand the performance characteristics of the entire platform. By aggregating data from metrics, logs, and distributed traces, observability provides a holistic perspective that transcends the limitations of siloed monitoring tools.

This article explores the marriage of Next-Gen Observability and platform engineering. It delves into the intricacies of how observability reshapes platform management by providing real-time insights, proactive detection of anomalies, and informed decision-making for optimizing resource utilization. By combining the power of observability with the art of platform engineering, organizations can architect resilient and high-performing platforms that form the bedrock of modern applications.

Understanding Platform Engineering

Platform engineering plays a pivotal role in shaping the foundation upon which applications are built and delivered. At its core, platform engineering encompasses the design, development, and management of the infrastructure, services, and tools that support the entire application ecosystem.

Platform engineering is the discipline that crafts the technical underpinnings required for applications to thrive. It involves creating a cohesive ecosystem of services, libraries, and frameworks that abstract away complexities, allowing application developers to focus on building differentiated features rather than grappling with infrastructure intricacies.

A defining characteristic of platforms is their intricate web of interconnected services and components. These components range from microservices to databases, load balancers, caching systems, and more. These elements collaborate seamlessly to provide the functionalities required by the applications that rely on the platform.

The management of platform environments is marked by inherent complexities. Orchestrating diverse services, ensuring seamless communication, managing the scale-out and scale-in of resources, and maintaining consistent performance levels present a multifaceted challenge. Platform engineers must tackle these complexities while also considering factors like security, scalability, and maintainability.

Platform outages wield repercussions that stretch beyond the boundaries of the platform itself, casting a pervasive shadow over the entire application ecosystem. These disruptions reverberate, resulting in downtimes, data loss, and a clientele that’s both agitated and dismayed. The ramifications encompass more than just the immediate fiscal losses; they extend to a long-lasting tarnish on a company’s reputation, eroding trust and confidence.

In the contemporary landscape, user expectations hinge on the delivery of unwaveringly consistent and dependable experiences. The slightest lapse in platform performance has the potential to mar user satisfaction. This can, in turn, lead to a disheartening ripple effect, manifesting as user attrition and missed avenues for business growth. The prerequisite for safeguarding high-quality user experiences necessitates the robustness of the platform itself.

Enter the pivotal concept of observability — a cornerstone in the architecture of modern platform engineering. Observability serves as a beacon of hope, endowing platform engineers with an arsenal of tools that transcend mere visibility. These tools enable engineers to transcend the surface and plunge into the intricate machinations of the platform’s core.

This dynamic insight allows them to navigate the labyrinth of intricate interactions, promptly diagnosing issues and offering remedies in real-time. With its profound capacity to unfurl the platform’s inner workings, observability empowers engineers to swiftly identify and address problems, thereby mitigating the impact of disruptions and fortifying the platform’s resilience against adversity.

Core Concepts of Next-Gen Observability for Platform Engineering

Amidst the intricacies of platform engineering, where a multitude of services work in concert to deliver a spectrum of functionalities, comprehending the intricate interplay within a distributed platform presents an imposing challenge.

At the heart of this challenge lies a complexity born of a web of interconnected services, each with specific tasks and responsibilities. These services often span a gamut of nodes, containers, and even geographical locations. Consequently, tracing the journey of a solitary request as it navigates this intricate network becomes an endeavor fraught with intricacies and nuances.

In this labyrinthine landscape, the beacon of distributed tracing emerges as a powerful solution. This technique, akin to unraveling a tightly woven thread, illuminates the flow of requests across the expanse of services. In capturing these intricate journeys, distributed tracing unravels insights into service dependencies, bottlenecks causing latency, and the intricate tapestry of communication patterns. As if endowed with the ability to see the threads that weave the fabric of the platform, platform engineers gain a holistic view of the journey each request undertakes. This newfound clarity empowers them to pinpoint issues with precision and optimize with agility.

However, the advantages of distributed tracing transcend the microcosm of individual services. The insights garnered extend their reach to encompass the platform as a whole. Platform engineers leverage these insights to unearth systemic concerns that span multiple services. Bottlenecks, latency fluctuations, and failures that cast a shadow over the entire platform are promptly brought to light. The outcomes are far-reaching: heightened performance, curtailed downtimes, and ultimately, a marked enhancement in user experiences. In the intricate dance of platform engineering, distributed tracing emerges as a beacon that dispels complexity, illuminating pathways to optimal performance and heightened resilience.

At the nucleus of observability, metrics and monitoring take center stage, offering a panoramic view of the platform’s vitality and efficiency.

Metrics, those quantifiable signposts, unfold a tapestry of data that encapsulates the platform’s multifaceted functionality. From the utilization of the CPU and memory to the swift cadence of response times and the mosaic of error rates, metrics lay bare the inner workings, revealing a clear depiction of the platform’s operational health.

A parallel function of this duo is the art of monitoring — an ongoing vigil that unveils deviations from the expected norm. The metrics, acting as data sentinels, diligently flag sudden surges in resource consumption, the emergence of perplexing error rates, or deviations from the established patterns of performance. Yet, the role of monitoring transcends mere alerting; it is a beacon of foresight. By continuously surveying these metrics, monitoring predicts the need for scalability. As the platform’s utilization ebbs and flows, as users and requests surge and recede, the platform’s orchestration must adapt in stride. Proactive monitoring stands guard, ensuring that resources are dynamically assigned, and ready to accommodate surging demands.

And within this dance of metrics and monitoring, the dynamic nature of platform scalability comes to the fore. In the tapestry of modern platforms, scalability is woven as an intrinsic thread. As users and their requests ebb and flow, as services and their load variate, the platform must be malleable, and capable of graceful expansion and contraction. Observability, cast in the role of a linchpin, empowers platform engineers with the real-time pulse of these transitions. Armed with the insights furnished by observability, the engineers oversee the ebb and flow of the platform’s performance, ensuring a proactive, rather than reactive, approach to scaling. Thus, as the symphony of the platform unfolds, observability lends its harmonious notes, orchestrating the platform’s graceful ballet amidst varying loads.

In the intricate tapestry of platform engineering, logs emerge as the textual chronicles that unveil the story of platform events.

Logs assume the role of a scribe, documenting the narrative of occurrences, errors, and undertakings within the platform’s realm. In their meticulously structured entries, they create a chronological trail of the endeavors undertaken by various components. The insights gleaned from logs provide a contextual backdrop for observability, enabling platform engineers to dissect the sequences that lead to anomalies or incidents.

However, in the context of multi-service environments within complex platforms, the aggregation and analysis of logs take on a daunting hue. With a myriad of services coexisting, the task of corralling logs spreads across diverse nodes and instances. Uniting these scattered logs to craft a coherent narrative poses a formidable challenge, amplified by the sheer volume of logs generated in such an environment.

Addressing this intricate challenge are solutions that carve paths for efficient log analysis. The likes of log aggregation tools, with exemplars like the ELK Stack comprising Elasticsearch, Logstash, and Kibana, stand as guiding beacons. These tools facilitate the central collection, indexing, and visualization of logs. The platform engineer’s endeavors to search, filter, and analyze logs are fortified by these tools, offering a streamlined process. Swiftly tracing the origins of incidents becomes a reality, empowering engineers in the realm of effective troubleshooting and expedited resolution. As logs evolve from mere entries to a mosaic of insight, these tools, augmented by observability, light the way to enhanced platform understanding and resilience.

Implementing Next-Gen Observability in Platform Engineering

Instrumenting code across the breadth of services within a platform is the gateway to achieving granular observability.

Here are some factors to consider:

  • Granular Observability Data: Instrumentation involves embedding code with monitoring capabilities to gather insights into service behavior. This allows engineers to track performance metrics, capture traces, and log events at the code level. Granular observability data provides a fine-grained view of each service’s interactions, facilitating comprehensive understanding.
  • Best Practices for Instrumentation: Effective instrumentation requires a thoughtful approach. Platform engineers need to carefully select the metrics, traces, and logs to capture without introducing excessive overhead. Best practices include aligning instrumentation with key business and operational metrics, considering sampling strategies to manage data volume, and ensuring compatibility with observability tooling.
  • Code-Level Observability for Bottleneck Identification: Code-level observability plays a pivotal role in identifying bottlenecks that affect platform performance. Engineers can trace request flows, pinpoint latency spikes, and analyze service interactions. By understanding how services collaborate and identifying resource-intensive components, engineers can optimize the platform for enhanced efficiency.

Proactive Monitoring and Incident Response

Proactive monitoring enables platform engineers to preemptively identify potential issues before they escalate into major incidents.

The proactive monitoring approach involves setting up alerts and triggers that detect anomalies based on predefined thresholds. By continuously monitoring metrics, engineers can identify deviations from expected behavior early on. This empowers them to take corrective actions before users are affected.

Observability data seamlessly integrates into incident response workflows. When an incident occurs, engineers can access real-time observability insights to quickly diagnose the root cause. This reduces mean time to resolution (MTTR) by providing immediate context and actionable data for effective incident mitigation.

Observability provides real-time insights into the behavior of the entire platform during incidents. Engineers can analyze traces, metrics, and logs to trace the propagation of issues across services. This facilitates accurate root cause analysis and swift remediation.

Scaling Observability with Platform Growth

 Scaling observability alongside the platform’s growth introduces challenges related to data volume, resource allocation, and tooling capabilities. The sheer amount of observability data generated by numerous services can overwhelm traditional approaches.

To manage the influx of data, observability pipelines come into play. These pipelines facilitate the collection, aggregation, and processing of observability data. By strategically designing pipelines, engineers can manage data flow, filter out noise, and ensure that relevant insights are available for analysis.

Observability is not static; it evolves alongside the platform’s expansion. Engineers need to continually assess and adjust their observability strategies as the platform’s architecture, services, and user base evolve. This ensures that observability remains effective in uncovering insights that aid in decision-making and optimization.

Achieving Platform Engineering Excellence Through Observability

At its core, observability unfurls real-time insights into the dynamic symphony of platform resource utilization. Metrics, such as the rhythm of CPU usage, the cadence of memory consumption, and the tempo of network latency, play harmonious notes that guide engineers. These metrics, akin to notes on a musical score, disclose the underutilized instruments and the overplayed chords. Such insights propel engineers to allocate resources judiciously, deftly treading the fine line between scaling and conserving, balancing and distributing.

Yet, observability is not just a map; it’s an artist’s palette. With its brushes dipped in data, it empowers engineers to craft performances of peak precision. Within the intricate canvas of observability data lies the artist’s ability to diagnose performance constraints and areas of inefficiency. Traces and metrics unveil secrets, pointing out latency crescendos, excessive resource indulgence, and the interplay of service dependencies that orchestrate slowdowns. Armed with these revelations, engineers don the mantle of virtuosity, fine-tuning the components of the platform. The aim is nothing short of optimal performance, a symphony of efficiency that resonates throughout the platform.

Real-world vignettes, cast as case studies, offer a vivid tableau of the observability’s transformative impact. These tales unfold how insights, gleaned through observability, yield tangible performance enhancements. The chronicles narrate stories of reduced response times, streamlined operations, and harmonized experiences. These are not merely anecdotes but showcases of observability data weaving into the very fabric of engineering decisions, orchestrating leaps of performance that resonate with discernible gains. In the intricate choreography of platform engineering, observability dons multiple roles — an instructor, a composer, and an architect of performance enhancement.

Ensuring Business Continuity and User Satisfaction

In the intricate interplay of business operations and user satisfaction, observability emerges as a safety net, a sentinel that safeguards business continuity and elevates user contentment.

In the realm of business operations, observability stands as a sentinel against the tempestuous tide of platform outages. The turbulence of such outages can unsettle business operations and erode the very bedrock of user trust. Observability steps in, orchestrating a swift ballet of incident identification and resolution. In this dynamic dance, engineers leverage real-time insights as beacons, pinpointing the elusive root causes that underlie issues. The power of observability ensures that recovery is swift, and the impact is pared down, a testament to its role in minimizing downtime’s blow.

Yet, observability’s canvas extends beyond the realm of business operations. It stretches its reach to the very threshold of user experience. Here, it unveils a compelling correlation—platform health waltzes in tandem with user satisfaction. A sluggish response, a dissonant error, or the stark absence of service can fracture user experiences, spurring disenchantment and even churn. The portal to user interactions, as illuminated by observability data, becomes the looking glass through which engineers peer. This vantage point affords a glimpse into the sentiment of users and their interactions. The insights unveiled through observability carve a pathway for engineers to align platform behavior with user sentiment, choreographing proactive measures that engender positive experiences.

As the proverbial cherry on top, case studies illuminate observability’s transformative prowess. These real-world tales narrate how the tapestry of observability-driven optimizations interlaces with the fabric of user satisfaction.

From smoothing the checkout processes in the e-commerce realm to fine-tuning video streaming experiences, these examples resonate as testimonies to observability’s role in crafting user-centric platforms. In this symphony of platform engineering, observability stands as a conductor, orchestrating harmony between business continuity and user contentment.

Conclusion

Observability isn’t a mere tool; it’s a mindset that reshapes how we understand, manage, and optimize platforms. The world of software engineering is evolving, and those who embrace the power of Next-Gen Observability will be better equipped to build robust, scalable, and user-centric platforms that define the future.

As you continue your journey in platform engineering, remember that the path to excellence is paved with insights, data, and observability. Embrace this paradigm shift and propel your platform engineering endeavors to new heights by integrating observability into the very DNA of your strategies. Your platforms will not only weather the storms of complexity but will also emerge stronger, more resilient, and ready to redefine the boundaries of what’s possible.

The post Next-Gen Observability: Monitoring and Analytics in Platform Engineering appeared first on The New Stack.

]]>
Is Policy as Code the Cure for Multicloud Config Chaos? https://thenewstack.io/is-policy-as-code-the-cure-for-multicloud-config-chaos/ Mon, 11 Sep 2023 19:13:37 +0000 https://thenewstack.io/?p=22717761

Hosting software across public cloud and private cloud is, at present, inherently less manageable and less secure than simpler hosting

The post Is Policy as Code the Cure for Multicloud Config Chaos? appeared first on The New Stack.

]]>

Hosting software across public cloud and private cloud is, at present, inherently less manageable and less secure than simpler hosting paradigms — full stop. The 2022 edition of IBM’s Cost of a Data Breach report states that 15% of all breaches are still attributed to cloud misconfiguration.

The same year, a white paper by Osterman Research and Ermetic placed detecting general cloud misconfigurations like unencrypted resources and multifactor authentication at the top of the list of concerns in organizations with a high cloud maturity level.

The move to multicloud has created silos between environments and every layer of IT. Instead of working together to create better infrastructure for better software and better service, IT Ops teams spend valuable time mitigating the liabilities of cross-deployed infrastructure and misconfigured services — all with different tools and no visibility into the policies they’re expected to enforce.

But there is a better way to manage the cloud and ensure that policy enforcement is in place: Policy as Code. Policy as Code (sometimes called PaC) is a development approach that expresses infrastructure and application behavior policies in code, rather than being hardcoded.

That means those policies can be used and reused to automatically enforce consistent configurations across the estate — like security, compliance, baselines and more. Policy as Code can enforce configurations throughout the entire software development life cycle, rather than relying on manual checks and processes.

Despite its obvious benefits for DevOps, PaC still isn’t a common practice in the industry — and it’s rarely used as a tool for tackling tangled messes like cloud misconfiguration. Let’s break down how PaC can help bridge today’s cloud config gaps.

The Power of Policy as Code in Multicloud Configuration

  • With PaC, one size actually can fit all. Policy as Code is used to unite public cloud with private cloud for simpler management and faster scaling of software, resources and services offered by each.
  • Policy as Code can standardize a governable process across multiple layers of IT, from central IT and infrastructure all the way up to app developers. It does that by making your policies visible, auditable and shareable.
  • Policy as Code supports business expectations by aligning configurations, from baseline through deployment, with strategic business objectives.
  • Policy as Code lets developers do what they do best: code. Adding configuration to developers’ plates is asking them to work outside their sweet spot. That can ruin the developer experience, causing burnout and turnover, which makes all your other problems worse.
  • Policy as Code ensures greater security by shifting compliance responsibility away from burdened individuals and onto repeatable code that’s automatically enforced.

Simple, Right? Then Why Aren’t Organizations Implementing Policy as Code?

When organizations started migrating services to public clouds, most failed to consider the long-term implications of such a move. The past few years have revealed the lasting effect of cloud migration on the standardized processes they’d spent so long building on the ground:

  • The pandemic drove an insatiable desire for availability of services and resources, which overrode caution.
  • The abstraction of cost attracted bottom-liners and business leaders.
  • Service-level agreements for cloud availability were supposed to make in-house security guarantees obsolete.
  • The cloud gave organizations across industries the chance to “remain competitive” as early adopters saw a rush of benefits.

Developers, too, helped drive some of the fervor for cloud. Developers at the app layer needed the flexibility of the cloud (the freedom to choose tools and workflows at will). Only later did organizations realize that their detachment from corporate policy was leading to misconfigurations across hybrid deployments, complicating an already messy paradigm.

Cloud repatriation is only exacerbating those problems, even when done by degrees. Today, some organizations are scaling back their cloud deployments or diversifying them by returning mainframe hosting to the mix. But that mix still lacks the standardization needed to effectively manage it all. Far from a solution, cloud repatriation is, in fact, an aggravating factor for the issues associated with cross-deployed infrastructure.

As long as organizations have one foot in the data center and one foot in the cloud — and as long as they obscure their cloud configuration approach with disparate toolsets — cloud misconfiguration will keep holding back the potential of their hybrid cloud Ops. A lack of standardization will keep leading to business problems like security gaps, unauthorized access, rampant drift, resource inefficiencies, noncompliance and data loss.

How to Start Building a Policy as Code Practice

The best way to create PaC for your infrastructure is through reverse engineering. Start by defining your ideal state, identify the potential risks and gaps you’ll uncover on your way there, and develop a framework to mitigate those risks.

Here are a few recommendations to start building a PaC approach that can enforce desired state for better infrastructure and better DevOps, wherever you’re deployed:

Don’t spend a bunch of resources on new tools. PaC isn’t about reinventing the wheel — it’s about leveraging the tools and processes you’ve got (like Infrastructure as Code) to enforce a repeatable state across all of your infrastructure. Strong automation and configuration management are at the core of PaC, so use the tools you already have to establish a PaC approach.

Define the desired state of your infrastructure across data center, multicloud and hybrid. Identify potential areas of risk that can result from configuration drift, like compliance errors, and chart a course back to your desired infrastructure configurations through state enforcement. With desired state enforcement through PaC, you can preempt and prevent misconfigurations even in cross-deployed infrastructure.

Align your infrastructure with the business goals it supports. When creating PaC, guardrails are crucial to targeting your efforts where they’re needed most. Start with your infrastructure management journeys: Consider who in your organization needs infrastructure resources, what their main use cases are for that infrastructure, and where and how they consume infrastructure. Map those needs to Day 0 (provisioning), Day 1 (configuration management) and Day 2 (state enforcement and compliance) for a strong PaC framework that supports your whole DevOps cycle.

Test your PaC. Challenge both your infrastructure configuration management and state enforcement to ensure it’s doing what you want it to do from the perspectives of both your business goals and risk assessment.

Your Cloud Infra Management Won’t Get Simpler on Its Own

Developers can’t be expected to handle policy enforcement in their own tools. When they can rely on configuration files written as code, they can work quickly and confidently in line with company standards, using tools they already know, rather than toying with functional code to make it compliant at their layer.

With PaC, your team can support the needs of developers in the cloud and changing expectations of compliance to help you realize the reasons you moved to the cloud in the first place.

The post Is Policy as Code the Cure for Multicloud Config Chaos? appeared first on The New Stack.

]]>
Unlock Data’s Full Potential with a Mature Analytics Strategy https://thenewstack.io/unlock-datas-full-potential-with-a-mature-analytics-strategy/ Fri, 08 Sep 2023 17:16:18 +0000 https://thenewstack.io/?p=22717769

Over the past decade, businesses have harnessed the power of “big data” to unlock new possibilities and enhance their analytical

The post Unlock Data’s Full Potential with a Mature Analytics Strategy appeared first on The New Stack.

]]>

Over the past decade, businesses have harnessed the power of “big data” to unlock new possibilities and enhance their analytical capabilities. Today, those businesses must accelerate those capabilities by moving beyond experimentation with analytics toward mature investments and capabilities, or risk losing a competitive edge.

A mature data analytics strategy is critical to deriving the most value from data, but many organizations struggle to get it right. Despite the exponential growth in data collection, about 73% of enterprise data remains unused for analytics, according to Forrester. This means that just one-fourth of the data generated is effectively leveraged to gain valuable insights. Embracing modern technology, such as containerized storage capabilities, can help leaders obtain a strong handle on their data and derive actionable insights from it to truly drive business growth.

Legacy Analytics Architectures Are Obstructing Innovation

Today’s software applications need to handle millions of users across the globe on demand while running on multiple platforms and environments. They also need to provide high availability to enable businesses to innovate and respond to changing market conditions. Legacy platforms were designed prior to ubiquitous fast storage and network fabric, presenting more challenges than solutions for organizations looking to get ahead of the competition.

When I spoke to IT leaders who use legacy deployment models, the number-one complaint I heard is that it requires too much effort to support data at the indexer layer, which leads to reduced operational efficiencies. Hours, days and even weeks can be spent on software updates, patches and scaling hardware to support growth. This, in turn, affects optimization as at-scale teams are challenged to meet the needs of their growing organization.

Additionally, legacy architectures require multiple copies of data, which significantly increases compute and storage requirements. When you add storage in a distributed architecture, you add compute regardless of organizational needs, affecting overall utilization and the ability to control costs.

Lastly, with varying performance capabilities across different storage tiers, there is a risk of slower query response times or inconsistent search results. This can hinder the speed and accuracy of data analysis. A mature analytics strategy faces these challenges head-on to provide operational efficiency, accelerated innovation and reduced cost of doing business.

The Case for Containerizing Modern Analytics Loads

Managing modern data involves more than relying on cloud architecture capabilities alone. Containerization can seamlessly integrate into cloud infrastructure to support modern analytics workloads. Imagine the convenience of running an application in a virtual environment without the hefty resource requirements of a hypervisor. By encapsulating software into virtual self-contained units, that’s exactly what a container can do.

Containerized applications provide greater performance and can run reliably from one computing environment to another. More application instances allow for greater performance overall, and the portability of the storage method enables centralized image management, rapid deployment and elasticity for organizations to scale storage capacity based on demand.

Interestingly, containerized applications can help with CPU utilization as well. In testing, we found that containerized applications enabled up to 60% utilization, compared to only 17% from a bare metal application model. Pair containerization with a high-performance storage solution, and organizations can achieve more flexibility and quicker response as data volumes increase.

Kubernetes’ Role in Unlocking Agile Data Management

Container orchestration platforms like Kubernetes provide robust tools for managing and orchestrating containerized applications at scale. With Kubernetes, platform and DevOps teams can easily deploy and run thousands of applications in a containerized or VM format, on any infrastructure, and can operate with much lower operational costs.

But to fully derive the benefits of a powerful application platform like Kubernetes, users need an equally powerful data platform to complete the solution. The Portworx Data Platform offers advancements such as automated and declarative storage provisioning, volume management, high availability and data replication, data protection and backup, business continuity and disaster recovery, security and robust cost optimization and management. These capabilities enable organizations to efficiently manage and control their data storage across distributed cloud environments, ensuring data availability and agility.

When using Kubernetes for containerized storage, there are considerations to keep in mind to ensure an organization’s mature analytics strategy is optimized and agile. First, using Kubernetes operators can further enhance storage capabilities by automating and simplifying complex tasks.

It’s also crucial to set up high availability at both the data service layer and the storage layer because relying on a single instance in a Kubernetes environment can be risky. Lastly, understanding whether an organization’s data service can be scaled up or scaled out will allow IT teams to choose the best solution to add more capacity or compute power as needed.

Organizations with mature analytics investments are achieving bigger impacts on business outcomes across the board, from customer experience and strategy to product innovation. Through modern data management like container applications and Kubernetes, organizations can make greater use of their data for innovation and growth and, more to the point, increase sales.

The post Unlock Data’s Full Potential with a Mature Analytics Strategy appeared first on The New Stack.

]]>
Achieve Cloud Native without Kubernetes https://thenewstack.io/achieve-cloud-native-without-kubernetes/ Thu, 07 Sep 2023 15:57:34 +0000 https://thenewstack.io/?p=22717648

This is the second of a two-part series. Read part one here. At its core, cloud native is about leveraging the

The post Achieve Cloud Native without Kubernetes appeared first on The New Stack.

]]>

This is the second of a two-part series. Read part one here.

At its core, cloud native is about leveraging the benefits of the cloud computing model to its fullest. This means building and running applications that take advantage of cloud-based infrastructures. The foundational principles that consistently rise to the forefront are:

  • Scalability — Dynamically adjust resources based on demand.
  • Resiliency — Design systems with failure in mind to ensure high availability.
  • Flexibility — Decouple services and make them interoperable.
  • Portability — Ensure applications can run on any cloud provider or even on premises.

In Part 1 we highlighted the learning curve and situations where directly using Kubernetes might not be the best fit. This part zeros in on constructing scalable cloud native applications using managed services.

Managed Services: Your Elevator to the Cloud

Reaching the cloud might feel like constructing a ladder piece by piece using tools like Kubernetes. But what if we could simply press a button and ride smoothly upward? That’s where managed services come into play, acting as our elevator to the cloud. While it might not be obvious without deep diving into specific offerings, managed services often use Kubernetes behind the scenes to build scalable platforms for your applications.

There’s a clear connection between control and complexity when it comes to infrastructure (and software in general). We can begin to tear down the complexity by delegating some of the control to managed services from cloud providers like AWS, Azure or Google Cloud.

Managed services empower developers to concentrate on applications, relegating the concerns of infrastructure, scaling and server management to the capable hands of the cloud provider. The essence of this approach is crystallized in its core advantages: eliminating server management and letting the cloud provider handle dynamic scaling.

Think of managed services as an extension of your IT department, bearing the responsibility of ensuring infrastructure health, stability and scalability.

Choosing Your Provider

When designing a cloud native application, the primary focus should be on architectural principles, patterns and practices that enable flexibility, resilience and scalability. Instead of immediately selecting a specific cloud provider, it’s much more valuable for teams to start development without the blocker of this decision-making.

Luckily, the competitive nature of the cloud has driven cloud providers toward feature parity. Basically, they have established foundational building blocks which have taken inspiration greatly from each other and ultimately offer the same or extremely similar functionality and value to end users.

This paves the way for abstraction layers and frameworks like Nitric, which can be used to take advantage of these similarities to deliver cloud development patterns for application developers with greater flexibility. The true value here is the ability to make decisions about technologies like cloud providers on the timeline of the engineering team, not upfront as a blocker to starting development.

Resources that Scale by Default

The resource choices for apps set the trajectory for their growth; they shape the foundation upon which an application is built, influencing its scalability, security, flexibility and overall efficiency. Let’s categorize and examine some of the essential components that contribute to crafting a robust and secure application.

Execution, Processing and Interaction

  • Handlers: Serve as entry points for executing code or processing events. They define the logic and actions performed when specific events or triggers occur.
  • API gateway: Acts as a single entry point for managing and routing requests to various services. It provides features like rate limiting, authentication, logging and caching, offering a unified interface to your backend services or microservices.
  • Schedules: Enable tasks or actions to be executed at predetermined times or intervals. Essential for automating repetitive or time-based workloads such as data backups or batch processing.

Communication and Event Management

  • Events: Central to event-driven architectures, these represent occurrences or changes that can initiate actions or workflows. They facilitate asynchronous communication between systems or components.
  • Queues: Offer reliable message-based communication between components, enhancing fault tolerance, scalability and decoupled, asynchronous communication.

Data Management and Storage

  • Collections: Data structures, such as arrays, lists or sets that store and organize related data elements. They underpin many application scenarios by facilitating efficient data storage, retrieval and manipulation.
  • Buckets: Containers in object storage systems like Amazon S3 or Google Cloud Storage. They provide scalable and reliable storage for diverse unstructured data types, from media files to documents.

Security and Confidentiality

  • Secrets: Concerned with securely storing sensitive data like API keys or passwords. Using centralized secret management systems ensures these critical pieces of information are protected and accessible only to those who need them.

Automating Deployments

Traditional cloud providers have offered services for CI/CD but often fall short of delivering a truly seamless experience. Services like AWS CodePipeline or Azure DevOps require intricate setup and maintenance.

Why is this a problem?

  1. Time-consuming: Setting up and managing these pipelines takes away valuable developer time that could be better spent on feature development.
  2. Complexity: Each cloud provider’s CI/CD solution might have its quirks and learning curves, making it harder for teams to switch or maintain multicloud strategies.
  3. Error-prone: Manual steps or misconfigurations can lead to deployment failures or worse, downtime.

You might notice a few similarities here with some of the challenges of adopting K8s, albeit at a smaller scale. However, there are options that simplify the deployment process significantly, such as using an automated deployment engine.

Example: Simplified Process

This is the approach Nitric takes to streamline the deployment process:

  1. The developer pushes code to the repository.
  2. Nitric’s engine detects the change, builds the necessary infrastructure specification and determines the minimal permissions, policies and resources required.
  3. The entire infrastructure needed for the app is automatically provisioned, without the developer explicitly defining it and without the need for a standalone Infrastructure as Code (IaC) project.

Basically, the deployment engine intelligently deduces and sets up the required infrastructure for the application, ensuring roles and policies are configured for maximum security with minimal privileges.

This streamlined process relieves application and operations teams from activities like:

  • The need to containerize images.
  • Crafting, troubleshooting and sustaining IaC tools like Terraform.
  • Managing discrepancies between application needs and existing infrastructure.
  • Initiating temporary servers for prototypes or test phases.

Summary

Using managed services streamlines the complexities associated with infrastructure, allowing organizations to zero in on their primary goals: application development and business expansion. Managed services, serving as an integrated arm of the IT department, facilitate a smoother and more confident transition to the cloud than working directly with K8s. They’re a great choice for cloud native development to reinforce digital growth and stability.

With tools like Nitric streamlining the deployment processes and offering flexibility across different cloud providers, the move toward a cloud native environment without Kubernetes seems not only feasible but also compelling. If you’re on a journey to build a cloud native application or a platform for multiple applications, we’d love to hear from you.

Read Part 1 of this series: “Kubernetes Isn’t Always the Right Choice.”

The post Achieve Cloud Native without Kubernetes appeared first on The New Stack.

]]>
Change Data Capture for Real-Time Access to Backend Databases https://thenewstack.io/change-data-capture-for-real-time-access-to-backend-databases/ Tue, 05 Sep 2023 15:05:52 +0000 https://thenewstack.io/?p=22717272

In a recent post on The New Stack, I discussed the emergence and significance of real-time databases. These databases are

The post Change Data Capture for Real-Time Access to Backend Databases appeared first on The New Stack.

]]>

In a recent post on The New Stack, I discussed the emergence and significance of real-time databases. These databases are designed to support real-time analytics as a part of event-driven architectures. They prioritize high write throughput, low query latency, even with complex analytical queries including filter aggregates and joins, and high levels of concurrent requests.

This highly-specialized class of database, which includes open source variants such as ClickHouse, Apache Pinot and Apache Druid, is often the first choice when you’re building a real-time data pipeline from scratch. But more often than not, real-time analytics is pursued as an add-on to an existing application or service, where a more traditional, relational database like PostgreSQL, SQL Server or MySQL has already been collecting data for years.

In the post I linked above, I also briefly touched on how these online transactional processing (OLTP) databases aren’t optimized for analytics at scale. When it comes to analytics, they simply cannot deliver the same query performance at the necessary levels of concurrency. If you want to understand why in more detail, read this.

But the Internet Is Built on These Databases!

Row-based databases may not work for real-time analytics, but we can’t get around the fact that they are tightly integrated with backend data systems around the world and across the internet. They’re everywhere, and they host critical data sets that are integral to and provide context for many of the real-time systems and use cases we want to build. They store facts and dimensions about customers, products, locations and more that we want to use to enrich streaming data and build more powerful user experiences.

So, what are we to do? How do you bring this row-oriented, relational data into the high-speed world of real-time analytics? And how do you do it without overwhelming your relational database server?

Here’s How Not to Do It

Right now, the prevailing pattern to get data out of a relational database and into an analytical system is using a batch extract, transform, load (ETL) process scheduled with an orchestrator to pull data from the database, transform it as needed and dump it into a data warehouse so the analysts can query it for the dashboards and reports. Or, if you’re feeling fancy, you go for an extract, load, transform (ELT) approach and let the analytics engineers build 500 dbt models on the Postgres table you’ve replicated in Snowflake.

This may as well be an anti-pattern in real-time analytics. It doesn’t work. Data warehouses make terrible application backends, especially when you’re dealing with real-time data.

Batch ETL processes read from the source system on a schedule, which not only introduces latency but also puts strain on your relational database server.

ETL/ELT is simply not designed for serving high volumes of concurrent data requests in real-time. By nature, it introduces untenable latency between data updates and their availability to downstream consumers. With these batch approaches, latencies of more than an hour are common, with five-minute latencies about as fast as can be expected.

And finally, ETLs put your application or service at risk. If you’re querying a source system (often inefficiently) on a schedule, that puts a strain on your database server, which puts a strain on your application and degrades your user experience. Sure, you can create a read replica, but now you’re doubling your storage costs, and you’re still stuck with the same latency and concurrency constraints.

Change Data Capture (CDC) to the Real-Time Rescue

Hope is not lost, however, thanks to real-time change data capture (CDC). CDC is a method of tracking changes made to a database such as inserts, updates and deletes, and sending those changes to a downstream system in real time.

Change data capture works by monitoring a transaction log of the database. CDC tools read the transaction log and extract the changes that have been made. These changes are then sent to the downstream system.

Change data capture tools read from the database log file and propagate change events to a message queue for downstream consumers.

The transaction log, such as PostgreSQL’s Write Ahead Log (WAL) or MySQL’s “bin log,” chronologically records database changes and related data. This log-based CDC minimizes the additional load on the source system, making it superior to other methods executing queries directly on source tables.

CDC tools monitor these logs for new entries and append them to a topic on an event-streaming platform like Apache Kafka or some other message queue, where they can be consumed and processed by downstream systems such as data warehouses, data lakes or real-time data platforms.

Real-Time Analytics with Change Data Capture Data

If your service or product uses a microservices architecture, it’s highly likely that you have several (perhaps dozens!) of relational databases that are continually being updated with new information about your customers, your products and even how your internal systems are running. Wouldn’t it be nice to be able to run analytics on that data in real time so you can implement features like real-time recommendation engines or real-time visualizations in your products or internal tools like anomaly detection, systems automation or operational intelligence dashboards?

For example, let’s say you run an e-commerce business. Your website runs over a relational database that keeps track of customers, products and transactions. Every customer action, such as viewing products, adding to a cart and making a purchase, triggers a change in a database.

Using change data capture, you can keep these data sources in sync with real-time analytics systems to provide the up-to-the-second details needed for managing inventory, logistics and positive customer experiences.

Now, when you want to place a personalized offer in front of a shopper during checkout to improve conversion rates and increase average order value, you can rely on your real-time data pipelines, fed by the most up-to-date change data to do so.

How Do You Build a Real-Time CDC Pipeline?

OK, that all sounds great. But how do you build a CDC event pipeline? How do you stream changes from your relational database into a system that can run real-time analytics and then expose them back as APIs that you can incorporate into the products you’re building?

Let’s start with the components you’ll need:

  • Source data system: This is the database that contains the data being tracked by CDC. It could be Postgres, MongoDB, MySQL or any other such database. Note that the database server’s configuration may need to be updated to support CDC.
  • CDC connector: This is an agent that monitors the data source and captures changes to the data. It connects to a database server, monitors transaction logs and publishes events to a message queue. These components are built to navigate database schema and support tracking specific tables. The most common tool here is Debezium, an open source change data capture framework on which many data stack companies have built change data tooling.
  • Event streaming platform: This is the transport mechanism for your change data. Change data streams get packaged as messages, which are placed onto topics, where they can be read and consumed by many downstream consumers. Apache Kafka is the go-to open source tool here, with Confluent and Redpanda , among others, providing some flexibility and performance extensions on Kafka APIs.
  • Real-time database or platform: For batch analytics workflows like business intelligence and machine learning, this is usually a data warehouse or data lake. But we’re here for real-time analytics, so in this case, we’d go with a real-time database like those I mentioned above or a real-time data platform like Tinybird. This system subscribes to change data topics on the event streaming platform and writes them to a database optimized for low-latency, high-concurrency analytics queries.
  • Real-time API layer: If your goal, like many others, is to build user-facing features on top of change data streams, then you’ll need an API layer to expose your queries and scale them to support your new service or feature. This is where real-time data platforms like Tinybird provide advantages over managed databases, as they offer API creation out of the box. Otherwise, you can turn to tried-and-tested ORMs (object-relational mappings) and build the API layer yourself.

An example real-time CDC pipeline for PostgreSQL. Note that unless your destination includes an API layer, you’ll have to build one to support user-facing features.

Put all these components together, and you’ve got a real-time analytics pipeline built on fresh data from your source data systems. From there, what you build is limited only by your imagination (and your SQL skills).

Change Data Capture: Making Your Relational Databases Real Time

Change data capture (CDC) bridges the gap between traditional backend databases and modern real-time streaming data architectures. By capturing and instantly propagating data changes, CDC gives you the power to create new event streams and enrich others with up-to-the-second information from existing applications and services.

So what are you waiting for? It’s time to tap into that 20-year-old Postgres instance and mine it for all its worth. Get out there, research the right CDC solution for your database, and start building. If you’re working with Postgres, MongoDB  or MySQL, here are some links to get you started:

The post Change Data Capture for Real-Time Access to Backend Databases appeared first on The New Stack.

]]>
Common Cloud Misconfigurations That Lead to Data Breaches https://thenewstack.io/common-cloud-misconfigurations-that-lead-to-data-breaches/ Fri, 01 Sep 2023 13:31:51 +0000 https://thenewstack.io/?p=22717145

The cloud has become the new battleground for adversary activity: CrowdStrike observed a 95% increase in cloud exploitation from 2021

The post Common Cloud Misconfigurations That Lead to Data Breaches appeared first on The New Stack.

]]>

The cloud has become the new battleground for adversary activity: CrowdStrike observed a 95% increase in cloud exploitation from 2021 to 2022, and a 288% jump in cases involving threat actors directly targeting the cloud. Defending your cloud environment requires understanding how threat actors operate — how they’re breaking in and moving laterally, which resources they target and how they evade detection.

Cloud misconfigurations — the gaps, errors or vulnerabilities that occur when security settings are poorly chosen or neglected entirely — provide adversaries with an easy path to infiltrate the cloud. Multicloud environments are complex, and it can be difficult to tell when excessive account permissions are granted, improper public access is configured or other mistakes are made. It can also be difficult to tell when an adversary takes advantage of them.

Misconfigured settings in the cloud clear the path for adversaries to move quickly.

A breach in the cloud can expose a massive volume of sensitive information including personal data, financial records, intellectual property and trade secrets. The speed at which an adversary can move undetected through cloud environments to find and exfiltrate this data is a primary concern. Malicious actors will speed up the process of searching for and finding data of value in the cloud by using the native tools within the cloud environment, unlike an on-premises environment where they must deploy tools, making it harder for them to avoid detection. Proper cloud security is required to prevent breaches with far-ranging consequences.

So, what are the most common misconfigurations we see exploited by threat actors and how are adversaries exploiting them to get to your data?

  • Ineffective network controls: Gaps and blind spots in network access controls leave many doors open for adversaries to walk right through.
  • Unrestricted outbound access: When you have unrestricted outbound access to the internet, bad actors can take advantage of your lack of outbound restrictions and workload protection to exfiltrate data from your cloud platforms. Your cloud instances should be restricted to specific IP addresses and services to prevent adversaries from accessing and exfiltrating your data.
  • Improper public access configured: Exposing a storage bucket or a critical network service like SSH (Secure Shell Protocol), SMB (Server Message Block) or RDP (Remote Desktop Protocol) to the internet, or even a web service that was not intended to be public, can rapidly result in a cloud compromise of the server and exfiltration or deletion of sensitive data.
  • Public snapshots and images: Accidentally making a volume snapshot or machine image (template) public is rare. When it does happen, it allows opportunistic adversaries to collect sensitive data from that public image. In some cases, that data may contain passwords, keys and certificates, or API credentials leading to a larger compromise of a cloud platform.
  • Open databases, caches and storage buckets: Developers occasionally make a database or object cache public without sufficient authentication/authorization controls, exposing the entirety of the database or cache to opportunistic adversaries for data theft, destruction or tampering.
  • Neglected cloud infrastructure: You would be amazed at just how many times a cloud platform gets spun up to support a short-term need, only to be left running at the end of the exercise and neglected once the team has moved on. Neglected cloud infrastructure is not maintained by the development or security operations teams, leaving bad actors free to gain access in search of sensitive data that may have been left behind.
  • Inadequate network segmentation: Modern cloud network concepts such as network security groups make old, cumbersome practices like ACLs (access control lists) a thing of the past. But insufficient security group management practices can create an environment where adversaries can freely move from host to host and service to service, based on an implicit architectural assumption that “inside the network is safe,” and that “frontend firewalls are all that is needed.” By not taking advantage of security group features to permit only host groups that need to communicate to do so, and to block unnecessary outbound traffic, cloud defenders miss out on the chance to block the majority of breaches involving cloud-based endpoints.
  • Monitoring and alerting gaps: Centralized visibility into the logs and alerts from all services make it easier to search for anomalies.
  • Disabled logging: Effective data logging of cloud security events is imperative for the detection of malicious threat actor behavior. In many cases, however, logging has been disabled by default on a cloud platform or gets disabled to reduce the overhead of maintaining logs. If logging is disabled, there is no record of events and therefore no means of detecting potentially malicious events or actions. Logging should be enabled and managed as a best practice.
  • Missing alerts: Most cloud providers and all cloud security posture management providers provide alerts for important misconfigurations and most detect anomalous or likely malicious activities. Unfortunately, defenders often don’t have these alerts on their radar, either due to too much low-relevance information (alert fatigue) or a simple lack of connection between those alert sources and the places they look for alerts, such as a SIEM (security information and event management) tools.
  • Ineffective identity architecture: The existence of user accounts not rooted in a single identity provider that enforces limited session times and multifactor authentication (MFA), and can flag or block sign-in for irregular or high-risk signing activity, is a core contributor to cloud data breaches because the risk of stolen credential use is so high.
  • Exposed access keys: Access keys are used to interact with the cloud-service plane as a security principal. Exposed keys can be rapidly misused by unauthorized parties to steal or delete data; threat actors may also demand a ransom in exchange for a promise to not sell or leak it. While these keys can be kept confidential, albeit with some difficulty, it is better to expire them or use automatically rotated short-lived access keys in combination with restrictions on where (from what networks and IP addresses) they can be used.
  • Excessive account permissions: Most accounts (roles, services) have a limited set of normal operations and a slightly larger set of occasional operations. When they are provisioned with far greater privileges than needed and these privileges are misused by a threat actor, the “blast radius” is unnecessarily large. Excessive permissions enable lateral movement, persistence and privilege escalation, which can lead to more severe impacts of data exfiltration, destruction and code tampering.

Just about everyone has a cloud presence at this point. A lot of organizations make the decision for cost savings and flexibility without considering the security challenges that go alongside this new infrastructure. Cloud security isn’t something that security teams will understand without requisite training. Maintaining best practices in cloud security posture management will help you avoid common misconfigurations that lead to a cloud security breach.

The post Common Cloud Misconfigurations That Lead to Data Breaches appeared first on The New Stack.

]]>
How to Give Developers Cloud Security Tools They’ll Love https://thenewstack.io/how-to-give-developers-cloud-security-tools-theyll-love/ Thu, 31 Aug 2023 15:10:43 +0000 https://thenewstack.io/?p=22717100

There are few better ways to make developers resent cybersecurity than to impose security tools on them that get in

The post How to Give Developers Cloud Security Tools They’ll Love appeared first on The New Stack.

]]>

There are few better ways to make developers resent cybersecurity than to impose security tools on them that get in the way of development operations.

After all, although many developers recognize the importance of securing applications and the environments that host them, their main priority as software engineers is to build software, not to secure it. If you burden them with security tools that hamper their ability to write code efficiently, you’re likely to get resistance against the solutions — and rampant security risks because your developers may not take the tools seriously or use them to maximum effect.

Fortunately, that doesn’t have to be the case. There are ways to square the need for rigorous security tools with developers’ desire for efficiency and flexibility in their own work. Here are some tips to help you choose the right security tools and features to ensure that security solutions effectively mitigate risks without burdening developers.

What to Look for in Modern Cloud Security Tools

There are many types of security tools out there, each designed to protect a specific type of environment, a certain stage of the software delivery life cycle or against a certain type of risk. You might use “shift left” security tools to detect security risks early in the software delivery pipeline, for example, while relying on cloud security posture management (CSPM) and cloud identity and entitlement management (CIEM) solutions to detect and manage risks within the cloud environments that host applications.

You could leverage all of these features via an integrated cloud native application protection platform (CNAPP) solution, or you could implement them individually, using separate tools for each one.

However, regardless of the type of security tools you need to deploy or types of risks you’re trying to manage, your solutions should provide a few key benefits to ensure they don’t get in the way of developer productivity.

Context-Aware Security

Context-aware security is the use of contextual information to assess whether a risk exists in the first place, and if so, the potential severity of that risk. It’s different from a more-generic, blunter approach to security wherein all potential risks are treated the same, regardless of context.

The key benefit of context-aware security for developers is that it’s a way of balancing security requirements with usability and productivity. Based on the context of each situation, your security tools can evaluate how rigorously to deploy protections that may slow down development operations.

For example, imagine that you’ve configured multifactor authentication (MFA) by default for the source code management (SCM) system that your developers use. In general, requiring MFA to access source code is a best practice from a security perspective because it reduces the risk of unauthorized users being able to inject malicious code or dependencies into your repositories. However, having to enter multiple login factors every time developers want to push code to the SCM or view its status can slow down operations.

To provide a healthy balance between risk and productivity in this case, you could deploy a context-aware security platform that requires MFA by default when accessing the SCM but only requires one login factor when a developer connects from the same IP address and during the same time window from which he or she has previously connected. Based on contextual information, lighter security protections can be deployed in some circumstances so that developers can work faster.

Security Integrations

The more security tools you require developers to integrate with their own tooling, the harder their lives will be. Not only will the initial setup take a long time, but they’ll also be stuck having to update integrations every time they update their own tools.

To mitigate this challenge, look for security platforms that offer a wide selection of out-of-the-box integrations. Native integrations mean that developers can connect security tooling to their own tools quickly and easily, and that updates can happen automatically. It’s another way to ensure that development operations are secure, but without hampering developer efficiency or experience.

Comprehensive Protection

The more security features and protections you can deploy through a single platform, the fewer security tools and processes your developers will have to contend with to secure their own tools and resources.

This is the main reason why choosing a consolidated, all-in-one cloud security platform leads to a better developer experience. It not only simplifies tool deployment, but also gives developers a one-stop solution for reporting, managing and remediating risks. Instead of toggling through different tools to manage different types of security challenges, they can do it all from a single location, and then get back to their main job — development.

Getting Developers on Board with Security

At its worst, security tools are the bane of developers’ existence. It gets in their way and slows them down, and they treat it as a burden they have to bear.

Well-designed, well-implemented security tools do the opposite. By using strategies such as context-aware security, broad integrations and comprehensive, all-in-one cloud security platforms, organizations can deploy the protections they need to keep IT resources secure while simultaneously keeping developers happy and productive.

Interested in strengthening your cloud security posture? The Orca Cloud Security Platform offers complete visibility and prioritized alerts for potential threats across your entire cloud estate. Sign up for a free cloud risk assessment or request a demo today to learn more.

The post How to Give Developers Cloud Security Tools They’ll Love appeared first on The New Stack.

]]>
Setting up Multicluster Service Mesh with Rafay CLI https://thenewstack.io/setting-up-multicluster-service-mesh-with-rafay-cli/ Wed, 30 Aug 2023 17:12:08 +0000 https://thenewstack.io/?p=22716971

This is the second of a two-part series. Read Part 1.  Over the past several months, our team has been

The post Setting up Multicluster Service Mesh with Rafay CLI appeared first on The New Stack.

]]>

This is the second of a two-part series. Read Part 1

Over the past several months, our team has been working on scaling Rafay’s SaaS controller. As a crucial part of this, we embarked on setting up multicluster Istio environments. During this process, we encountered and successfully tackled the challenges previously mentioned. These challenges encompassed managing the complexity of the configuration, ensuring consistent settings across clusters, establishing secure network connectivity and handling service discovery, monitoring and troubleshooting complexities.

To overcome these challenges, we adopted Infrastructure as Code (IaC) approaches for configuration management and developed a command line interface (CLI) automation tool to ensure consistent and streamlined multicluster Istio deployments. The CLI follows the “multi-primary on different networks” model described in the Istio documentation. The topology we use in our multicluster Istio deployments looks like the image below.

The CLI uses a straightforward configuration. Below is an example of the configuration format:

$ cat examples/mesh.yaml
apiVersion: ristioctl.k8smgmt.io/v3
kind: Certificate
metadata:
  name: ristioctl-certs	
spec:
  validityHours: 2190
  password: false
  sanSuffix: istio.io # Subject Alternative Name Suffix
  meshID: uswestmesh
---
apiVersion: ristioctl.k8smgmt.io/v3
kind: Cluster
metadata:
  name: cluster1
spec:
  kubeconfigFile: "kubeconfig-istio-demo.yaml"
  context: cluster1
  meshID: uswestmesh
  version: "1.18.0"
  installHelloWorld: true #deploy sample HelloWorld application
---
apiVersion: ristioctl.k8smgmt.io/v3
kind: Cluster
metadata:
  name: cluster2
spec:
  kubeconfigFile: "kubeconfig-istio-demo.yaml"
  context: cluster2
  meshID: uswestmesh
  version: "1.18.0"
  installHelloWorld: true   #deploy sample HelloWorld application


(Note: The example above is a generic representation.)

In this configuration, the CLI is set up to work with two Kubernetes clusters: cluster1 and cluster2. Each cluster is defined with its respective details, including the Kubernetes kubeconfig file, the context and the version of Istio to be installed. The CLI uses this configuration to establish connectivity between services across the clusters and create the multicluster service mesh.

Explanation of the configuration:

Certificate: The CLI establishes trust between all clusters in the mesh using this configuration. It will generate and deploy distinct certificates for each cluster. All cluster certificates are issued by the same root certificate authority (CA). Internally, the CLI uses the step-ca tool.

Explanation:

  • apiVersion: The version of the API being used, in this case, it’s ristioctl.k8smgmt.io/v3.
  • kind: The type of resource, which is Certificate in this case.
  • metadata: Metadata associated with the resource, such as the resource name.
  • spec: This section contains the specifications or settings for the resource.
    • validityHours: Specifies the validity period of the certificate in hours.
    • password: Indicates whether a password is required.
    • sanSuffix: Subject Alternative Name (SAN) Suffix for the certificate.
    • meshID: Identifier for the multicluster service mesh.

Cluster: These are cluster resources used to define individual Kubernetes clusters that will be part of the multicluster service mesh. Each cluster resource represents a different Kubernetes cluster.

Explanation:

  • kubeconfigFile: Specifies the path to the kubeconfig file for the respective cluster, which contains authentication details and cluster information.
  • context: The Kubernetes context associated with the cluster, which defines a named set of access parameters.
  • meshID: Identifies the multicluster service mesh that these clusters will be connected to.
  • version: Specifies the version of Istio to be deployed in the clusters.
  • installHelloWorld: Indicates whether to deploy a sample HelloWorld application in each cluster.

Overall, this configuration describes the necessary settings to set up a multicluster service mesh using the ristioctl CLI tool. It includes the specification for a certificate and Kubernetes clusters that will be part of the service mesh. The ristioctl CLI tool will use this configuration to deploy Istio and other required configurations to create a unified and scalable mesh over these clusters. The steps below outline the tasks the CLI tool handles internally to set up a multicluster service mesh. Let’s further explain each step:

  • Configure trust across all clusters in the mesh: The CLI tool establishes trust between the Kubernetes clusters participating in the multicluster service mesh. This trust allows secure communication and authentication between services in different clusters. This involves generating and distributing certificates and keys for mutual TLS (Transport Layer Security) authentication.
  • Deploy Istio into the clusters: The CLI deploys Istio into each Kubernetes cluster within the mesh.
  • Deploy east-west gateway into the clusters: The east-west gateway is an Istio component responsible for handling traffic within the service mesh, specifically the traffic flowing between services in different clusters (east-west traffic). The CLI deploys the east-west gateway into each cluster to enable cross-cluster communication.
  • Expose services in the clusters: The CLI ensures that services run within each cluster are appropriately exposed and accessible to the other clusters in the multicluster service mesh.
  • Provision cross-cluster service discovery using Rafay ZTKA-based secure channel: Rafay ZTKA (Zero Trust Kubectl Access) is a secure channel technology that enables cross-cluster Kube API server communication.

By automating these steps, the CLI simplifies setting up a multicluster service mesh, reducing the operational complexity for users and ensuring a unified and scalable mesh over clusters in different environments. This approach enhances connectivity, security and observability, allowing organizations to adopt a multicloud or hybrid cloud strategy easily.

To use it:

ristioctl apply -f examples/mesh.yaml

The CLI is open source. You can find more details at https://github.com/RafaySystems/rafay-istio-multicluster/blob/main/README.md.

We use Rafay Zero Trust Kubectl Access (ZTKA) to prevent exposing the Kubernetes Cluster Kube API server to a different network for improved security. To implement this, you need to incorporate Rafay’s ZTKA kubeconfig in the configuration. The resulting topology will resemble the following:

Conclusion

Multicluster service connectivity is crucial for various organizational needs. While Istio provides multicluster connectivity, configuring it can be complex and cumbersome. Therefore, we have developed a tool to simplify the configuration process. Ensuring secure network connectivity between clusters is paramount to safeguarding data in the multicluster environment. With our tool, organizations can streamline the setup of multicluster service mesh and establish a secure and scalable infrastructure to support their distributed applications effectively.

The post Setting up Multicluster Service Mesh with Rafay CLI appeared first on The New Stack.

]]>
The API Gateway and the Future of Cloud Native Applications https://thenewstack.io/the-api-gateway-and-the-future-of-cloud-native-applications/ Wed, 30 Aug 2023 17:00:06 +0000 https://thenewstack.io/?p=22716085

The rapid growth of cloud native applications that are smaller, more distributed and designed for highly dynamic environments has turned

The post The API Gateway and the Future of Cloud Native Applications appeared first on The New Stack.

]]>

The rapid growth of cloud native applications that are smaller, more distributed and designed for highly dynamic environments has turned API gateways into indispensable intermediaries for driving digital initiatives.

At the same time, the emergence of the Kubernetes Gateway API, with support from the Envoy Gateway project, is driving a shift towards standardization and interoperability. Ultimately, this will enable organizations to leverage the benefits of a standard API gateway even as API gateway vendors are motivated to invest in continued innovation.

This article explores how API gateways are adapting to meet the demands of modern cloud native applications, including the concept of the API gateway as part of the cloud operating system, the roles of the emerging Kubernetes Gateway API and Envoy Gateway projects, and potential future advancements.

The Role of the API Gateway in Cloud Native Applications

A modern cloud native application is designed and developed specifically to take advantage of cloud infrastructure and services. It embraces the principles of scalability, resilience, and flexibility, enabling organizations to build robust and agile software systems. A typical cloud native application comprises various components, including a front-end, Backend For Frontend (BFF) services, back-end services and a database. Let’s review each of these components and how the API gateway plays a crucial role in orchestrating and managing communication between them.

Figure 1: Cloud native application architecture.

 

Cloud Native Application Components

Front-end: The front-end is the user-facing part responsible for rendering the user interface and interacting with end-users. It consists of web or mobile applications that communicate with back-end services via APIs exposed by the API gateway. The front-end components can be built using frameworks like React, Angular, or Vue.js, and are designed to be lightweight, highly responsive, and scalable to handle user requests efficiently.

Backend For Frontend (BFF) services: A vital part of cloud native applications, BFF services are specialized components that act as intermediaries, aggregating and transforming data from various back-end services to fulfill specific front-end requirements. They encapsulate business logic, handle authentication and authorization, and optimize data delivery to enhance performance and the user experience. The API gateway enables the routing of requests from the front-end to the appropriate BFF service based on the requested functionality.

Back-end services: Cloud native applications typically comprise multiple back-end services, each responsible for a specific set of functions or microservices. These services are designed to be loosely coupled, independently deployable, and scalable. Back-end services communicate with each other through APIs, and the API gateway plays a critical role in routing and load-balancing requests across these services. They are often built using technologies, such as Node.js, Java, Python, Ballerina or Go, and leverage containers for deployment and scalability.

Database: Either a traditional relational database or a NoSQL database can be used to store and manage the application’s data, depending on the specific requirements of the application.

The Role of the API Gateway

As we have seen, the API gateway in a cloud native application acts as a central entry point for external requests and provides a unified interface for accessing its various components. It handles tasks, such as request routing, authentication, authorization, rate limiting, and API analytics. By offloading these concerns from individual components, the API gateway simplifies development, improves performance, and enhances the overall security posture of the application.

Organizations with many services typically group them into business domains. One way to do this is to use domain-driven design (DDD) principles. When services within a domain communicate with services in other domains, these calls typically go through an API gateway. In such situations, an organization will typically have two API gateways: one for internal APIs and the other for external APIs.

 

Figure 2: API gateways in a cloud native architecture

Kubernetes: The Operating System of the Cloud

Advancements in technology — from mainframes and minicomputers to personal computers, mobile devices and cloud native computing — and the changing demands of modern applications have driven software architecture to evolve significantly over the years. One major aspect of this evolution is that, just as Linux became the operating system for enterprise applications two decades ago, Kubernetes has now emerged as the operating system for cloud native applications.

Kubernetes abstracts the underlying infrastructure and provides a unified API for deploying and managing containerized applications across clusters of machines. It also provides core features like automatic scaling, load balancing, service discovery, and rolling deployments, making it easier to build and operate resilient, scalable applications in a cloud native fashion. Additionally, Kubernetes integrates with other cloud native components, such as service meshes and, of course, API gateways, to create a comprehensive application delivery infrastructure.

By leveraging Kubernetes, organizations can achieve cloud portability, deploying applications across various cloud providers and on-premises environments with ease. With its extensible architecture and vibrant ecosystem of tools and extensions, Kubernetes is now the preferred choice for managing complex cloud native architectures.

The API Gateway Is Becoming Part of the Cloud Operating System

As Kubernetes becomes the operating system of the cloud, the API gateway is becoming a part of that operating system. With the rise of cloud native applications, the API gateway has evolved to become an essential component of the infrastructure required for building and managing these applications. It is no longer just a standalone service but rather an integral part of the cloud operating system. This shift is driven by the need for a standardized and centralized approach to manage APIs — just as there is for services and jobs within the cloud native ecosystem.

An Abstraction for APIs

Cloud native applications comprise numerous microservices, each offering specific functions. These microservices communicate with each other through APIs. However, as the number of microservices grows, managing and securing APIs becomes increasingly complex. This is where the API gateway steps in, providing a unified entry point for external requests, implementing security policies, and handling traffic management.

Similar to how services and jobs are abstracted in the cloud operating system, the API gateway offers a higher-level abstraction for APIs. It enables developers to focus on building and deploying microservices while abstracting away the complexities of API management, such as request routing, authentication, and rate limiting. This abstraction simplifies development, improves security, and enhances scalability.

Kubernetes Gateway API

Recognizing the importance of API management in the cloud native landscape, the Kubernetes community has introduced the Kubernetes Gateway API. The Gateway API aims to make APIs a first-class citizen on Kubernetes, providing a standardized way to define and manage API gateways within the Kubernetes ecosystem.

The Gateway API specification offers a consistent interface for configuring and operating API gateways, regardless of the underlying implementation. It allows for declarative configuration, enabling easy management and versioning of API gateway configurations as part of Kubernetes manifests. The Gateway API specification brings the benefits of portability and interoperability, allowing organizations to adopt different API Gateway implementations without modifying their applications.

The Envoy Gateway Project

The Envoy Gateway project is a popular implementation of the Kubernetes Gateway API. Envoy is an open-source, high-performance edge and service proxy designed for cloud native environments. It offers advanced features, such as load balancing, service discovery, rate limiting and observability. By integrating Envoy as a gateway implementation, organizations can leverage the features and capabilities provided by Envoy while adhering to the Gateway API specification.

Envoy acts as a powerful and flexible data plane proxy, managing traffic between external clients and microservices running on Kubernetes. Its extensible architecture allows for integration with various authentication mechanisms, service meshes, and other cloud native components.

The Envoy Gateway project, along with the Kubernetes Gateway API, enables organizations to build a robust and standardized infrastructure for managing APIs within the Kubernetes ecosystem. It provides a seamless and scalable approach to handling API traffic, ensuring the reliable delivery of requests and enforcing security policies.

Farewell, API Gateway. Long Live, Gateway API!

With the emergence of the Gateway API and its standardized approach to configuring and managing API gateways, the actual API gateway runtime is becoming less relevant. Traditionally, organizations will choose and configure a specific API gateway implementation to handle their API traffic. However, the Gateway API abstracts away the specific implementation details, allowing organizations to adopt different API gateway runtimes interchangeably.

The focus is shifting from the runtime to the Gateway API specification, which provides a consistent way to define API gateway configurations. Organizations can choose an API gateway runtime that aligns with their specific requirements and seamlessly switch between different implementations without impacting the application code or configuration.

This decoupling of the runtime from the Gateway API specification enables organizations to leverage the benefits of a standardized API gateway configuration while maintaining flexibility and choice in selecting the most suitable runtime for their needs. It also encourages competition and innovation among API gateway vendors, as they can differentiate their products by providing unique features and optimizations while adhering to the Gateway API specification.

The Shift in How APIs Are Configured and Managed

The Gateway API brings a significant shift in the way APIs are configured and managed within Kubernetes. Previously, configuring API gateways required manual configuration and interaction with specific API gateway runtime configurations. However, with the Gateway API, developers and operations teams can define API gateway configurations directly in Kubernetes manifests, leveraging Kubernetes’ declarative nature.

This shift empowers developers and operations teams to have more control over the API configuration process. They can define routing rules, security policies and transformations for APIs using familiar Kubernetes resource definitions. Moreover, the Gateway API enables them to manage API Gateway configurations as part of their infrastructure-as-code practices, promoting consistency, versioning and scalability.

By abstracting away the low-level details of API gateway runtime configurations, the Gateway API simplifies the API management process for developers and DevOps teams. It reduces the complexity of managing API gateway configurations separately and allows them to focus on their core responsibilities of building and deploying applications. This streamlined approach enhances collaboration and agility within development and operations teams, ultimately leading to faster and more reliable delivery of APIs. Some vendors will offer extensions to the API, supporting unique and differentiated features organizations will want. However, the core API will remain generic.

Conclusion

As cloud native applications continue to shape the future of software architecture, the API gateway plays a pivotal role in enabling secure and scalable communication between microservices. At the same time, the evolving landscape of cloud computing is pushing the API gateway to become an essential part of the cloud operating system. With the emergence of the Gateway API, we are witnessing a shift towards standardization and interoperability, enabling organizations to build cloud native applications that are portable, flexible and robust. By embracing these advancements, we can look forward to a future where the gateway API becomes a vital pillar of the cloud native ecosystem.

 

 

 

The post The API Gateway and the Future of Cloud Native Applications appeared first on The New Stack.

]]>
Top 4 Factors for Cloud Native Observability Tool Selection  https://thenewstack.io/top-4-factors-for-cloud-native-observability-tool-selection/ Tue, 29 Aug 2023 10:00:43 +0000 https://thenewstack.io/?p=22716741

This is the fourth of a four-part series. Read Part 1 , Part 2 and Part 3. Cloud native adoption

The post Top 4 Factors for Cloud Native Observability Tool Selection  appeared first on The New Stack.

]]>

This is the fourth of a four-part series. Read Part 1 , Part 2 and Part 3.

Cloud native adoption isn’t something that can be done with a lift-and-shift migration. There’s much to learn and consider before taking the leap to ensure the cloud native environment can help with business and technical needs. For those who are early in their modernization journeys, this can mean learning the various cloud native terms, benefits, pitfalls and how cloud native observability is essential to success. 

To help, we’ve created a four-part primer around “getting started with cloud native.” These articles are designed to educate and help outline the what and why of cloud native architecture.

Our most recent article covered why traditional application performance monitoring (APM) tools can’t keep up with modern observability needs. This one covers the features and business requirements to consider when selecting cloud native observability tools. 

The Need for Cloud Native Observability

Today’s developers are driven by two general issues pervasive throughout organizations of any size in any industry. First, they must be able to rapidly create and frequently update applications to meet ever-changing business opportunities. And they also must cater to stakeholders and a user base that expects (and demands) apps to be highly available and responsive, and incorporate the newest technologies as they emerge.

Monolithic approaches cannot meet these objectives, but cloud native architecture can. However, enterprises going from monolithic infrastructures to cloud native environments will fail without a modern approach to observability. But while the challenges cloud native adoption brings are real, they are not insurmountable.

Arming teams with modern observability that is purpose-built for cloud native environments will allow them to quickly detect and remediate issues across the environment. Your applications will work as expected. Customers will be happy. Revenue will be protected.

What to Look for in a Cloud Native Observability Solution

A suitable cloud native observability solution will:

Control Data … and Costs

Your cloud native observability solution should help you understand how much data you have and where it’s coming from, as well as make it simpler to quickly find the data you need to solve issues and achieve business results.

Traditional APM and infrastructure monitoring tools lack the ability to efficiently manage the exponential growth of observability data and the technical and organizational complexities of cloud native environments. APM and infrastructure monitoring tools require you to collect, store and pay for all your data regardless of the value you get from it.

With a central control plane, you optimize costs as data grows, without any surprise budget overruns. You get persistent system reliability and improve your user experience. You enjoy platform-generated recommendations that optimize performance. You also don’t waste so much valuable engineering resources on troubleshooting but rather reduce noise and redirect your engineers to solve problems faster. A central control plane allows you to refine, transform and manage your observability data based on need, context and utility. That way, you can analyze and understand the value of your observability data faster, including what’s useful and what’s waste.

Avoid Vendor Lock-In with Open Source Compatibility

Proprietary formats not only make it difficult for engineers to learn how to use systems, but they add customization complexity. Modern cloud native observability solutions natively integrate with open source standards such as Prometheus and OpenTelemetry, which eliminates switching costs. In times of economic uncertainty and when tech talent is scarce, you’ll want to invest in a solution that is open to possibilities.

Ensure Availability and Reliability

When your observability platform is down — even short, intermittent outages or performance degradations — your team is flying blind with no visibility into your services. The good news is that you don’t have to live with unpredictable and unreliable observability.

Your modernization plan should include working with an observability vendor that offers a 99.9% uptime service-level agreement (SLA), which can be confirmed by finding out what the actual delivered uptime has been for the past 12 months. Also, dig in a little to understand how vendors define and monitor SLAs and at what point it notifies customers of problems. A best-in-class solution will proactively monitor its own systems for downtime and count any period greater than a few minutes of system inaccessibility as downtime, prompting immediate customer notification.

Predict and Resolve Customer-Facing Problems Faster

A cloud native observability solution can improve engineering metrics such as mean time to remediate (MTTR) and mean time to detect (MTTD) as well as time to deploy. But that’s not all. It can also provide real-time insights that help improve business key performance indicators (KPIs) such as payment failures, orders submitted/processed or application latency that can hurt the customer experience.

Promote Strong Developer Productivity from the Jump

Today’s engineering on-call shifts are stressful because people can’t find the right data, run queries quickly or remediate issues fast — something enterprises should try to avoid when transitioning to a modern environment.

Most APM tools were introduced more than a decade ago when most engineering teams were organized in a top-down hierarchical fashion. In a DevOps world, developers own responsibility for the operations of their applications. The best way to support a modern environment that’s been organized with small, interdependent engineering teams is with an observability solution that supports workflows aligned with how your distributed, interdependent engineering teams are operating.

Your Observability Vendor Should Be a Partner in Your Cloud Native Success

Technical expertise isn’t a nice to have; it’s a must have for successful businesses. Vendor support experts help teams meet service-level agreements. Therefore, your observability vendor should offer customer support experts that are always available to help you navigate your cloud native journey — at no additional charge.

Read our full series on getting started with cloud native:

  1. 5 Things to Know Before Adopting Cloud Native
  2. Pros and Cons of Cloud Native to Consider Before Adoption
  3. 3 Ways Traditional APM Systems Hinder Modern Observability 
  4. Top 4 Factors for Cloud Native Observability Tool Selection

The post Top 4 Factors for Cloud Native Observability Tool Selection  appeared first on The New Stack.

]]>
Cloud Portability: How Platform Engineering Pushes Past Toil https://thenewstack.io/cloud-portability-how-platform-engineering-pushes-past-toil/ Mon, 28 Aug 2023 15:36:35 +0000 https://thenewstack.io/?p=22716696

This is the fourth part in a series. Read Part 1, Part 2 and Part 3. In our previous discussions

The post Cloud Portability: How Platform Engineering Pushes Past Toil appeared first on The New Stack.

]]>

This is the fourth part in a series. Read Part 1, Part 2 and Part 3.

In our previous discussions on platform engineering, we delved into the intricacies of transitioning to the field, the underlying motivations and its prospective trajectory. We discussed how adopting platform engineering will allow tech organizations to be more adaptable to changes toward in a business direction.

One such extreme change is cloud portability, which is not uncommon anymore in today’s business but has a significant effect on developer experience. Most companies start their journey with a single cloud provider, embracing the cloud native functionalities that these services offer. They build expertise, write automations and leverage as much of the cloud as they can.

But what happens when a business becomes too intertwined with one provider’s services? We’ve seen it firsthand: vendor lock in. This is a significant concern for businesses that need the flexibility to switch or interoperate between providers due to various factors, from customer preferences and regional market conditions to data sovereignty laws and pricing.

This article delves into case studies, outlines challenges and offers an approach to cloud portability that is practical and minimizes toil.

The Hurdles of Cloud Portability

As businesses scale, moving across cloud providers — or cloud portability/interoperability — becomes tempting but also fraught with hurdles. Let’s shed light on what goes inside the tech discussion rooms once such a decision is made.

  1. Obscured documentation: First, it must be determined what is cloud portable and what it is not. This becomes a large exercise because for most companies, in our experience, the architecture of the system is kept in documents that have already become obsolete. Traditional automation on code and infrastructure pipelines also fall flat as environments are rarely recreated and hence the source of truth becomes questionable.
  2. Skill gap: Next, the platform engineers and even developers who have spent years in building expertise on the primary cloud now have to acquire expertise on the new cloud, understanding the parity as well as finer differences. The time and effort spent becoming acquainted with new tools and conventions can detract from the team’s core operational focus, resulting in potential setbacks. Furthermore, there is a high chance that this skill gap will lead to suboptimal cloud environments when the migration happens.
  3. Automation rewrites: Simply put “the nuts and bolts have to match the machinery.” Given parity and disparity between cloud features, automation originally tailored for one cloud environment needs to be overhauled to be compatible with the new one.
  4. Development interruptions: Migration is usually run for long periods and development teams move ahead with enhancing existing workloads, which means automation teams have to constantly catch up. To ensure smooth migration, ongoing developments might be halted, causing potential project delays.
  5. Cross-cloud environment drifts: Over the course of an application’s life, environment drifts occur. During the transition period, when some of these environments are on different clouds, the chance of drift is even higher, causing inconsistencies and confusion. During migration, these disparities can manifest as inconsistencies between the origin and destination environments.
  6. Retraining overhead: Developers need to train on a new set of tools and best practices. This can temporarily dent the team’s productivity and elongate the adaptation phase.

Overcoming Hurdles: Dynamic Cloud Interoperability

Many of the above challenges have led us to design a key principle at Facets, which is a platform for platform engineers.

Documentation of architecture should not be a post fact; in fact, this should be the source of truth that drives automation. This can be built in layers, starting with developers on how they view their architecture devoid of cloud details, natively separating architecture intents from cloud implementations.

This is where our concept of “Dynamic Cloud Interoperability” (DCI) comes into play. DCI is our answer to the traditional narrative around cloud agnosticism. It involves developing an abstraction layer that allows businesses to employ the same infrastructure setup across different cloud providers like Amazon Web Services (AWS), Azure and Google Cloud Platform (GCP) without altering their applications. This means a service like AWS RDS can transition smoothly into a CloudSQL in GCP or a flexible server in Azure with zero hassle.

Here’s how DCI helps you address the aforementioned challenges:

  1. Obscured documentation: Ensure the architecture is documented in a cloud-agnostic manner as a prerequisite to cloud delivery, not a post fact. This not only clarifies the structure but also streamlines migration.
  2. Skill gap: Overlay the destination cloud best practices on the automations, which reduces the need to build expertise from scratch and provides a Day 1 optimized environment.
  3. Automation overhauls: Employ generative automations that auto-adapt, eliminating manual management and rewrites during cloud transitions.
  4. Development delays: Implement continuous delivery that functions uniformly across different cloud environments. Your development doesn’t have to halt for migrations.
  5. Drifts: DCI ensures a drift-free continuous delivery system to maintain consistency and avoid incremental errors over time.
  6. Developer learning curve: With DCI, we adopt a single-pane-of-glass approach. This unified interface makes transitions smoother for developers, obviating the need for extensive retraining.

DCI in Action: The GGX Story

One of the most vivid illustrations of this balance in action is our work with GGX, an NFT marketplace. It provides a platform for trading digital player cards. GGX initially used AWS’s cloud native functionalities but needed to migrate to GCP. Challenges included:

  1. Limited GCP knowledge: GGX’s team, adept with AWS, had little experience with GCP, risking a halt in development to learn the new platform.
  2. Migration hurdles: GGX’s automations, customized for AWS, required modifications for GCP compatibility, a process rife with potential errors.
  3. Infrastructure drift: It was essential that the actual infrastructure configuration remained aligned with its intended design during migration.

Facets intervened, offering solutions:

  1. DCI-aided migration: Dynamic Cloud Interoperability bridged AWS and GCP, eliminating the need to overhaul GGX’s automations.
  2. Developer landing zone: The developer landing zone of Facets ensures that the developers are least exposed to the change and are trained over a period of time without affecting migration timelines.
  3. Infrastructure integrity: GGX ensured a consistent infrastructure state throughout the migration because of the inherent cross-cloud orchestration guarantees.

GGX transitioned in 15 days instead of the projected 2 to 3 months, all while continuing their regular operations.

Crafting an Optimal Cloud Strategy

From our experience, an optimal cloud strategy involves using the best tools a cloud offers while staying flexible enough to use other cloud options when needed. To achieve this, businesses can use standardized cloud services, add protective layers, manage policies in one place and use automated deployment tools. This creates a cloud strategy that’s strong but can adapt when needed.

The move to the cloud offers businesses many potent tools. The key is striking a balance: Use the best of what a cloud provider presents while maintaining the agility to shift if required. A strategic, forward-thinking approach to cloud services lets companies build a strategy that’s both strong and adaptable.

The post Cloud Portability: How Platform Engineering Pushes Past Toil appeared first on The New Stack.

]]>
What’s New with the APISIX Gateway https://thenewstack.io/whats-new-with-the-apisix-gateway/ Sun, 27 Aug 2023 10:00:37 +0000 https://thenewstack.io/?p=22716604

Since it graduated from the Apache Software Foundation Incubator in July 2020, APISIX has become one of the most active

The post What’s New with the APISIX Gateway appeared first on The New Stack.

]]>

Since it graduated from the Apache Software Foundation Incubator in July 2020, APISIX has become one of the most active open source API projects on GitHub. Its founders soon after launched API7.ai, a company focused on the enterprise concerns related to the open source project.

API7 attracted talent from six countries, many of whom are ASF committers and authorities in cloud infrastructure and security, added Kubernetes and Docker capabilities and now supports a range of languages beyond Lua, including Go, Java, Python, Node.js and Wasm, according to CEO Ming Wen.

“From monolithic to microservices architectures and from bare metal to the cloud, the challenges we face grow into how to achieve rapid elastic autoscaling, efficient cluster management and convenient customization,” he said in an email interview.

“Now APIs are growing explosively. There are tens of or even hundreds of thousands of instances. [Users] need fast and elastic autoscaling and low latency to release products quickly and provide users with a good product experience.”

Wasm and More

Wen and Yuansheng Wang created APISix in April 2019 at China’s Zhiliu Technology and donated it to the Apache Software Foundation that October. When creating the company, they took a developer-first approach.

“The sustainable development of open source projects requires the investment of engineers, and the support of financial resources is needed to ensure the investment. Commercialization and open source can form a good ecosystem and a virtuous circle for mutual growth, especially since the basic software requires long-term investment and multiple resources to do well,” Wen said.

Apache APISIX is a dynamic, real-time and high-performance API gateway. It provides traffic management features such as load balancing, circuit breaking, authentication, observability and more.

APISIX consists of a data plane to dynamically control request traffic; a control plane to store and synchronize gateway data configuration, and a newly added AI plane to orchestrate plugins, as well as real-time analysis and processing of request traffic.

It’s built on the OpenResty NGINX distribution that includes the LuaJIT interpreter for Lua scripts. It stores and manages routing-related and plugin-related configurations in etcd, rather than a relational database, which improves availability and is more aligned with cloud native architecture, according to Wen. APISIX uses a radix tree (compressed prefix tree) data structure that only compresses intermediate nodes with one child node, which works well for fast lookups, optimizing performance for route matching.

It offers plugins for features such as speed limiting, identity authentication, request rewriting, URI redirection, open tracing and serverless. The number of plugins has grown from 20 at graduation to more than 100.

It was originally written primarily in Lua, a programming language similar to Python, though since embedding Wasm into APISIX, users can also work in Go, Python and other languages to create custom plugins.

It has added integrations with Prometheus and Datadog for monitoring, the Cypress testing framework and support for HTTP/3 and QUIC to provide more reliable connections and reduce latency.

Growing Support

The separation of the control plane and data plane was among the changes in APISIX 3.0, released last October, meant to address several security-related vulnerabilities found in the project over the past two years. A security patch for the Jason Web Token (JWT) was released in July.

APISIX now offers three modes of deployment: traditional, where both planes are deployed together; decoupled, where they’re deployed independently; and standalone, where only the data plane is deployed and configurations are loaded from a local YAML file.

The project has added full support for Arm64; a new gRPC client to allow developers to call third-party gRPC services directly; a transport layer protocol extension framework called xRPC that allows developers to customize specific application protocols; and support for the OpenAPI 3.0 specification.

Version 3.0 also added an AI plane that optimizes the data plane configuration using data as users’ settings on routes and plugins as well as log metrics.

The 3.4.0 release in June added a new plugin to forward logs to Grafana Loki, and allows for mTLS connection on the route level.

APISix has been gaining contributors more rapidly than rival open source API gateways and only slightly slower than other top Apache projects, according to git-contributor.com.

Its users include Zoom, Lotus Cars, Australian payments company Airwallex; Chinese companies Lenovo, WPS, vivo and OPPO, as well as scientific research institutions such as NASA and European Factory Platform.

API7 uses APISIX at its core, adding enterprise features, such as role-based access control, traffic labeling, support for the SOAP protocol and more. API7 also has achieved SOC 2 Type 1 security certification.

Wen maintains that as part of API7’s API management offerings, AI in its API gateway runtime can help developers improve performance by about 10%. The company has also added an AI-based API portal where developers can use plain language to query data from multiple tools.

Wen posits API7 Enterprise as an all-in-one solution for helping enterprises solve the problems of multicloud and hybrid cloud access and cross-cloud difficulties. While it competes with full lifecycle management platforms like Mulesoft and 3scale, Wen doesn’t consider them direct rivals. Open source options like Kong, Envoy and Spring Cloud are closer competitors he said.

Spring Cloud uses a Java technology stack while Envoy specializes in tackling issues of service mesh, east-west traffic and zero trust security. Kong and APISIX use some of the libraries and boast similar architectural advantages, though he argues that APISIX provides better performance.

Yang Li, a committer to the APISIX project and technical platform lead at Airwallex, discusses the six criteria his company used in selecting the API gateway, problems solved as well as challenges, in this post.

The post What’s New with the APISIX Gateway appeared first on The New Stack.

]]>
Good-Bye Kris Nóva https://thenewstack.io/good-bye-kris-nova/ Wed, 23 Aug 2023 12:57:36 +0000 https://thenewstack.io/?p=22716453

When anyone middle-aged or younger dies, It’s a cliche that they died much too young. Sometimes, it’s really true, though.

The post Good-Bye Kris Nóva appeared first on The New Stack.

]]>

When anyone middle-aged or younger dies, It’s a cliche that they died much too young. Sometimes, it’s really true, though. Someone dies who’s a true, innovative leader who was changing the world for the better. Such a person was Kris Nóva.

I can’t claim to have known Nóva well, but she impressed me. Most people who’d met her would agree. Her job title when she died from a climbing accident was GitHub Principal Engineer. But, she was far more than that.

Not even 40, Nóva had co-founded The Nivenly Foundation. This organization is a member-controlled and democratically-governed open source foundation. Its goal is sought to bring sustainability, autonomy, and control to open source projects and communities. Specifically, it governs the popular tech Mastodon site, Hachyderm Decentralized Social Media, and the Aurae Runtime Project. The latter is a Kubernetes node workload management program.

Kris Nóva and Alex Williams

Many people claim to be “thought leaders.” Only a handful really are. Nóva was one. Her Kubernetes clusterf*ck talks were famous for revealing what’s what with Kubernetes and security. She also co-authored Cloud Native Infrastructure, a must-read for anyone considering running cloud native architectures.

Nóva also authored Hacking Capitalism, a book modeling the tech industry as a system. This book is interesting for anyone who wants to know how tech works.  It’s specifically for marginalized technologists who need tools to navigate the tech business. You should read this if you’re a programmer or engineer constantly flustered by tech’s management, social, and business sides. It will give you the insight you need on how investors, top leadership, and entrepreneurs view our ruthless, but predictable, industry.

She wasn’t just a speaker and writer, though. She was also an open source developer who contributed significantly to Linux, Kubernetes, distributed runtime environments, Falco, and the Go programming language. Altogether, she had created 388 GitHub repositories. In a word, she was “impressive.”

As Josh Berkus, Red Hat’s Kubernetes Manager, said on Mastodon, We lost one of the leading lights of tech this week. Relentlessly driven, astonishingly brilliant, and one of the bravest people I ever met, Kris Nóva was both an inspiration and a friend to dozens, if not hundreds, of people (including me). While it is fitting that she should have left us doing what she always did — taking risks — we are all poorer for having lost her.”

Indeed, we are.

The post Good-Bye Kris Nóva appeared first on The New Stack.

]]>
3 Ways Traditional APM Systems Hinder Modern Observability  https://thenewstack.io/3-ways-traditional-apm-systems-hinder-modern-observability/ Tue, 22 Aug 2023 16:39:47 +0000 https://thenewstack.io/?p=22716321

This is the third of a four-part series. Read Part 1 and Part 2.  Cloud native adoption isn’t something that

The post 3 Ways Traditional APM Systems Hinder Modern Observability  appeared first on The New Stack.

]]>

This is the third of a four-part series. Read Part 1 and Part 2

Cloud native adoption isn’t something that can be done with a lift-and-shift migration. There’s much to learn and consider before taking the leap to ensure the cloud native environment can help with business and technical needs. For those who are early in their modernization journeys, this can mean learning the various cloud native terms, benefits, pitfalls and about how cloud native observability is essential to success. 

To help, we’ve created a four-part primer around “getting started with cloud native.” These articles are designed to educate and help outline the what and why of cloud native architecture.

The previous article discussed the benefits and drawbacks of cloud native architecture. This article explains why traditional application performance monitoring tools aren’t suited for modern observability needs. 

Cloud Native Requires New Tools

As cloud native approaches are more widely adopted, new challenges emerge. Organizations find it harder to understand the interdependencies between the various elements that make up an application or service. And their staff can spend enormous amounts of time trying to get to the root cause of an issue and fix problems.

What makes cloud native environments so different and more challenging to manage? Enterprises monitoring early cloud native workloads only need access to simple performance and availability data. In this scenario, the siloed nature of these platforms isn’t an obstacle to keeping applications or infrastructure running and healthy. So, traditional application performance monitoring (APM) and infrastructure monitoring tools do the job.

But as organizations begin their cloud native initiatives and use DevOps principles to speed application development, they need more. APM and infrastructure monitoring tools simply cannot provide the scalability, reliability and shared data insights needed to rapidly deliver cloud native applications at scale.

Legacy Tool Shortcomings

Here are some key ways legacy monitoring tools fail to meet cloud native challenges. These shortcomings will cause acute pain as your cloud native environment grows and should be factors that are considered when devising your modernization plan:

  • Inability to navigate microservices. Legacy tools are unable to navigate and highlight all the interdependencies of a microservices environment, making it nearly impossible to detect and remediate issues in a timely manner.
  • Lack of control. APM and infrastructure monitoring solutions lack data controls and visibility into observability data usage across teams and individuals. Simple code changes or new deployments can result in surprise overages.
  • Vendor lock-in. Proprietary solutions make it nearly impossible to switch tools, leaving you powerless when prices go up.

And though these may seem like engineering-centric challenges, they end up having a big impact on overall business health:

  • Costs increase. Because the pricing models for these tools are aligned to data ingestion, users or hosts, and there are no mechanisms to control data growth, it’s easy for costs to spiral out of control.
  • Teams end up flying blind. Rapidly rising costs force teams to restrict custom metrics and cardinality tags, limiting metrics stack behavior visualization and causing teams to lack important data.
  • Developer productivity plummets. Engineers are spending long nights and weekends troubleshooting. Burnout sets in. The skills gap worsens.
  • There is downtime and data loss. Service-level agreements (SLAs) and service-level objectives (SLOs) aren’t being met. Small changes lead to data loss.

What’s Needed?

These shortcomings have consequences due to the way modern businesses operate. Customer experience and application responsiveness are critical differentiators. Anything that affects either of these things can drive away customers, infuriate internal workers or alienate partners. Today, rather than waiting for problems — including performance degradation, disruption and downtime — to happen, businesses need to be ahead of the issues. They need to anticipate problems in the making and take corrective actions before they affect the application or the user.

It is obvious that cloud native architectures offer many benefits, but organizations also potentially have many challenges to overcome. Traditional application, infrastructure and security monitoring tools offer some help, but what they truly need is an observability solution designed for cloud native environments.

In the next and final installment, we’ll cover four main considerations you should have during the cloud native observability software selection process.

Read our full series on getting started with cloud native:

  1. 5 Things to Consider Before Adopting Cloud Native
  2. Pros and Cons of Cloud Native to Consider Before Adoption
  3. 3 Reasons Traditional APM Systems Hinder Modern Observability
  4. Top 4 Considerations for Cloud Native Observability Tool Selection

The post 3 Ways Traditional APM Systems Hinder Modern Observability  appeared first on The New Stack.

]]>
Kubernetes 1.28 Accommodates the Service Mesh, Sudden Outages https://thenewstack.io/kubernetes-1-28-accommodates-the-service-mesh-sudden-outages/ Fri, 18 Aug 2023 17:08:26 +0000 https://thenewstack.io/?p=22715797

With its latest release, version 1.28, Kubernetes has formally recognized the service mesh as a first-class citizen in a cluster.

The post Kubernetes 1.28 Accommodates the Service Mesh, Sudden Outages appeared first on The New Stack.

]]>

Planternetes logo

With its latest release, version 1.28, Kubernetes has formally recognized the service mesh as a first-class citizen in a cluster.

K8s v1.28, nicknamed “Planternetes” is the first release where the API recognizes a service mesh as a special type of init container, or containers needed to initialize a pod.

“Folks have been using the sidecar pattern for a long time,” said Grace Nguyen, who led the v1.28 release. The API will support actions such as updating secrets and logging. You want logging to continue even after the node has been shut down, or before it is spun up, Nguyen said.

To support the service mesh, the API gets an additional field to designate a service mesh and has the policy that the containerized service mesh can remain operational for the lifetime of the pod, unlike a regular init container.

The sidecar pattern has been around since Kubernetes itself. A sidecar acts as a networking agent for a Kubernetes application, handling all the traffic in and out, as well as performing checks, monitoring, etc.

Ideally, the service mesh container to be running before the app itself, ensuring all inbound/outbound connections are supported. The service mesh container should also be around after the app containers are terminated, to manage any remaining traffic. In practice, making this happen has been tricky for the service mesh providers like Linkerd and Istio, some of whom have created brittle platform-specific workarounds.

Shut It Down

“Planternetes” is the second release of 2023, it consists of 46 enhancements and a fresh new logo (see above). Twenty of these enhancements are in the Alpha early stage, 14 are in Beta mode, and 12 are ready for production usage (“stable”).

Non-graceful shutdown is one such stable feature. A non-graceful shutdown is one in which the kubelet’s Node Shutdown Manager may not detect a pod becoming inoperable, due to some underlying hardware failure or the OS freezing up. Now, there is a mechanism to move pods to another node when the original node fails. The StatefulSet provides K8s with the info needed to airlift the pod to a healthier environment.

Those convinced that the culprit of any networking problem is always DNS will be happy to know that Kubernetes configuration for DNS has been expanded. Previously, K8s could only search across six domains, with a maximum of 256 characters each. Now, the search paths for kubelets have been increased to 32, with a maximum of 2,048 characters.

Enterprises that don’t want to update K8s as quickly as new versions are released will get some breathing room with this release, Nguyen said. Users can now skip up to three new releases of the control plane, instead of two. This means nodes would only have to be updated once a year instead of twice, and still be current with upstream support.

Release Complexities

Nguyen has been on the release team for over two years now, and would not characterize this release of Kubernetes as a major one — most major releases have a lot of deprecation of obsolete features. But there still has been a lot of work that went on here, just by the size Kubernetes is growing to be.

“I think that we are getting to the point where there are so many features coming in and so many code pull requests that it is hard to keep track of both at the same time,” Nguyen said. A large pull request may come in for a particular feature, but that feature may not be ready for release, and so that code may have to get rolled back.

The post Kubernetes 1.28 Accommodates the Service Mesh, Sudden Outages appeared first on The New Stack.

]]>
How SaaS Companies Can Monetize Generative AI https://thenewstack.io/how-saas-companies-can-monetize-generative-ai/ Fri, 18 Aug 2023 17:00:17 +0000 https://thenewstack.io/?p=22715396

You’ve already been part of a conversation at your company, either as a contributor or observer, on how your customers

The post How SaaS Companies Can Monetize Generative AI appeared first on The New Stack.

]]>

You’ve already been part of a conversation at your company, either as a contributor or observer, on how your customers can benefit with an increased value from your products infused with Generative AI, LLMs or custom AI/ML models.

Universally, product roadmaps are being upended to incorporate AI. As you hash out your approach and draw up the enhanced roadmap, I want to share some words of advice from the good ol’ California Gold Rush: Don’t show up to the gold rush without a shovel!

Similarly, don’t overlook the monetization aspect of your SaaS and AI. Factor it in at the outset and integrate the right plumbing at the start — not as an afterthought or post-launch.

What’s Changing? SaaS Is Shifting to Metered Pricing

Two years ago, I wrote about the inevitable shift to metered pricing for SaaS. The catalyst that would propel the shift at the time was unknown, but the foundational thesis outlined that it was inevitable. No one could have predicted in 2021 that a particular form of AI would serve to be that catalyst.

First thing to realize is that this is not merely a “pricing” change. It is a monetization model change. A pricing change would be a change in what you charge, for example, going from $79 per user/month to $99 per user/month. A monetization model change is a fundamental shift in how you charge, which inevitably will also change what you charge. It’s a business model change.

Traditionally, SaaS pricing has been a relatively lightweight exercise, often decoupled from product or product teams. With a per-user or per-seat model, as long as the price point was set sufficiently (and in some cases arbitrarily) high above a certain threshold that covered for underlying costs with the desired margin, that’s all that was needed. It was essentially a one-size-fits-all approach requiring almost no need for usage instrumentation or product usage tracking and reporting.

SaaS and AI Pivots This on Its Head

Your technology stack increasingly will have more third party value-add components of AI/ML, further infused with additional custom models layered on top. You are going to operate in a multi-vendor business tier (not just infrastructure) ecosystem. These new value-added business tier components in the form of AI/ML in turn will come with a usage-based pricing and charge model. See ChatGPT pricing.

Each user of your SaaS application will stretch and use these metered components in different ways, thereby propelling you to also charge on a metered basis to align with underlying costs and revenue.

Deploy a Proven and Scalable Approach

While on the surface it may seem daunting, believe me, this is a welcomed change. Lean into it.

Not only will it enable you to provide your customers with flexible and friendly consumption-based pricing, but it will also drive a level of operational efficiency and discipline that will further contribute to your bottom line.

Start with de-coupled metering, and then layer a usage-based pricing plan on top. For example, Stripe leverages GPT-4 from OpenAI to enrich the customer-facing experience in its documentation. Instacart has also integrated with ChatGPT to create an Ask Instacart service. The app will allow users to research food-related queries in a conversational language such as healthy meal formulations, recipe ideas based on given ingredients and generated shopping lists based on the ingredients of a particular recipe.

Beyond integrating with ChatGPT and other services, traditional software companies are developing their own GenAI technologies as well. For example, Adobe has rolled out Adobe Firefly to offer its own text- and image-generation capabilities to creatives.

As these capabilities become natively integrated and expected by customers, it will be imperative to track usage and develop a flexible, transparent pricing model that scales to all levels of consumption.

Usage-Based Pricing Is a Natural Fit for Generative AI Companies

Generative AI and Usage-Based Pricing: A Complimentary Pair

ChatGPT parses the text prompt to generate an output based on the “understanding” of that prompt. The prompts and outputs vary in length where the prompt/output size and resource consumption are directly related, with a larger prompt requiring greater resources to process and vice versa. Additionally, the usage profile can be expected to vary significantly from customer to customer. One customer may only use the tool sparingly, while another could be generating new text multiple times daily for weeks on end, and the pricing model must account for this variability.

On top of this, services like ChatGPT are themselves priced according to a usage-based model. This means that any tools leveraging ChatGPT or other models via API will be billed based on the usage; since the backend costs of providing service are inherently variable, the customer-facing billing should be usage-based as well.

To deliver the most fair and transparent pricing, and enable frictionless adoption and user growth, these companies should look to usage-based pricing with a product-led go-to-market motion. Having both elastic frontend usage and backend costs position generative AI products as ideal fits with a usage-based and product-led approach.

How to Get Started

Meter frontend usage and backend resource consumption

Rather than building these models from scratch, many companies elect to leverage OpenAI’s APIs to call GPT-4 (or other models), and serve the response back to customers. To obtain complete visibility into usage costs and margins, each API call to and from OpenAI tech should be metered to understand the size of the input and the corresponding backend costs, as well as the output, processing time and other relevant performance metrics.

By metering both the customer-facing output and the corresponding backend actions, companies can create a real-time view into business KPIs like margin and costs, as well as technical KPIs like service performance and overall traffic. After creating the meters, deploy them to the solution or application where events are originating to begin tracking real-time usage.

Track usage, margins and account health for all customers

Once the metering infrastructure is deployed, begin visualizing usage and costs in real time as usage occurs and customers leverage the generative services. Identify power users and lagging accounts and empower customer-facing teams with contextual data to provide value at every touchpoint.

Since generative AI services like ChatGPT use a token-based billing model, obtain granular token-level consumption information for each customer using your service. This helps to inform customer-level margins and usage for AI services in your products, and it is valuable intel going into sales and renewal conversations. Without a highly accurate and available real-time metering service, this level of fidelity into customer-level consumption, costs and margins would not be possible.

Launch and iterate with flexible usage-based pricing

After deploying meters to track the usage and performance of the generative AI solution, the next step is to monetize this usage with usage-based pricing. Identify the value metrics that customers should be charged for. For text generation this could be the word count or the total processing time to serve the response; for image generation it could be the size of the input prompt, the resolution of the image generated or the number of images generated. Commonly, the final pricing will be built from some combination of multiple factors like those described.

After creating the pricing plan and assigning to customers, real-time usage will be tracked and billed. The on-demand invoice will be kept up-to-date so at any time both the vendor or customers can view current usage charges.

Integrate with your existing tools for next-generation customer success

The final step once metering is deployed and the billing service is configured is to integrate with third-party tools inside your organization to make usage and billing data visible and actionable. Integrate with CRM tooling to augment customer records with live usage data or help streamline support ticket resolution.

With real-time usage data being collected, integrate this system with finance and accounting tools for usage-based revenue recognition, invoice tracking and other tasks.

Amberflo for Generative AI

Amberflo provides an end-to-end platform for customers to easily and accurately meter usage and operate a usage-based business. Track and bill for any scale of consumption, from new models in beta testing up to production-grade models with thousands of daily users. Amberflo is flexible and infrastructure-agnostic to track any resource with any aggregation logic.

Build and experiment with usage-based pricing models, prepaid credits, hybrid pricing or long-term commitments to find the best model and motion to suit any unique business and customer base. Leverage real-time analytics, reporting and dashboards to stay current on usage and revenue, and create actionable alerts to receive notifications when key thresholds or limits are met.

The post How SaaS Companies Can Monetize Generative AI appeared first on The New Stack.

]]>
The Architect’s Guide to Thinking about Hybrid/Multicloud https://thenewstack.io/the-architects-guide-to-thinking-about-hybrid-multicloud/ Fri, 18 Aug 2023 15:22:18 +0000 https://thenewstack.io/?p=22716054

Recently, a journalist asked us to help frame the challenges and complexities of the hybrid cloud for technology leaders. While

The post The Architect’s Guide to Thinking about Hybrid/Multicloud appeared first on The New Stack.

]]>

Recently, a journalist asked us to help frame the challenges and complexities of the hybrid cloud for technology leaders. While we suspect many technologists have given this a fair amount of thought, we also know from first-hand discussions with customers and community members that this is still an area of significant inquiry. We wanted to summarize that thinking into something practical, expanding where appropriate and becoming prescriptive where it was necessary.

We’ll start by saying that the concepts of the hybrid cloud and the multicloud are difficult to unbundle. If you have a single on-premises private cloud and a single public cloud provider, doesn’t that qualify you as a multicloud? Not that anyone really has just two. The team at Flexera does research every year on the subject and found that 87% of enterprises consider themselves multicloud, with 3.4 public clouds and 3.9 private clouds on average, although these number are actually down a touch from last year’s report:

There is a legitimate question in there: Can you have too many clouds?

The answer is yes. If you don’t design things correctly, you can find yourself in the dreaded “n-body problem” state. This is a physics term that was co-opted for software development. In the context of multiple public clouds, the “n-body problem” refers to the complexity of managing, integrating and coordinating these clouds. In an n-cloud environment, each cloud service (Amazon Web Services (AWS) , Azure, Google Cloud, etc.) can be seen as a “body.” Each of these bodies has its own attributes like APIs, services, pricing models, data management tools, security protocols, etc. The n-body problem in this scenario would be to effectively manage and coordinate these diverse and often-complex elements across multiple clouds. A few examples include interoperability, security and compliance, performance management, governance and access control and data management.

As you add more clouds (or bodies) into the system, the problem becomes exponentially more complex because the differences between the clouds aren’t just linear and cannot be extrapolated from pairwise interactions.

Overcoming the n-body problem in a multicloud environment requires thoughtful architecture, particularly around the data storage layers. Choosing cloud native and, more importantly, cloud-portable technologies can unlock the power of multicloud without significant costs.

On the other hand, can there be too few clouds? If too few equals one, then probably. More than one, and you are thinking about the problem in the right way. It turns out that too few clouds or multiple clouds with a single purpose (computer vision or straight backup for example) deliver the same outcome — lock-in and increased business risk.

Lock-in reduces optionality, increases cost and minimizes the firm’s control over its technology stack, choice within that cloud notwithstanding (AWS, for example, has over 200 services and more than 20 database services). Too few clouds can also create business risk. AWS and other clouds go down several times a year. Those outages can bring a business to a standstill.

Enterprises need to build a resilient, interchangeable cloud architecture. This means application portability and data replication such that when a cloud goes down, the application can fail over to the other cloud seamlessly. Again, you will find dozens of databases on every public and private cloud — in fact some of them aren’t even available outside of the public cloud (see Databricks). That is not where the problem exists in the “too few clouds” challenge.

As you add more clouds into the system, the problem becomes exponentially more complex because the differences between the clouds aren’t just linear.

The data layer is more difficult. You won’t find many storage options running on AWS, GCP, Azure, IBM, Alibaba, Tencent and the private cloud. That is the domain of true cloud native storage players — those that are object stores, software-defined and Kubernetes native. In an ideal world AWS, GCP and Azure would all have support for the same APIs (S3), but they don’t. Applications that depend on data running on one of these clouds will need to be redesigned to run on another. This is the lock-in problem.

The key takeaway is to be flexible in your cloud deployment models. Even the most famous “mono-cloud” players like Capital One have significant on-premises deployments — on MinIO in fact. There is no large enterprise that can “lock onto” one cloud. The sheer rigidity of that would keep enterprises from buying companies that are on other clouds. That is the equivalent of cutting off one’s nose to spite one’s face.

Enterprises must be built for optionality in the cloud operating model. It is the key to success. The cloud operating model is about RESTful APIs, monitoring and observability, CI/CD, Kubernetes, containerization, open source and Infrastructure as Code. These requirements are not at odds with flexibility. On the contrary, adhering to these principles provides flexibility.

So What Is the Magic Number?

Well, it is not 42. It could, however, be three. Provided the enterprise has made wise architectural decisions (cloud native, portable, embracing Kubernetes), the answer will be between three and five clouds with provisions made for regulatory requirements that dictate more.

Again, assuming the correct architecture, that range should provide optionality, which will provide leverage on cost. It will provide resilience in the case of an outage, it will provide richness in terms of catalog depth for services required, and it should keep the n-body problem manageable.

What about Manageability?

While most people will tell you complexity is the hardest thing to manage in a multicloud environment, the truth is that consistency is the primary challenge. Having software that can run across clouds (public, private, edge) provides the consistency to manage complexity. Take object storage. If you have a single object store that can run on AWS, GCP, Azure, IBM, Equinix or your private cloud, your architecture becomes materially simpler. Consistent storage and its features (replication, encryption, etc.) enable the enterprise to focus on the application layer.

Consistency creates optionality, and optionality creates leverage. Reducing complexity can’t come at some unsustainable cost. By selecting software that runs across clouds (public, private, edge) you reduce complexity and you increase optionality. If it’s cheaper to run that workload on GCP, move it there. If it’s cheaper to run that database on AWS, run it there. If it’s cheaper to store your data on premises and use external tables, do that.

Choose software that provides a consistent experience for the application and the developer and you will achieve optionality and gain leverage over cost. Make sure that experience is based on open standards (S3 APIs, open table formats).

Bespoke cloud integrations turn out to be a terrible idea. As noted, each additional native integration is an order of magnitude that’s more complex. Think of it this way: If you invest in dedicated teams, you are investing $5 million to $10 million per platform, per year in engineers. That doesn’t account for the cost of the tribal knowledge for each cloud. In the end, it results in buggy, unmaintainable application code. Software-defined, Kubernetes-centric software can solve these problems. Make that investment, not the one in bespoke cloud integrations.

What We Fail to See …

As IT leaders, we often deal with what is in front of us: shadow IT or new M&A integrations. Because of this we often fail to create the framework or first principles associated with the overall strategy. Nowhere is this more evident than the cloud operating model. Enterprises need to embrace first principles for containerization, orchestration, RESTful APIs like S3, automation and the like. Those first principles create the foundation for consistency.

IT leaders who attempt to dictate Cloud A over Cloud B because they get a bunch of upfront credits or benefits to other applications in the portfolio are suckers in the game of lock in that companies like Oracle pioneered, but the big three have mastered.

Going multicloud should not be an excuse for a ballooning IT budget and an inability to hit milestones. It should be the vehicle to manage costs and accelerate the roadmap. Using first cloud operating model principles and adhering to that framework provide the context to analyze almost any situation.

Cloud First or Architecture First?

A client asked us the other day if we recommend dispersing clouds among multiple services. It took us a moment to understand the question because it was the wrong one. The question should have been, “Should I deploy multiple services across multiple clouds?”

The answer to that is yes.

Snowflake runs on multiple clouds. Hashicorp Vault runs on multiple clouds. MongoDB runs on multiple clouds. Spark, Presto, Flink, Arrow and Drill run on multiple clouds. MinIO runs on multiple clouds.

Pick an architecture stack that is cloud native, and you will likely get one that is cloud-portable. This is the way to think about hybrid/multicloud.

The post The Architect’s Guide to Thinking about Hybrid/Multicloud appeared first on The New Stack.

]]>
Cloud Workload Security vs. Cloud Security Posture Management https://thenewstack.io/cloud-workload-security-vs-cloud-security-posture-management/ Thu, 17 Aug 2023 17:49:31 +0000 https://thenewstack.io/?p=22715957

Cloud security is hard. Seriously. In a world where everything from our phones to our refrigerators can be connected to

The post Cloud Workload Security vs. Cloud Security Posture Management appeared first on The New Stack.

]]>

Cloud security is hard.

Seriously. In a world where everything from our phones to our refrigerators can be connected to the internet, securing any cloud native organization can be an overwhelming task simply due to all of the moving parts. It’s not just employee devices you have to worry about anymore, but every identity, configuration and nuance in your (probably many) cloud providers. Not to mention all of the workloads that you run within these providers. Cloud-based servers run software, and software has vulnerabilities, all of which need to be tracked, patched and managed.

It’s a lot to deal with.

While all of this work might seem like the job of one tool, generally speaking, it actually requires two: one that knows about your cloud-based workloads and one that knows about your cloud providers. These two types of tools are generally classified as cloud workload protection platforms and cloud security posture management, respectively. But what exactly are they and how might you use one (or probably both) to increase your security posture?

What Is a Cloud Workload Protection Platform (CWPP)?

Cloud workload protection platforms, or CWPPs, are a category of cybersecurity tools that focus on securing cloud-based workloads across virtual machines, containers and serverless functions. Often installed as an agent within the underlying endpoints, cloud workload protection platforms are largely focused on what is running on the cloud rather than the cloud provider itself.

Because of the devastating impact that unpatched software has had on global cybersecurity, using a CWPP tool is a great way to ensure that your organization is plugging the holes that so often present themselves in both modern and legacy software applications. In addition to securing compute resources, cloud workload protection platforms might also secure the underlying data being used within a workload, which might look like identifying personally identifiable information, payment credentials or even encryption keys where they don’t belong.

What Is Cloud Security Posture Management (CSPM)?

Whereas cloud workload protection platforms focus on what’s running on the cloud, cloud security posture management (CSPM) focuses on the cloud provider itself. Just as operating systems can become vulnerable due to misconfigurations or poor access controls, the same thing can happen to your cloud provider.

Proper configuration and identity management are essential for securing your cloud infrastructure, and mistakes in either of these areas can lead to unauthorized access, data breaches and other security incidents, which will ultimately increase the attack surface of your cloud environment.

With CSPM tools, organizations can ensure that their cloud infrastructure is configured according to best practices, reducing the risk of cybersecurity incidents and maintaining compliance with regulatory requirements. Ultimately, proper configuration management and identity management are critical components of a comprehensive cloud security strategy, and investing in CSPM tools can help organizations stay ahead of evolving threats and protect their critical data and assets.

Uniting CWPP and CSPM for Comprehensive Protection

When it comes to security, there’s no sense locking your doors if you’re not going to use your security system too. All it takes for a threat actor to breach your security is one vulnerability, which is why you must protect every facet of your organization. CWPP is excellent for securing cloud workloads, but without the added protection from CSPM, all that security is performative at best.

While these two toolsets are valuable for establishing healthy security hygiene, it’s important to note that implementing them in isolation from one another can result in duplicate or irrelevant alerts. A robust approach to CWPP and CSPM requires that the selected tools can communicate with one another, allowing for more context-based alerting and an end-to-end understanding of the true security posture of an organization.

Choosing the Right Cloud Security Platform

Selecting the right security tools for your organization is critical, but it’s equally important to ensure that they integrate well with each other. Disparate cloud security platforms can leave gaps in your security posture due to their inability to communicate with each other. This can lead to missed threats, unaddressed vulnerabilities and incomplete security coverage.

To avoid these issues, it’s crucial to select cloud security tools that can integrate and work together seamlessly. This can be achieved through the use of integrated security platforms or by carefully selecting tools that are designed to work together. Integration ensures that security platforms can communicate and share threat intelligence, reducing the risk of missed threats and increasing overall security coverage.

Ultimately, a well-rounded cybersecurity program is essential for protecting your organization from ever-evolving threats. It requires more than just a handful of tools; it requires thoughtful consideration of the threat landscape and well-researched implementation of the selected tools.

Looking for a CWPP or CSPM solution? Orca Security offers a comprehensive and integrated cloud security platform that combines CWPP and CSPM capabilities, providing complete coverage of your cloud security needs. Take a free cloud risk assessment, or request a demo today to see how Orca Security can help secure your cloud environment.

Further Reading

The post Cloud Workload Security vs. Cloud Security Posture Management appeared first on The New Stack.

]]>
Pros and Cons of Cloud Native to Consider Before Adoption https://thenewstack.io/pros-and-cons-of-cloud-native-to-consider-before-adoption/ Tue, 15 Aug 2023 13:26:48 +0000 https://thenewstack.io/?p=22715750

This is the second of a four-part series. Read Part 1. Cloud native adoption isn’t something that can be done

The post Pros and Cons of Cloud Native to Consider Before Adoption appeared first on The New Stack.

]]>

This is the second of a four-part series. Read Part 1.

Cloud native adoption isn’t something that can be done with a lift-shift migration. There’s much to learn and consider before taking the leap to ensure the cloud native environment can help with business and technical needs. For those who are early in their modernization journeys, this can mean learning the various cloud native terms, benefits, pitfalls and about how cloud native observability is essential to success. 

To help, we’ve created a four-part primer around “getting started with cloud native.” These articles are designed to educate and help outline the what and why of cloud native architecture.

The previous article included a definition of cloud native, its connection to DevOps methodology and architectural elements. This article will cover the pros and cons of cloud native adoption and implementation. 

Innovation Brings Complexity

A cloud native architecture speeds up application development since a large application can be broken into parts, and every part can be developed in parallel. That brings many benefits. But the complexity of cloud native apps makes it hard to see the relationship between various elements. That makes it harder to maintain performance, security and accuracy or diagnose problems in these areas when they arise.

So, let’s look at both the benefits and challenges of using a cloud native architecture.

Empowering the Modern Business 

Applications built using a cloud native architecture offer a faster time to market, more scalable, efficient development and improved reliability. Let’s look at the advantages in greater detail.

Faster Time to Market

A cloud native approach to developing applications speeds development times. The component nature of cloud native apps allows development to be distributed to multiple teams. And the work of these teams can be done independently. Each service owner can work on their component of the app simultaneously. One group is not dependent on another group finishing its part of the app before they can start on their own.

Additionally, cloud native apps allow components to be reused. So rather than creating a new frontend for every new app or a new “buy” capability, existing ones can be used on a new app. Reusing various elements greatly reduces the total amount of code that must be created for each new application.

Change one thing in the code for a monolithic structure, and it affects everything across the board. Microservices are independently deployed and don’t affect other services.

Efficiency

As noted, a cloud native approach lets smaller development teams work in parallel on a larger application. The idea is that a smaller team spends less time managing timetables, in meetings and keeping people up to date, and more time doing what needs to be done.

In such a work environment, these small teams access common company resources. That allows each team to benefit from cultural knowledge acquired over time throughout the organization. And naturally, the teams can work together, benefiting from each other’s best practices.

Scalability and Agility

In a cloud native environment, an organization can readily scale different functional areas of an application as needed. Specifically, running elements of a cloud native application on public clouds builds the capability to dynamically adjust compute, storage and other resources to match usage.

Adjustments can be to accommodate long-term trends or short-term changes. For instance, a retailer having a seasonal sale can increase the capacity of its shopping cart and search services to accommodate the surge in orders. Similarly, a financial institution seeing an increase in fraudulent activity may scale up machine learning fraud detection services.

If you run everything through one monolithic application, it’s hard to manage the massive scale of services and respond to changing market conditions as an application grows.

Reliability and Resiliency

Because cloud native systems are based on loosely coupled, interchangeable components, they are less vulnerable to a larger set of failures compared to the classical monolithic application. If one microservice fails, it rarely causes an application-wide outage, although it could degrade performance or functionality. Similarly, containers are designed to be ephemeral, and the failure of one node will have little to no impact on the operations of the cluster. In short, in cloud native environments, the “blast radius” is much smaller when a component fails. When something fails, a smaller set of services or functions may be affected, rather than the entire application.

Cloud Native Also Comes with Challenges

Competitive benefits notwithstanding, cloud native adoption comes with its own set of challenges. None are insurmountable thanks to modern tooling, but understanding what you’re getting into with microservices and containers will set you up for success on your cloud native journey.

Complexity Can Impede Engineer Productivity 

The inherent design of microservices leads to significant complexity. Imagine a microservices architecture featuring thousands of interdependent services — it becomes much more difficult and time-consuming to isolate issues. Even visualizing these services and their connections is challenging, let alone wrapping your head around it. When microservices are so independent of each other, it’s not always easy to manage compatibility and other effects of different versions and workloads.

The infrastructure layer is not any simpler. Kubernetes is notoriously challenging to operate, in part because the ephemeral nature of containers means some may only live for a few seconds or minutes. There are many moving parts in a container orchestration system that all must be configured and maintained correctly.

All told, cloud native complexity places a new burden on engineers who are responsible for performance and reliability.

Unprecedented Observability Data Volume

With cloud native agility comes an explosion of observability data (metrics, logs, traces, events) that can slow down teams while they’re trying to solve customer-facing problems. Cloud native environments, especially as you start scaling them, emit massive amounts of observability data — somewhere between 10 and 100 times more than traditional VM-based environments. Each container emits the same volume of telemetry data as a VM, and scaling containers into the thousands and collecting more and more complex data (higher data cardinality) results in data volume becoming unmanageable.

The post Pros and Cons of Cloud Native to Consider Before Adoption appeared first on The New Stack.

]]>
5 Things to Know Before Adopting Cloud Native https://thenewstack.io/5-things-to-know-before-adopting-cloud-native/ Tue, 08 Aug 2023 15:13:30 +0000 https://thenewstack.io/?p=22715104

This is the first of a four-part series. Cloud native adoption isn’t something that can be done with a lift-and-shift

The post 5 Things to Know Before Adopting Cloud Native appeared first on The New Stack.

]]>

This is the first of a four-part series.

Cloud native adoption isn’t something that can be done with a lift-and-shift migration. There’s much to learn and consider before taking the leap to ensure the cloud native environment can help with business and technical needs. For those who are early in their modernization journeys, this can mean learning the various cloud native terms, benefits, pitfalls and about how cloud native observability is essential to success. 

To help, we’ve created a four-part primer around “getting started with cloud native.” These articles are designed to educate and help outline the what and why of cloud native architecture.

This first article covers the basic elements of cloud native, its differences from legacy architectures and its connection to the DevOps methodology. 

A Look at Cloud Native and Its Necessity for Business Today

A reliable cloud native environment is essential for the survival of enterprises today. Moving to a modern microservices and container-based architecture promises speed, efficiency, availability and the ability to innovate faster — key advantages enterprises need to compete in a world where a new generation of born-in-the-cloud companies are luring away customers hungry for new features, fast transactions and always-on service.

Add in economic uncertainty and the competitive stakes for enterprises soar: A simple search delay on an online retailer’s site could lose a loyal customer and coveted revenue to a more innovative and reliable competitor.

With encroaching competition from nimble organizations, an uncertain global economy and savvy, demanding customers, it’s more important than ever to transition to a modern, cloud native technology stack and best practices that can deliver:

  • A highly available and more reliable service. Cloud native best practices enable you to build a more resilient product and service.
  • More flexibility and interoperability. Cloud native environments are not only more portable, but they also provide the ability to scale up and down dynamically and on demand.
  • Speed and more efficiency. Engineers can iterate faster to handle increased customer expectations.

But buyer beware, cloud native is challenging. The benefits of adopting cloud native technologies are impossible to ignore and Gartner predicts that 90% of companies will be cloud native by 2027. But there are also challenges that come with the shift from a traditional to a modern environment: If the transition to cloud native lacks proper planning and tools, enterprises risk unprecedented data volume, increased costs, downtime, reduced engineering productivity, and, yes, customer dissatisfaction.

What Is Cloud Native?

The challenge most organizations face is how to have the flexibility to rapidly develop and deploy new applications to meet fast-changing business requirements. Increasingly, cloud native is the architecture of choice to build and deploy new applications. A cloud native approach offers benefits to both the business and developers.

In contrast to monolithic application development, cloud native applications or services are loosely coupled with explicitly described dependencies. As a result:

  • Applications and processes run in software containers as isolated units.
  • Independent services and resources are managed by central orchestration processes to improve resource usage and reduce maintenance costs.
  • Businesses get a highly dynamic system that is composed of independent processes that work together to provide business value.

Fundamentally, a cloud native architecture makes use of microservices and containers that leverage public or private cloud platforms as the preferred deployment infrastructure.

  • Microservices provide the loosely coupled application architecture, which enables deployment in highly distributed patterns. Additionally, microservices support a growing ecosystem of solutions that can complement or extend a cloud platform.
  • Containers are important because developing, deploying and maintaining applications requires a lot of ongoing work. Containers offer a way for processes and applications to be bundled and run. They are portable and easy to scale. They can be used throughout an application’s life cycle, from development to test to production. They also allow large applications to be broken into smaller components and presented to other applications as microservices.
  • Kubernetes (also called K8s) is the most popular open source platform used to orchestrate containers. Once engineers configure their desired infrastructure state, Kubernetes then uses automation to sync said state to its platform. Organizations can run Kubernetes with containers on bare metal, virtual machines, public cloud, private cloud and hybrid cloud.

The Cloud Native and DevOps Connection

Cloud native is the intersection of two kinds of changes. One is a software and technical architecture around microservices and containers, and the other is an organizational change known as DevOps. DevOps is a practice that breaks down the silos between development teams and central IT operations teams where the engineers who write the software are also responsible for operating it. This is critical in a cloud native era, as distributed systems are so complex the operations must be run by the teams who built them.

With cloud native and DevOps, small teams work on discrete projects, which can easily be rolled up into the composite app. They can work faster without all of the hassles of operating as part of a larger team. Amazon Executive Chairman Jeff Bezos felt that this small team approach was such a benefit he popularized the concept of the two-pizza team, which is the number of people that can be fed by two pizzas. As the theory goes, the smaller the team, the better the collaboration between members. And such collaboration is critical because software releases are done at a much faster pace than ever before.

Together, cloud native and DevOps allow organizations to rapidly create and frequently update applications to meet ever-changing business opportunities. They help cater to stakeholders and a user base that expects (and demands) apps to be high availability, responsive and incorporate the newest technologies as they emerge.

The Monolithic Architecture Had Its Time and Place

We just discussed how a microservices architecture is a structured manner for deploying a collection of distributed yet interdependent services in an organization. They are game-changing compared to some past application development methodologies, allowing development teams to work independently and at a cloud native scale.

In comparison, with a monolithic architecture, all elements of an application are tightly integrated. A simple change to one, say, the need to support a new frontend, requires making that change and then recompiling the entire application. There are typically three advantages to this architecture:

  • Simple to develop: Many development tools support monolithic application creation.
  • Simple to deploy: Deploy a single file or directory to your runtime.
  • Simple to scale: Scaling the application is easily done by running multiple copies behind some sort of load balancer.

The Monolithic Model

The monolithic model is more traditional and certainly has some pros, but it will slow down enterprises needing to scale and compete in a world where the name of the game is fast, reliable, innovative application development. Here are some of the main issues organizations have when using a monolithic model:

  • Scalability – Individual components aren’t easily scalable.
  • Flexibility – A monolith is constrained by the technologies already used in the system and is often not portable to new environments (across clouds).
  • Reliability – Module errors can affect an application’s availability. Module errors can affect an application’s availability.
  • Deployment – The entire monolith needs to be redeployed when there are changes.
  • Development speed – Development is more complex and slower when a large, monolithic application is involved.

A Final Word about Cloud Native

If the last few years have taught us anything, it’s that speed and agility are the foundation of success for digitally transformed organizations. Organizations that can meet the rapidly evolving demands of their lines of business, customers and internal users will be able to successfully navigate tough times.

Using a cloud native architecture helps ensure new applications can be created quickly and existing applications can be promptly updated to incorporate new technologies or as requirements change over time.

In the next installment, we’ll be discussing the benefits of cloud native architecture and how it can empower modern business.

The post 5 Things to Know Before Adopting Cloud Native appeared first on The New Stack.

]]>
Where Does WebAssembly Fit in the Cloud Native World? https://thenewstack.io/where-does-webassembly-fit-in-the-cloud-native-world/ Thu, 03 Aug 2023 16:50:31 +0000 https://thenewstack.io/?p=22713792

This past January, Matt Butcher, co-founder and CEO of wrote an article about the future of WebAssembly for The New

The post Where Does WebAssembly Fit in the Cloud Native World? appeared first on The New Stack.

]]>

This past January, Matt Butcher, co-founder and CEO of Fermyon Technologies, wrote an article about the future of WebAssembly for The New Stack in which he made a bold statement: “2023 will be the year that the component model begins redefining how we write software.”

In this episode of The New Stack Makers podcast, Butcher acknowledged that that’s a “grandiose claim.” But, as he told Makers host Heather Joslyn, the component model is likely to help WebAssembly more quickly integrate into the cloud native landscape.

An advantage of WebAssembly, or Wasm —  a binary instruction format for a stack-based virtual machine, designed to execute binary code on the web — is that it allows developers to write code in their preferred language and run it anywhere.

“When you think about the way that programming languages have evolved over time, every five to seven years, we see a new superstar programming language,” Butcher said. “And we’ve watched this pattern repeat: the language drops, and then it takes a couple of years, as everybody has to build up the same set of libraries.”

The component model, he said, is positioned to help eliminate this problem by providing “a common way for WebAssembly libraries to say, these are the things I need. These are my imports. And these are the things that I provide — these are my exports. And then we can compile that WebAssembly module, and it can express its needs and we can take another WebAssembly module and we can start joining them up.”

The Bytecode Alliance is in the midst of defining standards for the component model. The model holds enormous promise, Butcher said. Now, he said, if a new language shows up,  “If it can compile the WebAssembly and use the component model, it can pull in libraries from other things. It reduces the barriers there. It means that the maintenance of existing libraries begins to shrink.”

And that,” he added, “really is a big game changer.”

This conversation was sponsored by Fermyon Technologies.

No ‘Kubernetes Killer’

Notably, Butcher said, WebAssembly could help deliver — finally — on the promise of serverless.

Serverless was supposed to offer two key benefits, he said. One, that “you’re only running your software when you’re handling requests.” And the second, to free developers from the need to run a server and allow them to dive right into programming core business logic.

The problem, he added, is that serverless was built on what he called  “yesterday’s technology,” first virtual machines and then containers, which were built for long-running processes and aren’t cloud-agnostic.

“A virtual machine may take several minutes to start up a container takes a couple dozen seconds to start up. And if you’re really trying to handle requests that are coming in, and you know, process the request and return a response as fast as possible, you’re stuck with a kind of design Catch-22. Your core platform can’t do it that fast.”

By contrast, WebAssembly has a rapid startup time and solves other problems for developers, Butcher said. When he and his team began querying developers about their experiences with the cloud native ecosystem, they heard enthusiasm from devs about serverless.

Developers, Butcher said, told them ”If I could just find a platform that didn’t have this low startup time and had a better developer experience, was cheaper to operate, was cross-platform and cross-architecture, that would make me so happy.

“So it’s kind of like having people define a product for you and say, ‘Here’s my wish list of things. Can you build me one of these?’ That’s why I think serverless is in this position right now, where we’re gonna see a big resurgence of it.”

While Butcher acknowledged that he once believed that WebAssembly may be a “Kubernetes killer,” he now said he thinks the two are uncomparable apples and oranges. And that they can, in fact, be compatible.

“The fact that the Kubernetes ecosystem is so engaged in making sure that WebAssembly is supported alongside containers is a good indication that on the orchestrator layer, people are paying attention,” he said. “We’re making wise choices, and we’re making sure that we’re not orphaning an entire technology merely because something new and shiny came along.”

Check out the full episode for more on new developments in WebAssembly and how Wasm is poised to play a central role in the cloud native ecosystem.

The post Where Does WebAssembly Fit in the Cloud Native World? appeared first on The New Stack.

]]>
3 GitOps Myths Busted  https://thenewstack.io/3-gitops-myths-busted/ Wed, 02 Aug 2023 10:00:44 +0000 https://thenewstack.io/?p=22714586

It is highly probable that organizations will not be able to achieve an effortless shift to deployment and management of

The post 3 GitOps Myths Busted  appeared first on The New Stack.

]]>

It is highly probable that organizations will not be able to achieve an effortless shift to deployment and management of applications and Kubernetes environments anytime soon. Kubernetes remains a complex and challenging platform to manage and implement, despite its widespread use and adoption in cloud native deployments.

The reduction in complexity is not primarily due to Kubernetes itself, but rather depends on the tools, processes, and culture surrounding it. The intricate structure of Kubernetes, involving nodes and clusters, contributes to its complexities.

This is where GitOps comes into play as the de facto process for deploying applications to cloud native environments.

According to GitLab, GitOps is “an operational framework that takes DevOps best practices used for application development such as version control, collaboration, compliance, and CI/CD, and applies them to infrastructure automation.” It is built on the open source git version control software.

A more precise and consensus-lead description of GitOps has been released by OpenGitOps — a GitOps working group under the CNCF App Delivery SIG. It consists of a set of open source standards, best practices and community-focused education to help organizations adopt a structured, standardized approach to implementing GitOps. It describes GitOps Principles as:

  1. Declarative: A system managed by GitOps must have its desired state expressed declaratively.
  2. Versioned and Immutable: Desired state is stored in a way that enforces immutability, versioning and retains a complete version history.
  3. Pulled Automatically: Software agents automatically pull the desired state declarations from the source.
  4. Continuously Reconciled: Software agents continuously observe actual system state and attempt to apply the desired state.

However, even with GitOps, challenges persist in managing Kubernetes deployments and associated tasks, emphasizing the need for improvements.

In this article, we will explore some of the issues that remain with GitOps and are currently being addressed — and more specifically, associated myths.

The goal is to make cloud native deployments and management less complex over time. While we may not have an easy button for Kubernetes deployments in the near future, GitOps will continue to play a crucial role in simplifying the process.

“The aim is to enhance the accessibility and manageability of cloud native deployments, reduce the complexities associated with Kubernetes and enable smoother operations,” said Alexis Richardson, founder and CEO of Weaveworks, who first coined the term GitOps. “This will happen through continuous efforts and advancements in GitOps thanks largely to the open source community and projects such as Flux.”

Myth: Be wary of tools that expressly enable GitOps because GitOps can be done with nothing more than standard continuous delivery tools that support Git-based automation.

While this assumption has been propagated in the past, it is very easy to disprove. Counter arguments include how the Cloud Native Computing Foundation equates graduated open source Flux and Argo CD — the two leading platforms for GitOps — as GitOps enablers. Both are now considered critical components of cloud native infrastructure, as they join the graduated ranks of Kubernetes itself, Prometheus and Envoy.

Both also offer features that are continually improving GitOps processes and security.  To wit, Flux version 2 GA ensures Kubernetes clusters remain synchronized with configuration sources beyond Git repositories and include OCI repos as first class, according to Flux’s documentation. This means anyone can now use OCI repos like ECR or Dockerhub as a scalable and secure cache for signed artifacts. This places Flux firmly in the camp of tools addressing the software supply chain.

Also in this version 2, Flux introduces multitenancy support and the ability to sync an arbitrary number of Git repositories, addressing long-standing user requests. The tool is also purpose-built to leverage Kubernetes’ API extension system and seamlessly integrate with core components like Prometheus within the Kubernetes ecosystem.

Flux v2 is developed with the GitOps Toolkit, which consists of composable APIs and specialized tools tailored for building Continuous Delivery solutions on Kubernetes.

“Flux is everywhere now and pulling far ahead of the alternatives. The integration with OCI, cosign, policy and tools like Terraform mean that every serious platform has to build on Flux, while others, such as GitLab and Azure, have already done so,” Richardson said. “With Weaveworks open source dashboard, supported catalog and management extensions customers have a go-to for paid support.”

In a nutshell, successful GitOps adoption is heavily contingent on GitOps enables such as leading Flux and Argo CD while the implementation of culture changes and processes are also essential. Indeed, the transition toward GitOps is all about trusting the pull-based paradigm of continuously applying gradual changes to the overall application stack, Torsten Volk, an analyst for Enterprise Management Associates (EMA), said.

“This requires a shift toward a declarative paradigm where new code defines what it needs to run optimally and a universal controller automatically ensures that all of these requirements are met. The cultural shift toward enabling developers to write declarative instructions for deployment, operation, and upgrade of their own code is the key challenge when it comes to adopting GitOps,” he said.

Myth: Picking either Flux or Argo CD would lead to building a GitOps silo.

When it comes to these tools, each team might have distinct inclinations. Developers may lean towards Flux, while operations teams could prefer specific features of Argo CD. This diversity in preferences is entirely normal within an organization.

Now, the question arises: Can there be a harmonious integration between these tools for the organization?

The answer is yes, there can indeed be a marriage between Flux, Argo CD, and other tools within an organization. By carefully assessing the specific needs and workflows of each team, it is possible to create a seamless collaboration among these tools, ensuring a more efficient and effective overall process.

Both Flux and Argo CD take advantage of the history available in Git to make it possible to easily audit the change history or revert back to previously working versions before a breaking change was applied. However, Flux’s and Argo CD’s workflows and extensions are different.

Open source Flamingo, the Flux subsystem for Argo introduced shortly before KubeCon + CloudNativeCon last year, integrates Flux into Argo CD for what Weaveworks, the company behind Flamingo, said offers a “seamless” GitOps experience with Kubernetes clusters.

“Argo CD is a user-friendly toolkit for easily defining sophisticated deployment pipelines while Flux is fully focused on its simple, controller-based and fully automated approach of deploying and managing application stacks based on standard Kubernetes APIs,” Volk said. “Adding Flux controllers to the nice Argo CD UI could be a nice step forward in making Argo CD more scalable while simplifying the usability of Flux.”

In a practical way, Unnati Mishra, a working member the technical staff at VMware, described how having Flux and Argo CD simultaneously “could be difficult,” but it is possible “to integrate them to some extent for seamless integration.”

“If an organization is currently using one of the tools and wants to switch to the other, they can do so gradually by onboarding new projects or teams to the selected tool while continuing to use the current tool for ongoing projects until those projects are prepared for migration,” Mishra said. “They can even be given their own isolated Kubernetes clusters if there are multiple teams or projects inside the organization that each need a different set of tools. Each team will be able to select the tool that best suits their needs without harming the others in this way.”

Myth: The false promises about scaling with GitOps.

It is highly likely that as your organization embarks on its cloud native journey, there will come a point where scaling to multiclusters becomes necessary. For instance, developers may need to work on and test applications before making pull requests without having direct access to the production code, of course, for applications running in production on Kubernetes.

Moreover, in certain scenarios, a team might manage multiple clusters and distribute workloads among them to ensure sufficient fault tolerance and availability. For example, when running a machine learning training workload, the team might increase the number of replicas or cluster replicas to meet specific demands.

Additionally, different clusters may be deployed across various physical locations in cloud environments, whether on Amazon Web Services, Azure, GCP and others, requiring separate tools and processes to align with geographic mandates, legal restrictions, compliance requirements, and data access policies.

For the above needs, this writer has heard that a Kubernetes deployment platform, that covers CI/CD, and management of clusters is all that is needed. Sure, GitOps would be nice to have, but it is not essential, or if adopted, its cultures and Git repository-centric philosophy without specific GitOps tools such as Flux or Argo are needed, this writer has heard. The counterarguments are numerous, including the issue of drift that plagues multiple clusters, where GitOps and GitOps tools are instead able to retain the golden standard of application code that remains immutable so that any change, however small, made to a cluster is immediately flagged.

In the field, over 80% of organizations are running multiple, sometimes dozens, of Kubernetes clusters in production, Volk notes. These clusters are often differently managed by the tools available by AWS, Azure, GCP, VMware, Red Hat and others, as mentioned above. “To stay on top of these differences there needs to be a unified layer for consistently applying changes across all clusters. The controller-based approach that interfaces with standard Kubernetes APIs can make this happen, but at the same time needs custom integrations with the opinionated parts of the different cloud environments through GitOps is essential,” Volk said. “For example, providing an Amazon RDS database instance requires very different code from deploying that same database to Azure Database Services.”

Indeed, scaling Kubernetes without GitOps is a bad idea, says Selvi Kadirvel, a platform architect and engineer at Elotl. GitOps is considered an essential part of the solution to the tremendous increase in the scale and number of Kubernetes clusters within organizations, Kadirvel said.  For example, another critical trend to handle scale is the expansion of the GitOps paradigm from Kubernetes application deployments to the infrastructure layer for both on-premises and cloud resources,  Drift detection and reconciliation GitOps offers “will serve us well when applied uniformly to lower layers of the software stack,” Kadirvel said.

At the same time, just as the Kubernetes scheduler abstracts decisions on where your “pods” should be running given the multitude of nodes of different types available within a single cluster, the emerging field of multicluster orchestrators will enable platform teams to manage the large number of Kubernetes clusters that companies are having to maintain, Kadirvel said. These are unique from cluster life-cycle management platforms such as OpenShift, Mirantis, Nutanix Kubernetes Engine and Rancher, Kadirvel said.

The post 3 GitOps Myths Busted  appeared first on The New Stack.

]]>
RISC-V Finds Its Foothold in a Rapidly Evolving Processor Ecosystem https://thenewstack.io/risc-v-finds-its-foothold-in-a-rapidly-evolving-processor-ecosystem/ Fri, 28 Jul 2023 10:00:39 +0000 https://thenewstack.io/?p=22714185

Developers have grown up hearing ARM or x86 being the guts of PCs and servers, but an alternative architecture called

The post RISC-V Finds Its Foothold in a Rapidly Evolving Processor Ecosystem appeared first on The New Stack.

]]>

Developers have grown up hearing ARM or x86 being the guts of PCs and servers, but an alternative architecture called RISC-V is emerging.

In the next few years, some companies will inevitably ship PCs and servers running on RISC-V processors. Those systems will likely run on Linux as Microsoft is not known to be developing a Windows OS for the architecture.

But there are big problems with the software ecosystem — the developer support is pitiful. RISC-V International, which is developing the chip architecture, talks more about hardware, with software a distant second in priorities.

Initial Support

Since its emergence close to a decade ago, RISC-V quickly gained the support of major chip makers, including Apple, which has put controllers in its Apple Silicon. About 10 billion chip cores based on RISC-V have shipped. Most recently, Meta announced an AI inferencing chip built on RISC-V architecture.

The chip architecture is often called a hardware equivalent of Linux. It is a free chip technology built on a contributor culture and the ethos of open source, in which a community works together to develop and improve the product.

RISC-V is a free-to-license architecture, which means anyone can fork a version of the architecture into their own chip.

Chips with RISC-V can be compiled like Lego blocks — companies that take the base architecture, and top it off with proprietary hardware blocks that may include accelerators for AI, graphics, or security.

“What was once an experiment, a prototype, is quickly moving into production,” said Calista Redmond, during a keynote at last month’s RISC-V Summit in Barcelona.

The structure of RISC-V makes it suitable for cloud native environments handling diverse applications and complex computing requirements.

The minimal base instructions are designed to quickly offload applications such as AI and analytics to accelerators like GPUs or specialized math processors, which excel at such tasks.

Chips from Intel and AMD are reaching their physical limits, and the flexibility of RISC-V provides a structure to move computing into the future.

For example, RISC-V provides a pathway for new hardware architectures such as sparse computing, which is being researched by the Intelligence Advanced Research Projects Activity, in which processing units are closer to the data in storage or memory.

The Barcelona Supercomputing Centre proposed the concept of merging CPU and memory in a RISC-V chip, which will reduce the memory bottleneck posed by machine-learning applications.

 

“What we want from it — it is actually to do memory-intensive operations close to memory, like memcpy,” said Umair Riaz, a researcher at BSC, referring to the C++ function to copy memory blocks. Riaz also referenced the spinlock function, and mentioned the CPU executing those in memory will be more efficient and faster.

“Executing functions locally you will eventually get performance and less [network] traffic because you are doing much more closer to memory,” Riaz said.

Writing applications for such complicated RISC-V chips may be a load for even the bravest programmers that want to code directly to the hardware. But Intel wants to provide the tooling needed for coders to start testing applications in simulated RISC-V environments.

OneAPI

Intel’s Codeplay software unit recently announced the OneAPI Construction Kit, which includes tools for developers to test code in a simulated RISC-V environment on x86 PCs.

The Construction Kit’s signature feature is support for SYCL — which allows coders to write and compile applications regardless of the hardware architecture — and Intel is taking the first steps to bring RISC-V support to the parallel-programming framework.

The kit includes support for Intel’s DPC++/C++ Compiler, which allows C++ code to be recompiled for use across multiple hardware architectures.

Developers can also test RISC-V code on Raspberry Pi-like developer boards or systems from companies such as Milk-V, and StarFive. Both companies offer high-performance 64-bit RISC-V systems with support for Linux.

Support for Linux tools on RISC-V are tepid. Only a handful of packages are fully supported, and that includes Ubuntu OS, Gnu Toolchain, OpenvSwitch, Apache Nuttx, and Spidermonkey for Mozilla.

Many packages for RISC-V will work reasonably well, but are still not fully supported. For example, the RISC-V developer community in China reported that more than 80% of the packages in open source Fedora are now supported on RISC-V,

Some key packages, such as Pytorch, GCC, TensorFlow, and OpenJDK will work, but are not yet fully supported. Support for open source applications like LibreOffice and Firefox are being built up. Google is accelerating its support of AOSP (Android Open Source Project) on RISC-V, which will be a big part of the next architecture specification.

RISC-V server chip makers Esperanto Technologies and Ventana Micro Systems have announced server chips for cloud computing, but have not talked much about software support or programming models. Esperanto has ported Meta’s Open Pre-Trained Transformer model to its RISC-V server.

RISC-V International, which is developing the architectural spec, is trying to solve that problem with the establishment of the RISC-V Software Ecosystem, also called RISE, to create the underlying software tools and middleware for RISC-V systems. The initial backers include companies such as Google, Intel, Nvidia, Qualcomm, Samsung, and Ventana.

Mark Himelstein, chief technology officer RISC-V International, at the summit talked about RISC-V taking a page from the cultural roots of Linux culture, with contributors contributing to the shared interests.

“That contributor culture means upstreaming on RISC-V and other communities where open source and open standards play a part,” Himelstein said, adding “that does not mean you are working on the pieces of the puzzle that are rapidly commoditizing.”

There is also no structure for hardware and software co-design that makes it easier for coders to use x86 and ARM systems. RISC-V first develops a hardware spec and Linux compatibility comes later. That is very different than Intel, which upstreams Linux drivers for a chip before it is released, which ensures the hardware is compatible with the latest build of the OS.

China, Tho

RISC-V’s software efforts also lack a force of nature like Linus Torvalds that can drive a project forward by sheer will. RISC-V also is not mainstream enough to attract an army of developers.

But it is a different scene with China, which is adopting RISC-V on a massive scale to create homegrown chips and reduce its reliance on Western technology. Developers in China are rolling up their sleeves and contributing coding to stand-up RISC-V compatible operating systems for Linux.

Their motivation is simple — an engineering focus is driving China’s RISC-V initiative, not politics, and there is plenty of motivation for developers to build OS support, especially with the latest Western chip technology out of sight due to export restrictions.

Chinese companies are developing some of the most sophisticated RISC-V chips, and the community is adding support for more packages daily. Many of the core contributors to Fedora, Debian, Gentoo and Arch Linux, GNU toolchain, and Clang are in China.

The RISC-V community in China is also leading a grassroots effort to bring support for ROCm — which is AMD’s parallel-programming framework — to RISC-V processors. AMD did not respond to requests for comment on whether it was involved in porting ROCm to RISC-V.

The post RISC-V Finds Its Foothold in a Rapidly Evolving Processor Ecosystem appeared first on The New Stack.

]]>
API-First Development: Architecting Applications with Intention https://thenewstack.io/api-first-development-architecting-applications-with-intention/ Wed, 26 Jul 2023 19:04:40 +0000 https://thenewstack.io/?p=22714141

Move fast! Break things! As developers on an agile team, we repeat these words constantly, always at least right up

The post API-First Development: Architecting Applications with Intention appeared first on The New Stack.

]]>

Move fast! Break things! As developers on an agile team, we repeat these words constantly, always at least right up until the moment when something actually breaks. Then it turns out, despite all our oft-touted slogans, what everyone really meant was: move fast without breaking things. Duh, why didn’t we think of that sooner?

I want to talk about API-first development because I believe an API-first approach will lead to a faster and more scalable approach to developing software applications without breaking as many things along the way.

What Does ‘API-First Development’ Mean?

API-first development prioritizes the design and development of the API as the foundation for your entire architecture. This means taking extra care to treat your API as a product in its own right, even if it’s only going to be consumed internally by your own developers. This might require a bit more planning and collaboration between stakeholders, but there are a lot of good reasons to invest a little bit of extra time upfront.

Why API-First Development?

More traditionally, tech companies often started with a particular user experience in mind when setting out to develop a product. The API was then developed in a more or less reactive way to transfer all the necessary data required to power that experience. While this approach gets you out the door fast, it isn’t very long before you probably need to go back inside and rethink things. Without an API-first approach, you feel like you’re moving really fast, but it’s possible that you’re just running from the front door to your driveway and back again without even starting the car.

API-first development flips this paradigm by treating the API as the foundation for the entire software system. Let’s face it, you are probably going to want to power more than one developer, maybe even several different teams, all possibly even working on multiple applications, and maybe there will even be an unknown number of third-party developers. Under these fast-paced and highly distributed conditions, your API cannot be an afterthought.

As a software system matures, its real value emerges from its integration into a more complex whole. Features and products are like big delicious juicy fruit hanging off a branch. Your API is the entire freakin’ tree!

How Do I Get Started?

So you’re feeling ready to dive into API-first development? Spectacular! Trust me, the little bit of extra work upfront will pay off big time down the road, and you’ll start seeing the impact almost immediately.

The first step is to design and document your API. Don’t worry, this isn’t as complicated as it might sound. Just create an API spec file. This file will serve as both blueprint and documentation for your API. There are several varieties in common use today (OpenAPI, Swagger, RAML and API Blueprint, for example). We don’t need to stress out about which one to choose right now. The important thing is that all of these specifications provide standardized and machine-readable representations of your API. This standardization pays dividends later by helping you collaborate better, work more efficiently, beef up security and seamlessly integrate with external systems.

Here’s the best part: Writing these specs is actually pretty easy. If you can handle writing the actual server code for your API, then you will pick up any of these specifications in no time at all. It’s a natural extension of what you’re already doing.

This may all seem a bit time-consuming at first, but trust me, that small investment up front will save you heaps of time down the road, especially once you start leveraging the power of code generators (Liblab for example). These tools can do all sorts of cool stuff like generating SDK and server code for your API. Imagine making a change in one place, and boom! It’s instantly updated across all your server code as well as the SDKs used by internal and third-party developers.

Wouldn’t that save you time? That’s the transformative power of API-first development.

Conclusion

An API-first approach might not be the perfect fit for every use case. If you’re working on a small app or a prototype with limited integration needs, going all out with extensive documentation and code generation might not be your cup of tea. Likewise, if you’re dealing with a legacy application with its own well-established API, convincing management to dedicate the time and resources to document that API thoroughly might not be feasible. In most other cases, however, adopting a more proactive API-first approach to development can unlock some serious benefits.

In today’s interconnected digital landscape, it’s high time that we start treating APIs as first-class citizens in our architectures. A top-notch API design sets the stage for innovation by enabling developers to leverage existing functionality in new and unexpected ways. On top of this, APIs fuel collaboration by boosting interoperability between systems. Given all these undeniable advantages, the only thing holding most developers back is a lack of knowledge. So buckle up then, and let’s write a spec!

The post API-First Development: Architecting Applications with Intention appeared first on The New Stack.

]]>
The Kubernetes Inner Loop with Cloud Foundry Korifi https://thenewstack.io/the-kubernetes-inner-loop-with-cloud-foundry-korifi/ Wed, 26 Jul 2023 10:00:16 +0000 https://thenewstack.io/?p=22713215

Certain developer workflows can be tedious. One example is working locally with containers. It brings to mind an old XKCD

The post The Kubernetes Inner Loop with Cloud Foundry Korifi appeared first on The New Stack.

]]>

Certain developer workflows can be tedious. One example is working locally with containers. It brings to mind an old XKCD comic.

When working locally, the process of building and deploying containers can hinder the development experience and have a negative impact on team productivity. The industry refers to local workflows for developers as the “Inner Loop.” Cloud native development teams can greatly benefit from a reliable inner development loop framework. These frameworks facilitate the iterative coding process by automating repetitive tasks such as code building, containerization, and deployment to the target cluster.

Key expectations for an inner dev loop framework include:

  • Automation of repetitive steps, such as code building, container creation, and deployment to the desired cluster;
  • Seamless integration with both remote and local clusters, while providing support for local tunnel debugging in hybrid setups;
  • Customizable workflows to enhance team productivity, allowing for the configuration of tailored processes based on team requirements.

Cloud native applications introduce additional responsibilities for developers, including handling external dependencies, containerization, and configuring orchestration tools such as Kubernetes YAML. These tasks increase the time involved in deployments and also introduce toil — ultimately hindering productivity.

In this tutorial, you will learn how to simplify inner-loop development workflows for your software development teams by making use of a Cloud Foundry abstraction over kind clusters. This abstraction named Korifi is fully open source and is expected to work for all languages and frameworks. Using Cloud Foundry Korifi will help developers push their source code to a cluster and the PaaS will return a URL/endpoint that the developer can then use to access the application or API.

Using ‘kind’ for Local Kubernetes

In the context of Kubernetes, a kind cluster refers to a lightweight, self-contained, and portable Kubernetes cluster that runs entirely within a Docker container. It is primarily used for local development and testing purposes. The main characteristics of kind clusters that make them suitable for local development are the following.

  • Kind clusters consume fewer system resources compared to running a full-scale Kubernetes cluster.
  • Kind clusters eliminate the need for complex cluster provisioning and configuration, making it easier to bootstrap a cluster.
  • By matching the desired specifications, including the version of Kubernetes, network settings, and installed components — kind clusters provide a way to replicate production-like Kubernetes environments locally.

Installing Korifi on kind Clusters

First, please install the following tools before commencing the tutorial. They’re all required at various stages of the process.

Set the following environment variables.

export ROOT_NAMESPACE="cf"

export KORIFI_NAMESPACE="korifi-system"

export ADMIN_USERNAME="kubernetes-admin"

export BASE_DOMAIN="apps-127-0-0-1.nip.io"


Note: nip.io is a wildcard DNS for any IP address. It is powered by PowerDNS with a simple, custom PipeBackend written in Python. In this particular case, apps-127-0-0-1.nip.io will resolve to 127.0.0.1, which will direct requests to the localhost.

Use the following configuration to create the kind cluster. The extraPortMappings field maps additional ports between the container and the host machine. Here, it specifies that container ports 80 and 443 should be mapped to the same host ports 80 and 443 respectively using TCP.

Create root namespaces that will be used in the cluster. It also includes labels for pod security policy enforcement.

Install the following dependencies: cert-manager, kpack, and Contour.

Cert manager is installed with a single kubectl apply command, with the latest release referenced in the path to the yaml definition.

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.12.0/cert-manager.yaml


Cert manager is an open source certificate management solution designed specifically for Kubernetes clusters.

Kpack is installed with a single kubectl apply command, with the latest release referenced in the path to the yaml definition.

kubectl apply -f https://github.com/pivotal/kpack/releases/download/v0.11.0/release-0.11.0.yaml


Kpack is an open source project that integrates with Kubernetes to provide a container-native build process. It consumes Cloud Native Buildpacks to export OCI-compatible containers.

Contour is an open source Ingress controller for Kubernetes that is built on top of the Envoy proxy. An Ingress controller is a Kubernetes resource that manages the inbound network traffic to services within a cluster. It acts as a gateway and provides external access to the services running inside the cluster.

kubectl apply -f https://projectcontour.io/quickstart/contour.yaml


Contour specifically focuses on providing advanced features and capabilities for managing ingress in Kubernetes.

The installation requires a container registry to function. When using Korifi on a local kind cluster, the use of Docker Hub is recommended. In order to access this container registry, a secret will have to be created and configured.

Use the following Helm chart to install Korifi on the kind cluster.

Once installed, push an app using the following steps. First, authenticate with the Cloud Foundry API.

cf api https://api.localhost --skip-ssl-validation
cf login


Next, create a cf org and a cf space:

cf create-org acme-corp
cf target -o acme-corp
cf create-space -o acme-corp bu-rning
cf target -o acme-corp -s bu-rning


And finally, deploy the application:

cf push beautiful-bird -p ~/sandbox/korifi/tests/smoke/assets/test-node-app/


The single cf push command is used to deploy an application to the kind cluster that has Korifi installed on it.

An Alternate Way to Install

The community has contributed a script that will help install Korifi on a kind cluster. The use of this script will help speed things up considerably. This method is recommended if you’re trying Korifi for the first time.

git clone https://github.com/cloudfoundry/korifi.git
cd korifi
./scripts/deploy-on-kind.sh demo-korifi


When the installation completes, apps can be pushed using the same steps as above.

Why Pursue Efficiency in Local Development?

Local development is the first workflow that a developer always works with. It is paramount that this step be accurate and efficient in order to keep developers productive. While efficiency can vary depending on the specific development context. It’s essential to experiment with different approaches and tools to find what works best for your team and project.

An optimized build and deployment process is at the center of a good developer experience, and this is true for local environments too. Streamlined build and deployment pipelines go a long way in minimizing time spent which ensures faster iterations.

When using Kubernetes, Cloud Foundry Korifi is one way to make it faster and more efficient for software developers and operators to manage app lifecycles. We encourage you to give it a try and work with the community to make it better.

The post The Kubernetes Inner Loop with Cloud Foundry Korifi appeared first on The New Stack.

]]>