Artificial Intelligence News, Analysis and Resources - The New Stack

LLMs and Data Privacy: Navigating the New Frontiers of AI

Mark Hinkle — Wed, 27 Sep 2023 17:00:30 +0000

Large Language Models (LLMs) like ChatGPT are revolutionizing how we interact online, offering unmatched efficiency and personalization. But as these AI-driven tools become more prevalent, they bring significant concerns about data privacy to the forefront. With models like OpenAI’s ChatGPT becoming staples in our digital interactions, the need for robust confidentiality measures is more pressing than ever.

I have been thinking about security for generative AI lately. Not because I have tons of private data but because my clients do. I also need to be mindful of taking their data and manipulating it or analyzing it in SaaS-based LLMs, as doing so could breach privacy. Numerous cautionary tales exist already of professionals doing this either knowingly or unknowingly. Among my many goals in life, being a cautionary tale isn’t one of them.

Current AI Data Privacy Landscape

Despite the potential of LLMs, there’s growing apprehension about their approach to data privacy. For instance, OpenAI’s ChatGPT, while powerful, refines its capabilities using user data and sometimes shares this with third parties. Platforms like Anthropic’s Claude and Google’s Bard have retention policies that might not align with users’ data privacy expectations. These practices highlight an industry-wide need for a more user-centric approach to data handling.

The digital transformation wave has seen generative AI tools emerge as game-changers. Some industry pundits even compare their transformative impact to landmark innovations like the internet. The impact of the internet is likely to be just as great, if not greater. As the adoption of LLM applications and tools skyrockets, there’s a glaring gap: preserving the privacy of data processed by these models by securing the inputs of training data and any data the model outputs. This presents a unique challenge. While LLMs require vast data to function optimally, they must also navigate a complex web of data privacy regulations.

Legal Implications and LLMs

The proliferation of LLMs hasn’t escaped the eyes of regulatory bodies. Frameworks like the EU AI Act, General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) have set stringent data sharing and retention standards. These regulations aim to protect user data, but they also pose challenges for LLM developers and providers, emphasizing the need for innovative solutions that prioritize user privacy.

Top LLM Data Privacy Threats

In August, the Open Web Application Security Project (OWASP) released the Top 10 for LLM Applications 2023, a comprehensive guide to the most critical security risks to LLM applications. One such concern is training data poisoning. This happens when changes to data or process adjustments introduce vulnerabilities, biases, or even backdoors. These modifications can endanger the security and ethical standards of the model. To tackle this, confirming the genuineness of the training data’s supply chain is vital.

Using sandboxing can help prevent unintended data access, and it’s crucial to vet specific training datasets rigorously. Another challenge is supply chain vulnerabilities. The core foundation of LLMs, encompassing training data, ML models and deployment platforms, can be at risk due to weaknesses in the supply chain. Addressing this requires a comprehensive evaluation of data sources and suppliers. Relying on trusted plugins and regularly engaging in adversarial testing ensures the system remains updated with the latest security measures.

Sensitive information disclosure is another challenge. LLMs might unintentionally disclose confidential data, leading to privacy concerns. To mitigate this risk, it’s essential to use data sanitization techniques. Implementing strict input validation processes and hacker-driven adversarial testing can help identify potential vulnerabilities.

Enhancing LLMs with plugins can be beneficial but also introduce security concerns due to insecure plugin design. These plugins can become potential gateways for security threats. To ensure these plugins remain secure, it’s essential to have strict input guidelines and robust authentication methods. Continuously testing these plugins for security vulnerabilities is also crucial.

Lastly, the excessive agency in LLMs can be problematic. Giving too much autonomy to these models can lead to unpredictable and potentially harmful outputs. It’s essential to set clear boundaries on the tools and permissions granted to these models to prevent such outcomes. Functions and plugins should be clearly defined, and human oversight should always be in place, especially for significant actions.

Three Approaches to LLM Security

There isn’t a one-size-fits-all approach to LLM security. It’s a balancing act between how you want to interact with both internal and external sources of information and the users of those models. For example, you may want a customer-facing and internal chatbot to collate private institutional knowledge.

Data Contagion Within Large Language Models (LLMs)

Data contagion of Large Language Models (LLMs) is the accidental dissemination of confidential information via a model’s inputs. Given the intricate nature of LLMs and their expansive training datasets, ensuring that these computational models do not inadvertently disclose proprietary or sensitive data is imperative.

In the contemporary digital landscape, characterized by frequent data breaches and heightened privacy concerns, mitigating data contagion is essential. An LLM that inadvertently discloses sensitive data poses substantial risks, both in terms of reputational implications for entities and potential legal ramifications.

One approach to address this challenge encompasses refining the training datasets to exclude sensitive information, ensuring periodic model updates to rectify potential vulnerabilities and adopting advanced methodologies capable of detecting and mitigating risks associated with data leakage.

Sandboxing Technique LLMs

Sandboxing is another strategy to keep data safe when working with AI models. Sandboxing entails the creation of a controlled computational environment wherein a system or application operates, ensuring that its actions and outputs remain isolated and don’t make their way outside of the systems.

For LLMs, the application of sandboxing is particularly salient. By establishing a sandboxed environment, entities can regulate access to the model’s outputs, ensuring interactions are limited to authorized users or systems. This strategy enhances security by preventing unauthorized access and potential model misuse.

With over 300,000 plus models available on HuggingFace and exceptionally powerful large-language models readily available, it’s within reason for those enterprises that have the means to deploy their own EnterpriseGPT that can remain private.

Effective sandboxing necessitates the implementation of stringent access controls, continuous monitoring of interactions with the LLM and establishing defined operational parameters to ensure the model’s actions remain within prescribed limits.

Data Obfuscation Before LLM Input

The technique of “obfuscation” has emerged as a prominent strategy in data security. Obfuscation pertains to modifying original data to render it unintelligible to unauthorized users while retaining its utility for computational processes. In the context of LLMs, this implies altering data to remain functional for the model but become inscrutable for potential malicious entities. Given the omnipresent nature of digital threats, obfuscating data before inputting it into an LLM is a protective measure. In the event of unauthorized access, the obfuscated data, devoid of its original context, offers minimal value to potential intruders.

Several methodologies are available for obfuscation, such as data masking, tokenization and encryption. It is vital to choose a technique that aligns with the operational requirements of the LLM and the inherent nature of the data being processed. Selecting the right approach ensures optimal protection while preserving the integrity of the information.

In conclusion, as LLMs continue to evolve and find applications across diverse sectors, ensuring their security and the integrity of the data they process remains paramount. Proactive measures, grounded in rigorous academic and technical research, are essential to navigate the challenges posed by this dynamic domain.

OpaquePrompts: Open Source Obfuscation for LLMs

In response to these challenges, OpaquePrompts has recently been released on Github by Opaque Systems. It preserves the privacy of user data by sanitizing it, ensuring that personal or sensitive details are removed before interfacing with the LLM. By harnessing advanced technologies such as confidential computing and trusted execution environments (TEEs), OpaquePrompts guarantees that only the application developer can access the full scope of the prompt’s data. OpaquePrompts’s suite of tools is available on GitHub for those interested in delving deeper.

OpaquePrompts is engineered for scenarios demanding insights from user-provided contexts. Its workflow is comprehensive:

User Input Processing: LLM applications create a prompt, amalgamating retrieved-context, memory and user queries, which is then relayed to OpaquePrompts.
Identification of Sensitive Data: Within a secure TEE, OpaquePrompts utilizes advanced NLP techniques to detect and flag sensitive tokens in a prompt.
Prompt Sanitization: All identified sensitive tokens are encrypted, ensuring the sanitized prompt can be safely relayed to the LLM.
Interaction with LLM: The sanitized prompt is processed by the LLM, which then returns a similarly sanitized response.
Restoring Original Data: OpaquePrompts restores the original data in the response, ensuring users receive accurate and relevant information.

The Future: Merging Confidentiality with LLMs

In the rapidly evolving landscape of Large Language Models (LLMs), the intersection of technological prowess and data privacy has emerged as a focal point of discussion. As LLMs, such as ChatGPT, become integral to our digital interactions, the imperative to safeguard user data has never been more pronounced. While these models offer unparalleled efficiency and personalization, they also present challenges in terms of data security and regulatory compliance.

Solutions like OpaquePrompts are one of many that will come that exemplify how data privacy at the prompt layer can be a game-changer. Instead of venturing into the daunting task of self-hosting a Foundational Model, LLM focusing on prompt-layer privacy provides data confidentiality from the get-go without requiring the expertise and costs associated with in-house model serving. This simplifies LLM integration and reinforces user trust, underscoring the commitment to data protection.

It is evident that as we embrace the boundless potential of LLMs, a concerted effort is required to ensure that data privacy is not compromised. The future of LLMs hinges on this delicate balance, where technological advancement and data protection coalesce to foster trust, transparency and transformative experiences for all users.

The post LLMs and Data Privacy: Navigating the New Frontiers of AI appeared first on The New Stack.

Controlling the Machines: Feature Flagging Meets AI

Cody De Arkland — Tue, 26 Sep 2023 18:00:21 +0000

Have you ever stopped to consider how many movie plotlines would have been solved with a feature flag? Well, you probably haven’t — but since I spend most of my time working on different scenarios in which teams use feature flags to drive feature releases, it crosses my mind a lot. There are more than six Terminator movies, and if Cyberdyne had just feature-flagged Skynet, they could’ve killswitched the whole problem away! We could make the same analogies to The Matrix or any of a dozen other movies.

Cinema references aside, there are real translations of how these controlled release scenarios apply in the technology space. Artificial intelligence is ushering in a time of great innovation in software. What started with OpenAI and GPT-3 quickly accelerated to what seems like new models being released every week.

We’ve watched GPT-3 move to 3.5 and then to GPT-4. We’re seeing GPT-4’s 32K model emerge for larger content consumption and interaction. We’ve watched the emergence of Llama from Meta, Claude from Anthropic and BARD from Google — and that’s just the text-based LLMs. New LLMs are springing up for image creation, enhancement, document review and many other functions.

Furthermore, within each of these AI model domains, additional versions are being released as new capabilities are unlocked and trained in new ways. I can’t help but see the parallel to software development in the realm of AI models as well. These LLMs have their own software lifecycle as they are enhanced and shipped to users.

Each vendor has its own beta programs supporting segments of users being enabled for models. Product management and engineering teams are evaluating the efficacy of these models versus their predecessors and determining if they are ready for production. There are releases of these new models, in the same way you’d release a new piece of software, and along with that, there’s been rollbacks of models that have already been released.

LLMs as a Feature

Looking at the concept through that lens, it becomes easy to see the connection between AI models and the practice of feature flagging and feature management. We at LaunchDarkly talk a lot about controlling the experience of users, enabling things like beta programs or even robust context-based targeting with regard to features that are being released. The same concepts translate directly to the way users consume any AI model.

What if you wanted to enable basic GPT-3.5 access for the majority of your users, but your power users were entitled to leverage GPT-4, and your most advanced users were able to access the GPT-4-32K model that supports significantly longer character limits at a higher cost? Concepts like this are table stakes for feature flagging. Even Sam Altman at OpenAI talks about the availability of a killswitch concept that lives within GPT-4. Essentially, we’ve come full circle to The Terminator reference and he is advocating for a means to disable it if things ever got too scary.

Take the following JavaScript code sample as an example, from a NextJS 13.4-based application that simulates the ability to opt-in and opt-out of API models:

In this example, we’re getting the model from a LaunchDarkly feature flag, deciding what sort of token length to leverage based on the model selected and feeding that model into our OpenAI API call. This is a specific example leveraging the OpenAI API, but the same concept would translate to using something like Vercel’s AI package, which allows a more seamless transition between different types of AI models.

Within the application itself, once you log in, you’re presented with the option to opt-in to a new model as needed, as well as opt-out to return back to the default model.

Measuring the Model

As these models mature, we’ll want more ways to measure how effective they are against different vendors and model types. We’ll have to consider questions such as:

How long does a model take to return a valid response?
How often is a model returning correct information versus a hallucination?
How can we visualize this performance with data and use it to help us understand where the “right model to use when” is?
What about when we want to serve the new model to 50% of our users to evaluate against?

Software is in a constant state of evolution; this is something we’ve become accustomed to in our space, but it’s so interesting how much of it still relies on the same core principles. The software delivery lifecycle is still a real thing. Code is still shipped to a destination to run on, and released to users to consume. AI is no different in this case.

As we see the LLM space become more commoditized, with multiple vendors offering unique experiences, the tie-ins into concepts like CI/CD, feature flagging and software releases are only going to grow in frequency. The way organizations integrate AI into their product and ultimately switch models to gain better efficiency is going to become a practice the software delivery space will need to adopt.

At LaunchDarkly Galaxy 23, our user conference, I’ll be walking through a hands-on example of these concepts using LaunchDarkly to control AI availability in an application. It’ll be a session focused on hands-on experiences, showing live what this model looks like in a product. With any luck, we’ll build a solid foundation of how we can establish a bit more control over machines and protect ourselves from the ultimate buggy code, which results in the machines taking control. At minimum, I’ll at least show you how to write in a killswitch. =)

The post Controlling the Machines: Feature Flagging Meets AI appeared first on The New Stack.

Lifelong Machine Learning: Machines Teaching Other Machines

Kimberley Mok — Tue, 26 Sep 2023 16:00:20 +0000

For humans, learning happens over a lifetime as we gain, share and further develop skills that we’ve picked up along the way, and continuously adapt them to new situations. In contrast, we don’t think of machines as learning in quite such a collaborative way, and over the long term. However, new research into a subset of machine learning called lifelong learning (LL) suggests that machines are indeed capable of this human-like learning where it can learn and accumulate knowledge over time, and build upon it in order to adapt these skills to new scenarios.

Now, a team of researchers from the University of Southern California led by Laurent Itti and graduate student Yunhao Ge, have developed a tool that allows artificially intelligent agents to engage in this type of continuous and collective learning. In a recently published paper titled Lightweight Learner for Shared Knowledge Lifelong Learning, the researchers describe how their Shared Knowledge Lifelong Learning (SKILL) tool was able to help AI agents to each initially learn one of 102 different image recognition tasks, before sharing their know-how over a decentralized communication network with other agents. This collective transmission of knowledge then leads to all agents eventually mastering all 102 tasks — while still maintaining their previous knowledge of their initially assigned task.

“It’s like each robot is teaching a class on its specialty, and all the other robots are attentive students,” explained Ge in a statement. “They’re sharing knowledge through a digital network that connects them all, sort of like their own private internet. In essence, any profession requiring vast, diverse knowledge or dealing with complex systems could significantly benefit from [AI using] this SKILL technology.”

Avoiding ‘Catastrophic Forgetting’

Lifelong learning is a relatively new field in machine learning, where AI agents are learning continually as they come across new tasks. The goal of LL is for agents to acquire new knowledge of novel tasks, without forgetting how to perform previous tasks. This approach is different from the typical “train-then-deploy” machine learning, where agents cannot learn progressively without “catastrophic interference” (also called catastrophic forgetting) happening in future tasks, where the AI abruptly and drastically forgets previously learned information upon learning new information.

According to the team, their work represents a potentially new direction in the field of lifelong machine learning, as current work in LL involves getting a single AI agent to learn tasks one step at a time in a sequential way.

In contrast, SKILL involves a multitude of AI agents all learning at the same time in a parallel way, thus significantly accelerating the learning process. The team’s findings demonstrate when SKILL is used, the amount of time that is required to learn all 102 tasks is reduced by a factor of 101.5 — which could be a huge advantage when AI learning in a self-supervised manner is deployed in the real world.

“Most current LL research assumes a single agent that sequentially learns from its own actions and surroundings, which, by design, is not parallelizable over time and/or physical locations,” explained the team.

“In the real world, tasks may happen in different places. [..] SKILL promises the following benefits: speed-up of learning through parallelization; ability to simultaneously learn from distinct locations; resilience to failures as no central server is used; possible synergies among agents, whereby what is learned by one agent may facilitate future learning by other agents.”

‘Common Neural Backbone’

To create SKILL, the researchers took inspiration from neuroscience, in particular zeroing in on the theory of the “grandmother cell” or gnostic neuron — a hypothetical neuron that represents a complex but specific concept or object. This neuron is activated when the person senses or perceives that specific entity.

For the researchers, this theory of the grandmother cell was translated into their approach of designing lightweight lifelong learning (LLL) agents with a common, generic and pre-trained neural “backbone”, capable of tackling image-based tasks. As the team points out, this method enables “distributed, decentralized learning as agents can learn their own tasks independently”. Because it is also done in a parallel fashion, this technique also makes accelerated and scalable lifelong learning possible.

Diagram showing the design of the SKILL algorithm.

“Agents use a common frozen backbone and only a compact task-dependent ‘head’ module is trained per agent and task, and then shared among agents,” clarified the team. “This makes the cost of both training and sharing very low. Head modules simply consist of a classification layer that operates on top of the frozen backbone, and a set of beneficial biases that provide lightweight task-specific re-tuning of the backbone, to address potentially large domain gaps between the task-agnostic backbone and the data distribution of each new task.”

The researchers say that SKILL is similar to crowdsourcing, where a group of people share their skills and knowledge to find a common solution to a problem. They believe that machines could use a similar approach to become “comprehensive assistants” to aid human professionals in fields like medicine. In conjunction with other emerging fields of research like social intelligence for AI, other experts point out that lifelong machine learning could be crucial in developing artificial general intelligence (AGI).

Trust but Verify: To Get AI Right, Its Adoption Requires Guardrails

David DeSanto — Mon, 25 Sep 2023 16:00:29 +0000

Companies across all industries are at a pivotal moment in AI adoption. The policies we put into place, the strategies we create and the ways we shift our workflows to incorporate AI will help shape the future of business.

To responsibly adopt AI, organizations must look for ways to align it with their goals, while also considering what updates to security and privacy policies may be required. When implemented strategically, AI has the potential to augment functions across organizations, from software development to marketing, finance and beyond.

While many organizations rush to incorporate AI into their workflows, the companies that will experience the most success are those that take a measured, strategic approach to AI adoption. Let’s walk through some of the ways that organizations can set themselves up for success.

Taking a Privacy-First Approach

The use of AI requires guardrails to be in place for it to be implemented responsibly and sustainably — both for organizations and their customers.

A recent survey by GitLab shows that nearly half (48%) of respondents reported concern that code generated using AI may not be subject to the same copyright protection as human-generated code, and 42% of respondents worry that code generated using AI may introduce security vulnerabilities.

Without carefully considering how AI tools store and protect proprietary corporate, customer and partner data, organizations may make themselves vulnerable to security risks, fines, customer attrition and reputational damage. This is especially important for organizations in highly regulated environments, such as the public sector, financial services or health care that must adhere to strict external regulatory and compliance obligations.

To ensure that intellectual property is contained and protected, organizations must create strict policies outlining the approved usage of AI-generated code. When incorporating third-party platforms for AI, organizations should conduct a thorough due diligence assessment ensuring that their data, both the model prompt and output, will not be used for AI/ML model training and fine tuning, which may inadvertently expose their intellectual property to other organizations.

While the companies behind many popular AI tools available today are less than transparent about the source of their model-training data, transparency will be foundational to the longevity of AI. When models, training data, and acceptable use policies are opaque and closed to inspection, it makes it more challenging for organizations to safely and responsibly use those models.

Starting Small

To safely and strategically benefit from the efficiencies of AI, organizations can avoid pitfalls, including data leakage and security vulnerabilities, by first identifying where risk is the lowest in their organization. This can allow them to build best practices in a low-risk area first before allowing additional teams to adopt AI, ensuring it scales safely.

Organizational leaders can start by facilitating conversations between their technical teams, legal teams and AI-service providers. Setting a baseline of shared goals can be critical to deciding where to focus and how to minimize risk with AI. From there, organizations can begin setting guardrails and policies for AI implementation, such as employee use, data sanitization, in-product disclosures and moderation capabilities. Organizations must also be willing to participate in well-tested vulnerability detection and remediation programs.

Finding the Right Partners

Organizations can look to partners who can help them securely adopt AI and ensure they are building on security and privacy best practices. This will enable them to adopt AI successfully without sacrificing adherence to compliance standards, or risking relationships with their customers and stakeholders.

Concerns from organizations around AI and data privacy typically fall into one of three categories: what data sets are being used to train AI/ML models, how proprietary data will be used and whether proprietary data, including model output, will be retained. The more transparent a partner or vendor is, the more informed an organization can be when assessing the business relationship.

Developing Proactive Contingency Plans

Finally, leaders can create security policies and contingency plans surrounding the use of AI and review how AI services handle proprietary and customer data, including the storage of prompts sent to, and outputs received from, their AI models.

Without these guardrails in place, the resulting consequences can seriously affect the future adoption of AI in organizations. Although AI has the potential to transform companies, it comes with real risks — and technologists and business leaders alike are responsible for managing those risks responsibly.

The ways in which we adopt AI technologies today will affect the role that AI plays moving forward. By thoughtfully and strategically identifying priority areas to incorporate AI, organizations can reap the benefits of AI without creating vulnerabilities, risking adherence to compliance standards, or risking relationships with customers, partners, investors, and other stakeholders.

The post Trust but Verify: To Get AI Right, Its Adoption Requires Guardrails appeared first on The New Stack.

Intel Looks to Muscle Its Way to AI Dominance

Jeffrey Burt — Mon, 25 Sep 2023 14:27:29 +0000

Back in the day, Intel used its size, engineering muscle, and sizeable financial resources to run roughshod over smaller competitors and seize dominant shares of the client and server processor markets.

However, a few years of missed deadlines and questionable strategic decisions, shifts in computing, and increased competition from a resurgent AMD, Nvidia, and an array of AI-focused startups took the sheen off Intel’s armor, making the one-time giant look vulnerable.

Returning to the company in early 2021 as CEO, Pat Gelsinger promised to put Intel back on the right track, creating an ambitious product roadmap, ramping its manufacturing capabilities, and putting a greater emphasis on developers with a software-first approach, personified by his hiring of Greg Lavender, the former VMware executive who took over as senior vice president, CTO, and general manager of the company’s Software and Advanced Technology Group.

The chip-making giant — along with much of the rest of the IT industry — soon turned much of its efforts to AI and machine learning field, a fast-growing space that only accelerated with the rapid adoption of generative AI and large-language models after the release in late last year of OpenAI’s ChatGPT chatbot.

All that was on display at Intel’s two-day Innovation 2023 developer conference in San Jose, California, last week, with Gelsinger, Lavender, and other executives positioning the company as the only player with the silicon and open ecosystem chops to address the needs of AI developers.

“This developer community is the catalyst for driving the transformation and deep technical innovations that we’re doing to create fundamental change across industries with Intel hardware and software,” Lavender said in his day-two keynote. “There’s no other field seeing such deep, rapid innovation than the field of artificial intelligence right now, especially around generative AI and large-language models.”

He added that “the key to leveling the playing field across all AI developers is a strategy that is built on open systems and open ecosystems.”

Central to much of this is Intel’s Developer Cloud, which was first introduced at last year’s show and is now generally available. Through the Developer Cloud, AI programmers will get access to an array of Intel’s chips and applications, including early access to upcoming technologies.

Infrastructure Lays the AI Groundwork

In his own keynote the day before, Gelsinger gave a detailed rundown about those systems and the Intel chips that will power many of them and will form the computing foundation for AI developers.

“AI is representing a generational shift in how computing is used and giving rise to the ‘Siliconomy,’” the CEO said. “But inside of that, a simple rule: developers rule. You run the global economy. … [AI development] requires a range of different capabilities, next-generation CPUs, NPUs [neural processing units], GPUs, chiplets, new interconnects, and specialized accelerators, and our commitment to you is to give you the coolest hardware and software ASAP. And we will do that.”

He said this promise of four processor nodes in five years is still on track, proving that Intel can once again deliver quality products on time. It’s an achievement that can’t be taken lightly, according to Patrick Moorhead, chief analyst at Moor Insights and Strategies, writing on X (nee Twitter), “This is the most important metric I track on the degree of Intel’s future success.”

“Design and software [are] vital, of course, but without five nodes in four years … the company never be successful,” Moorhead continued “It’s the first question I ask Pat Gelsinger about, every time we meet. Why do I say that? Without IFS [Intel Foundry Services], Intel won’t be cost or tech-competitive through investment scale, and a successful IFS needs five in four.”

For its Xeon data center chips, Gelsinger said the “Emerald Rapids” processor — a follow-on to the current fourth generation “Sapphire Rapids” processors — will be released Dec. 14, but it was the fifth generation Xeons that generated buzz. It will be the first generation to feature the vendor’s P-core (performance) and E-core (efficient) layouts to address different workloads.

“Granite Rapids” will be the P-core chip, coming out next year and offering as much as three times the performance for AI workloads over Sapphire Rapids. A little earlier in 2024, Intel will roll out “Sierra Forest,” the E-core chip that the company said would come with 144 cores. However, Gelsinger said Intel was able to create a version of the processor with another 144-core chiplet, bringing the total number of cores to 288.

“Clearwater Forest,” another E-core Xeon, will arrive in 2025.

The Coming AI PCs

Intel also will use its upcoming Core Ultra “Meteor Lake” client chips for PCs that will be able to run AI inferencing workloads on the device. The Core Ultra, which will launch December 14, is a chiplet design with a CPU and GPU and that also will include an integrated NPU power-efficient AI accelerator.

Developers and users will be able to work with AI workloads locally, ensuring greater data security and privacy, a growing concern with ChatGPT and similar tools.

The CEO also spoke about the 2023.1 release of its OpenVINO toolkit distribution for AI inferencing for developers on client and edge platforms. It includes pre-trained models that can integrate with generative AI models like Meta’s Llama 2. There also is Project Strata, which will result in an edge-native software platform that will launch next year to enable infrastructure to scale for the intelligent edge and hybrid AI.

Lavender stressed the need for open ecosystems to ensure widespread adoption of AI and said Intel’s open strategy will draw developers away from competitors like Nvidia. He noted that 4,000 of Intel’s Gaudi 2 GPUs and its fifth-generation Xeons will be used in an upcoming massive AI supercomputer from Stability AI.

OpenAPI Leads the Way

He also noted the rapid adoption of Intel’s OneAPI open programming model, seeing an 85% uptake since 2021. In addition, OneAPI — which touches CPUs, GPUs, FPGAs, and accelerators — will be the basis of the Linux Foundation’s Unified Acceleration (UXL) Foundation to create an open standard for accelerator programming. The group’s founders include Intel, Arm, Fujitsu, Google Cloud, and Qualcomm.

Intel is contributing its OneAPI specification to the UXL Foundation.

In addition, the chip maker is working with Red Hat, Canonical, and SUSE to develop Intel-optimized enterprise software distributions optimized for Intel technologies, while CodePlay, a software company Intel bought last year, is rolling out multiplatform plug-ins in OneAPI for GPUs from Nvidia, AMD, and Intel. Using the OneAPI plug-in for Nvidia will enable developers to run the Khronos Groups SYCL programming models on the vendor’s GPUs, Lavender said.

“This is a major milestone for the OneAPI ecosystem and the developer community, creating a viable migration path from [Nvidia’s] CUDA and other proprietary programming models to SYCL and OneAPI, enabling AI programming everywhere,” he said.

“The CUDA ecosystem is being disrupted due to the generative AI revolution and the importance of higher-level programming extractions using frameworks such as OpenAI’s Triton, Google’s Jax, Modular AI’s Mojo, and Julia for scientific computing. More is coming. The rapid rate of innovation of AI technologies is creating new disruptions to the status quo, freeing the developer from proprietary lock-in. This is important to the future of everyone and getting AI adopted everywhere.”

The post Intel Looks to Muscle Its Way to AI Dominance appeared first on The New Stack.

Using Real-Time Data to Unify Generative and Predictive AI

Rahul Pradhan — Mon, 25 Sep 2023 13:17:13 +0000

In the age of data-driven decision-making, the role of artificial intelligence (AI) has never been more pivotal. From predicting stock market trends to generating personalized content for users, AI models are at the forefront of innovation. However, the efficacy of these models is deeply tied to the quality and timeliness of the data they consume.

The Challenge of Stale Data and the Impact on Predictive Outcomes and Illusion of Accuracy

The adage “garbage in, garbage out” holds true in the realm of AI. When models are trained or fed with incomplete, biased or outdated information, the predictive outcomes suffer. For instance, in financial markets where conditions change in milliseconds, relying on stale data can result in missed opportunities or even financial losses. Outdated data can give the illusion of accuracy. Models may show high confidence in their predictions, but these predictions are based on a reality that no longer exists.

The implications of stale data are far-reaching:

Business decisions: In sectors like finance, health care and retail, decisions based on outdated information can lead to significant financial losses or missed opportunities.
Safety concerns: In critical applications like autonomous driving or medical diagnostics, stale data can be a matter of life and death.
Consumer experience: For customer-centric services like recommendation engines or personalized marketing, outdated predictions can lead to a decline in user engagement and satisfaction.

The Enigma of Hallucinations in Foundational Models

Foundational models are incredibly powerful but are not immune to generating content that is either nonsensical or factually incorrect — a phenomenon known as “hallucinations.” These hallucinations occur because the model is drawing from a static dataset that may not have the most current or contextually relevant information.

Reducing Hallucinations and Improving Accuracy and Relevance with Real-Time Data

Integrating real-time data into the AI pipeline can significantly reduce the occurrence of hallucinations. When the model has access to the most current data, it can generate predictions or content that is contextually relevant.

Real-time data ensures that the model’s predictions are aligned with the freshest data. This is crucial for businesses if they want to leverage the complete power of AI to drive decision-making and move to the high-value predictive use cases that AI can unlock.

The Role of Databases for Real-Time AI

The foundation for creating hyper-contextualized and personalized experiences for generative AI-enriched applications is in the organizations’ system of records and truth. Real-time data is an integral component of this real-time AI application stack, and it is imperative to have operational databases tightly integrated into the AI pipeline. This ensures a seamless flow of real-time data into the models, enabling them to adapt to changing conditions instantaneously.

In order to build these experiences, developers need a highly performant, multimodal database platform that can efficiently store, manage and query unstructured data. They need a long-term memory layer for LLMs that enables the augmentation of context with conversational and context history with real-time data, and to enable that with the ability to store and search for data in the LLM native format — that of high-dimensional mathematical vectors. The key to giving foundational models long-term memory is a highly available database capable of storing and querying unstructured data. Such databases can hold vast amounts of information and make it readily available for the model, thereby acting as the model’s “memory.”

A multimodal database platform is well suited to be that data platform for real-time AI applications. It can seamlessly combine operations and transactional, analytical and semantic stores with integrations across open source LLM platforms and cloud providers to accelerate the journey for developers to build the next generation of applications.

The integration of real-time data into generative and predictive AI models is not just a technical upgrade; it’s a paradigm shift. As we move toward an increasingly dynamic world, the ability of AI to adapt and provide accurate, timely insights will be the cornerstone of effective decision-making. By addressing the challenges of stale data and hallucinations, we can unlock the true potential of AI, making it an invaluable asset in our data-driven future.

Couchbase introduced generative AI capabilities into its Database as a Service Couchbase Capella to significantly enhance developer productivity and accelerate time to market for modern applications. For more information about Capella iQ, and to sign up for a private preview, please visit here or try Couchbase for yourself today with our free trial here.

The post Using Real-Time Data to Unify Generative and Predictive AI appeared first on The New Stack.

Dev News: Svelte 5, AI Bot for Android Studio, and GitHub Tools

Loraine Lawson — Sat, 23 Sep 2023 11:00:22 +0000

Rich Harris offered a preview of Svelte 5 in a recent blog post and video. What’s new? Harris introduced a new way to handle reactivity in Svelte called Runes.

Reactivity is a programming concept in which data update based on its dependencies, as software engineer Tom Smykowski demonstrated in this blog post.

Some developers on Twitter have compared it to React’s hooks. Smykowski observed that each framework handles reactivity a little bit differently and compared Runes to Angular’s Signals and React’s use of an “explicit list of dependencies to handle fine-grained reactive updates.”

A release date for Svelte 5 has not been set, Harris added.

Google’s Release of Studio Bot to Android Studio

Google released its AI-powered coding assistant, Studio Bot, in the Android Studio canary build and made it available to more than 170 countries — although it’s still designed to be used in English. Studio Bot understands natural language and is so far just designed to be used in English.

“You can enter your questions in Studio Bot’s chat window ranging from very simple and open-ended ones to specific problems that you need help with,” the press release explained.

It remembers the context so that you can ask follow-up questions, e.g., “Can you give me the code for this in Kotlin” or “Can you show me how to do it in Compose.” Developers don’t need to send in source code to use Studio Bot.

“By default, Studio Bot’s responses are purely based on conversation history, and you control whether you want to share additional context or code for customized responses,” Google stated.

That said, Studio Bot is still a work in progress, so Google recommends validating its response before using it in a production app.

GitHub Launches Innovation Graph, Adds Atlassian Migration Support

GitHub on Thursday launched its GitHub Innovation Graph, an open data and insights platform on the global and local impact of developers.

The Innovation Graph includes longitudinal metrics on software development for economies around the world. The website and repository provides quarterly data dating back to 2020 on git pushes, developers, organizations, repositories, languages, licenses, topics, and economy collaborators. The platform offers a number of data visualizations, and the repository outlines the methodology. Data for each metric is available to download.

“In research commissioned by GitHub, consultancy Tattle found that researchers in the international development, public policy, and economics fields were interested in using GitHub data but faced many barriers in obtaining and using that data,” the company said in a news release. “We intend for the Innovation Graph to lower those barriers. Researchers in other fields will also benefit from convenient, aggregated data that may have previously required third-party data providers if it was available at all.”

Graph created by GitHub Innovation Graph

GitHub also announced this week its adding support for migrations to two tools: GitHub Enterprise Importer now supports customers using BitBucket Server and Bitbucket Data Center, and GitHub Actions Importer can now help developers pivot off Atlassian’s CI/CD products.

GitHub Actions Importer eliminates the manual process of CI migrations and automates the evaluation and testing of the CI migration of nearly a quarter million pipelines, the company said in a statement. GitHub Actions Importer allows developers to move from any of Atlassian’s CI/CD products — Bitbucket, Bamboo Server, and Bamboo Data Center — to GitHub Actions. After Feb. 15, 2024, Atlassian will no longer offer technical support, security updates or vulnerability fixes for their Server products like Bitbucket Server and Bamboo Server, according to GitHub.

DockerCon 2023 Runs Oct. 3-5

DockerCon is back with both live and virtual options this year Oct. 3-5. The live conference is at the MagicBox in Los Angeles and runs Wednesday and Thursday. Tuesday is a workshop day, which is an additional add-on. The virtual ticket includes the live keynotes and select educational sessions.

Topics to be covered during the conference include:

Web Application
Web Development
Building and Deploying Applications
Secure Software Delivery
Innovation and Agility
Open Source
Emerging Trends

Vercel Launches Serverless Storage System

On Monday, frontend cloud development platform Vercel launched a public beta of Vercel Blob, its serverless storage system.

Blob stands for binary large objects and are typically images, audio or other multimedia objects. Sometimes binary executable code is stored as a blob as well. Vercel Blob allows Vercel Pro users to store and retrieve any file with an intuitive, promise-based API.

Designed for the JavaScript and TypeScript frameworks, Vercel Blob allows developers to store and retrieve any file. During its four-month private beta, Vercel created 50,000 blob stores. Users with a Vercel account can have multiple blob stores in a project. Also, each blob store can be accessed by multiple Vercel projects. Vercel Blob URLs are publicly accessible, created with an unguessable random ID, and immutable.

There are plans to support making a Blob private in an upcoming release

Free Software Development Course, Coding Labs

LinkedIn Learning is collaborating with Coder Pad to offer 33 new software development courses and interactive coding exercises for free through Dec. 18. Coders can learn about six languages — Python, JavaScript, Go, SQL, Java and C++. There are six new programming essential courses, which covers the basics of a language, as well as 18 new coding labs or practice environments to hone programming skills in these languages, and nine new advanced courses focused primarily on advanced techniques in the six languages, plus one course on building a generative language model from scratch.

Gradle Changes Name

Developer build tool Gradle Enterprise will now be called Develocity. The reason for this name change is that Gradle, Inc., found the original name created a misconception that Gradle Enterprise was only for the Gradle Build Tool when it actually supports both the Gradle Build Tool and the Apache Maven build system.

The company also recently announced that Develocity supports the Bazel build system, which is an open source project hosted by Google. The company also released beta-level support for sbt, the open source build system popular with the Scala language developer community. The roadmap for Develocity includes plans to support additional software development ecosystems.

The post Dev News: Svelte 5, AI Bot for Android Studio, and GitHub Tools appeared first on The New Stack.

Open Source Can Deflate the ‘Threat’ of AI

B. Cameron Gain — Fri, 22 Sep 2023 13:44:26 +0000

BILBAO, SPAIN — AI should not only be restricted, controlled, and locked down, but developers working with generative language models underpinning this revolution should rely on open source to ultimately allow for a positive outcome that we can only dream about today.

Of course, there are many naysayers for this assumption, and the examples are many, ranging from politicians with different agendas to frightened public members and other parties, some of whom could have good or bad intentions.

Open source will help developers achieve great things. Things will change radically, yes. But we need to rely on open source for this fascinating road ahead. That was my takeaway from Foundation Executive Director @jzemlin’s keynote. #OSSummit @linuxfoundation @thenewstack pic.twitter.com/ii6fTRP6jV

— BC Gain (@bcamerongain) September 20, 2023

As Jim Zemlin, the Linux Foundation‘s executive director, referenced in his Open Source Summit Europe keynote, Elon Musk was one of over a thousand signers to express his fear of the revolution getting out of control when, in an open letter a few weeks ago, Musk and others proposed a six-month moratorium on AI, beyond which was released by OpenAI with ChatpGPT.

Not to downplay how AI models are already often biased and do not take diversity into account, representing very real risks and potential tragic outcomes for today and tomorrow, ill-founded reactions to fears of what could go wrong are numerous.

The naysayers, as someone said, or start over. Zemlin offered a number of substantive reasons and historical examples involving hip cryptography of why attempting to lock down LLM could potentially be a costly mistake.

“Recently, we’ve heard from different people around the world, largely folks that already have a lot of capital, a lot of GPUs, and good foundation models that we need to take a six-month pause until we’ve figured it out. We’re even hearing calls from folks who are saying, hey, this large language models technology and advanced AI technology is so powerful that in 20 years in the hands of individual actors, people could do terrible things, such as create violent weapons, massive cyberattacks and so forth,” Zemlin said.

“And what I’m telling you today is that kind of fear and that kind of concern that the availability of open source large language models would create some terrible outcome simply isn’t true. That open source always creates sunshine, and that fear as a counterbalance around the code, because it’s not just bad things people do with large language models it is good things too, like discovering advanced drugs, helping manufacturing to become more efficient, using large language models to create more environmentally friendly building construction. Like for every action, there can be a reaction, and we’re already seeing open source immediately start to tackle some of these things people are concerned about when it comes to AI.”

The post Open Source Can Deflate the ‘Threat’ of AI appeared first on The New Stack.

Tech Works: When Should Engineers Use Generative AI?

Jennifer Riggins — Fri, 22 Sep 2023 12:00:26 +0000

Your developers are already playing around with generative AI. You can’t stop them completely and you probably don’t want to, lest they fall behind the curve. After all, you want your developers focusing on meaningful work, and Large Language Model (LLM)-trained code-completion tools like Amazon Web Services’ CodeWhisperer and GitHub’s Copilot have great potential to increase developer productivity.

But, if you don’t have a generative AI policy in place, you’re putting your organization at risk, potentially harming your code quality and reliability.

ChatGPT’s code is inaccurate more often than not, according to an August study by Purdue University researchers. Yet more than 80% of Fortune-500 companies have accounts on it. You could also be putting your reputation on the line. Just look at Samsung, which recently had an accidental leak of sensitive internal source code by an engineer into ChatGPT, which sparked a blanket ban on generative AI assistants. That’s probably a reasonable short-term response, but it lacks long-term vision.

In order to take advantage of this productivity potential, without the PR pitfalls, you have to have a clearly communicated generative AI policy for engineering teams at your organization.

For this edition of Tech Works, I talked to engineering leadership who adopted GenAI early to help you decide how and when to encourage your software engineers to use generative AI, and when to deter them from leveraging chatbots and risking your organization’s privacy and security.

Consumer vs. Enterprise Generative AI Tools

There are many generative AI tools out there — CodeWhisperer, Google’s Bard, Meta AI’s LLaMA, Copilot, and OpenAI’s ChatGPT. But thus far, it’s the latter two that have gotten the buzz within engineering teams. Deciding which GenAI tool to use comes down to how you’re using it.

“People are just dropping stuff in ChatGPT and hoping to get the right answer. It’s a research tool for OpenAI you’re using for free. You’re just giving them free research,” Zac Rosenbauer, CTO and co-founder of a developer documentation platform company Joggr, told The New Stack. (By default, ChatGPT saves your chat history and uses the conversations to further train its models.)

Rosenbauer then showed me a series of slides to explain how an LLM works, which comes off as more guessing the probability of a word to fill in Mad Libs than going for the most accurate response. “That’s why you get really stupid answers,” he said. “Because it’s going to try to just answer the question no matter what.”

Public LLMs are trained to give an answer, even if they don’t know the right one, as shown by the Purdue study that found 52% of code written by ChatGPT is simply wrong, even while it looks convincing. You need to explicitly tell a chatbot to only tell you if it knows the right answer.

Add to this, the very valid concern that employees from any department are copy-pasting personally identifiable information or private company information into a public tool like ChatGPT, which is effectively training it on your private data.

It’s probably too soon for any teams to have gained a competitive edge from the brand-new ChatGPT Enterprise, but it does seem that, due to both quality and privacy concerns, you want to steer your engineers away from regular ChatGPT for a lot of their work.

“The first thing we say to any company we deal with is to make sure you’re using the right GenAI,” said James Gornall, cloud architect lead at CTS, which is focused on enabling Google customer business cases for data analytics, including for Vertex AI, the generative AI offering within an enterprise’s Google Cloud perimeter. “There’s enterprise tooling and there’s consumer tooling.”

“Every company now has GenAI usage and people are probably using things that you don’t know they’re using.”

— James Gornall, CTS

ChatGPT may be the most popular, but it’s also very consumer-focused. Always remind your team: just because a tool is free, doesn’t mean there isn’t a cost for using it. That means never putting private information into a consumer-facing tool.

“No business should be doing anything in Bard or ChatGPT as a strategy,” Gornall told The New Stack. Free, consumer-facing tools are usually harmless at the individual level, but, “the second you start to ask it anything around your business, strategy approach or content creation” — including code — “then you want to get that in something that’s a lot more ring-fenced and a lot more secure.”

More often than not, generative AI benefits come from domain specificity. You want an internal developer chatbot to train on your internal strategies and processes, not the whole world.

“Every company is now kind of a GenAI company. Whether you like it or not, people are probably going to start typing in the questions to these tools because they’re so easy to get a hold of,” Gornall said.

“You don’t even need a corporate account or anything. You can register for ChatGPT and start copying and pasting stuff in, saying ‘Review this contract for me’ or, in Samsung’s case, ‘Review this code,’ and, invariably, that could go very badly, very, very quickly.”

You not only increase privacy and security by staying within your organizational perimeters, you increase your speed to value.

GenAI “can save a lot of time; for example, generating documents or generating comments — things that developers generally hate doing. But other times, we will try using this and it’ll actually take us twice as long because now we’re having to double-check everything that it wrote.”

— Ivan Lee, Datasaur

Don’t use a consumer-facing GenAI tool for anything that is very proprietary, or central to how your business operates, advised Karol Danutama, vice president of engineering at Datasaur. But, if there is something that is much more standardized where you could imagine 100 other companies would need a function just like this, then he has advised his team to feel more comfortable using LLMs to suggest code.

Don’t forget to factor in ethical choices. A company-level AI strategy must cover explainability, repeatability and transparency, Gornall said. And it needs to do so in a way that’s understood by all stakeholders, even your customers.

Context Is Key to Developer Flow State

You will always gain more accuracy and speed to value if you are training an existing LLM within the context of your business, on things like internal strategies and documentation. A context-driven chatbot — like the enterprise-focused Kubiya — needs to speak to the human content creator, and hopefully speed up or erase the more mundane parts of developers’’ work. Early engineering use cases for generative AI include:

Creating code snippets.
Generating documentation and code samples.
Creating functions.
Importing libraries.
Creating classes.
Generating a wireframe.
Running quality and security scans
Summarizing code.

It has the potential to “really get rid of a lot of the overhead of the 200 characters you have to type before you start on a line of code that means anything to you,” Gornall said. You still have to manually review it for relevance and accuracy within your context. “But you can build something real by taking some guidance from it and getting some ideas of talking points.”

For coding, he said, these ideas may or may not be production-ready, but generative AI helps you talk out how you might solve a problem. So long as you’re using an internal version of GenAI, you can feed in your coding standards, coding styles, policy documents and guideline templates into the chatbot. It will add that content to its own continuous improvement from external training, but keep your prompts and responses locked up.

“You can scan your entire codebase in a ridiculously quick amount of time to say, ‘Find me anything that doesn’t conform to this,’ or ‘Find me anything that’s using this kind of thing that we want to deprecate,’” Gornall said.

Don’t close off your dataset, he advised. You need to continue to train on third-party data too, lest you create an “echo chamber within your model where, because you’re just feeding it your wrong answers, it is going to give you wrong answers.” With the right balance of the two, you can maintain control and mitigate risk.

Generative AI for Documentation

One of the most in-demand productivity enablers is documentation. Internal documentation is key to self-service, but is usually out of date — if it even exists at all — and difficult to find or search.

Add to that, documentation is typically decoupled from the software development workflow, triggering even more context switching and interrupted flow state to go to Notion, Confluence or an external wiki to look something up.

“If you know about developers, if it’s not in their [integrated development environment], if it’s not in their workflow, they will ignore it,” Rosenbauer said.

This makes docs ripe for internal generative AI.

“We felt that developer productivity recently had suffered because of how much was asked to do,” Rosenbauer said. “The cognitive load of the developer is so much higher than it was, in my opinion, 10 or 15 years ago, even with a lot more tooling available.”

“Generative AI is not helping the core current role of an engineer, but it’s getting rid of a lot of the noise. It’s getting rid of a lot of the stuff that can take time but not deliver value.”

—James Gornall, CTS

He reflected on why he and Seth Rosenbauer, his brother and Joggr co-founder quit their jobs as engineering team leads just over a year ago.

For example, Zak Rosenbauer noted, “DevOps, though well-intended, was very painful for a lot of non-DevOps software engineers. Because the ‘shift left’ methodology is important — I think of it as an empowering thing — but it also forces people to do work they weren’t doing before.”

So the Rosenbauers spent the following six months exploring what had triggered that dive in developer productivity and increase in cognitive load. What they realized is that the inadequacy or non-existence of internal documentation is a huge culprit.

As a result, they created Joggr, a generative AI tool — one that “regenerates content,” Zac Rosenbauer said. One of the company’s main focuses is automatically regenerating code snippets to maintain documentation, descriptions, portions of text, links to code and more. About a third of Joggr’s customers currently are working in platform engineering and they expect that practice to grow.

Will GenAI Take Jobs Away?

“The question we get asked quite a lot is: Is it taking our jobs? I don’t think so. I think it’s changing people’s jobs and people will do well to learn how to work with these things and get the most out of them, but I think it is still very early days,” Gornall said.

“Generative AI is not helping the core current role of an engineer, but it’s getting rid of a lot of the noise. It’s getting rid of a lot of the stuff that can take time but not deliver value.”

It is unlikely that the rate of development and adoption of generative AI will slow down, so your organization needed a GenAI policy yesterday. And it must include a plan to train engineers about it.

Just like his search engine native generation learned with the help of Google and StackOverflow, Ivan Lee, CEO and founder of Datasaur, believes that the next-gen CompSci grads will be asking ChatGPT or Copilot. Everyone on a team will have to level up their GenAI knowledge. Don’t forget, identifying flaws in other people’s code is a key part of any engineering job — now you just have to apply that skill to machine-written code, too.

Lee added, “We need to be very careful about knowing how to spot check, understanding the strengths of this technology and the weaknesses.”

The post Tech Works: When Should Engineers Use Generative AI? appeared first on The New Stack.

Don’t Listen to a Vendor about AI, Do the DevOps Redo

Alex Williams — Thu, 21 Sep 2023 15:42:08 +0000

Don’t listen to a vendor about AI, says John Willis, a well-known technologist and author in the latest episode of The New Stack Makers.

“They’re going to tell you to buy the one size fits all,” Willis said. It’s like going back 30 to 40 years ago and saying, ‘Oh, don’t learn how to code Java, you’re not going to need it — here, buy this product.'”

Willis said that DevOps provides an example of how human capital solves problems, not products. The C-level crowd needs to learn how to manage the AI beast and then decide what to buy and not buy. They need a DevOps redo.

One of the pioneers of the DevOps movement, Willis said now is a time for a “DevOps redo.” It’s time to experiment and collaborate as companies did at the beginning of the DevOps movement.

“If you look at the patterns of DevOps, like the ones who invested early, some of the phenomenal banks that came out unbelievably successful by using a DevOps methodology,” Willis said. “They invested very early in the human capital. They said let’s get everybody on the same page, let’s run internal DevOps days.”

Just don’t let it sort of happen on its own and start buying products, Willis said. The minute you start buying products is the minute you enter a minefield of startups that will be gone soon enough or will get bought up by large companies.

Instead, people will need to learn how to manage their data using techniques such as retrieval augmentation, which provides ways to fine-tune a larger language model, for example, with a vector database.

It’s a cleansing process, Willis said. Organizations will need cleansing to create robust data pipelines that keep the LLMs from hallucinating or giving up code or data that a company would never want to let an LLM provide to someone. We’re talking about the danger of giving away code that makes a bank billions in revenues or the contract for a superstar athlete.

For a company of any scale, the coding gets fun again when done right for a company using LLMs at scale with some form of retrieval augmentation.

Getting it right means adding some governance to the retrieval augmentation model. “You know, some structuring, ‘can you do content moderation?'” Are you red-teaming the data? So these are the things I think will get really interesting that you’re not going to hear vendors tell you about necessarily; vendors are going to say, ‘We’ll just pop your product in our vector database.'”

The post Don’t Listen to a Vendor about AI, Do the DevOps Redo appeared first on The New Stack.

Oracle Introduces New App Analytics Platform, Enhances Analytics Cloud

Andrew Brust — Thu, 21 Sep 2023 13:47:03 +0000

At its Oracle CloudWorld conference in Las Vegas this week, Oracle is introducing a range of new analytics capabilities. In addition to its core Oracle Database, MySQL and MySQL HeatWave businesses, Oracle focuses on analytics and applications. As such, the new analytics capabilities it is announcing accrue to both its Oracle Analytics Cloud (OAC) platform as well as the value-added functionality for Oracle applications that run atop that platform.

A Full Data Intelligence Platform

It’s with respect to the latter that Oracle is announcing the new Fusion Data Intelligence Platform. This new service is an evolution of the Fusion Analytics platform that preceded it, but in addition to Fusion Analytics’ semantic models that are defined and materialized in Oracle Analytics Cloud, the new service includes 360-degree data models, analytic artifacts, AI and BI models and pre-built intelligent apps.

Those pre-built apps bring in data models, ML models and analytics, designed to be accessible to people who don’t currently use self-service BI, and prefer to stay a level of abstraction above it. Oracle demoed a “Supply Chain Command Center” application as an example. It was a full-blown browser-based application with BI and AI capabilities already implemented and built in.

External Data too, All in the Lakehouse

Like Fusion Analytics, Fusion Data Intelligence Platform is not an island. For example, it will allow the addition of external data and will link to the likes of Salesforce, LinkedIn, and other external services with business-relevant data. On the Oracle applications side, Fusion Data Intelligence Platform will tie into Oracle Netsuite, Oracle Health and Oracle Industries applications. Fusion Data Intelligence Platform also integrates with, and includes an instance of, OAC, which Fusion Analytics did as well.

All data will land in a single Oracle Cloud Infrastructure (OCI) data lakehouse with a semantic model, ML models, etc. and OAC tie-ins. Though the lakehouse will feature a single model, it will be broken into multiple “subject areas” for specific target audiences.

OAC Gets AI

It’s not only at the Fusion Data Intelligence Platform level where Oracle has added AI capabilities. After all, Fusion Data Analytics Platform is a layer above OAC, where Oracle has added AI capabilities as well.

OAC now has an Analytics Assistant, offering a chatbot interface on your data, with links to public data via ChatGPT. In partnership with Synthesia, the Assistant features avatars that can act as “news readers” to deliver data stories verbally to business decision-makers.

AI-Powered Document Understanding can scan JPEG and PDF files — and extract values and context. One example mentioned by Oracle, for applying this in practice, was the reading of individual receipt images to ensure their totals match the data in expense reports.

Narratives, Teams Integration, and the Business User Strategy

Contextual Insights implements natural language generation to provide narratives of users’ data. It’s similar in concept to Power BI’s Smart Narratives and Data Stories/narratives in Tableau. OAC now also integrates with Microsoft Teams, letting users bring OAC dashboards, visualizations, and insights into Teams channel chats. The functionality provided is similar to the previously introduced integration of OAC with Slack.

The range of capabilities added to Oracle’s Analytics platform should greatly benefit Oracle Applications customers. While customers might think of Power BI or Tableau when the subject of analytics comes up, Oracle is making it unnecessary to bring in third-party platforms when it comes to AI- and BI-driven insights on its applications’ data. Its goal is to go beyond self-service analytics and instead just surface analytics capabilities in business users’ tools. Clearly, Oracle is delivering in that area.

The post Oracle Introduces New App Analytics Platform, Enhances Analytics Cloud appeared first on The New Stack.

Generative AI: A New Tool in the Developer Toolbox

Keshav Murthy — Tue, 19 Sep 2023 17:48:58 +0000

Developers craft software that both delights consumers and delivers innovative applications for enterprise users. This craft requires more than just churning out heaps of code; it embodies a process of observing, noticing, interviewing, brainstorming, reading, writing and rewriting specifications; designing, prototyping and coding to the specifications; reviewing, refactoring and verifying the software; and a virtuous cycle of deploying, debugging and improving. At every stage of this cycle, developers consume and generate two things: code and text. Code is text, after all.

The productivity of the developers is limited by real-world realities, challenges with timelines, unclear requirements, legacy codebase and more. To overcome these obstacles and still meet the deadlines, developers have long relied on adding new tools to their toolbox. For example, code generation tools such as compilers, UI generators, ORM mappers, API generators, etc. Developers have embraced these tools without reservation, progressively evolving them to offer more intelligent functionalities. Modern compilers do more than just translate; they rewrite and optimize the code automatically. SQL, developed fifty years ago as a declarative language with a set of composable English templates, continues to evolve and improve data access experience and developer productivity. Developers have access to an endless array of tools to expand their toolbox.

The Emergence of GenAI

GenAI is a new, powerful tool for the developer toolbox. GenAI, short for Generative AI, is a subset of AI capable of taking prompts and then autonomously creating many forms of content — text, code, images, videos, music and more — that imitate and often mirror the quality of human craftsmanship. Prompts are instructions in the form of expository writing. Better prompts produce better text and code. The seismic surge surrounding GenAI supported with technologies such as ChatGPT, and copilot, positions 2023 to be heralded as the “Year of GenAI”. GenAI’s text generation capability is expected to revolutionize every aspect of developer experience and productivity.

Impact on Developers

Someone recently noted, “In 2023, natural language has emerged as the fastest programming language.” While the previous generation of tools focused on incremental improvement to productivity for writing code and improving code quality, GenAI tools promise to revolutionize these and every other aspect of developer work. ChatGPT can summarize a long requirement specification, give you the delta of what changed between two versions or help you come up with a checklist of a specific task. For coding, the impact is dramatic. Since these models have been trained on the entire internet, billions of parameters, and trillions of tokens, they’ve seen a lot of code. By writing a good prompt, you make it to write a big piece of code, design the APIs and refactor the code. And in just one sentence, you can ask ChatGPT to rewrite everything in a brand-new language. All these possibilities were simply science fiction just a few years ago. It makes the mundane tasks disappear, hard tasks easier and difficult tasks possible. Developers are relying more on ChatGPT to explain new concepts, clarify a confusing idea. Apparently, this trend has reduced the traffic to StackOverflow, a popular Q&A site for developers, anywhere between 16% to 50%, on various measures! Developers choose the winning tool.

But, there’s a catch. More than one, in fact. The GenAI tools of the current generation, although promising, are unaware of your goals and objectives. These tools, developed through training on a vast array of samples, operate by predicting the succeeding token, one at a time, rooted firmly in the patterns they have previously encountered. Their answer is guided and constrained by the prompt. To harness their potential effectively, it becomes imperative to craft detailed, expository-style prompts. This nudges the technology to produce output that is closer to the intended goal, albeit with a style and creativity that is bounded by their training data. They excel in replicating styles they have been exposed to but fall short in inventing unprecedented ones. Multiple companies and groups are busy with training LLMs for specific tasks to improve their content generation. I recommend heeding the advice of Sathya Nadella, Microsoft’s CEO, who suggests it is prudent to treat the content generated by GenAI as a draft, requiring thorough review to ensure its clarity and accuracy. The onus falls on the developer to delineate between routine tasks and those demanding creativity — a discernment that remains beyond GenAI’s reach. At least, for now.

Despite this, with justifiable evidence, GenAI promises improved developer experience and productivity. OpenAI’s ChatGPT raced to 100 million users in a record time. Your favorite IDEs have plugins to exploit it. Microsoft has promised to use GenAI in all its products, including its revitalized search offering, bing.com. Google has answered with its own suite of services and products; Facebook and others have released multiple models to help developers progress.

It’s a great time to be a developer. The revolution has begun promptly. At Couchbase, we’ve introduced generative AI capabilities into our Database as a Service Couchbase Capella to significantly enhance developer productivity and accelerate time to market for modern applications. The new capability called Capella iQ enables developers to write SQL++ and application-level code more quickly by delivering recommended sample code.

For more information about Capella iQ and to sign up for a private preview, please visit here , or try Couchbase for yourself today with our free trial here.

The post Generative AI: A New Tool in the Developer Toolbox appeared first on The New Stack.

Whose IP Is It Anyway? AI Code Analysis Can Help

Natalie Lightner — Tue, 19 Sep 2023 13:31:24 +0000

With generative AI tools like OpenAI, ChatGPT and GitHub Copilot flooding the software development space, developers are quickly adopting these technologies to help automate everyday development tasks. A recent Stack Overflow survey found an overwhelming 70% of its 89,000 respondents are either currently employing AI tools in their development process or are planning to do so in 2023.

In response to the growing AI landscape, new AI tools that can perform code analysis are coming on the market. By enabling developers to analyze code generated by AI tools and identify open source snippets and related license and copyright terms, these tools allow developers to simply provide code blocks or snippets generated by AI tools and receive feedback about whether it matches an open source project, and if so, which license the project is associated with. With this information, teams can have confidence that they are not building and shipping applications that contain someone else’s protected intellectual property.

Synopsys Senior Sales Engineer Frank Tomasello recently hosted a webinar, “Black Duck Snippet Matching and Generative AI Models,” to discuss the rise of AI and how our snippet analysis technology helps protect teams and IP in this uncertain frontier. We touch upon the key webinar takeaways below.

The Risks of AI-Assisted Programming

The good: Fewer resource constraints. The bad: Inherited code with unknown restrictions. The ugly: License conflicts with potential legal implications.

Citing the Stack Overflow survey noted above, Tomasello underscored in the webinar that we are well on our way to adopting an industry-wide shift toward AI-assisted programming. While beneficial from a resource and timing constraint perspective, lazy or insecure use of AI can mean a whole world of trouble.

AI tools like Copilot and ChatGPT function based on learning algorithms that use vast repositories of public and open source code. These models then use the context provided by their users to suggest lines of code to incorporate into proprietary projects. At face value, this is tremendously helpful in speeding up development and minimizing resource limitations. However, given that open source was used to train these tools, it is essential to recognize the possibility that a significant portion of this public code is either copyrighted or subject to more restrictive licensing conditions.

The worst-case scenario is already playing out; earlier this year, GitHub and OpenAI faced groundbreaking class-action lawsuits that claim violations of copyright laws for allowing Copilot and ChatGPT to generate sections of code without providing the necessary credit or attribution to original authors. The fallout from these and inevitable future lawsuits remains to be seen, but the litigation is something that no organization wants to face.

The danger here is therefore not the use of generative AI tools, but the failure to complement their use with tools capable of identifying license conflicts and their potential risk.

The Challenge of Securing AI-Generated Code

We’ve seen over and over the outcomes for failing to adhere to license requirements, long before AI: think Cisco Systems v. the Free Software Foundation in 2008 and Artifex Software v. Hancom in 2017. But the risk remains the same; as AI-assisted software development advances, it’s becoming ever more crucial for companies to remain vigilant about potential copyright violations and maintain strict compliance with the terms of open source licenses.

Business leaders are concerned with implementing AI guardrails and protections, but they often lack a tactical or sustainable approach. Today, most organizations either ignore security needs entirely or take an unsustainably manual approach. The manual approach involves considerable resourcing to maintain — more people, more money and more time to complete. With uncertain economic conditions and limited capacity, organizations are struggling to dedicate the necessary effort for this task. In addition, the complexity of license regulations necessitates a level of expertise and training that organizations likely lack.

Further compounding the issue is the element of human error. It would be unrealistic to expect developers to painstakingly investigate every single license that is mapped to every single open source component and successfully identify all associated licenses, especially given the massive scale of open source usage in modern applications.

What’s required is an automated solution that goes above and beyond common open source discovery methods to help teams simplify and accelerate the compliance aspect of open source usage.

How Synopsys Can Help

While most SCA tools parse files generated by package managers to resolve open source dependencies, that’s not sufficient to identify the IP obligations associated with AI-generated code. This code is usually provided in blocks or snippets that will not be recognized by package managers or included in files like package.json or pom.xml. That’s why you need a tool that goes several steps further in identifying open source dependencies, including conducting snippet analysis.

Synopsys’ Black Duck team offers a snippet analysis tool that does exactly what its name suggests; it analyses source code, and can match snippets as small as a handful of lines to the open source projects where they originated. Black Duck can provide customers with the license associated with that project and advise on associated risk and obligations. This is all powered by a KnowledgeBase™ of more than 6 million open source projects and over 2,750 unique open source licenses.

Synopsys is now offering a preview of this AI code analysis tool to the public at no cost. This will enable developers to leverage productivity-boosting AI tools without worrying about violating license terms that other SCA tools might overlook.

The post Whose IP Is It Anyway? AI Code Analysis Can Help appeared first on The New Stack.

AI for Developers: How Can Programmers Use Artificial Intelligence?

Megan Grant — Mon, 18 Sep 2023 18:17:05 +0000

No, AI isn’t going to steal your job.

Artificial intelligence (AI), machine learning (ML), and natural language processing (NLP) are changing the landscape of… just about everything we do. And therein lies the problem: Is AI going to impact your job for the worse? If the headlines are any indication of things to come, you — my software developer friend — might be feeling nervous. But let’s hit the pause button, because this conversation isn’t exactly black and white. Under the right circumstances, AI for developers can be a good — nay, a great — thing.

In this article, we’ll cover:

Why machine learning algorithms will not fully take over the software development lifecycle.
Why and how AI code assistants are your friend.
How you can use AI-powered tools to your advantage.
A few examples of AI code writers and other tools you might be interested in.

Let’s go!

Wait, Will AI Replace Developers?

It’s one of the biggest concerns stopping some of us from moving forward into this next chapter. Over the years, there have already been stories and fears about robots taking our jobs. There are robots in Las Vegas mixing and serving drinks. In Texas, we saw the first-ever McDonald’s where you’re served by robots. At Amazon Go stores, you can shop without scanning anything or even talking to another person.

What does this more for those of us in tech? Won’t artificial intelligence make developers obsolete? I mean… just look at Google’s autocomplete.

Yikes. But wait.

First, let’s look at what’s likely the inevitable truth: AI probably isn’t going away. Natural language processing and large language models are only going to get savvier. A recent study from PricewaterhouseCoopers (PwC) shows that the AI market could add $15.7 trillion to the global economy by 2030, with up to a 26% boost in GDP for local economies from AI by 2030.

And make no mistake about it: Your employer is very likely looking into generative AI, if they haven’t already asked you to get to work on it! In fact, 64% of businesses think AI will increase their productivity. 60% expect AI to boost sales. They’re also hoping it’ll help them avoid more mistakes (48%), save money (59%), and streamline processes (42%).

Just look at the Google trend for the search term “AI tools”:

All of this is to say that if you’re not already using AI as a developer, you likely will be soon — or, you should be.

That said, know that artificial intelligence is not a replacement. It’s a supplement. It completes or enhances the whole. Indeed, AI has its limits. (More on this in a minute!)

For this reason, AI simply can’t replace developers. Rather, AI allows developers to focus on what they do best — build and ship — rather than get caught up in repetitive tasks.

The Benefits of AI for Developers

“I’m fine. I don’t need AI,” we can hear you saying. Hold that thought — let’s talk about this. The tides are turning, and you might want to go with them and not struggle against them. Here’s why.

1. Artificial Intelligence Is the Master of Automation

Developers are responsible for a lot of work that ends up being painfully repetitive and monotonous, like testing and debugging. Writing code in the first place can also be extremely tedious.

Depending on the source, developers might be spending nearly half of their development time debugging. Take half of your yearly income, multiply it by how many developers there are on your team alone, and you’ll start to get an idea of the time and money your company is spending just so that you can address bugs.

Now, imagine AI doing a good chunk of that work for you. Yes, we’re talking about automated code reviews, unit tests, code generation, and the automatic implementation of other repetitive tasks that can end up being a huge time-suck.

This technology can potentially do a lot of the heavy lifting when it comes to code completion. Picture being freed up to work on projects that you normally wouldn’t have had time to accomplish. Like Simon Willison — the creator of Datasette and co-creator of Django — said, this technology allows you to be more ambitious.

That’s the power of using AI tools for software development.

2. AI Can Reduce the Likelihood of Human Error

There are some things that humans are better at than technology. And, undoubtedly, under other circumstances, the reverse is also true.

If you write code snippets purely by hand, it is prone to errors. If you audit existing code by hand, it is prone to errors. Many things that happen during software development are prone to errors when they’re done manually.

No, AI for developers isn’t completely bulletproof. However, a trustworthy AI tool can help you avoid things like faulty code writing and code errors, ultimately helping you to enhance code quality.

While you won’t rely on AI tools for the entire coding process, leaning into AI coding assistant tools more often will reduce the likelihood of human errors and make your life a whole lot easier. AI-powered code is the present and the future.

3. AI and ML Allow for More Robust Data Analysis

It’s not only about using AI tools to write code, suggest code, and help with other potentially tedious tasks. You can also use AI tools to interpret, dissect, and audit the code that you already have. This can help you make more informed, data-driven decisions.

As an example, let’s take a look at Ambee, a climate tech start-up that is quickly growing. From the get-go, MongoDB Atlas has been at the center of Ambee’s database architecture, helping to support their AI and ML models.

With MongoDB Atlas, Ambee is able to run AI models that deliver data as a service to their own clients and provide intelligent recommendations. Without MongoDB, it would be exceedingly difficult for Ambee to house all of its data in one location and operationalize it for different use cases.

For example, Ambee uses AI to predict forest fires and their outcomes across the United States and Canada. This also means sending critical warnings to organizations so that they can protect people and property.

Source: https://www.mongodb.com/blog/post/ambees-ai-environmental-data-revolution-powered-atlas

For all of the reasons explored above, the conversations around AI and ML are far too complex to be as simple as, “Will this take my job?” Rather, we need to expand our horizons and think about the limitless potential thanks to this life-changing technology. Think of all the amazing ways that AI will help you create even better work than you already are.

How to Leverage Artificial Intelligence and Machine Learning as a Developer

It’s one thing to talk about how beneficial AI tools can be in software development, but it’s even better to actually experience it. Let’s talk about how you can get started using AI to improve code quality, code generation, and the software development process as a whole.

1. Use AI Tools to Support Your Efforts — Not Replace Them

We’ve said this already but it bears repeating. AI tools are a supplement, not a replacement. You can’t (at least, not yet) completely remove yourself from the development process.

There are still a myriad of things you can do that AI cannot. Period.

2. Know the Limits of the AI Tool You’re Using

Because AI tools can’t do everything, you have to be aware of where technology exits and you enter.

For instance, while you can absolutely use AI tools for debugging, you should still have human beings doing thorough testing and QA before updates to your software are made available to the public. Otherwise, you might end up with a mess on your hands. (Keep reading for some rather concerning examples.)

3. Ensure Your Manager/Employer Is on Board and Clear on Expectations, Boundaries, and Security Protocol

Some brands are all about AI tools and want to dive in immediately, if not yesterday. Others are understandably a little more hesitant.

What is your employer comfortable with? What’s off-limits? Do they want you to use an AI tool to generate code but prefer you stick to testing and debugging manually?

Beyond the boundaries, what are the expectations and goals? While it’s fun to experiment with artificial intelligence, you should still do so strategically.

Finally, how are you ensuring that you’re using AI in a safe and secure manner? Is there specific data and information you need to avoid putting into AI tools?

4. Take Responsibility for the End Results!

AI is not 100% bulletproof, and you’ve probably already seen the headlines: “People Are Creating Records of Fake Historical Events Using AI“; “Lawyer Used ChatGPT In Court — And Cited Fake Cases. A Judge Is Considering Sanctions“; “AI facial recognition led to 8-month pregnant woman’s wrongful carjacking arrest in front of kids: lawsuit.”

This is what happens when people take artificial intelligence too far and don’t use any guardrails.

Your own coding abilities and skill set as a developer are still absolutely vital to this entire process. As much as software developers might love to completely lean on an AI code assistant for the journey, the technology just isn’t to that point. If something goes wrong with your code documentation, you certainly can’t tell your employer or your audience, “Sorry about that! Our code assistant slipped up.”

So, radical accountability is still a must. AI can assist developers in creating more secure code and also save time. But at the end of the day, it comes down to the brains — not the technology — behind the masterpiece.

5. Test Small and Scale Big

“Let’s just use AI for everything!” you’re saying. Hold that thought.

You have to crawl before you walk and walk before you run. Code assistants are enabling developers to build high-quality code faster and with more accuracy. But that doesn’t mean that software developers should go all-in from the word “go.”

Source: Pexels.

What might it look like to test small? Well, maybe you start by using an AI-powered tool to write individual code snippets. Or maybe you utilize an AI-powered assistant to make code suggestions.

If this goes well, maybe you progress to using an AI-powered tool to manage entire functions, complete code, and automate repetitive tasks that you’ve always done manually.

Popular AI Tools That Programmers Are Using

Okay, so you’re ready to take the next step — fantastic! What might that even look like? There are plenty of tools, platforms, and software that developers are enjoying.

GitHub CoPilot is an adopted AI developer tool.” The creators have trained it on billions of lines of code in various programming languages. Also worth noting is that GitHub Copilot can integrate with one of the most popular code editors: Visual Studio Code!

Source: https://github.com/features/copilot

Protect yourself from security vulnerabilities with something like Amazon CodeGuru Security. It uses ML and automated reasoning to find issues in your code, offer recommendations for how to fix them, and track their statuses over time. As part of its key features, it will also scale up and down with your workload.

Source: https://aws.amazon.com/codeguru/

Sourcegraph is a code AI platform to help you build software. Its code graph powers Cody — a powerful AI coding assistant for writing, fixing, and maintaining code — and Code Search, helping devs explore their entire codebase and make large-scale migrations and security fixes. Write functional code and get suggestions based on the code context.

Finally, add Amazon CodeWhisperer to your list! It provides provenance for generated code to help developers decide if there are any issues with software licenses.

AI Is Your Friend

Across any number of programming languages, whether you’re dealing with tiny code snippets or entire applications, whether you’re new to the world of software development or you’re a seasoned veteran, artificial intelligence, machine learning, and natural language processing will be one of your greatest allies.

Use AI-powered code for the power of good, and your next code-based project will be a win.

The post AI for Developers: How Can Programmers Use Artificial Intelligence? appeared first on The New Stack.

Three Big Bets on the Future of AI

Madhukar Kumar — Mon, 18 Sep 2023 16:13:42 +0000

In April 2023, Goldman Sachs released a report estimating that advancements in generative AI have the potential to drive a 7% (or approximately $7 trillion) “increase in global GDP and lift productivity growth by 1.5 percentage points over a 10-year period.” This prospect so clearly highlights why it is important to get the future of generative AI right, especially as it relates to data — the key piece that is arguably the heart of this technology.

So, what does the future of generative AI look like? A big part of it will be split-second curation, consolidation across multiple data sources and types and providing context to LLMs. To thrive and function their best, LLMs will need fresh, curated data and context for applications — all of which needs to happen in milliseconds, snaps of real-time. Let’s dive a bit deeper into the three tenets I am betting as the future of AI:

1. An Ensemble of LLMs

LLMs, or large language models, are “deep learning algorithms that can recognize, summarize, translate, predict, and generate content using very large datasets.” LLMs are the backbone of generative AI. As this technology continues to evolve, there is not going to be one universal LLM that dominates the market. Instead, organizations will leverage an ensemble of LLMs to power use cases, something we already see emerging today. For example, GPT4 is rumored to be not just one massive model but a collection of 10+ different models, each with 100 billion parameters all stitched together. Consequently, enterprises will have to have a combination of LLMs or foundational models that they start to leverage. I believe enterprises will hedge their bets and costs by utilizing multiple foundational models that accomplish specific tasks better than the others. This includes both open source LLMs like Llama 2 and Hugging Face, and private LLMs like OpenAI, Anthropic and Cohere.

2. AI Data Planes Emerge

For businesses, I believe there will be an AI data plane that sits between their ensemble of LLMs and their corporate data. Incorporating an AI data plane provides additional context and clean data to an ensemble of LLMs for instant responses based on data within the enterprise firewalls. These data planes will have to have the ability to ingest, store and process vector embeddings — along with other data types and structures, including hybrid search. This includes managing data access, security and governance, as well as a thin layer of intelligence that helps prototype and build applications rapidly and easily.

3. Real-time AI Will Increasingly Become the Norm

As AI proliferates and we start to interface with more audio and video-enabled AI, businesses will demand access to fresh data in real-time (milliseconds) to provide the right context for foundational models. LLMs and other multi-structured foundational models will need to respond to requests in real-time and, in turn, will need their data planes to have real-time capabilities to process and analyze data in diverse formats. To execute on real-time AI, enterprises need to continuously vectorize data streams as they are ingested and utilize those for AI applications. Consequently, organizations will increasingly move toward a zero ETL philosophy to minimize data movement, complexities and latencies to power their AI apps.

Conclusion

The world of AI and generative AI is fast evolving. Newer applications, foundational and business models and supporting technologies are quickly emerging. SingleStore has been at the forefront of this gen AI revolution, with its built-in vector and multi-model capabilities to power fast real-time AI applications. Understanding the small pieces that make up the larger puzzle is key to getting the generative AI revolution right — and creating a future where this technology can be used to elevate human lives.

At our upcoming conference, SingleStore Now, we’ll be demonstrating hands-on sessions on how developers can build and scale compelling enterprise-ready generative AI applications. The event will feature customers, partners, industry leaders and practitioners like Harrison Chase, co-founder and CEO of LangChain. To learn more and to register for SingleStore Now, visit singlestore.com/now.

The post Three Big Bets on the Future of AI appeared first on The New Stack.

How to Get the Right Vector Embeddings

Yujian Tang — Mon, 18 Sep 2023 13:09:38 +0000

Vector embeddings are critical when working with semantic similarity. However, a vector is simply a series of numbers; a vector embedding is a series of numbers representing input data. Using vector embeddings, we can structure unstructured data or work with any type of data by converting it into a series of numbers. This approach allows us to perform mathematical operations on the input data, rather than relying on qualitative comparisons.

Vector embeddings are influential for many tasks, particularly for semantic search. However, it is crucial to obtain the appropriate vector embeddings before using them. For instance, if you use an image model to vectorize text, or vice versa, you will probably get poor results.

In this post, we will learn what vector embeddings mean, how to generate the right vector embeddings for your applications using different models and how to make the best use of vector embeddings with vector databases like Milvus and Zilliz Cloud.

How Are Vector Embeddings Created?

Now that we understand the importance of vector embeddings, let’s learn how they work. A vector embedding is the internal representation of input data in a deep learning model, also known as embedding models or a deep neural network. So, how do we extract this information?

We obtain vectors by removing the last layer and taking the output from the second-to-last layer. The last layer of a neural network usually outputs the model’s prediction, so we take the output of the second-to-last layer. The vector embedding is the data fed to a neural network’s predictive layer.

The dimensionality of a vector embedding is equivalent to the size of the second-to-last layer in the model and, thus, interchangeable with the vector’s size or length. Common vector dimensionalities include 384 (generated by Sentence Transformers Mini-LM), 768 (by Sentence Transformers MPNet), 1,536 (by OpenAI) and 2,048 (by ResNet-50).

What Does a Vector Embedding Mean?

Someone once asked me about the meaning of each dimension in a vector embedding. The short answer is nothing. A single dimension in a vector embedding does not mean anything, as it is too abstract to determine its meaning. However, when we take all dimensions together, they provide the semantic meaning of the input data.

The dimensions of the vector are high-level, abstract representations of different attributes. The represented attributes depend on the training data and the model itself. Text and image models generate different embeddings because they’re trained for fundamentally different data types. Even different text models generate different embeddings. Sometimes they differ in size; other times, they differ in the attributes they represent. For instance, a model trained on legal data will learn different things than one trained on health-care data. I explored this topic in my post comparing vector embeddings.

Generate the Right Vector Embeddings

How do you obtain the proper vector embeddings? It all starts with identifying the type of data you wish to embed. This section covers embedding five different types of data: images, text, audio, videos and multimodal data. All models we introduce here are open source and come from Hugging Face or PyTorch.

Image Embeddings

Image recognition took off in 2012 after AlexNet hit the scene. Since then, the field of computer vision has witnessed numerous advancements. The latest notable image recognition model is ResNet-50, a 50-layer deep residual network based on the former ResNet-34 architecture.

Residual neural networks (ResNet) solve the vanishing gradient problem in deep convolutional neural networks using shortcut connections. These connections allow the output from earlier layers to go to later layers directly without passing through all the intermediate layers, thus avoiding the vanishing gradient problem. This design makes ResNet less complex than VGGNet (Visual Geometry Group), a previously top-performing convolutional neural network.

I recommend two ResNet-50 implementations as examples: ResNet 50 on Hugging Face and ResNet 50 on PyTorch Hub. While the networks are the same, the process of obtaining embeddings differs.

The code sample below demonstrates how to use PyTorch to obtain vector embeddings. First, we load the model from PyTorch Hub. Next, we remove the last layer and call .eval() to instruct the model to behave like it’s running for inference. Then, the embed function generates the vector embedding.

HuggingFace uses a slightly different setup. The code below demonstrates how to obtain a vector embedding from Hugging Face. First, we need a feature extractor and model from the transformers library. We will use the feature extractor to get inputs for the model and use the model to obtain outputs and extract the last hidden state.

Text Embeddings

Engineers and researchers have been experimenting with natural language and AI since the invention of AI. Some of the earliest experiments include:

ELIZA, the first AI therapist chatbot.
John Searle’s Chinese Room, a thought experiment that examines whether the ability to translate between Chinese and English requires an understanding of the language.
Rule-based translations between English and Russian.

AI’s operation on natural language has evolved significantly from its rule-based embeddings. Starting with primary neural networks, we added recurrence relations through RNNs to keep track of steps in time. From there, we used transformers to solve the sequence transduction problem.

Transformers consist of an encoder, which encodes an input into a matrix representing the state, an attention matrix and a decoder. The decoder decodes the state and attention matrix to predict the correct next token to finish the output sequence. GPT-3, the most popular language model to date, comprises strict decoders. They encode the input and predict the right next token(s).

Here are two models from the sentence-transformers library by Hugging Face that you can use in addition to OpenAI’s embeddings:

MiniLM-L6-v2: a 384-dimensional model
MPNet-Base-V2: a 768-dimensional model

You can access embeddings from both models in the same way.

Multimodal Embeddings

Multimodal models are less well-developed than image or text models. They often relate images to text.

The most useful open source example is CLIP VIT, an image-to-text model. You can access CLIP VIT’s embeddings in the same way as you would an image model, as shown in the code below.

Audio Embeddings

AI for audio has received less attention than AI for text or images. The most common use case for audio is speech-to-text for industries such as call centers, medical technology and accessibility. One popular open source model for speech-to-text is Whisper from OpenAI. The code below shows how to obtain vector embeddings from the speech-to-text model.

Video Embeddings

Video embeddings are more complex than audio or image embeddings. A multimodal approach is necessary when working with videos, as they include synchronized audio and images. One popular video model is the multimodal perceiver from DeepMind. This notebook tutorial shows how to use the model to classify a video.

To get the embeddings of the input, use outputs[1][-1].squeeze() from the code shown in the notebook instead of deleting the outputs. I highlight this code snippet in the autoencode function.

Storing, Indexing and Searching Vector Embeddings with Vector Databases

Now that we understand what vector embeddings are and how to generate them using various powerful embedding models, the next question is how to store and take advantage of them. Vector databases are the answer.

Vector databases like Milvus and Zilliz Cloud are purposely built for storing, indexing and searching across massive datasets of unstructured data through vector embeddings. They are also one of the most critical infrastructures for various AI stacks.

Vector databases usually use the Approximate Nearest Neighbor (ANN) algorithm to calculate the spatial distance between the query vector and vectors stored in the database. The closer the two vectors are located, the more relevant they are. Then the algorithm finds the top k nearest neighbors and delivers them to the user.

Vector databases are popular in use cases such as LLM retrieval augmented generation (RAG), question and answer systems, recommender systems, semantic searches, and image, video and audio similarity searches.

To learn more about vector embeddings, unstructured data and vector databases, consider starting with the Vector Database 101 series.

Summary

Vectors are a powerful tool for working with unstructured data. Using vectors, we can mathematically compare different pieces of unstructured data based on semantic similarity. Choosing the right vector-embedding model is critical for building a vector search engine for any application.

In this post, we learned that vector embeddings are the internal representation of input data in a neural network. As a result, they depend highly on the network architecture and the data used to train the model. Different data types, such as images, text and audio, require specific models. Fortunately, many pretrained open source models are available for use. In this post, we covered models for the five most common types of data: images, text, multimodal, audio and video. In addition, if you want to make the best use of vector embeddings, vector databases are the most popular tool.

The post How to Get the Right Vector Embeddings appeared first on The New Stack.

Dev News: A ‘Nue’ Frontend Dev Tool; Panda and Bun Updates

Loraine Lawson — Sat, 16 Sep 2023 11:00:22 +0000

A new minimalistic frontend development toolset called Nue.js launched Wednesday. It’s an alternative to React, Vue, Next.js, Vite, Svelte and Astro, said frontend developer and Nue.js creator Tero Piirainen when introducing it on Hacker News. It’s designed for websites and reactive user interfaces, he further explained in the Nue.js FAQ. The toolset has been open sourced under the MIT license.

“Nue ecosystem is a work-in-progress and today I’m releasing the tiny, but powerful core: Nue JS,” he wrote on Hacker News. “It’s an extremely small (2.3kb minzipped) JavaScript library for building user interfaces.”

Nue comes from the German word neue, which translates to “new” in English. It allows developers with knowledge of HTML, CSS and JavaScript to build server-side components and reactive interfaces. It’s like React or Vue, but without hooks, effects, props, or other abstractions, he added.

React vs Nue (according to Nue)

The Nue.js website boasts that it can build user interfaces with 10x less code, presumably when compared with competitors (but that wasn’t specified). It’s designed to be part of an ecosystem, with plans to include:

Nue CSS for cascaded styling to replace CSS-in-JS, Tailwind and SASS;
Nue MVC, for building single-page apps;
Nue UI for creating reusable components for rapid UI development;
Nuemark, a markdown flavor for rich and interactive content; and
Nuekit for building websites and web apps with less code

Piirainen, who hails from Helsinki, has more than 25 years of experience building open source projects, technology products, and startups. Previous projects Piirainen has coded include Riot.js, Flowplayer, and jQuery Tools. He is currently the sole developer on Nue.js, but is seeking contributors.

Panda Updated

Panda, the popular Python library, released version 2.1.0 this week. Panda is a data analysis and manipulation library built on top of NumPy, which is a library for scientific computing. This update includes a number of enhancements:

Avoid NumPy object type for strings by default;
DataFrame reductions preserve extension dtypes;
Copy-on-Write improvements;
A New DataFrame.map() method and support for ExtensionArrays; and
New implementation of DataFrame.stack()

Panda also plans to make PyArrow a required dependency with Panda 3.0. Among the listed benefits are the ability to:

Infer strings. PyArrow backs strings by default, “enabling a significant reduction of the memory footprint and huge performance improvements,” the post stated.
Infer more complex dtypes with PyArrow by default, such as decimal, lists, bytes, structured data and more.
Improve interoperability with other libraries that depend on Apache Arrow.

The group is looking for feedback on the decision.

Node.js Release 20.6.0

Node.js released Node.js 20.6.0 last week, with the big change being that it now offers built-in .env file support for configuring environment variables. The change also allows developers to define NODE_OPTIONS directly in the .env file, eliminating the need to include it in the package.json, the release note stated.

There’s also a new API register on node:module to specify a file that exports module customization hooks, passes data to the hooks, and establishes communication channels with them.

“The ‘define the file with the hooks’ part was previously handled by a flag –experimental-loader, but when the hooks moved into a dedicated thread in 20.0.0 there was a need to provide a way to communicate between the main (application) thread and the hooks thread,” the release note stated. “This can now be done by calling register from the main thread and passing data, including MessageChannel instances.”

The JavaScript runtime is used to develop web applications, real-time applications, and command-line tools.

Bun Update Addresses Bugs

Bun released last week. This week, creator Jarred Sumner posted that Vercel has added Bun install support and Replit added Bun support. Ruby on Rails also added Bun support and Laravela Sial now installs Bun by default. There’s also a Typescript web framework that runs on Bun called Elysia.

All is not perfect in Bun world, however, and the bug reports are starting to role in, with 1,027 bugs reported on the new runtime. To be fair, a good portion of those go back to Bun’s early days, but around 400 bugs have been filed since its 1.0 release. Bun v1.0.1, posted Tuesday, addressing some of these problems.

Free Prompt Engineering Course for Web Developers

Developer education platform Scrimba is offering a free prompt engineering course for web developers. Before taking the course, it’s recommended that developers have a basic understanding of HTML, CSS, JavaScript and React. It’s taught by Treasure Porth, a software engineer who has taught code since 2015. The three-hour course focuses on creating prompts, AI-assisted coding, and using AI large language models for job searches.

The post Dev News: A ‘Nue’ Frontend Dev Tool; Panda and Bun Updates appeared first on The New Stack.

3 Ways to Stop LLM Hallucinations

Alan Ho — Fri, 15 Sep 2023 17:15:23 +0000

Large language models have become extremely powerful today; they can help provide answers to some of our hardest questions. But they can also lead us astray: They tend to hallucinate, which means that they give answers that seem right but aren’t.

LLMs hallucinate when they encounter queries that aren’t part of their training data set — or when their training data set contains erroneous information (this can happen when LLMs are trained on internet data, which, as we all know, can’t always be trusted). LLMs also don’t have memory. Finally, “fine tuning” is often regarded as a way to reduce hallucinations by retraining a model on new data — but it has its drawbacks.

Here, we’ll look at three methods to stop LLMs from hallucinating: retrieval-augmented generation (RAG), reasoning and iterative querying.

Retrieval-Augmented Generation

With RAG, a query comes into the knowledge base (which, in this case, is a vector database) as a semantic vector — a string of numbers.

The model then retrieves similar documents from the database using vector search, looking for documents whose vectors are close to the vector of the query.

Once the relevant documents have been retrieved, the query, along with these documents, is used by the LLM to summarize a response for the user. This way, the model doesn’t have to rely solely on its internal knowledge but can access whatever data you provide it at the right time. In a sense, it provides the LLM with “long-term memory” that it doesn’t possess on its own. The model can provide more accurate and contextually appropriate responses by including proprietary data stored in the vector database.

An alternate RAG approach incorporates fact-checking. The LLM is prompted for an answer, which is then fact-checked and reviewed against data in the vector database. An answer to the query is produced from the vector database, and then the LLM uses that answer as a prompt to discern whether it’s related to a fact.

Reasoning

LLMs are good at a lot of things. They can predict the next word in a sentence, thanks to advances in “transformers,” which transform how machines understand human language by paying varying degrees of attention to different parts of the input data. LLMs are also good at boiling down a lot of information into a concise answer, and finding and extracting something you’re looking for from a large amount of text. Surprisingly, LLMS can also plan — they can gather data and plan a trip for you.

And maybe even more surprisingly, LLMs can use reasoning to produce an answer, in an almost human-like fashion. Because people can reason, they don’t need tons of data to make a prediction or decision. Reasoning also helps LLMs to avoid hallucinations. An example of this is “chain-of-thought prompting.”

This method helps models to break multistep problems into intermediate steps. With chain-of-thought prompting, LLMs can solve complex reasoning problems that standard prompt methods can’t (for an in-depth look, check out the blog post Language Models Perform Reasoning via Chain of Thought from Google).

If you give an LLM a complicated math problem, it might get it wrong. But if you provide the LLM with the problem as well as the method of solving it, it can produce an accurate answer — and share the reason behind the answer. A vector database is a key part of this method, as it provides examples of questions similar to this and populates the prompt with the example.

Even better, once you have the question and answer, you can store it in the vector database to further improve the accuracy and usefulness of your generative AI applications.

There are a host of other reasoning advancements you can learn about, including tree of thought, least to most, self-consistency and instruction tuning.

Iterative Querying

The third method to help reduce LLM hallucinations is interactive querying. In this case, an AI agent mediates calls that move back and forth between an LLM and a vector database. This can happen multiple times iteratively in order to arrive at the best answer. An example of this is forward-looking active retrieval generation, also known as FLARE.

You take a question and then query your knowledge base for similar questions. You’d get a series of similar questions. Then you query the vector database with all the questions, summarize the answer, and check if the answer looks good and reasonable. If it doesn’t, repeat the steps until it does.

Other advanced interactive querying methods include AutoGPT, Microsoft Jarvis and Solo Performance Prompting.

There are many tools that can help you with agent orchestration. LangChain is a great example that helps you orchestrate calls between an LLM and a vector database. It essentially automates the majority of management tasks and interactions with LLMs and provides support for memory, vector-based similarity search, advanced prompt-templating abstraction and a wealth of other features. It also helps and supports advanced prompting techniques like chain-of-thought and FLARE.

Another such tool is CassIO, which was developed by DataStax as an abstraction on top of our Astra DB vector database, with the idea of making data and memory first-class citizens in generative AI. CassIO is a Python library that makes the integration of Cassandra with generative artificial intelligence and other machine learning workloads seamless by abstracting the process of accessing the database, including its vector search capabilities, and offering a set of ready-to-use tools that minimize the need for additional code.

Putting It All Together: SkyPoint AI

SkyPoint AI is a SaaS provider specializing in data, analytics and AI services for the senior care and living industry. The company leverages generative AI to enable natural and intuitive interactions between seniors, caregivers and software systems. By simplifying complex applications and streamlining the user experience, SkyPoint AI empowers seniors and caregivers to access information and insights effortlessly, which helps enhance care.

The company pulls from a wide variety of data that is both structured and unstructured to provide AI-generated answers to prompts like “How many residents are currently on Medicare?” said SkyPoint chief executive Tisson Mathew. This helps care providers make informed decisions quickly, based on accurate data, he said.

Getting to that point, however, was a process, Mathew said. His team started by taking a standard LLM and fine-tuning it with SkyPoint data. “It came up with disastrous results — random words, even,” he said. Understanding and creating prompts was something SkyPoint could handle, but it needed an AI technology stack to handle generating accurate answers at scale.

SkyPoint ended up building a system that ingested structured data from operators and providers, including electronic health-care record and payroll data, for example. This is stored in a columnar database; RAG is used to query it. Unstructured data, such as policies and procedures and state regulations, is stored in a vector database: DataStax Astra DB.

Tisson posed a question as an example: What if a resident becomes abusive? Astra DB provides an answer that is assembled based on state regulations and the users context, and a variety of different documents and vector embeddings in natural language that’s easy for a senior-care facility worker to understand,

“These are specific answers that have to be right,” Tisson said. “This is information an organization relies on to make informed decisions for their community and for their business.”

Conclusion

SkyPoint AI illustrates the importance of mitigating the risk of AI hallucinations; the consequences could be potentially dire without the methods and tools available to ensure accurate answers.

With RAG, reasoning and iterative querying approaches such as FLARE, generative AI — particularly when fueled by proprietary data — is becoming an increasingly powerful tool to help enterprises serve their customers efficiently and effectively.

Learn more about how DataStax helps you build real-time, generative AI applications.

The post 3 Ways to Stop LLM Hallucinations appeared first on The New Stack.

Salesforce Spreads NLP-Enabled Einstein AI over Most of Its Apps

Chris J. Preimesberger — Fri, 15 Sep 2023 16:19:12 +0000

SAN FRANCISCO — Salesforce’s Dreamforce conference, the largest tech show in Northern California in terms of attendees each year, this week was billed as the “largest AI show in the world.” How a designation like that is determined is unclear, but most products showcased and seminars staged did, in fact, touch on AI in one aspect or another.

Interesting sidebar: The term “AI” was questioned by a guest speaker, Williams-Sonoma CEO Laura Alber, who claims that “artificial” is a misnomer, because the intelligence in AI apps is real and based on data.

“I’ve been thinking about this a lot and I actually was pretty inspired by Parker (Salesforce co-founder Parker Harris) and his resetting of what artificial intelligence really means,” Alber said. “And the truth is that when it comes to customers, there can be nothing artificial about the experience. So as we use AI, we use it to inspire and improve the experience. We need to make sure that it’s completely authentic.

“So I wonder about a name change today. I wonder if it’s just called ‘intelligence,’ or as Parker said, ‘data intelligence.’ And I think we should stop thinking about the definition as artificial when it comes to the customer experience.”

So there you have it. We may be calling this phenomenal software running bots and “copilots” — as Salesforce describes its intelligence agents — something completely different in the near future if “AI” is generally found to be too limiting a term.

Salesforce’s Einstein AI had plenty of news in its orbit this week at the Dreamforce event that ended Thursday and was expected to have attracted 40,000-plus humans to downtown San Francisco — not to mention upward of 150,000 more online.

Einstein Copilot

This is Salesforce’s newly minted AI-powered application that enables line-of-business users to talk to their CRMs — literally. Its NLP-enabled interface prompts the assistant, which then uses AI to generate trusted and accurate recommendations and guidance based on the customer’s database of information in the CRM.

Einstein Copilot works alongside employees to accomplish tasks of all types. “It’s very good at drafting emails,” Salesforce CEO/co-founder Marc Benioff said. “But it also does a lot more than that.” Copilot also can create PowerPoint presentations, marketing materials and pitch letters. It also can be queried on things such as:

“Give me tips for reducing average customer service call time”
“Calculate the best discounts to move old merchandise fast”
“Help me prepare for this meeting with all the right details”
“Create personalized product promotions”

With a tireless automated assistant like this, who needs a regular assistant? (Yes, that is a legitimate question now being asked all over the enterprise world.)

Copilot Builder

Copilot Builder empowers developers and line-of-business employees to customize their Copilots with prompts, unique skills and models. Users can build specific Copilots for individual customers and tasks. The platform enables admins, developers and IT teams to build a secure conversational AI assistant that answers questions, reasons and takes action to accomplish specific tasks for employees, Benioff said. Copilots are pre-trained on proprietary and secure CRM data to deliver relevant and trustworthy content, tailored to a brand’s unique identity and needs, Benioff said.

Einstein Trust Layer

This new security attribute provides configurability for managing sensitive data in addition to a compliance-ready audit trail to keep companies in control of their data. CFOs and CISOs alike will like those features.

The new Einstein Trust Layer is integrated into every Einstein Copilot by default. It is compliance-ready to allow companies to retain control and gain visibility of how their data is being used with their chosen AI models.

PII is protected by default, and the Trust Layer is fine-tuned to score and record every AI response for employees to understand whether a response is safe to use.

Defining ‘Decision Science’

“Decision Science” is a term that was brought up in presentations at Dreamforce, mostly in connection with the company’s new Data Cloud — which is a kind of home plate for all the company’s applications. Salesforce exec Janani Narayanan described this new sector of IT as focusing on using data and AI to drive decisions across an organization to maximize sales pipelines and ACV (annual contract value).

“Whether we are making a strategic decision with analytics or an automated decision with marketing automation, we strive to use data and AI responsibly to grow our business and improve the customer experience,” Narayanan told The New Stack.

Narayanan, whose full title is Salesforce Senior Director of Product Management in Digital Intelligence Automation, said that “before Data Cloud, Salesforce’s suite of products were built to deal with mostly transactional data. The addition of Data Cloud allowed us to complement our existing transactional database with the ability to take in massive volumes of data from across CRM, web, mobile and APIs.”

Last Dreamforce in San Francisco?

Dreamforce has been staged at Moscone Center since its inception in 2003 — except in 2020, the first year of COVID-19 (2021 and 2022 were hybrid in-person/online events). This year’s event may turn out to be the last one to be held in San Francisco’s largest convention facility; the jury is currently out on that topic. Salesforce will make a decision in the next few weeks as to where the huge conference will be held in the future.

This is because Benioff has been unhappy with his native city’s approach to finding additional housing for its average of 4,400 homeless people sleeping on the streets each day. San Francisco Mayor London Breed was in the first row for the opening keynote and was introduced by Benioff to the audience.

OpenAI Founder on ChatGPT Success

OpenAI CEO/co-founder Sam Altman had some stage time on Day 1. When asked by Benioff in a 1:1 interview at a packed Yerba Buena Theater what was “the biggest surprise” he saw out of all the hoopla around ChatGPT, Altman replied simply: “That it’s all working.”

During his 37-minute session, Altman said “When you start off on a scientific endeavor, you hope it will work; you let yourself dream it will work. We knew we had a lot to figure out, and figuring out any new science is always hard. We had conviction and a path laid out by our chief scientist, but how do you have conviction and then actually get it to work? There was pretty much a consensus in the outside world that it wasn’t going to work, but we secured the effort and made it work. That probably was the biggest surprise.”

The post Salesforce Spreads NLP-Enabled Einstein AI over Most of Its Apps appeared first on The New Stack.

Edge AI: How to Make the Magic Happen with Kubernetes

Saad Malik — Thu, 14 Sep 2023 11:00:13 +0000

Picture this: You walk into a Macy’s store, taking a break from the virtual world to indulge in the tactile experience of brick-and-mortar shopping.

As you enter, a tall and broad-shouldered Android named Friday greets you warmly.

“Welcome back, Lucy. How are you doing today? Did you like the pink sweater you bought last week?”

Taken aback by the personal touch, you reply, “I’m doing well, thank you. I’m actually looking for a blouse to go with that sweater.”

Friday engages you in conversation, subtly inquiring about your preferences, until finally, the smart mirror built into its torso lights up. It scans your body and overlays selected clothing options on your avatar. Impressed?

Now, with a nod of your head, one of Friday’s minions retrieves your chosen attire and some matching accessories, and leads you to the next available dressing room.

Sci-fi has been dreaming (not always positively) about this kind of thing for decades — can you believe that Minority Report came out more than 20 years ago?

But at last the future is here. This is the evolving reality in retail, healthcare, agriculture, and a wide range of other sectors, thanks to rapid maturation of artificial intelligence (AI). From self-driving cars and other autonomous vehicles, to fruit-picking robots and AI medical imaging diagnoses, AI is now truly everywhere.

AI: The Magic Behind the Curtain

As enchanting as the customer experience in our Macy’s scenario seems, the technological orchestration behind it is no less impressive.

When we say “AI” we are probably talking about the seamless integration of so many different technologies:

Text-to-speech (TTS), to convert Friday’s script and product names into spoken audio.
Speech-to-text (STT), to recognize your responses and store them.
Object detection, to recognize apparel and customers.
Natural language processing, to extract meaning from your spoken responses.
Image generation to create sample outfits from prompts.

And of course behind it all, an up-to-date, vectorized database of available store merchandise and customer records.

When I said that AI has been maturing rapidly, I wasn’t kidding. These complex capabilities are now really in reach of every business, including yours, thanks to a blooming landscape of open source technologies and simple SaaS products.

I recently built a demo mimicking this interactive personal shopping experience. It’s not quite Friday, and it’s not in an android body — b u t it is in your browser for you to try out right here.

By the way: as someone who doesn’t code as much as I used to, I was still able to implement this entire stack in less than a day, and most of the time was wrestling with CSS to make rounded corners! The actual TTS, STT, and LLM portions were relatively easy, using the amazing OpenAI APIs.

Of course, in the real world, I probably wouldn’t want to send my corporate data into the bowels of OpenAI, which is why I strongly urge you to check out our open source LocalAI project, which enables you to run the whole lot on-prem.

AI’s True Home Is on the Edge

Today, training and deep learning of models happens at huge expense in clouds and data centers, because it’s so computationally intensive. Many of the core processing and services we talk about above run on cloud services, where they can easily scale and be managed.

But for actual business-critical inferencing work in the real world, retail assistants and other AI workloads probably shouldn’t live in the cloud. They need to live at the edge. In fact, we believe the natural home for most AI workloads will be running at the edge of the network.

Why? Distributed computing puts processing power and compute resources closer to the user and the business process. It sidesteps several key challenges:

Connectivity: AI workloads can involve a huge amount of data. Yet, particularly in rural or industrial use cases, internet connections at edge locations are often intermittent or slow, presenting a major bottleneck. And 5G rollouts won’t fix that any time soon. If you process the data on the device, you don’t need to connect to the internet.
Latency: Even if you have connectivity to the DC or cloud, latency becomes a factor. A few hundred milliseconds might not sound like much, but in a real-time interaction, it’s an eternity. Running AI workloads at the edge enables real-time experiences with almost instantaneous response times.
Cloud costs: Hyperscalers charge to ingest your data, move it between availability zones, and extract it again. Across millions of AI interactions, these costs add up.
Data privacy: Many AI workloads will gather sensitive, regulated data. Do you really want your body measurements and shopping history floating around in the cloud? With edge computing, your personal sensitive data is processed locally right there on the edge server and that’s where it can stay, if compliance demands it.

But Edge Introduces Its Own Challenges…

Anyone who has tried deploying edge computing infrastructure at scale, whether on Kubernetes or another platform, will tell you that it’s hard work.

You’ll be confronted with challenges around deploying and onboarding hardware, on an ongoing basis. How can you get your Friday android booted and active when there’s no IT expert in the store to ninja the CLI?

You have to address security, when devices may be vulnerable to physical tampering. At minimum you need to consider encryption and a means of verifying device trust, at boot and beyond.

And you need to manage hardware and software stacks at scale, from monitoring to patching. This itself is a major challenge when considering the hardware limitations of small form-factor edge devices, and the intermittent connectivity between the device and management infrastructure back at HQ. You need an architecture with zero-risk updates and the ability to run effectively in an air-gap environment.

…and AI Amplifies the Challenges of the Edge

Adding AI into edge environments introduces further layers of complexity. It’s not just infrastructure you need to manage now, but also:

Models: Your data scientists have an overwhelming number of ready-made datasets and models to choose from on popular repositories like Hugging Face. How can you help them quickly try out and deploy these models — and then keep them up to date on a daily or weekly basis?

AI engines: Engines such as Seldon, BentoML and Kserve need constant maintenance, updates, and tuning for optimal performance. Updating these across many locations becomes tedious and error-prone.

AI models process incoming data in real time, turning raw inputs like voice commands or sensor readings into actionable insights or personalized interactions. AI engines such as Seldon, BendoML and Kserve run those AI models. Think of it like this: AI models are workloads, and the AI engine is the runtime in which these models are executed.

Solving the Challenges of Edge for Your AI Workloads

This is the problem space we’ve been attacking as we build Palette EdgeAI, announced today.

Palette EdgeAI helps you deploy and manage the complete stack, from the edge OS and Kubernetes infrastructure to the models powering your innovative AI apps.

Without diving too deep into the feature list, EdgeAI enables you to:

Deploy and manage your edge AI stacks to edge locations at scale, from easy hardware onboarding options to repeatable ‘blueprints’ that include your chosen AI engine.
Update your edge models frequently without risk of downtime, with easy repo integration and safe, version-controlled updates and rollbacks.
Secure critical intellectual property and sensitive data, with a complete security architecture from silicon to app, including immutability, secure boot, SBOM scans and air-gap mode.

The integration of AI and edge computing is not just an intriguing possibility; it’s a necessity for the next leap in technology and user experience. As we stand on this exciting frontier, one thing is clear: the future of shopping, healthcare, business and many other aspects of life will be smarter, faster, and more personalized than ever.

There are edge AI use cases coming for your industry, too. The growth forecasts are stratospheric: The global Edge AI software market is set to grow from $590 million in 2020 to $1.83 trillion by 2026, according to MarketsandMarkets Research.

Ready to be a part of this future? Of course, you are. But the benefits of AI will only be in your grasp if you can tackle the challenges of the edge. That’s where we can help. So why not take a look at Palette EdgeAI and let us know what you think?

The post Edge AI: How to Make the Magic Happen with Kubernetes appeared first on The New Stack.

Free GPUs and AI Chips Are Available to Run AI

Agam Shah — Fri, 08 Sep 2023 13:00:40 +0000

Free is great, especially for developers looking to run AI models on GPUs just hanging out in data centers waiting to be exploited at zero expense.

The free GPUs available in the cloud today are aging GPUs on their last legs, from Google and other cloud providers. The cloud providers have faster GPUs but are trying to prevent rot on the older GPUs by donating time on these chips to AI enthusiasts and researchers with the technical chops to run Python scripts.

Users can fire up a Jupyter Notebook, load the models, pull down code from GitHub repositories, power up the runtime, and let GPUs do the heavy lifting to produce the output.

Unfortunately, running tweaked AI models is not as easy as just firing your laptop and double-clicking an icon. It may get there at some point, but until then, it still needs command-line tech savviness.

It is different from universal chatbot tools provided by OpenAI or Google, which include a user interface to make AI accessible to the masses.

Friendly user interfaces do not exist for open source models like Llama 2, which was recently released by Meta, though there are exceptions like Huggingface Chat, which runs on Llama 2.

Llama 2 is like AI raw material that developers can take and customize to their own requirements, and in most cases that will require a GPU available on cloud services, or graphics cards on a local PC.

Google Cloud is one of the few places on the Internet where you can find free GPUs and TPUs. The Google Colab website, which is primarily for researchers, has a free tier on its Jupyter Notebook where developers can choose one GPU — the T4 — on which to run inferencing.

T4 is one of the earliest Nvidia chips optimized for artificial intelligence computing, but it is slow. Google previously provided the V100, which is an upgrade over the T4, under the free tier. But the V100 is not free anymore and is now offered under the paid tier, which starts at $9.99 for 100 uses a month. Colab also offers paid access to Nvidia’s A100 GPU, which is faster and was used to train OpenAI’s and 4.0, and Google’s PaLM and PaLM 2.

Google Colab’s free tier involves putting up a script in the Colab notebook, which pulls the model and code from GitHub and other websites and is tuned to run on a GPU. Users can select the Nvidia T4 GPU in the notebook settings and run the script. The task is placed in a queue until a physical GPU becomes available in the Google Cloud.

The Colab creates a virtual desktop on Google to run the inferencing. Users need to make sure plenty of space is available on their Google Drive to store temporary code as the model executes. If the model is large, users will need to buy extra Google Drive storage.

Google Colab’s free tier utilizes older hardware when available. It also provides an option to run inferencing on CPUs, which will be much slower, or Google’s own TPUs. Google’s TPUs can be powerful, but the code needs to be specifically tuned to exploit TPU acceleration. TPUs are ASICs (application-specific integrated circuits) with fixed functionality.

By comparison, GPUs and CPUs can take on generic code, but the GPU provides faster results. Nonetheless, the Nvidia T4 GPU will take a long time to run, so feel free to go out, and get a meal or a pint, especially if AI compute requests are large in scope.

Google offers many AI models in its Vertex AI offering, where developers do not need to worry about the hardware. Users can build their own or run models already available in Google Cloud, or run their own prompts and get responses. Vertex is a one-stop-shop for AI that automatically assigns the hardware in the infrastructure, and users do not have to worry about runtimes or coding specifically to hardware.

Nvidia’s stranglehold on AI has left many AI chip companies in the dust, and developers are emerging as winners in this battle. Graphcore has opened its AI chips for developers to try models that include the most recent Llama-2 model with 7 billion parameters.

Developers can fire up Jupyter notebooks, load up the model from Hugging Face and execute it on Graphcore’s AI chips. However, developers need to have the technical knowledge to run code. Graphcore is providing access to its chips in the cloud for free to prove the real-world functionality of the latest large-language models. The chips also run text-to-image models that include OpenAI’s Dolly 2.0.

Papersource hosts Graphcore’s AI chips and provides access to free GPUs. The free GPU option is only the Quadro M4000, a workstation graphics chip that is eight years old and was not designed for AI. But beggars can’t be choosers — it is a GPU, it is free, and it is better than a CPU.

Cerebras, another AI chipmaker, is offering free access to its AI chips in its data centers. The company’s WSE-2, which is the size of a wafer and the largest chip in the world, is exclusively for training and is expensive to make, so accessing the chip involves jumping through a few hoops.

Cerebras has programs for developers, graduate students, and faculty to access its GPUs, and those with interest can get in touch with the company.

“We are constantly putting models in the open-source community for them to use. Currently, the top performing 3B parameter model BTLM-3B, with more than 1 million downloads on HuggingFace, was developed by Cerebras,” a company spokeswoman said. Cerebras’ servers are programmed via PyTorch, an open industry standard language.

The cheapest form of AI remains running code locally on desktops with powerful GPUs or CPUs. It involves using tools like Oogabooga, which is an automated tool that can create chatbots on local PCs similar to ChatGPT.

Users can load the Oogabooga tool to download and load up chatbots based on existing open models such as Llama 2 or MPT. The process can be complex for non-tech users as it involves installing Python, Nvidia’s developer tools such as CuDNN, and downloading tuned models to run locally.

The tool can run chatbots on CPUs, but it is best to have a gaming PC or laptop with Nvidia’s recent RTX 3090 or 4090 GPUs, which cost thousands of dollars. This doesn’t add up to running AI models for free, but a great alternative if you have a graphics card in a PC.

A tool called AUTOMATIC1111 is a similar tool to load and run text-to-image models locally on PCs.

The post Free GPUs and AI Chips Are Available to Run AI appeared first on The New Stack.

Candle: A New Machine Learning Framework for Rust

Loraine Lawson — Thu, 07 Sep 2023 14:06:03 +0000

Artificial intelligence (AI) company Hugging Face recently released a new minimalistic, machine learning (ML) framework for Rust called Candle. It’s already attracted 7.8 thousand stars and 283 forks on GitHub.

Hugging Face has also rolled out a new coder tool called SafeCoder, which leverages StarCoder to allow organizations to create their own on-premise equivalent of GitHub Copilot. Earlier this year, the open source company released a JavaScript library that allows frontend and web developers to add machine learning capabilities to webpages and apps.

Hugging Face is investing in developer tools that will extend the reach of its 300,000 open source machine learning models, explained Jeff Boudier, head of product and growth at the startup.

“The big picture is that we’re developing our ecosystem for developers and seeing a lot of traction doing it,” Boudier told The New Stack on the heels of a $235 million fund raise that included support from Google, Amazon, Nvidia, Salesforce, AMD, Intel, IBM and Qualcomm. “Now with the support of all these great platforms and players, we can make sure that we have support for the community, whichever platform they use to run their machine learning models.”

Candle, the Rust ML Framework

ML frameworks typically are written in Python and supported by frameworks like PyTorch. These frameworks tend to be “very large, which makes creating instances on a cluster slow,” Hugging Face explained in Candle’s FAQ.

Candle is designed to support serverless inference, which is a way to run machine learning (ML) models without having to manage any infrastructure. Candle does this by allowing the deployment of lightweight binaries, the FAQ explained. Binaries are the executable files that contain the necessary files and resources for the application to run on a target environment.

Candle also allows developers to remove Python from production workloads. “Python overhead can seriously hurt performance, and the GIL is a notorious source of headaches,” the FAQ explained, referring to the Python GIL, or Global Interpreter Lock. The GIL offers benefits, but prevents CPython from achieving full multicore performance, according to cloud storage vendor Backblaze, which explained it in this blog post.

There are three Candle app demos that developers to check out:

A transcription app that uses the OpenAI Whisper models and WASM runtime built with Candle;
A storyteller app that uses Candle to run Llama2.c in the browser using Rust/Wasm; and
Object detection and pose estimation models in the browser using Rust/WASM that utilizes safetensor’s YOLOv8 models and a WASM runtime built with Candle.

SafeCoder: A Co-Pilot for Enterprises

One of the reason why enterprises aren’t rushing to Copilot is their code can go toward training the model, which means data out the door. Not surprisingly, organizations aren’t in a rush to embrace that.

SafeCoder will allow that code information to stay on-premise while still informing the model, Boudier explained.

Customers can build their own Code LLMs, fine-tuned on their proprietary codebase, using open models and libraries, without sharing their code with Hugging Face or any other third party, he said.

“With SafeCoder, Hugging Face delivers a containerized, hardware-accelerated Code LLM inference solution, to be deployed by the customer directly within the Customer secure infrastructure, without code inputs and completions leaving their secure IT environment,” wrote Boudier and Hugging Face tech lead Philipp Schmid in an Aug. 22 blog announcing the tool.

It’s based on StarCoder, an open source LLM alternative that can be used to build chatbots or AI coding assistants. StarCoder is trained on 80 different programming languages, he said, including Rust.

“StarCoder is one of the best open models to do code suggestion,” Boudier said. “Star coder is an open, pre-trained model that has been trained on over a trillion tokens of commercially permissible open source project data. That’s a training data set that you can go look on the Hugging Face hub, you can see if any of your code is within the data set, so it’s really built with consent and compliance from the get-go.”

VMware is an early adopter of SafeCoder, he added.

“I can have the solution that’s uniquely tailored to my company and deployed in our infrastructure so that it runs within the secure environment,” Boudier said. “That’s the promise of a SafeCoder.”

The post Candle: A New Machine Learning Framework for Rust appeared first on The New Stack.

VMware’s Dev-Centered Approach to Pre-Trained Models and Generative AI

Alex Williams — Wed, 06 Sep 2023 19:15:55 +0000

Who knows anything about pre-trained models? Who can explain how to use generative AI?

Developers, platform teams, infrastructure teams, security teams — no one is that advanced. Everyone knows just a smidgeon about using pre-trained models or generative AI. Few know that a pre-trained model consists of a large data set that gets trained with machine learning. (Disclosure: I referred to ChatGPT for that definition).

ChatGPT also will tell you that these larger data sets often get paired with natural language processing to find patterns in data and predict text, such as code. And generative AI? It’s far broader by definition. That’s what the Claude AI from Anthropic will tell you. It’s constructed by using neural network techniques like adversarial training like diffusion modeling, knowledge integration etc. The focus is on modeling high-dimensional distributions of data and efficiently sampling from them.

It’s that early in the cycle where I feel compelled to define terms at the start of a story. And really, how people are getting started with these new thought-provoking technologies is about all I could think about when hearing the AI-filled keynotes at VMware Explore late last month. So, I looked in the conference’s session catalog and found a Spring developer workshop about generative AI.

My curiosity: What do these Java developers know at all about something that is so new? Spring is the popular Java framework.

In front of a full conference room, a VMware engineer started with the absolute basics. Before ChatGPT, the developer trained the model, the VMware engineer said. The “P” in GPT stands for pre-trained. For level setting, GPT stands for “generative pre-trained transformers.”

Pre-Trained Models

He conveyed that the pre-trained model makes all the difference. Generative AI makes development something even a non-programmer can do. And it can help a veteran developer code in ways that lower the risk. Code generation becomes a far simpler task. It makes software development a universal capability.

“And so this kind of transforms AI into being more of a general developer tool than sort of a very specialized area,” the instructor said. “So consequently, it’s going to be ubiquitous.”

But it’s not just about the pre-trained model and generative AI.

“It’s about the software ecosystem around what you’re doing, right?” he said. “How do you get data in and out? How do you make this accessible over the web, and do enterprise integration patterns? Integrating different components and data is all super relevant to creating an effective solution. And of course, you know, the large ecosystem that Spring has in projects, meaning that we can quickly pull together very compelling solutions in the AI space by bringing these components together.”

He went on to explain concepts about generative AI. What’s a model? What are the benefits of ChatGPT? What are its limitations? He explained prompts and the rise of prompt engineering. He discussed how prompts and Spring intersect. He explained how tokens work. He listed the Java Client APIs from Azure OpenAI, OpenAI, and Google Bard.

Then, he introduced SpringAI, now in the Spring Experimental GitHub organization, inspired by the LangChain/LlamaIndex open source projects.

Developers Need the Basics

The introduction to generative AI speaks to what developers and operations teams need now: They want the basics. They need to learn the terminology, prompting tips, the role of queries, etc.

But how did VMware go about adopting pre-trained models? How has the company adapted to integrating pre-trained models? And what have they worked on to make pre-trained models part of its Tanzu application stack?

For example, VMware developed a hub, a graph database, said Purnima Padmanabhan, senior vice president and general manager at VMware. It builds on Aria, a graph-based data store with a GraphQL API originally built for managing IT resources. The graph data stores provide a visual overview of applications and environments.

The Path

The VMware team normalized its data. Normalization plays a crucial role when data gets used in a generative AI environment. It prevents issues such as bias, and for private data to be anonymized. Consistency issues in data formats get resolved, and deduplication means better efficiencies.

Once data is normalized, the team modeled the topology and did the near real-time discovery of environments. They used traditional AI and machine learning techniques to look at big data and synthesize information out of it.

But when conversational AI exploded last December, Padmanabhan said they realized that the data normalization would pay off.

With normalization done, VMware applied a conversational interface to Tanzu Intelligent Services, which VMware launched at Explore, its annual user conference held last month in Las Vegas. It uses conversational AI to ask questions about the application, what’s problematic, what node is causing problems, etc.

“What kind of problem is it?” Padmanabhan said, characterizing what they could deduce. “Is it cost, or performance or security? And if it’s a cost problem, what do I need to do? How do I need to right size it? It pulls data not only from the database that we have and the queries that we have, but also from documentation from other sources.”

Challenges

And herein lies the problem with generative AI adoption. Most companies need a data infrastructure in place to even start using generative AI. They hire machine learning technologists but often before prepping their enterprise environments.

Chris Albon, director of machine learning at the Wikimedia Foundation, wrote on X, the company formerly known as Twitter:

“A mistake I see WAY too many times is hiring some expensive ML expert before having the infrastructure there to support them. Then they spin their wheels being their own data engineer, data scientist, MLOps engineer, etc. etc. until they quit because they aren’t training models.”

“When people ask me: ‘How do I start with AI,?’ you first need to know what is the problem you’re solving,” Padmanabhan said. “What is it that you want to simplify? What is the data that you’re going to look at?”

And that simplification of a problem takes time, Padmanabhan said. “It’s that’s almost boring work that has to be done.”

Step one means identifying the problem, Padmanabhan said. Two, identify the correct data set. Third, find the model applicable to the data set. If a pre-trained model needs fine-tuning, that’s much better than building a new model. Most of the time, a user does not need to build the model. They may fine-tune a pre-trained model.

Padmanabhan said that fine-tuning a model becomes a way to consider data privacy. A model gets trained through the formulation of a query.

“If I can train the model on how to formulate a query, I don’t need to give any data,” Padmanabhan said. I just want to say, ‘formulate this query.'”

VMware developed accelerators that connect with pre-trained models, Padmanabhan said. She said to think of it as a catalog of applications, a curated catalog, that integrates with pre-trained models “so it’s easy for developers to create their AI-enabled applications through the same process through the same platform that they do their other applications.”

The problem statement becomes, “What am I actually asking the model to do? And how can I give you the minimum data set possible?”

Prompt engineering gets done up front, but the query formulation determines how much data needs to go into the fine-tuning of the model. VMware managed that process for its overall Kubernetes-based Tanzu application platform and its application services, which are part of its Cloud Foundry business.

“Okay, now that I got all these pieces, how do I do it this at scale?,” Padmanabhan said. “Because now I know, I have to have consistent APIs, I have to have a common workflow for ML workflow and ML flow engines.”

Cloud Foundry works with a concept called tiles, which provides a systematic approach. Applications get packaged as tiles that allow developers to integrate third-party software. Padmanabhan said VMware developed models based on the tiles of popular applications to make them available.

VMware uses a template process for its app accelerator, Padmanabhan said. For example, a developer can tell the accelerator what programming language to use. The idea: use an API to connect the accelerator to a service such as SpringAI. The accelerator may then have a way to connect to models through common APIs — the solution: a basic accelerator that allows for adjustments and fine-tuning.

Big Code

In its considerations for software engineers, VMware chose to work with open source alternatives, which touches on why VMware chose Hugging Face as a partner. Hugging Face provides a community for AI/ML projects with a deep focus on open source.

Together, Hugging Face and VMware announced SafeCoder, a coding assistant based upon StarCoder, which they developed with VMware and ServiceNow through an open source project called Big Code, which focuses on responsible training of large language models for coding applications.

SafeCoder, designed for the enterprise, is built with security and privacy as a first priority. It works on VMware infrastructure, whether on-premise in the cloud or hybrid. It also works with “open source projects like Ray and Kubeflow to deploy AI services adjacent to their private datasets.”

The VMware team talked more about responsibility than any tech conference I’ve attended. But it makes sense considering how proprietary models pose legal and compliance issues.

“For VMware, our source code is our business,” said Chris Wolf, vice president of VMware AI Labs. “And it’s very important for us to make sure that we’re maintaining privacy and control of that data because that’s our business.”

VMware tuned SafeCoder against its private source code. They looked at the code from their top-performing software engineers and their code commits, which they used as the dataset in the model. It gave them a quality code base for a new model, resulting in automation they could do in their style.

In a pilot, 80 VMware software engineers used SafeCoder, and more than 90 percent want to continue using it, Wolf said. Taking an open source route means more control over the direction of software development as they are not beholden to proprietary technology.

The post VMware’s Dev-Centered Approach to Pre-Trained Models and Generative AI appeared first on The New Stack.

Is It too Early to Leverage AI for WebAssembly?

B. Cameron Gain — Wed, 06 Sep 2023 17:28:24 +0000

AI and its application to IT, software development, and operations are just beginning to take hold, portending profound implications and disruptions for how humans’ roles will evolve, especially in the near and long term.

On a smaller scale, WebAssembly represents a technology that is generating significant hype while demonstrating its viability. However, a successful business model adoption has yet to be realized, mainly due to a lack of standardization for the final endpoint. Meanwhile, at least one vendor, Fermyon, believes that applying AI to WebAssembly is not premature at this stage.

So, how can AI potentially help Wasm’s development and adoption and is that too early to determine? As Angel M De Miguel Meana, a staff engineer at VMware’s Office of the CTO, noted how during the last year, since the introduction of ChatGPT brought AI to the forefront of software development, the AI ecosystem has evolved drastically. Meanwhile, “WebAssembly provides a solid base to run inference not only on the server, but in many different environments like browsers and IoT devices,” De Miguel Meana said. “By moving these workloads to end-user devices, it removes the latency and avoids sending data to a centralized server, while being able to work on the type of heterogeneous devices often found at the edge… Since the Wasm ecosystem is still emerging, integrating AI in early stages will help to push new and existing AI related standards. It is a symbiotic relationship.”

Perfect Pairing

“We started Fermyon with the goal of building a next-wave serverless platform. AI is very clearly part of this next wave. In our industry, we frequently see revolutionary technologies grow up together: Java and the web, cloud and microservices, Docker and Kubernetes,” Matt Butcher, co-founder and CEO of Fermyon Technologies, told The New Stack. “WebAssembly and AI are such a perfect pairing. I see them growing up (and growing old) together.”

“Baking” AI models, such as LLMs [large language models] or transformers, into the WebAssembly runtime, is the logical next step to accelerate the adoption of WebAssembly, Torsten Volk, an analyst for Enterprise Management Associates (EMA), told The New Stack. Similar to calling, e.g. a database service via API, compiled WebAssembly apps (binaries) could then send their API request to the WebAssembly runtime that in turn would relay this call to the AI model and pipe the model-response back to the originator, Volk said.

“These API requests will become very powerful once we have a common component model (CCM) that provides developers with one standardized API that they can use to access databases, AI models, GPUs, messaging, authentication, etc. The CCM would then let developers write the same code to talk to an AI model (e.g. GPT or Llama) on any kind of server in the data center, cloud or even at edge locations, as long as this server has sufficient hardware resources available,” Volk said. “This all boils down to the key question of when industry players will agree on a CCM. In the meantime, WebAssembly clouds such as Fermyon can leverage WebAssembly to make AI models portable and scalable within their own cloud infrastructure where they do not need a CCM and pass on some of the savings to the customer.”

Solving the Problem

Meanwhile, Fermyon believes that applying AI to WebAssembly is not premature at this stage. As Butcher noted, developers tasked with building and running enterprise AI apps on LLMs like LLaMA2 face a 100x compute expense for access to GPUs at $32/instance-hour and upwards. Alternatively, they can use on-demand services but then experience abysmal startup times. This makes it impractical to deliver enterprise-based AI apps affordably.

Fermyon Serverless AI has solved this problem by offering sub-second cold start times over 100x faster than other on-demand AI infrastructure services, Butcher said. This “breakthrough” is made possible because of serverless WebAssembly technology powering Fermyon Cloud, which is architected for sub-millisecond cold starts and high-volume time-slicing of compute instances which has proven to alter compute densities by a factor of 30x, he said. Extending this runtime profile to GPUs makes Fermyon Cloud the fastest AI inferencing infrastructure service, Butcher said.

Such an inference service is “very interesting” as the typical WebAssembly app consists of only a few megabytes, while AI models are a lot larger than that, Volk said. This means they would not be able to start up quite as fast as traditional WebAssembly apps. ”I assume that Fermyon has figured out how to use time slicing for providing GPU access to WebAssembly apps so that all of these apps can get the GPU resources they need by reserving a few of these time slices via their WebAssembly runtime,” Volk said. “This would mean that a very large number of apps could share a small number of expensive GPUs to serve their users on-demand. This is a little bit like a time-share, but without being forced to come to the lunchtime presentation.”

Getting started using Spin.

So, how would the user interact with Serverless AI? With Fermyon’s Serverless AI, there are no REST APIs or external services — it’s just built locally to Fermyon’s Spin and also in Fermyon Cloud, Butcher explained. “Anywhere in your code, you can simply pass a prompt into Serverless AI and get back a response. In this first beta, we’re including LLaMa2’s chat model and the recently announced Code Llama code-generating model,” Butcher said. “So, whether you’re summarizing text, implementing your own chatbot, or writing a backend code generator, Serverless AI has you covered. Our goal is to make AI so easy that developers can right away begin leveraging it to build a new and jaw-dropping class of serverless apps.”

Big Implications

Using WebAssembly to run workloads, it is possible to use Fermyon Serverless AI to assign a “fraction of a GPU” to a user application “just in time” to execute an AI operation, Fermyon CTO and co-founder Radu Matei wrote in a blog post. “When the operation is complete, we assign that fraction of the GPU to another application from the queue,” Matei wrote. “And because the startup time in Fermyon Cloud is milliseconds, that’s how fast we can switch between user applications that are assigned to a GPU. If all GPU fractions are busy crunching data, we queue the incoming application until the next one is available.”

This has two big implications, Matei wrote. First, users don’t have to wait for a virtual machine or container to start and for a GPU to be attached to it. Also, “we can achieve significantly higher resource utilization and efficiency for our infrastructure,” Matei wrote.

Specific features Serverless AI offers that Fermyon communicated include:

This is a developer tool and hosted service for enterprises building serverless applications that include AI inferencing using open source LLMs.
Thanks to our core WebAssembly technology, our cold startup times are 100x faster than competing offerings, cutting down from minutes to under a second. This allows us to execute hundreds of applications in the same amount of time (and with the same hardware) that today’s services use to run one.
We provide a local development experience for building and running AI apps with Spin and then deploying them into Fermyon Cloud for high performance at a fraction of the cost of other solutions.
Fermyon Cloud uses AI-grade GPUs to process each request. Because of our fast startups and efficient time-sharing, we can share a single GPU across hundreds of apps.
We’re launching the free tier private beta.

Big Hopes

There's also a lot of ecosystem work that has to be done in the space-just to have inferences is not enough, @juntao said today during his talk with @Vmware's @_angelmm "Getting Started with AI and WebAssembly" at #WasmCon 2023. pic.twitter.com/00vlszH5qr

— BC Gain (@bcamerongain) September 7, 2023

However, there is certainly a way to go before Wasm and AI concurrently reach their potential. During WasmCon 2023, Michael Yuan CEO and co-founder of Second State, a runtime project for Wasm, and WasmEdge discussed some of the work in progress. He covered the topic with De Miguel Meana, during their talk “Getting Started with AI and WebAssembly” at WasmCon 2023.

“There’s a lot of ecosystem work that needs to be done in this space [of AI and Wasm]. For instance, having inferences alone is not sufficient,” Yuan said. “The million-dollar question right now is, when you have an image and a piece of text, how do you convert that into a series of numbers, and then after the inference, how do you convert those numbers back into a usable format?”

Preprocessing and post-processing are among Python’s greatest strengths today, thanks to the availability of numerous libraries for these tasks, Yuan said. Incorporating these preprocessing and post-processing functions into Rust functions would be beneficial, but it requires more effort from the community to support additional modules. “There is a lot of potential for growth in this ecosystem,” Yuan said.

The post Is It too Early to Leverage AI for WebAssembly? appeared first on The New Stack.

What Can You Expect from a Developer Conference These Days?

Alex Williams — Wed, 06 Sep 2023 14:16:39 +0000

What can you expect from a developer conference these days? Two topics in particular: the developer experience and AI.

Developers spend much of their time not coding, said Ivan Burazin, Chief Development Experience Officer at InfoBip, in a recent discussion on The New Stack Makers before the Shift Conference in Zadar, Croatia. Burazin started the conference and sold it to Infobip, a cloud communications company.

When thinking about the developer experience, Burazin cited how developers waste about 50 to 70% of their productive time not coding, Burazin said. Productive time means after vacation time, meetings, and other matters get subtracted.

But the time keeps getting lost when considering how that core time gets eaten away by non-coding work. A developer has to wait to spin up an environment. Tests take away from a developer’s core time, as do builds. Start to add up the hours, and the time starts to melt away. Setting up a developer time takes 2.7 hours a week. For tests, it’s over three hours a week. And for builds, it’s almost four hours a week.

The developer experience becomes a root matter, which divides into an internal and external realm. In an external capacity, the developer’s customer experience becomes what matters. Internally, it becomes a matter of velocity, meaning the amount of code a developer deploys.

“But at the same time, the experience developers has to be better or more enjoyable because in a sense, they will actually be able to produce more faster,” Burazin said.

This all comes back to the overall developer experience, something Buazin pays attention to with Shift, coming up Sept. 18-19.

At Shift, the conference has talks on six stages, Burazin said. One stage will focus on the developer experience from an internal and external perspective.

The developer experience topic is new, but even newer is AI, which will also be the focus at another stage at Shift.

But what should be covered in a discussion about AI if there are few real experts to move the conversation forward?

Burazin said it’s more about how people can use AI to build a product, service, or company. Every company will become an AI company in the future.

“How can you build something utilizing AI and that’s how we look at setting up themes on that stage,” Burazin said.

The post What Can You Expect from a Developer Conference These Days? appeared first on The New Stack.

D-Wave Suggests Quantum Annealing Could Help AI

Jelani Harper — Mon, 04 Sep 2023 10:00:24 +0000

The effect of quantum computing on Artificial Intelligence could be as understated as it is profound.

Some say quantum computing is necessary to achieve General Artificial Intelligence. Certain expressions of this paradigm, such as quantum annealing, are inherently probabilistic and optimal for machine learning. The most pervasive quantum annealing use cases center on optimization and constraints, problems that have traditionally involved non-statistical AI approaches like rules, symbols, and reasoning.

When one considers the fact that there are now cloud options for accessing this form of quantum computing (replete with resources for making it enterprise-applicable for any number of deployments) sans expensive hardware, one fact becomes unmistakably clear.

“With quantum computing, a lot of times we’re talking about what will it be able to do in the future,” observed Mark Johnson, D-Wave SVP of Quantum Technologies and Systems Products. “But no, you can do things with it today.”

Granted, not all those things involve data science intricacies. Supply chain management and logistics are just as easily handled by quantum annealing technologies. But, when these applications are considered in tandem with some of the more progressive approaches to AI-enabled by quantum annealing, their esteem to organizations across verticals becomes apparent.

Understanding Quantum Annealing

Quantum annealing involves the variety of quantum computing in which, when the quantum computer reaches its lowest energy state, it solves a specific problem — even NP-hard problems. Thus, whether users are trying to select features for a machine learning model or the optimum route to send a fleet of grocery store delivery drivers, quantum annealing approaches provide these solutions when the lowest energy state is achieved. “Annealing quantum computing is a heuristic probabilistic solver,” Johnson remarked. “So, you might end up with the very best answer possible or, if you don’t, you will end up with a very good answer.”

Quantum annealing’s merit lies in its ability to supply these answers at an enormous scale — such as that required for a defense agency’s need to analyze all possible threats and responses for a specific location at a given time. It excels in cases in which “you need to consider many, many possibilities and it’s hard to wade through them,” Johnson mentioned. Classical computational models consider each possibility one at a time for such a combinatorial optimization problem.

Quantum annealing considers those possibilities simultaneously.

Statistical AI

The data science implications for this computational approach are almost limitless. One developer resource D-Wave has made available via the cloud is a plug-in for the SDK for Ocean — a suite of open source Python tools — that integrates with scikit-learn to improve feature selection. It supports “recognizing in a large pattern of data, can I pick out features that correlate with certain things and being able to navigate that,” Johnson remarked. “I understand it ends up mapping into an optimization problem.” The statistical aspects of quantum annealing are suitable for other facets of advanced machine learning, too.

According to Johnson, because of its “probabilistic nature, one of the interesting things that quantum annealing does is not just picking the best answer or a good answer, but coming up with a distribution, a diversity of answers, and understanding the collection of answers and a little about how they relate to each other.” This quality of quantum annealing is useful for numerous dimensions of machine learning including backpropagation, which is used to adjust a neural network’s parameters while going from the output to the input. It can also reinforce what Johnson termed “Boltzmann sampling,” which involves randomly sampling combinatorial structures.

Cloud, Hybrid Framework

There are considerable advantages to making quantum annealing available through the cloud. The cloud architecture for accessing this form of computing is just as viable for accessing what Johnson called the “gate model” type of quantum computing, which is primed for factoring numbers and used in “RSA encryption schema,” Johnson confirmed. Organizations can avail themselves of quantum annealing in D-Wave’s cloud platform. Moreover, they can also utilize hybrid quantum and classical computing infrastructure as well, which is becoming ever more relevant in modern quantum computing conversations. “You would just basically be using both of them together for the part of the problem that’s most efficient,” Johnson explained.

In addition to the ready availability of each of these computational models, D-Wave’s cloud platform furnishes documentation for a range of example use cases for common business problems across industries. There’s also an “integrated developer environment you can pull up that already has in it Ocean, our open source suite of tools, which help the developer interface with the quantum computer,” Johnson added. Examples include the ability to write code in Python. When organizations find documentation in the cloud about a previous use case that’s similar to theirs, “You can pull up sample code that will… use the quantum computer to solve that problem in your integrated developer environment,” Johnson noted.

Quantum Supremacy

That sample code provides an excellent starting point for developers to build applications for applying quantum computing and hybrid quantum and classical computing methods to an array of business problems pertaining to financial services, manufacturing, life sciences, manufacturing, and more. It’s just one of the many benefits of quantum computing through the cloud. The appeal of quantum annealing, of course, lies in its ability to expedite the time required to solve combinatorial optimization problems.

As the ready examples of quantum solutions — the vast majority of which entail quantum annealing — across the aforesaid verticals indicate, such issues “are, the harder we look, ubiquitous throughout business,” Johnson indicated. The data science utility of quantum annealing for feature selection, Boltzmann sampling, and backpropagation is equally horizontal and may prove influential to the adoption rates of this computational approach.

The post D-Wave Suggests Quantum Annealing Could Help AI appeared first on The New Stack.