Lambda Unveils ‘Inference-as-a-Service’ API, Asserts Lowest Prices in AI Sector

Stay ahead of the curve with our daily and weekly newsletters. Get the latest updates and exclusive content on AI advancements straight to your inbox. Learn More

Meet Lambda Labs, a San Francisco-based company that has been at the forefront of the AI industry for the past 12 years. Known for providing on-demand graphics processing units (GPUs) to machine learning researchers and AI model builders, Lambda Labs is a trusted name in the field.

Today, they’re taking a giant leap forward with the introduction of the Lambda Inference API. This new service, touted as the most affordable of its kind, allows businesses to deploy AI models and applications without the hassle of procuring or maintaining compute.

This launch enhances Lambda’s existing services, which include providing GPU clusters for training and fine-tuning machine learning models.

“We offer a fully verticalized platform, which allows us to pass on significant cost savings to our users. Unlike other providers like OpenAI, we don’t have rate limits that hinder scaling, and you can get started without having to speak to a salesperson,” said Robert Brooks, Lambda’s Vice President of Revenue, during a video call interview with VentureBeat.

Brooks further explained that developers can easily get started by visiting Lambda’s new Inference API webpage, generating an API key, and getting up and running in less than 5 minutes.

With support for cutting-edge models like Meta’s Llama 3.1, Nous’s Hermes-3, and Alibaba’s Qwen 2.5, Lambda’s Inference API is a game-changer for the machine learning community. Here’s a glimpse of the full list of supported models:

deepseek-coder-v2-lite-instruct
dracarys2-72b-instruct
hermes3-405b
hermes3-405b-fp8-128k
hermes3-70b
hermes3-8b
lfm-40b
llama3.1-405b-instruct-fp8
llama3.1-70b-instruct-fp8
llama3.1-8b-instruct
llama3.2-3b-instruct
llama3.1-nemotron-70b-instruct

The pricing is as attractive as the service itself, starting at just $0.02 per million tokens for smaller models like Llama-3.2-3B-Instruct and going up to $0.90 per million tokens for larger, state-of-the-art models such as Llama 3.1-405B-Instruct.

As Stephen Balaban, Lambda co-founder and CEO, recently stated, “Stop wasting money and start using Lambda for LLM Inference.” He shared a graph showing Lambda’s cost-effectiveness in serving AI models through inference compared to other competitors in the market.

What sets Lambda apart is its pay-as-you-go model, which ensures customers only pay for the tokens they use, eliminating the need for subscriptions or rate-limited plans.

Closing the AI loop

For over a decade, Lambda has been a pillar of support for AI advancements with its GPU-based infrastructure.

From providing hardware solutions to training and fine-tuning capabilities, Lambda has earned a reputation as a reliable partner for enterprises, research institutions, and startups.

“Lambda has been deploying GPUs for over a decade to our user base. We have tens of thousands of Nvidia GPUs from various life cycles, allowing us to maximize the utility of these AI chips for the wider ML community at reduced costs,” Brooks explained. “With the launch of Lambda Inference, we’re closing the loop on the full-stack AI development lifecycle. The new API formalizes what many engineers had already been doing on Lambda’s platform—using it for inference—but now with a dedicated service that simplifies deployment.”

One of Lambda’s distinguishing features is its deep reservoir of GPU resources. Brooks noted, “Lambda has deployed tens of thousands of GPUs over the past decade, allowing us to offer cost-effective solutions and maximum utility for both older and newer AI chips.”

This GPU advantage enables the platform to support scaling to trillions of tokens monthly, providing flexibility for developers and enterprises alike.

Open and flexible

By offering unrestricted access to high-performance inference, Lambda is positioning itself as a flexible alternative to cloud giants.

“We aim to provide the machine learning community with unrestricted access to rate-limited inference APIs. You can plug and play, read the docs, and scale rapidly to trillions of tokens,” Brooks added.

The API supports a range of open-source and proprietary models, including popular instruction-tuned Llama models.

The company also plans to expand to multimodal applications, including video and image generation, in the near future.

“Initially, we’re focused on text-based LLMs, but soon we’ll expand to multimodal and video-text models,” Brooks said.

Serving devs and enterprises with privacy and security

The Lambda Inference API is designed to cater to a wide range of users, from startups to large enterprises in media, entertainment, and software development.

These industries are increasingly adopting AI to power applications like text summarization, code generation, and generative content creation.

“We don’t retain or share user data on our platform. We act as a conduit for serving data to end users, ensuring privacy,” Brooks emphasized, reinforcing Lambda’s commitment to security and user control.

As AI adoption continues to rise, Lambda’s new service is poised to attract attention from businesses seeking cost-effective solutions for deploying and maintaining AI models. By eliminating common barriers such as rate limits and high operating costs, Lambda hopes to empower more organizations to harness the potential of AI.

The Lambda Inference API is available now, with detailed pricing and documentation accessible through Lambda’s website.

Get daily insights on business use cases with VB Daily

Want to impress your boss? VB Daily has got you covered. We provide the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.