Hugging Face offers serverless inferences by third parties, at no extra cost
AI developers can now directly access selected serverless inference providers on Hugging Face. This should not incur any additional costs for them.
![Artificial intelligence: algorithms are already making decisions all over Europe](https://heise.cloudimg.io/width/610/q85.png-lossy-85.webp-lossy-85.foil1/_www-heise-de_/imgs/18/4/7/8/9/9/3/9/shutterstock_1024337068-fa0048d39ae827ee.jpeg)
(Image: whiteMocca/Shutterstock.com)
- Sven Festag
The AI development platform Hugging Face has integrated access to serverless inference providers into its service. This integration is intended to enable developers to run their AI models on the infrastructure of different service providers without hardware management. Initially, Hugging Face is offering serverless inference from Sambanova, Replicate, Together AI and Fal. Access via the development platform should not be pricier than directly from the respective providers.
Serverless inference at cost price
Developers can generate tokens for the respective providers via the web interface. Requests via the interface then run via the Hugging Face infrastructure. The company charges the price it pays itself to the relevant service provider for API access. In the future, however, Hugging Face plans to conclude revenue-sharing agreements with the inference providers, according to the company. With the free plan, customers receive a limited number of requests. The Pro subscription for nine US dollars per month includes two dollars of credit that can be redeemed for all providers.
Alternatively, as before, developers can use existing API keys from inference service providers with the AI platform. In this case, billing takes place directly via the respective provider. Tokens and API keys can be used via the client SDKs in Python and JavaScript. Direct HTTP requests are also possible. These are used for OpenAI-compatible interfaces, for example. Hugging Face provides corresponding code examples on its blog.
Furthermore, dedicated hardware for executing AI models can also be rented via Hugging Face. With serverless inference, AI developers can run and scale their models without having to manage the hardware themselves. The providers adapt the computing power to the respective requirements. In addition to its day-to-day business, Hugging Face is currently working on Open-R1, an open-source copy of DeepSeek's R1 model.
(sfe)