inference

Tensormesh raises $4.5M to squeeze more inference out of AI server loads

With the AI infrastructure push reaching staggering proportions, there’s more pressure than ever to squeeze as much inference as possible out of the GPUs they have. And for researchers with expertise in a particular technique, it’s a great time to raise funding. That’s part of the driving force behind Tensormesh, launching out of stealth this […]

Tensormesh raises $4.5M to squeeze more inference out of AI server loads Read More »

DeepSeek releases ‘sparse attention’ model that cuts API costs in half

Researchers at DeepSeek on Monday released a new experimental model called V3.2-exp, designed to have dramatically lower inference costs when used in long-context operations. DeepSeek announced the model with a post on Hugging Face, also posting a linked academic paper on GitHub. The most important feature of the new model is called DeepSeek Sparse Attention,

DeepSeek releases ‘sparse attention’ model that cuts API costs in half Read More »

Clarifai’s new reasoning engine makes AI models faster and less expensive

On Thursday, the AI platform Clarifai announced a new reasoning engine that it claims will make running AI models twice as fast and 40% less expensive. Designed to be adaptable to a variety of models and cloud hosts, the system employs a range of optimizations to get more inference power out of the same hardware.

Clarifai’s new reasoning engine makes AI models faster and less expensive Read More »

Hugging Face makes it easier for devs to run AI models on third-party clouds

AI dev platform Hugging Face has partnered with third-party cloud vendors including SambaNova to launch Inference Providers, a feature designed to make it easier for devs on Hugging Face to run AI models using the infrastructure of their choice. Other partners involved with the new effort include Fal, Replicate, and Together AI. Hugging Face says

Hugging Face makes it easier for devs to run AI models on third-party clouds Read More »