Select Page






Character.AI, a full-stack AI company, has unveiled a series of groundbreaking advancements in AI inference technology. These innovations are set to make large language models (LLMs) more efficient and cost-effective, according to a recent blog post by Character.AI.

Breakthroughs in Inference Technology

Character.AI, which aims to build toward Artificial General Intelligence (AGI), has focused on optimizing the inference process—the method through which LLMs generate responses. The company has developed new techniques around the Transformer architecture and “attention KV cache,” which enhances data storage and retrieval during text generation. These advancements have significantly improved inter-turn caching as well.

Character.AI claims to serve approximately 20,000 queries per second, which is about 20% of the request volume handled by Google Search, at a cost of less than one cent per hour of conversation. This efficiency is achieved through their proprietary innovations, making it much cheaper to scale LLMs globally.

Cost-Efficiency Achievements

Since its launch in 2022, Character.AI has managed to reduce its serving costs by at least 33 times. The company’s current cost to serve traffic is 13.5 times less than what it would be using the most efficient leading commercial APIs. This cost-efficiency is crucial for the scalability of consumer LLMs.

If an AI company were to serve 100 million daily active users, each using the service for an hour per day, the serving costs would amount to $365 million per year at the current rate of $0.01 per hour. In contrast, a competitor using leading commercial APIs would incur costs of at least $4.75 billion annually. These figures underscore the significant business advantages provided by Character.AI’s inference improvements.

Future Implications

The improvements in inference efficiency not only make it feasible to scale LLMs to a global audience but also pave the way for creating a profitable business-to-consumer (B2C) AI enterprise. Character.AI continues to iterate on these innovations, aiming to make their advanced technology accessible to consumers worldwide.

For more detailed information, you can read the full technical blog post here.

Image source: Shutterstock



Share it on social networks