Cloudflare Empowers Customers With AI Tools To Deploy And Run Models

Cloudflare, a leading cloud services provider, is venturing into the AI domain with the launch of a new suite of products and apps. The aim is to assist customers in building, deploying, and running AI models at the network edge, all while focusing on cost savings. The new offerings include Workers AI, Vectorize, and AI Gateway, each serving unique purposes in the AI management process.

Key Takeaway

Cloudflare’s new suite of AI tools aims to simplify the deployment and management of AI models. With Workers AI, Vectorize, and AI Gateway, customers can enjoy low-latency AI inference, efficient storage of vector embeddings, and enhanced visibility and cost management. Cloudflare’s focus on cost savings and simplified user experience sets it apart from other complex AI management solutions currently on the market.

Workers AI: Enabling Low-Latency AI Inference

Cloudflare’s Workers AI provides customers with access to nearby GPUs hosted by Cloudflare partners. This pay-as-you-go offering ensures that AI inference occurs on GPUs in close proximity to users, resulting in a low-latency, AI-powered end-user experience. Leveraging Microsoft-backed ONNX technology, Workers AI allows AI models to run in the most optimal location based on factors such as bandwidth, latency, connectivity, and processing constraints.

Users of Workers AI have access to a catalog of models, including large language models (LLMs), automatic speech recognition models, image classifiers, and sentiment analysis models. Notably, data used for inference is kept within the server region and is not utilized for the training of current or future AI models.

Vectorize: Efficient Storage of Vector Embeddings

Vectorize, on the other hand, caters to customers who require a database to store vector embeddings. Vector embeddings are mathematical representations of data used in machine learning algorithms. By leveraging Cloudflare’s global network, Vectorize enables queries of the database to occur closer to users, reducing latency and inference time. Customers can generate embeddings using models from Workers AI or use embeddings generated by third-party vendors.

AI Gateway: Enhanced Visibility and Cost Management

The third component of Cloudflare’s AI suite, AI Gateway, offers observability features that assist in tracking AI traffic. It provides metrics such as the number of model inference requests, duration of requests, number of users utilizing a model, and overall cost of running an AI app. Additionally, AI Gateway includes features such as caching and rate limiting to help reduce costs. Caching allows for the storage of responses from LLMs, minimizing the need for generating new responses. Rate limiting provides control over app scalability and protects against malicious actors and heavy traffic.

By addressing the complexities and cost challenges associated with AI implementation, Cloudflare aims to make AI more accessible to developers and companies. With its existing infrastructure and global network, Cloudflare is poised to deliver improved performance and cost efficiency to its customers.