Anyscale Introduces Multi-Tenant Serve Applications with Containerized Runtime Environments

In a recent update, Anyscale, a leading AI application platform, announced the introduction of multi-tenant serve applications utilizing runtime environments as containers. This development aims to enhance resource management and operational efficiency, according to Anyscale.

Advancements in Multi-Application Support

In an enlightening conversation between Sam Chan, Technical Program Manager at Anyscale, and Cindy Zhang, the two discussed the advancements and challenges of multi-application serve clusters with different dependencies. Multi-application support allows different applications to run on the same cluster, each using the same runtime environments as containers. This approach helps manage resources more effectively and reduces operational complexity, enabling independent upgrades for different applications.

Zhang highlighted the previous limitations, where users had to bundle all model dependencies into one large Docker image, leading to bloated images and mixed dependencies. This was particularly challenging for customers with multiple research teams working on separate models. The new feature allows each team to deploy their code in its own container, offering cleaner isolation and easier maintenance.

The Role of Runtime Environments as Containers

The new feature, “runtime environments as containers,” permits specifying a different Docker image for each application. When Ray needs to start a replica for an app, it will initiate a container from that app’s image and run the worker process inside. This ensures clean isolation between applications and enhances the efficiency of resource sharing.

Zhang explained that this feature unlocks Ray’s multi-tenancy capabilities, allowing multiple applications to share resources more efficiently on the same cluster. For instance, eight applications can be squeezed onto a single large VM with eight GPUs, each Ray Serve application configured to use one GPU. This granular utilization of GPU capacity minimizes underutilized resources and simplifies operational management by maintaining a single Ray cluster.

Technical Implementation and Challenges

Under the hood, Ray is integrated with Podman to pull images and spin up containers. When a new Ray Worker needs to start, Ray calls out to Podman to orchestrate the pull of the relevant image and spin up the container. Ray then orchestrates the running of the Ray Worker code inside that container.

However, the feature is still experimental. Zhang cautioned that there might be startup delays the first time an image needs to be pulled, and the feature hasn’t been tested at a large scale. Additionally, other runtime environment fields, such as Python environments or working directories, are not currently supported with container runtime environments.

Future Plans

Looking ahead, Anyscale plans to refine the user experience around combining containers with other runtime environment fields, such as specific environment variables for each application. They are actively gathering user feedback to determine which fields to include and are planning more scalability testing.

For those interested in exploring this new feature, Anyscale provides a detailed guide to get started with multiple Ray Serve applications and runtime environments as containers.

Image source: Shutterstock

Share it on social networks