Deployment creation and update information#
Deployment creation and update information refers to the set of properties available for creating and updating Bento Deployments.
Configuration types#
Type |
Description |
---|---|
Basic |
Provides basic configurations of the Deployment, such as Cluster, Endpoint Access Type, and resources for API Server and Runner Pods. It is convenient for quickly spinning up a Deployment. |
Advanced |
Provides additional configurations of the Deployment, such as autoscaling behaviors, traffic control, environment variables, and update strategies. |
JSON |
Defines a JSON file to create or update the Deployment, which contains the same fields as those in Advanced. You can download the Deployment information
in JSON by clicking Download as JSON. To create a Deployment from your local machine using the JSON file, run |
Deployment properties#
Property |
Description |
---|---|
Cluster |
The cluster where your Bento Deployment is created. |
Kubernetes Namespace |
The Kubernetes namespace where your Deployment is created. |
Deployment Name |
The name of your Deployment. |
Description |
An introduction to your Deployment, providing additional information on it. |
Endpoint Access Type |
You can manage the endpoint access to your Deployment, choosing between different access levels depending on your needs. This flexibility allows you to control who can access your Deployment, enhancing both security and ease-of-use.
|
Bento Repository |
Bento repositories act as a centralized hub for managing packaged machine learning models, offering tools for versioning, sharing, retrieval, and deployment. Each Bento repository corresponds to a Bento set, which contains different versions of a specific machine learning model. All the Bento repositories are displayed on the Bentos page. |
Bento |
A deployable artifact containing all the application information, such as model files, code, and dependencies. After selecting a Bento repository, you need to specify a Bento version to deploy. |
Autoscaling |
The autoscaling feature dynamically adjusts the number of API Server and Runner Pods within the specified minimum and maximum limits. Min and Max values define the boundaries for scaling, allowing the autoscaler to reduce or increase the number of Pods as needed. This feature supports scaling to zero Pods. For Advanced configurations, you can define specific metric thresholds that the autoscaler will use to determine when to adjust the number of Pods. The available metrics for these purposes include:
By setting values for these fields, you are instructing the autoscaler to ensure that the average for each metric does not exceed the specified thresholds. For example, if you set the CPU value to 80, the autoscaler will target an average CPU utilization of 80%. Allowed scaling-up behaviors:
Allowed scaling-down behaviors:
|
Resources per replica |
You can separately allocate resources for API Servers and Runners using one of the following two strategies:
|
Traffic control* |
You can control the traffic of BentoML API Servers and Runners using the following two ways.
|
Environment variables* |
Environment variables allow you to configure your Bento applications based on the current environment, without the need to hard-code any specific values in your scripts or codebases. They are key-value pairs that can be injected into the Pod where your application runs. You can use them for various purposes like setting up connections to databases, defining paths to dependencies, or any other configuration that your application might need to run. |
Deployment strategy* |
The Deployment strategy determines how traffic is migrated from the old version to the new version of your Bento application.
|
BentoML Configuration* |
Add additional BentoML configurations to customize the behavior of your Deployment. For more information, see Configuration. |
Note
Properties marked with an asterisk (*) are only available for Advanced and JSON configurations.