Performance Guide#
This guide is intended to aid advanced BentoML users with a better understanding of the costs and performance overhead of their model serving workload. This guide will also demonstrate BentoML’s architecture and provide insights into how users can fine-tune its performance.
Todo
Performance Guide Todo items:
basic load testing with locust
- load testing tips:
production mode vs development mode
enable/disable logging
always run locust client on a separate machine
- performance best practices:
bentoml serve
options: –api-worker, –backlog, –timeoutconfigure runner resources
configure adaptive batching (max_latency, max_batch_size)
embedded runner
existing benchmark results and comparisons
- advanced topics:
alternative load testing with grafana k6
setup tracing and dashboard
setup tracing for Yatai and distributed Runner
instrument tracing for user service and runner code
Help us improve the project!
Found an issue or a TODO item? You’re always welcome to make contributions to the project and its documentation. Check out the BentoML development guide and documentation guide to get started.