All Tools
R
OtherFreeOpen Source
RAY
Distributed AI compute engine for scaling ML workloads
Apache-2.0
ABOUT
Scaling machine learning workloads from a single machine to a cluster typically requires rewriting code with complex distributed systems primitives like MPI, Kubernetes, or Spark. Ray eliminates this barrier by providing a simple, universal API for distributed computing that works the same on a laptop and a 1000-node cluster. Its ecosystem of libraries (Ray Train, Ray Tune, Ray RLlib, Ray Serve) lets teams scale training, tuning, and serving without changing their PyTorch, TensorFlow, or XGBoost code, dramatically reducing the engineering effort needed for production ML infrastructure.
INSTALL
pip install rayINTEGRATION GUIDE
1. Scaling PyTorch or TensorFlow training across multiple GPUs or nodes with minimal code changes
2. Running large-scale hyperparameter searches with Ray Tune on a distributed cluster
3. Deploying and serving ML models in production with Ray Serve alongside existing applications
TAGS
pythondistributed-computingml-engineeringclusteropen-source