Unifying Large Scale Data Preprocessing and ML Pipelines with Ray Datasets | PyData Global 2021

Unifying Large Scale Data Preprocessing and Machine Learning Pipelines with Ray Datasets Speakers: Alex Wu, Clark Zinzow Summary ML tasks such as distributed training and batch inference stretch the abstractions of modern data processing systems, leading to performance or learning efficiency tradeoffs. In this talk we introduce Ray Dataset, a universal compatibility layer built on Arrow and Python that allows data processing to be combined with ML pipelines without such tradeoffs. Alex Wu’s Bio -- Clark Zinzow’s Bio -- PyData Global 2021 Website: LinkedIn: Twitter: PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyD

1 view