Intelligence, Inside and Outside.

PyCon 2019 | Machine Learning Model And Dataset Versioning Practices

PyCon 2019 | Machine learning model and dataset versioning practices

Speaker: Dmitry Petrov

 

Python is a prevalent programming language in machine learning (ML) community. A lot of Python engineers and data scientists feel the lack of engineering practices like versioning large datasets and ML models, and the lack of reproducibility. This lack is particularly acute for engineers who just moved to ML space.

We will discuss the current practices of organizing ML projects using traditional open-source toolset like Git and Git-LFS as well as this toolset limitation. Thereby motivation for developing new ML specific version control systems will be explained.

Data Version Control or [DVC.ORG][1] is an [open source][2], command-line tool written in Python. We will show how to version datasets with dozens of gigabytes of data and version ML models, how to use your favorite cloud storage (S3, GCS, or bare metal SSH server) as a data file backend and how to embrace the best engineering practices in your ML projects.

[1]: http://dvc.org

[2]: https://github.com/iterative/dvc

Slides can be found at: https://speakerdeck.com/pycon2019 and https://github.com/PyCon/2019-slides

Read More  Oracle Announces MySQL HeatWave ML—The Easiest, Fastest, And Least Expensive Way for Developers To Add Powerful Machine Learning Capabilities To Their MySQL Applications

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!
Share this article
Shareable URL
Prev Post

PyCon 2019 | Modern Solvers: Problems Well-Defined Are Problems Solved

Next Post

PyCon 2019 | Everything at Once: Python’s Many Concurrency Models

Read next