Join us for an evening featuring tech-talks about Data Science and MLflow from Mesosphere, Databricks and RStudio respectively.

Agenda

  • 6:00 - 6:30 pm Mingling & Refreshments
  • 6:30 - 6:40 pm Opening remarks
  • 6:40 - 7:10 pm The Quest for Data Science Principles
  • 7:10 - 7:40 pm Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning & Integrating Custom Libraries
  • 7:40 - 8:10 pm Introduction to MLflow with R
  • 8:10 - 8:30 AMA w/ Matei Zaharia, Ben Hindman & Andy Konwinski

 

Talk 1: The Quest for Data Science Principles

With the rapid and recent rise of data science, more and more organizations are in the process of building some kind Data Science platform. Still, as this is a young and rapidly evolving field, (almost) everyone is new to this and hence is trying to find best practices on their own.

Over the last half-century we have developed and refined the discipline of software engineering in order to accelerate the development and deployment of applications. This has for example involved a general shift towards DevOps practices that align developer and business objectives and dramatically reduce time-to-delivery.

The time has come to apply the principles of DevOps to data science and leverage the lessons from software engineering (and its systematic and repeatable methodology) to the discipline of data science. This rapidly emerging field is sometimes referred to as DataOps, and encompasses development of AI models and the overall platform surrounding them.

In this talk, we explore how we can establish Data Science Principles such as:

  • Reproducible Model Training
  • Zero Downtime Deployments
  • Live Model Testing Monitoring

Bios:

Tobi Knaup, Co-Founder and CTO at Mesosphere Tobi was one of the first engineers and tech lead at Airbnb where he wrote large parts of the infrastructure, scaled the site to millions of users and built a world-class engineering team. He is currently CTO and co-founder of Mesosphere.

Jörg Schad, Technical Lead Community Projects at Mesosphere Jörg is the technical lead for community projects at Mesosphere in San Francisco. In his previous life, he implemented distributed and in-memory databases and conducted research in the Hadoop and Cloud area during his Ph.D.

Talk 2: Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating Custom Libraries

Because MLflow is an API-first platform, there are many patterns for using it in complex workflows and integrating it with existing tools. In this talk, we’ll demo a few best practices for using MLflow in a more complex workflow. These include:

  • Run multi-step workflows on MLflow, such as data preparation steps followed by training, and organizing your projects so you can automatically reuse past work.
  • Tune Hyperparameter on MLflow with open source hyperparameter tuning packages.
  • Save a model in MLflow (eg, from a new machine learning library) and deploying it to the existing deployment tools.

Bios:

Matei Zaharia is an assistant professor of computer science at Stanford University and Chief Technologist at Databricks. He started the Apache Spark project during his PhD at UC Berkeley in 2009 and also worked on other open source datacenter systems, co-starting the Apache Mesos project and contributing as a committer on Apache Hadoop.

Plus, Members of MLflow Team

 

Talk 3: R and MLflow Together

This talk will present R as a programming language suited for solving data analysis and modeling problems, MLflow as an open source project to help organizations manage their machine learning lifecycle and the intersection of both by adding support for R in MLflow. It will be highly interactive and touch on some of the technical implementation choices taken while making R available in MLflow. It will also demonstrate using MLflow tracking, projects, and models directly from R as well as reusing R models in MLflow to interoperate with other programming languages and technologies.

Bio: Javier Luraschi