Container technology has changed the way data science gets done. The original container use case for data science focused on what I call, “environment management”. Configuring software environments is a constant chore, especially in the open source software space, the space in which most data scientists work. It often requires …
Data science is an exciting field, but it can be intimidating to get started, especially for those new to coding. Even for experienced developers and data scientists, the process of developing a model could involve stringing together many steps from many packages, in ways that might not be as elegant …
Jan. 11, 2021, 11:45 a.m.
This Domino Data Science Field Note covers a proposed definition of interpretability and distilled overview of the PDR framework. Insights are drawn from Bin Yu, W. James Murdoch, Chandan Singh, Karl Kumber, and Reza Abbasi-Asi’s recent paper, “Definitions, methods, and applications in interpretable machine learning”. Introduction Model interpretability continues to …
Nov. 14, 2019, 10:10 a.m.
This article covers causal relationships and includes a chapter excerpt from the book Machine Learning in Production: Developing and Optimizing Data Science Workflows and Applications by Andrew Kelleher and Adam Kelleher. A complementary Domino project is available. Introduction As data science work is experimental and probabilistic in nature, data scientists …
This article provides an excerpt of “Tuning Hyperparameters and Pipelines” from the book, Machine Learning with Python for Everyone by Mark E. Fenner. The excerpt and complementary Domino project evaluates hyperparameters including GridSearch and RandomizedSearch as well as building an automated ML workflow. Introduction Data scientists, machine learning (ML) researchers, …
Paco Nathan‘s latest article covers program synthesis, AutoPandas, model-driven data queries, and more. Introduction Welcome back to our monthly burst of themespotting and conference summaries. BTW, videos for Rev2 are up: https://rev.dominodatalab.com/rev-2019/ On deck this time ’round the Moon: program synthesis. In other words, using metadata about data science work …
Pete Skomoroch presented “Product Management for AI” at Rev. This post provides a distilled summary, video, and full transcript. Session Summary Pete Skomoroch’s “Product Management for AI” session at Rev provided a “crash course” on what product managers and leaders need to know about shipping machine learning (ML) projects and …
Our last release, Domino 3.3 saw the addition of two major capabilities: Datasets and Experiment Manager. “Datasets”, a high-performance, revisioned data store offers data scientists the flexibility they need to make use of large data resources when developing models. And “Experiment Manager” acts as a data scientist’s “modern lab notebook” …
Paco Nathan’s latest article features several emerging threads adjacent to model interpretability. Introduction Welcome back to our monthly burst of themes and conferences. Several technology conferences all occurred within four fun-filled weeks: Strata SF, Google Next, CMU Summit on US-China Innovation, AI NY, and Strata UK, plus some other events. …
This Domino Data Science Field Note provides highlights and excerpted slides from Chloe Mawer’s “The Ingredients of a Reproducible Machine Learning Model” talk at a recent WiMLDS meetup. Mawer is a Principal Data Scientist at Lineage Logistics as well as an Adjunct Lecturer at Northwestern University. Special thanks to Mawer …
This Domino Data Science Field Note provides very distilled insights and excerpts from Been Kim’s recent MLConf 2018 talk and research about Testing with Concept Activation Vectors (TCAV), an interpretability method that allows researchers to understand and quantitatively measure the high-level concepts their neural network models are using for prediction, …
This blog post provides highlights and a full written transcript from the panel, “Data Science Versus Engineering: Does It Really Have To Be This Way?” with Amy Heineike, Paco Nathan, and Pete Warden at Domino HQ. Topics discussed include the current state of collaboration around building and deploying models, tension …
This blog post includes candid insights about addressing tension points that arise when people collaborate on developing and deploying models. Domino’s Head of Content sat down with Don Miner and Marshall Presser to discuss the state of collaboration between data science and data engineering. The blog post provides distilled insights, …
Last week, Paco Nathan referenced Julia Angwin’s recent Strata keynote that covered algorithmic bias. This Domino Data Science Field Note dives a bit deeper into some of the publicly available research regarding algorithmic accountability and forgiveness, specifically around a proprietary black box model used to predict the risk of recidivism, …
TLDR: In this Domino Data Science Field Note, we briefly discuss an algorithm and framework for generating explanations, LIME (Local Interpretable Model-Agnostic Explanations), that may help data scientists, machine learning researchers, and engineers decide whether to trust the predictions of any classifier in any model, including seemingly “black box” models. …
In this guest blog post, Derrick Higgins, of American Family Insurance, covers item response theory (IRT) and how data scientists can apply it within a project. As a complement to the guest blog post, there is also a demo within Domino. Introduction I lead a data science team at American …
Sept. 17, 2018, 2:25 p.m.
Introduction: New Monthly Series! Welcome to a new monthly series! I’ll summarize highlights from recent industry conferences, new open source projects, interesting research, great examples, amazing people, etc. – all pointed at how to level up your organization’s data science practices. Key Theme: Machine Learning Models Themes. Amidst the flurry …
This Domino Data Science Field Note covers a proposed definition of machine learning interpretability, why interpretability matters, and the arguments for considering a rigorous evaluation of interpretability. Insights are drawn from Finale Doshi-Velez’s talk, “A Roadmap for the Rigorous Science of Interpretability” as well as the paper, “Towards a Rigorous …