All the data you need.

Tag: Model Development

Model Interpretability: The Conversation Continues
This Domino Data Science Field Note covers a proposed definition of interpretability and distilled overview of the PDR framework. Insights are drawn from Bin Yu, W. James Murdoch, Chandan Singh, Karl Kumber, and Reza Abbasi-Asi’s recent paper, “Definitions, methods, and applications in interpretable machine learning”. Introduction Model interpretability continues to …
Understanding Causal Inference
This article covers causal relationships and includes a chapter excerpt from the book Machine Learning in Production: Developing and Optimizing Data Science Workflows and Applications by Andrew Kelleher and Adam Kelleher. A complementary Domino project is available. Introduction As data science work is experimental and probabilistic in nature, data scientists …
Towards Predictive Accuracy: Tuning Hyperparameters and Pipelines
This article provides an excerpt of “Tuning Hyperparameters and Pipelines” from the book, Machine Learning with Python for Everyone by Mark E. Fenner. The excerpt and complementary Domino project evaluates hyperparameters including GridSearch and RandomizedSearch as well as building an automated ML workflow. Introduction Data scientists, machine learning (ML) researchers, …
Themes and Conferences per Pacoid, Episode 11
Paco Nathan‘s latest article covers program synthesis, AutoPandas, model-driven data queries, and more. Introduction Welcome back to our monthly burst of themespotting and conference summaries. BTW, videos for Rev2 are up: https://rev.dominodatalab.com/rev-2019/ On deck this time ’round the Moon: program synthesis. In other words, using metadata about data science work …
Product Management for AI
Pete Skomoroch presented “Product Management for AI” at Rev. This post provides a distilled summary, video, and full transcript. Session Summary Pete Skomoroch’s “Product Management for AI” session at Rev provided a “crash course” on what product managers and leaders need to know about shipping machine learning (ML) projects and …
Announcing Domino 3.4: Furthering Collaboration with Activity Feed
Our last release, Domino 3.3 saw the addition of two major capabilities: Datasets and Experiment Manager. “Datasets”, a high-performance, revisioned data store offers data scientists the flexibility they need to make use of large data resources when developing models. And “Experiment Manager” acts as a data scientist’s “modern lab notebook” …
Themes and Conferences per Pacoid, Episode 9
Paco Nathan’s latest article features several emerging threads adjacent to model interpretability. Introduction Welcome back to our monthly burst of themes and conferences. Several technology conferences all occurred within four fun-filled weeks: Strata SF, Google Next, CMU Summit on US-China Innovation, AI NY, and Strata UK, plus some other events. …
Addressing Irreproducibility in the Wild
This Domino Data Science Field Note provides highlights and excerpted slides from Chloe Mawer’s “The Ingredients of a Reproducible Machine Learning Model” talk at a recent WiMLDS meetup. Mawer is a Principal Data Scientist at Lineage Logistics as well as an Adjunct Lecturer at Northwestern University. Special thanks to Mawer …
Model Interpretability with TCAV (Testing with Concept Activation Vectors)
This Domino Data Science Field Note provides very distilled insights and excerpts from Been Kim’s recent MLConf 2018 talk and research about Testing with Concept Activation Vectors (TCAV), an interpretability method that allows researchers to understand and quantitatively measure the high-level concepts their neural network models are using for prediction, …
Data Science vs Engineering: Tension Points
This blog post provides highlights and a full written transcript from the panel, “Data Science Versus Engineering: Does It Really Have To Be This Way?” with Amy Heineike, Paco Nathan, and Pete Warden at Domino HQ. Topics discussed include the current state of collaboration around building and deploying models, tension …
Collaboration Between Data Science and Data Engineering: True or False?
This blog post includes candid insights about addressing tension points that arise when people collaborate on developing and deploying models. Domino’s Head of Content sat down with Don Miner and Marshall Presser to discuss the state of collaboration between data science and data engineering. The blog post provides distilled insights, …
Justified Algorithmic Forgiveness?
Last week, Paco Nathan referenced Julia Angwin’s recent Strata keynote that covered algorithmic bias. This Domino Data Science Field Note dives a bit deeper into some of the publicly available research regarding algorithmic accountability and forgiveness, specifically around a proprietary black box model used to predict the risk of recidivism, …
Trust in LIME: Yes, No, Maybe So?
TLDR: In this Domino Data Science Field Note, we briefly discuss an algorithm and framework for generating explanations, LIME (Local Interpretable Model-Agnostic Explanations), that may help data scientists, machine learning researchers, and engineers decide whether to trust the predictions of any classifier in any model, including seemingly “black box” models. …
Item Response Theory in R for Survey Analysis
In this guest blog post, Derrick Higgins, of American Family Insurance, covers item response theory (IRT) and how data scientists can apply it within a project. As a complement to the guest blog post, there is also a demo within Domino. Introduction I lead a data science team at American …
Themes and Conferences per Pacoid, Episode 1
Introduction: New Monthly Series! Welcome to a new monthly series! I’ll summarize highlights from recent industry conferences, new open source projects, interesting research, great examples, amazing people, etc. – all pointed at how to level up your organization’s data science practices. Key Theme: Machine Learning Models Themes. Amidst the flurry …
Make Machine Learning Interpretability More Rigorous
This Domino Data Science Field Note covers a proposed definition of machine learning interpretability, why interpretability matters, and the arguments for considering a rigorous evaluation of interpretability. Insights are drawn from Finale Doshi-Velez’s talk, “A Roadmap for the Rigorous Science of Interpretability” as well as the paper, “Towards a Rigorous …
Feature Engineering: A Framework and Techniques
This Domino Field Note provides highlights and excerpted slides from Amanda Casari’s “Feature Engineering for Machine Learning” talk at QCon Sao Paulo. Casari is the Principal Product Manager + Data Scientist at Concur Labs. Casari is also the co-author of the book, Feature Engineering for Machine Learning: Principles and Techniques …
Classify all the Things (with Multiple Labels)
Derrick Higgins of American Family Insurance presented a talk, “Classify all the Things (with multiple labels): The most common type of modeling task no one talks about” at Rev. Higgins covers multilabel classification, a few methods used for multiclass prediction, and existing toolkits. This blog post provides highlights, the video, …