All the data you need.

Tag: Practical Techniques

Bringing ML to Agriculture: Transforming a Millennia-old Industry
Guest post by Jeff Melching, Distinguished Engineer / Chief Architect Data & Analytics At The Climate Corporation, we aim to help farmers better understand their operations and make better decisions to increase their crop yields in a sustainable way. We’ve developed a model-driven software platform, called Climate FieldView™, that captures, …
The curse of Dimensionality
Guest Post by Bill Shannon, Founder and Managing Partner of BioRankings Danger of Big Data Big data is the rage. This could be lots of rows (samples) and few columns (variables) like credit card transaction data, or lots of columns (variables) and few rows (samples) like genomic sequencing in life …
Providing fine-grained, trusted access to enterprise datasets with Okera and Domino
Domino and Okera – Provide data scientists access to trusted datasets within reproducible and instantly provisioned computational environments. In the last few years, we’ve seen the acceleration of two trends — the increasing amounts of data stored and utilized by organizations, and the subsequent need for data scientists to help …
Why models fail to deliver value and what you can do about it.
Building models requires a lot of time and effort. Data scientists can spend weeks just trying to find, capture and transform data into decent features for models, not to mention many cycles of training, tuning, and tweaking models so they’re performant. Yet despite all this hard work, few models ever …
The importance of structure, coding style, and refactoring in notebooks
Notebooks are increasingly crucial in the data scientist’s toolbox. Although considered relatively new, their history traces back to systems like Mathematica and MATLAB. This form of interactive workflow was introduced to assist data scientists in documenting their work, facilitating reproducibility, and prompting collaboration with their team members. Recently there has …
Data Drift Detection for Image Classifiers
This article covers how to detect data drift for models that ingest image data as their input in order to prevent their silent degradation in production. Run the example in a complementary Domino project. Introduction: preventing silent model degradation in production In the real word, data is recorded by different …
Model Interpretability: The Conversation Continues
This Domino Data Science Field Note covers a proposed definition of interpretability and distilled overview of the PDR framework. Insights are drawn from Bin Yu, W. James Murdoch, Chandan Singh, Karl Kumber, and Reza Abbasi-Asi’s recent paper, “Definitions, methods, and applications in interpretable machine learning”. Introduction Model interpretability continues to …
On Being Model-driven: Metrics and Monitoring
This article covers a couple of key Machine Learning (ML) vital signs to consider when tracking ML models in production to ensure model reliability, consistency and performance in the future. Many thanks to Don Miner for collaborating with Domino on this article. For additional vital signs and insight beyond what …
Clustering in R
This article covers clustering including K-means and hierarchical clustering. A complementary Domino project is available. Introduction Clustering is a machine learning technique that enables researchers and data scientists to partition and segment data. Segmenting data into appropriate groups is a core task when conducting exploratory analysis. As Domino seeks to …
Understanding Causal Inference
This article covers causal relationships and includes a chapter excerpt from the book Machine Learning in Production: Developing and Optimizing Data Science Workflows and Applications by Andrew Kelleher and Adam Kelleher. A complementary Domino project is available. Introduction As data science work is experimental and probabilistic in nature, data scientists …
Time Series with R
This article delves into methods for analyzing multivariate and univariate time series data. A complementary Domino project is available. Introduction Conducting exploratory analysis and extracting meaningful insights from data are core components of research and data science work. Time series data is commonly encountered. We see it when working with …
Exploring US Real Estate Values with Python
This post covers data exploration using machine learning and interactive plotting. If interested in running the examples, there is a complementary Domino project available. Introduction Models are at the heart of data science. Data exploration is vital to model development and is particularly important at the start of any data …
Natural Language in Python using spaCy: An Introduction
This article provides a brief introduction to natural language using spaCy and related libraries in Python. The complementary Domino project is also available. Introduction This article and paired Domino project provide a brief introduction to working with natural language (sometimes called “text analytics”) in Python using spaCy and related libraries. …
HyperOpt: Bayesian Hyperparameter Optimization
This article covers how to perform hyperparameter optimization using a sequential model-based optimization (SMBO) technique implemented in the HyperOpt Python package. There is a complementary Domino project available. Introduction Feature engineering and hyperparameter optimization are two important model building steps. Over the years, I have debated with many colleagues as …
Deep Reinforcement Learning
This article provides an excerpt “Deep Reinforcement Learning” from the book, Deep Learning Illustrated by Krohn, Beyleveld, and Bassens. The article includes an overview of reinforcement learning theory with focus on the deep Q-learning. It also covers using Keras to construct a deep Q-learning network that learns within a simulated …
Towards Predictive Accuracy: Tuning Hyperparameters and Pipelines
This article provides an excerpt of “Tuning Hyperparameters and Pipelines” from the book, Machine Learning with Python for Everyone by Mark E. Fenner. The excerpt and complementary Domino project evaluates hyperparameters including GridSearch and RandomizedSearch as well as building an automated ML workflow. Introduction Data scientists, machine learning (ML) researchers, …
Deep Learning Illustrated: Building Natural Language Processing Models
Many thanks to Addison-Wesley Professional for providing the permissions to excerpt “Natural Language Processing” from the book, Deep Learning Illustrated by Krohn, Beyleveld, and Bassens. The excerpt covers how to create word vectors and utilize them as an input into a deep learning model. A complementary Domino project is available. …
A Practitioner’s Guide to Deep Learning with Ludwig
Joshua Poduska provides a distilled overview of Ludwig including when to use Ludwig’s command-line syntax and when to use its Python API. Introduction New tools are constantly being added to the deep learning ecosystem. It can be fun and informative to look for trends in the type of tools being …