All the data you need.

Tag: Code

Evaluating Ray: Distributed Python for Massive Scalability
Dean Wampler provides a distilled overview of Ray, an open source system for scaling Python systems from single machines to large clusters. If you are interested in additional insights, register for the upcoming Ray Summit. Introduction This post is for people making technology decisions, by which I mean data science …
Data Drift Detection for Image Classifiers
This article covers how to detect data drift for models that ingest image data as their input in order to prevent their silent degradation in production. Run the example in a complementary Domino project. Introduction: preventing silent model degradation in production In the real word, data is recorded by different …
Techniques for Collecting, Prepping, and Plotting Data: Predicting Social Media-Influence in the NBA
This article provides insight on the mindset, approach, and tools to consider when solving a real-world ML problem. It covers questions to consider as well as collecting, prepping and plotting data. A complementary Domino project is available. Introduction Collecting and prepping data are core research tasks. While the most ideal …
Clustering in R
This article covers clustering including K-means and hierarchical clustering. A complementary Domino project is available. Introduction Clustering is a machine learning technique that enables researchers and data scientists to partition and segment data. Segmenting data into appropriate groups is a core task when conducting exploratory analysis. As Domino seeks to …
Time Series with R
This article delves into methods for analyzing multivariate and univariate time series data. A complementary Domino project is available. Introduction Conducting exploratory analysis and extracting meaningful insights from data are core components of research and data science work. Time series data is commonly encountered. We see it when working with …
Exploring US Real Estate Values with Python
This post covers data exploration using machine learning and interactive plotting. If interested in running the examples, there is a complementary Domino project available. Introduction Models are at the heart of data science. Data exploration is vital to model development and is particularly important at the start of any data …
Natural Language in Python using spaCy: An Introduction
This article provides a brief introduction to natural language using spaCy and related libraries in Python. The complementary Domino project is also available. Introduction This article and paired Domino project provide a brief introduction to working with natural language (sometimes called “text analytics”) in Python using spaCy and related libraries. …
HyperOpt: Bayesian Hyperparameter Optimization
This article covers how to perform hyperparameter optimization using a sequential model-based optimization (SMBO) technique implemented in the HyperOpt Python package. There is a complementary Domino project available. Introduction Feature engineering and hyperparameter optimization are two important model building steps. Over the years, I have debated with many colleagues as …
Deep Reinforcement Learning
This article provides an excerpt “Deep Reinforcement Learning” from the book, Deep Learning Illustrated by Krohn, Beyleveld, and Bassens. The article includes an overview of reinforcement learning theory with focus on the deep Q-learning. It also covers using Keras to construct a deep Q-learning network that learns within a simulated …
Towards Predictive Accuracy: Tuning Hyperparameters and Pipelines
This article provides an excerpt of “Tuning Hyperparameters and Pipelines” from the book, Machine Learning with Python for Everyone by Mark E. Fenner. The excerpt and complementary Domino project evaluates hyperparameters including GridSearch and RandomizedSearch as well as building an automated ML workflow. Introduction Data scientists, machine learning (ML) researchers, …
Deep Learning Illustrated: Building Natural Language Processing Models
Many thanks to Addison-Wesley Professional for providing the permissions to excerpt “Natural Language Processing” from the book, Deep Learning Illustrated by Krohn, Beyleveld, and Bassens. The excerpt covers how to create word vectors and utilize them as an input into a deep learning model. A complementary Domino project is available. …
Manual Feature Engineering
Many thanks to AWP Pearson for the permission to excerpt “Manual Feature Engineering: Manipulating Data for Fun and Profit” from the book, Machine Learning with Python for Everyone by Mark E. Fenner. There is also a complementary Domino project available. Introduction Many data scientists deliver value to their organizations by …
A Practitioner’s Guide to Deep Learning with Ludwig
Joshua Poduska provides a distilled overview of Ludwig including when to use Ludwig’s command-line syntax and when to use its Python API. Introduction New tools are constantly being added to the deep learning ecosystem. It can be fun and informative to look for trends in the type of tools being …
Themes and Conferences per Pacoid, Episode 11
Paco Nathan‘s latest article covers program synthesis, AutoPandas, model-driven data queries, and more. Introduction Welcome back to our monthly burst of themespotting and conference summaries. BTW, videos for Rev2 are up: https://rev.dominodatalab.com/rev-2019/ On deck this time ’round the Moon: program synthesis. In other words, using metadata about data science work …
Addressing Irreproducibility in the Wild
This Domino Data Science Field Note provides highlights and excerpted slides from Chloe Mawer’s “The Ingredients of a Reproducible Machine Learning Model” talk at a recent WiMLDS meetup. Mawer is a Principal Data Scientist at Lineage Logistics as well as an Adjunct Lecturer at Northwestern University. Special thanks to Mawer …
Can Data Science Help Us Make Sense of the Mueller Report?
This blog post provides insights on how to apply Natural Language Processing (NLP) techniques. A complementary Domino project is available. The Mueller Report The Mueller Report, officially known as the Report on the Investigation into Russian Interference in the 2016 Presidential Election, was recently released and gives the public more …
Code submission should be encouraged but not compulsory
ICML, ICLR, and NeurIPS are all considering or experimenting with code and data submission as a part of the reviewer or publication process with the hypothesis that it aids reproducibility of results. Reproducibility has been a rising concern with discussions in paper, workshop, and invited talk. The fundamental driver is …
Code submission should be encouraged but not compulsory
ICML, ICLR, and NeurIPS are all considering or experimenting with code and data submission as a part of the reviewer or publication process with the hypothesis that it aids reproducibility of results. Reproducibility has been a rising concern with discussions in paper, workshop, and invited talk. The fundamental driver is …