All the data you need.

Tag: Model Management

Modeling 101: How It Works and Why It’s Important
Models are the central output of data science, and they have tremendous power to transform companies, industries, and society. At the center of every machine learning or artificial intelligence application is the ML/AI model that is built with data, algorithms and code. Even though models look like software and involve …
8 Modeling Tools to Build Complex Algorithms
For a model-driven enterprise, having access to the appropriate tools can mean the difference between operating at a loss with a string of late projects lingering ahead of you or exceeding productivity and profitability forecasts. This is no exaggeration by any means. With the right tools, your data science teams …
What Is Model Risk Management and How is it Supported by Enterprise MLOps?
Model Risk Management is about reducing bad consequences of decisions caused by trusting incorrect or misused model outputs. An enterprise starts by using a framework to formalize its processes and procedures, which gets increasingly difficult as data science programs grow. Systematically enabling model development and production deployment at scale entails …
The Role of Containers on MLOps and Model Production
Container technology has changed the way data science gets done. The original container use case for data science focused on what I call, “environment management”. Configuring software environments is a constant chore, especially in the open source software space, the space in which most data scientists work. It often requires …
PyCaret 2.2: Efficient Pipelines for Model Development
Data science is an exciting field, but it can be intimidating to get started, especially for those new to coding. Even for experienced developers and data scientists, the process of developing a model could involve stringing together many steps from many packages, in ways that might not be as elegant …
MLflow Model Registry on Databricks Simplifies MLOps With CI/CD Features
MLflow helps organizations manage the ML lifecycle through the ability to track experiment metrics, parameters, and artifacts, as well as deploy models to batch or real-time serving systems. The MLflow Model Registry provides a central repository to manage the model deployment lifecycle, acting as the hub between experimentation and deployment. …
Domino Paves the Way for the Future of Enterprise Data Science with Latest Release
Today, we announced the latest release of Domino’s data science platform which represents a big step forward for enterprise data science teams. We’re introducing groundbreaking new features – including On-demand Spark clusters, enhanced project management, and the ability to export models – that give enterprises unprecedented power to scale their …
Manage and Scale Machine Learning Models for IoT Devices
A common data science internet of things (IoT) use case involves training machine learning models on real-time data coming from an army of IoT sensors. Some use cases demand that each connected device has its own individual model since many basic machine learning algorithms often outperform a single complex model. …
Databricks Extends MLflow Model Registry with Enterprise Features
We are excited to announce new enterprise grade features for the MLflow Model Registry on Databricks. The Model Registry is now enabled by default for all customers using Databricks’ Unified Analytics Platform. In this blog, we want to highlight the benefits of the Model Registry as a centralized hub for …
Evaluating Generative Adversarial Networks (GANs)
This article provides concise insights into GANs to help data scientists and researchers assess whether to investigate GANs further. If you are interested in a tutorial as well as hands-on code examples within a Domino project, then consider attending the upcoming webinar, “Generative Adversarial Networks: A Distilled Tutorial”. Introduction With …
Data Drift Detection for Image Classifiers
This article covers how to detect data drift for models that ingest image data as their input in order to prevent their silent degradation in production. Run the example in a complementary Domino project. Introduction: preventing silent model degradation in production In the real word, data is recorded by different …
Model Interpretability: The Conversation Continues
This Domino Data Science Field Note covers a proposed definition of interpretability and distilled overview of the PDR framework. Insights are drawn from Bin Yu, W. James Murdoch, Chandan Singh, Karl Kumber, and Reza Abbasi-Asi’s recent paper, “Definitions, methods, and applications in interpretable machine learning”. Introduction Model interpretability continues to …
On Being Model-driven: Metrics and Monitoring
This article covers a couple of key Machine Learning (ML) vital signs to consider when tracking ML models in production to ensure model reliability, consistency and performance in the future. Many thanks to Don Miner for collaborating with Domino on this article. For additional vital signs and insight beyond what …
Understanding Causal Inference
This article covers causal relationships and includes a chapter excerpt from the book Machine Learning in Production: Developing and Optimizing Data Science Workflows and Applications by Andrew Kelleher and Adam Kelleher. A complementary Domino project is available. Introduction As data science work is experimental and probabilistic in nature, data scientists …
Towards Predictive Accuracy: Tuning Hyperparameters and Pipelines
This article provides an excerpt of “Tuning Hyperparameters and Pipelines” from the book, Machine Learning with Python for Everyone by Mark E. Fenner. The excerpt and complementary Domino project evaluates hyperparameters including GridSearch and RandomizedSearch as well as building an automated ML workflow. Introduction Data scientists, machine learning (ML) researchers, …
Data Ethics: Contesting Truth and Rearranging Power
This Domino Data Science Field Note covers Chris Wiggins‘s recent data ethics seminar at Berkeley. The article focuses on 1) proposed frameworks for defining and designing for ethics and for understanding the forces that encourage industry to operationalize ethics, as well as 2) proposed ethical principles for data scientists to …
Announcing the MLflow 1.1 Release
We’re excited to announce today the release of MLflow 1.1. In this release, we’ve focused on fleshing out the tracking component of MLflow and improving visualization components in the UI. Some of the major features include: Automatic logging from TensorFlow and Keras Parallel coordinate plots in the tracking UI Pandas …
Seeking Reproducibility within Social Science: Search and Discovery
Julia Lane, NYU Professor, Economist and cofounder of the Coleridge Initiative, presented “Where’s the Data: A New Approach to Social Science Search & Discovery” at Rev. Lane described the approach that the Coleridge Initiative is taking to address the science reproducibility challenge. The approach is to provide remote access for …