All the data you need.

Tag: Code

What is Flutter: check out the advantages of Google’s framework
Can you imagine being able to develop natively compiled applications for multiple platforms such as mobile and desktop, as well as web applications using just a single codebase? This framework […] The post What is Flutter: check out the advantages of Google’s framework appeared first on Datafloq.
Visualizing GitHub repos
Most people are familiar with the file-and-folder view. Sort alphabetically, date, or file…Tags: Amelia Wattenberger, code, GitHub
Explaining black-box models using attribute importance, PDPs, and LIME
Using the Skater framework to illustrate methods that explain deep learning: how black-box models and different methods from the Skater framework can provide insights into the inner workings of a simple credit scoring neural network model. The post Explaining black-box models using attribute importance, PDPs, and LIME appeared first on …
Building a Named Entity Recognition model using a BiLSTM-CRF network
What is the Named Entity Recognition problem, and how can a BiLSTM-CRF model be fitted? Learn how by using a freely available annotated corpus and Keras. The model achieves relatively high accuracy and all data and code is freely available in the article. The post Building a Named Entity Recognition …
Accelerating model velocity through Snowflake Java UDF integration
Integrating Domino and Snowflake and using in-database machine learning / data processing techniques via user defined functions (UDF). The post Accelerating model velocity through Snowflake Java UDF integration appeared first on Data Science Blog by Domino.
Fitting Support Vector Machines via Quadratic Programming
A deep dive inside Support Vector Machines by deriving a Linear SVM classifier, explain its advantages and show the fitting process. The post Fitting Support Vector Machines via Quadratic Programming appeared first on Data Science Blog by Domino.
ML internals: Synthetic Minority Oversampling (SMOTE) Technique
In this article we discuss why fitting models on imbalanced datasets is problematic, and how class imbalance is typically addressed. We present the inner workings of the SMOTE algorithm and show a simple “from scratch” implementation of SMOTE. We use an artificially constructed imbalance dataset (based on Iris) to generate …
✚ Making a Quick, Custom Prevalence Map – The Process 139
This week I'm describing my process behind a quick map. You can download the code at the end of this issue.Tags: code, R
Credit Card Fraud Detection using XGBoost, SMOTE, and threshold moving
In this article, we’ll discuss the challenge organizations face around fraud detection, how machine learning can be used to identify and spot anomalies that the human eye might not catch. We’ll use a gradient boosting technique via XGBoost to create a model and I’ll walk you through steps you can …
On-Demand Spark clusters with GPU acceleration
Apache Spark has become the de-facto standard for processing large amounts of stationary and streaming data in a distributed fashion. The addition of the MLlib library, consisting of common learning algorithms and utilities, opened up Spark for a wide range of machine learning tasks and paved the way for running …
How to supercharge data exploration with Pandas Profiling
Producing insights from raw data is a time-consuming process. Predictive modeling efforts rely on dataset profiles, whether consisting of summary statistics or descriptive charts. Pandas Profiling, an open-source tool leveraging Pandas Dataframes, is a tool that can simplify and accelerate such tasks. This blog explores the challenges associated with doing …
Snowflake and Domino: Better Together
Introduction Arming data science teams with the access and capabilities needed to establish a two-way flow of information is one critical challenge many organizations face when it comes to unlocking value from their modeling efforts. Part of this challenge is that many organizations seek to align their data science workflows …
PyCaret 2.2: Efficient Pipelines for Model Development
Data science is an exciting field, but it can be intimidating to get started, especially for those new to coding. Even for experienced developers and data scientists, the process of developing a model could involve stringing together many steps from many packages, in ways that might not be as elegant …
Performing Non-Compartmental Analysis with Julia and Pumas AI
When analysing pharmacokinetic data to determine the degree of exposure of a drug and associated pharmacokinetic parameters (e.g., clearance, elimination half-life, maximum observed concentration (), time where the maximum concentration was observed (), Non-Compartmental Analysis (NCA) is usually the preferred approach [1]. At its core, NCA is based on applying …
Density-Based Clustering
Original content by Manojit Nandi – Updated by Josh Poduska Cluster Analysis is an important problem in data analysis. Data scientists use clustering to identify malfunctioning servers, group genes with similar expression patterns, and perform various other applications. There are many families of data clustering algorithms, and you may be …
Bringing ML to Agriculture: Transforming a Millennia-old Industry
Guest post by Jeff Melching, Distinguished Engineer / Chief Architect Data & Analytics At The Climate Corporation, we aim to help farmers better understand their operations and make better decisions to increase their crop yields in a sustainable way. We’ve developed a model-driven software platform, called Climate FieldView™, that captures, …
The curse of Dimensionality
Guest Post by Bill Shannon, Founder and Managing Partner of BioRankings Danger of Big Data Big data is the rage. This could be lots of rows (samples) and few columns (variables) like credit card transaction data, or lots of columns (variables) and few rows (samples) like genomic sequencing in life …
Providing fine-grained, trusted access to enterprise datasets with Okera and Domino
Domino and Okera – Provide data scientists access to trusted datasets within reproducible and instantly provisioned computational environments. In the last few years, we’ve seen the acceleration of two trends — the increasing amounts of data stored and utilized by organizations, and the subsequent need for data scientists to help …