All the data you need.
AI is the Most Disruptive Marketing Trend Since the Printing Press
The market for big data and AI is surging. One recent study found that the global market for these technologies will be worth $229 billion within the next five years. There are many benefits to industries that implement AI; healthcare, finance, communications, retailers, and even art companies are making use …
Gradient Boosting with Scikit-Learn, XGBoost, LightGBM, and CatBoost
Gradient boosting is a powerful ensemble machine learning algorithm. It’s popular for structured predictive modeling problems, such as classification and regression on tabular data, and is often the main algorithm or one of the main algorithms used in winning solutions to machine learning competitions, like those on Kaggle. There are …
foreach 1.5.0 now available on CRAN
This post is to announce that version 1.5.0 of the foreach package is now on CRAN. Foreach is an idiom that allows for iterating over elements in a collection, without the use of an explicit loop counter. The foreach package is now more than 10 years old, and is used …
Data Science with Azure Databricks at Clifford Chance
Guest blog by Mirko Bernardoni (Fiume Ltd) and Lulu Wan (Clifford Chance) With headquarters in London, Clifford Chance is a member of the “Magic Circle” of law firms and is one of the ten largest law firms in the world measured both by number of lawyers and revenue. As a …
How Technology Can Help Improve Your Rental Property Business
There is no shortage of things to do when you are a landlord, ranging from staying up to date with leases and dealing with tenant repair requests. When you have more than one property, this is magnified and it can lead to important tasks being forgotten. Luckily, there are a …
How Computer Graphics and Big Data Gave Birth to Today’s Artificial Intelligence (AI)
The explosion of breakthroughs, investments, and entrepreneurial activity around artificial intelligence over the last decade has been driven exclusively by deep learning, a sophisticated statistical analysis technique for finding hidden patterns in large quantities of data. A term coined in … Continue reading →
The Unpredictable Curve of COVID-19
Editor’s note: Amanda Makulec is joining as an advisor to the Coronavirus Data Resource Hub. As both a Masters of Public Health and the Operations Director for the Data Visualization Society, she’s an expert in the responsible use of data visualization for public health. She will be helping the Tableau …
Video Highlights: Python Machine Learning Tutorials
With COVID-19 keeping everyone indoors, this is the perfect opportunity to brush up your data science skills. Data science is a field that is booming and is playing a huge role in society. Instead of just reading a book, in this regular feature column, I will provide some great video …
10 Minutes from pandas to Koalas on Apache Spark
This is a guest community post from Haejoon Lee, a software engineer at Mobigen in South Korea and a Koalas contributor. pandas is a great tool to analyze small datasets on a single machine. When the need for bigger datasets arises, users often choose PySpark. However, the converting code from …
Identifying Leakage in Computer Vision on Medical Images
Computer vision has been suggested to help with the battle against COVID-19. In this article, we want to share our preliminary research into two different datasets. Confirming existing results, we found defects with the existing approaches, due to leakage when building COVID image datasets from heterogeneous sources. We explain here …
Evolution of Census questions
On the surface, the decennial census seems straightforward. Count everyone in the country…Tags: Alec Barrett, census, questions, The Pudding
3 Ways to Scrape Data from a Table
There is a lot of data presented in a table format inside the web pages. However, it could be quite difficult when you try to store the data into local computers for later access. The problem would be…
Reasoning under uncertainty
Reasoning under uncertainty sounds intriguing. Brings up images of logic, philosophy, and artificial intelligence. Statistics sounds boring. Brings up images of tedious, opaque calculations followed by looking some number in a table. But statistics is all about reasoning under uncertainty. Many people get through required courses in statistics without ever …
Natality & Mortality Rates
The Center for Disease Control (CDC) annually publishes material related to the birth rates and death rates of infants born in the United States. They gather a tremendous amount of data relevant to the child, including categories such as the education of the parents, age, health status, and tobacco use. …
Flattening the curve: How well is your county doing coronavirus social distancing?
The coronavirus is spreading quickly through the United States, and many elected officials across the country are reacting with policies designed to slow the transmission of the pandemic, otherwise known as “flattening the curve.” One of the key methods they’re encouraging is social distancing—the practice of people limiting their movements …
Named Entity Recognition: Concept, Guide and Tools
What Is Named Entity Recognition? Named entity recognition (NER) ‒ also called entity identification or... The post Named Entity Recognition: Concept, Guide and Tools appeared first on MonkeyLearn Blog.
Analytics Extensions connections, Connection Dialog v2 for Connector SDK, and Hyper API monthly updates
As part of the Tableau Developer Program, we host monthly Sprint Demos, where members of the engineering team demo what they have been working on for our developer community. It is a chance for you to meet the team, be informed and inspired by upcoming features, ask questions, and give …
Simulating an epidemic
3Blue1Brown goes into more of the math of SIR models — which drive…Tags: 3Blue1Brown, coronavirus, epidemic, simulation