All the data you need.

Tag: Python

A Comparative Analysis on Lab vs. Mined Diamonds
Introduction If I were to hand you two diamonds-- one created in a lab, one mined from the earth-- would you be able to distinguish one from the other? Would the type of diamond have any impact on your engagement? Lab diamonds are man-made, created by mimicking the extreme heat …
Prescription drug discount coupons: an analysis for small local pharmacies
Prescription drug costs in the United States have gone from $121 billion in 2000 to almost $360 billion in 2020. An average American spends $1,200 per person per year on prescription drugs, the highest in the world. The high cost is not the result of a more than average consumption …
Introduction to Data Science with Python
Learn data science with Python. Explore the Pandas and Matplotlib libraries for analysing and plotting data.
Enterprise-class NLP with spaCy v3
Introducing the latest features included in spaCy 3.0 including transformer pipelines that bring it's NLP capabilites up to state of the art standard
Circular Statistics in Python: An Intuitive Intro
In this contributed article, Amit Babayoff, a data scientist at Deeyook, discusses the principles of circular statistics, by looking at some its basic principles and tools and why conventional linear methods don’t work well on circular data. She also explores how a simple filtering for handling noise can be constructed …
Building a Plotly Dash on the Ames Housing Dataset
The Ames Housing dataset, basis of an ongoing Kaggle competition and assigned to bootcamp students globally, is a modern classic. It presents 81 features of houses -- mostly single family suburban dwellings -- that were sold in Ames, Iowa in the period 2006-2010, which encompasses the housing crisis. The goal …
Python triple quote strings and regular expressions
There are several ways to quote strings in Python. Triple quotes let strings span multiple lines. Line breaks in your source file become line break characters in your string. A triple-quoted string in Python acts something like “here doc” in other languages. However, Python’s indentation rules complicate matters because the …
Data Science Hobby Project: Web Scraping Book Ratings With BeautifulSoup
Tamas Ujhelyi was one of the first participants in my 6-week data science course (the Junior Data Scientist’s First Month). After finishing the course, he started a cool... The post Data Science Hobby Project: Web Scraping Book Ratings With BeautifulSoup appeared first on Data36.
Information Extraction from Text Using Python
We will implement information extraction from scratch in Python using the popular spaCy library.
How to supercharge data exploration with Pandas Profiling
Producing insights from raw data is a time-consuming process. Predictive modeling efforts rely on dataset profiles, whether consisting of summary statistics or descriptive charts. Pandas Profiling, an open-source tool leveraging Pandas Dataframes, is a tool that can simplify and accelerate such tasks. This blog explores the challenges associated with doing …
How to get started in AI development
Artificial intelligence is becoming crucial for businesses that want to compete and survive. In fact, if you’re not using AI in 2021, there’s a good chance you lose out to your competitors. Of course, that makes AI development a highly sought after career, as more companies need to hire skilled …
Crawler with Selenium
Introduction The crawler is comprised of several different components to make the unstructured data accessible for cleaning. As the data we are looking to scrap here is financial in nature, we take on several webpages to comprise and give the data structure. First, the structure is given with two feats, …
PyCaret 2.2: Efficient Pipelines for Model Development
Data science is an exciting field, but it can be intimidating to get started, especially for those new to coding. Even for experienced developers and data scientists, the process of developing a model could involve stringing together many steps from many packages, in ways that might not be as elegant …
Analysis and Predictions of Zillow Rental Index
Link to the Code Objective Two main objectives of this project are to determine the factors that influence the Zillow Rental Index (ZRI) and to utilize them to produce annual forecasts of the ZRI at the zip code level. The Zillow Observed Rent Index (ZORI) was used as a benchmark …
Natively Query Your Delta Lake With Scala, Java, and Python
Today, we’re happy to announce that you can natively query your Delta Lake with Scala and Java (via the Delta Standalone Reader) and Python (via the Delta Rust API). Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, …
A Machine Learning Approach to Predicting Loan Defaults
Lending Club was a peer-to-peer lending platform that connected investors to borrowers. They were an alternative to the traditional bank lending system. One of the appeals of Lending Club is that the loans they facilitated were of lower interest rates compared to banks because Lending Club only took a flat …
Python Autocomplete Improvements for Databricks Notebooks
At Databricks, we strive to provide a world-class development experience for data scientists and engineers, and new features are constantly getting added to our notebooks to improve our users’ productivity. We are especially excited about the latest of these features, a new autocomplete experience for Python notebooks (powered by the …
Exploring Avocado Data
Purpose and Goal: Inspired by the popularities of avocado toasts among millennials, and finding skyrocketed prices on avocadoes at produce sections recently, I wanted to find out which cities in the U.S. provide the most reasonable prices for avocados and understand the market and trends better to hopefully benefit suffering …