All the data you need.

Tag: Python

Data Science Hobby Project: Web Scraping Book Ratings With BeautifulSoup
Tamas Ujhelyi was one of the first participants in my 6-week data science course (the Junior Data Scientist’s First Month). After finishing the course, he started a cool... The post Data Science Hobby Project: Web Scraping Book Ratings With BeautifulSoup appeared first on Data36.
Information Extraction from Text Using Python
We will implement information extraction from scratch in Python using the popular spaCy library.
How to supercharge data exploration with Pandas Profiling
Producing insights from raw data is a time-consuming process. Predictive modeling efforts rely on dataset profiles, whether consisting of summary statistics or descriptive charts. Pandas Profiling, an open-source tool leveraging Pandas Dataframes, is a tool that can simplify and accelerate such tasks. This blog explores the challenges associated with doing …
How to get started in AI development
Artificial intelligence is becoming crucial for businesses that want to compete and survive. In fact, if you’re not using AI in 2021, there’s a good chance you lose out to your competitors. Of course, that makes AI development a highly sought after career, as more companies need to hire skilled …
Crawler with Selenium
Introduction The crawler is comprised of several different components to make the unstructured data accessible for cleaning. As the data we are looking to scrap here is financial in nature, we take on several webpages to comprise and give the data structure. First, the structure is given with two feats, …
PyCaret 2.2: Efficient Pipelines for Model Development
Data science is an exciting field, but it can be intimidating to get started, especially for those new to coding. Even for experienced developers and data scientists, the process of developing a model could involve stringing together many steps from many packages, in ways that might not be as elegant …
Analysis and Predictions of Zillow Rental Index
Link to the Code Objective Two main objectives of this project are to determine the factors that influence the Zillow Rental Index (ZRI) and to utilize them to produce annual forecasts of the ZRI at the zip code level. The Zillow Observed Rent Index (ZORI) was used as a benchmark …
Natively Query Your Delta Lake With Scala, Java, and Python
Today, we’re happy to announce that you can natively query your Delta Lake with Scala and Java (via the Delta Standalone Reader) and Python (via the Delta Rust API). Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, …
A Machine Learning Approach to Predicting Loan Defaults
Lending Club was a peer-to-peer lending platform that connected investors to borrowers. They were an alternative to the traditional bank lending system. One of the appeals of Lending Club is that the loans they facilitated were of lower interest rates compared to banks because Lending Club only took a flat …
Python Autocomplete Improvements for Databricks Notebooks
At Databricks, we strive to provide a world-class development experience for data scientists and engineers, and new features are constantly getting added to our notebooks to improve our users’ productivity. We are especially excited about the latest of these features, a new autocomplete experience for Python notebooks (powered by the …
Exploring Avocado Data
Purpose and Goal: Inspired by the popularities of avocado toasts among millennials, and finding skyrocketed prices on avocadoes at produce sections recently, I wanted to find out which cities in the U.S. provide the most reasonable prices for avocados and understand the market and trends better to hopefully benefit suffering …
How Retina Uses Databricks Container Services to Improve Efficiency and Reduce Costs
This is a guest community post authored by Brad Ito, CTO Retina.ai, with contributions by Databricks Customer Success Engineer Vini Jaiswal Retina is the customer intelligence partner that empowers businesses to maximize customer-level profitability. We help our clients boost revenue with the most accurate lifetime value metrics. Our forward-looking, proprietary …
How to build a data extraction pipeline with Apache Airflow
Data extraction pipelines might be hard to build and manage, so it's a good idea to use a tool that can help you with these tasks. Apache Airflow (https://airflow.apache.org/) is a popular...
Ames, Iowa Real Estate Analysis
Ames has an interesting housing market because there has been some expansion going on in recent years. Newer neighborhoods tend to be on the outskirts of the city, but there have also been a good number of renovations as well. As you can see above, all the neighborhoods surround Iowa …
Effect of Home Renovation on Price in Ames, Iowa
Link to the GitHub repo Background Ames, Iowa, home to Iowa State University ("the Cyclones") boasts an annually growing population over 67,000. The city ranks fairly high in CNNMoney's Best Places to Live consistently. Those seeking a home in their price range in that city can look through the distribution …
Density-Based Clustering
Original content by Manojit Nandi – Updated by Josh Poduska Cluster Analysis is an important problem in data analysis. Data scientists use clustering to identify malfunctioning servers, group genes with similar expression patterns, and perform various other applications. There are many families of data clustering algorithms, and you may be …
AutoScraper and Flask: Create an API From Any Website in Less Than 5 Minutes And with Fewer Than 20 Lines of Python
In this tutorial, we are going to create our own e-commerce search API with support for both eBay and Etsy without using any external APIs. With the power of AutoScraper...
How to Upload your Dataset to a Server (Using the Command Line or Jupyter)
In this article, I’ll answer a question that frequently comes up in my online courses, which is: How do you upload a dataset (e.g. csv, txt or tsv... The post How to Upload your Dataset to a Server (Using the Command Line or Jupyter) appeared first on Data36.