All the data you need.

Tag: Spark

How to Manage Python Dependencies in PySpark
Controlling the environment of an application is often challenging in a distributed computing environment – it is difficult to ensure all nodes have the desired environment to execute, it may be tricky to know where the user’s code is actually running, and so on. Apache Spark™ provides several standard ways …
Natively Query Your Delta Lake With Scala, Java, and Python
Today, we’re happy to announce that you can natively query your Delta Lake with Scala and Java (via the Delta Standalone Reader) and Python (via the Delta Rust API). Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, …
A Step-by-step Guide for Debugging Memory Leaks in Spark Applications
This is a guest authored post by Shivansh Srivastava, software engineer, Disney Streaming Services. Just a bit of context We at Disney Streaming Services use Apache Spark across the business and Spark Structured Streaming to develop our pipelines. These applications run on the Databricks Runtime(DBR) environment which is quite user-friendly. …
Databricks Is Named a Visionary in the 2020 Gartner Magic Quadrant for Cloud Database Management Systems (DBMS)
Last week, Gartner published the Magic Quadrant (MQ) for Cloud Database Management Systems, where Databricks was recognized as a Visionary in the market.1 This was the first time Databricks was included in a database-related Gartner Magic Quadrant. We believe this is due in large part to our investment in Delta …
Databricks and Coursera Launch Data Science Specialization for Data Analysts
Earlier this year, Databricks made a massive investment in training by providing free self-paced courses to all of our customers. Databricks furthers this investment by partnering with Coursera to provide Massive Open Online Courses (MOOC) training to the larger data community. Together we launched a new three-course specialization, Data Science …
Key Sessions for AWS Customers at Data + AI Summit Europe 2020
Databricks and Summit Gold Sponsor AWS Present on a wide variety of topics at this year’s premier data and AI event. Amazon Web Services (AWS) is sponsoring Data + AI Summit Europe 2020 and our work with AWS continues to make Databricks better integrated with other AWS services, making it …
How to Train XGBoost With Spark
XGBoost is currently one of the most popular machine learning libraries and distributed training is becoming more frequently required to accommodate the rapidly increasing size of datasets. To utilize distributed training on a Spark cluster, the XGBoost4J-Spark package can be used in Scala pipelines but presents issues with Python pipelines. …
Data Teams Unite! Countdown to Data + AI Summit Europe
Data + AI Summit 2020 Europe takes place virtually in just a few days,from 17-19 November – and it’s free to attend! Formerly known as Spark + AI Summit, Data + AI Summit will bring together thousands of data teams to learn from practitioners, leaders, innovators and the original creators …
Announcing the Launch of SQL Analytics
Today, we announced the new SQL Analytics service to provide Databricks customers with a first-class experience for performing BI and SQL workloads directly on the data lake. This launch brings to life a new experience within Databricks that data analysts and data engineers are going to love. The service provides …
Improving the Spark Exclusion Mechanism in Databricks
Ed Note: This article contains references to the term blacklist, a term that the Spark community is actively working to remove from Spark. The feature name will be changed in the upcoming Spark 3.1 release to be more inclusive, and we look forward to this new release. Why Exclusion? The …
Why Cloud Centric Data Lake is the future of EDW
In this first of two blogs, we want to talk about WHY an organization might want to look at a... The post Why Cloud Centric Data Lake is the future of EDW appeared first on Databricks.
Reputation Risk: Improving Business Competency and Nurturing Happy Customers by Building a Risk Analysis Engine
Why reputation risk matters? When it comes to the term “risk management”, Financial Service Institutions (FSI) have seen guidance and... The post Reputation Risk: Improving Business Competency and Nurturing Happy Customers by Building a Risk Analysis Engine appeared first on Databricks.
Faster SQL: Adaptive Query Execution in Databricks
Earlier this year, Databricks wrote a blog on the whole new Adaptive Query Execution framework in Spark 3.0 and Databricks... The post Faster SQL: Adaptive Query Execution in Databricks appeared first on Databricks.
Announcing Single-Node Clusters on Databricks
Databricks is used by data teams to solve the world’s toughest problems. This can involve running large-scale data processing jobs... The post Announcing Single-Node Clusters on Databricks appeared first on Databricks.
Integrating large-scale Genomic Variation and Annotation Data with Glow
Genomic annotations augment variant data by providing context for each change in the genome. For example, annotations help answer questions... The post Integrating large-scale Genomic Variation and Annotation Data with Glow appeared first on Databricks.
Analyzing Algorand Blockchain Data with Databricks Delta
Algorand is a public, decentralized blockchain system that uses a proof of stake consensus protocol. It is fast and energy-efficient,... The post Analyzing Algorand Blockchain Data with Databricks Delta appeared first on Databricks.
Flipp Presents Their Lakehouse Architecture with Delta Lake at Tableau Conference
Databricks at the Tableau Conference 2020 Our session Our page The Tableau Conference 2020 begins tomorrow, with our session Databricks:... The post Flipp Presents Their Lakehouse Architecture with Delta Lake at Tableau Conference appeared first on Databricks.
Measuring Advertising Effectiveness with Sales Forecasting and Attributing
Click below to download the notebooks for this solution accelerator: Campaign Effectiveness — ETL Campaign Effectiveness — Machine Learning How... The post Measuring Advertising Effectiveness with Sales Forecasting and Attributing appeared first on Databricks.