PingCAP, the provider of the advanced distributed SQL databases, announced the introduction of its new GitHub Data Explorer tool. This innovative new tool is built to help developers and open-source contributors achieve deeper insights into their GitHub activity, streamline workflows, and increase productivity.
DataStax, the real-time AI company, announced it has acquired Kaskada, a machine learning (ML) company that first solved managing, storing and accessing time-based data to train behavioral ML models and deliver the instant, actionable insights that fuel artificial intelligence (AI). Both DataStax and Kaskada have a track record of contributing …
At last week’s Data and AI Summit, we highlighted a new project called Spark Connect in the opening keynote. This blog post walks… The post Introducing Spark Connect – The Power of Apache Spark, Everywhere appeared first on Databricks.
Making an open data marketplace Stepping into this brave new digital world we are certain that data will be a central product for… The post Designing a Java Connector for Delta Sharing Recipient appeared first on Databricks.
Today we are thrilled to announce a full lineup of open source connectors for Go, Node.js, Python, as well as a new CLI… The post Connect From Anywhere to Databricks SQL appeared first on Databricks.
Data + AI Summit is the global event for the data community, where practitioners, leaders and visionaries come together to engage in thought-provoking… The post Can’t-miss Sessions Featuring MLflow appeared first on Databricks.
Streaming is one of the most important data processing techniques for ingestion and analysis. It provides users and developers with low latency and… The post How to Monitor Streaming Queries in PySpark appeared first on Databricks.
Stepping into this brave new digital world we are certain that data will be a central product for many organizations. The way to… The post Arcuate – Machine Learning Model Exchange With Delta Sharing and MLflow appeared first on Databricks.
There’s a controversy going on all over the internet regarding the Twitter algorithm. SpaceX and Tesla CEO Elon Musk has ambitions toward making it open-source. What would be the advantages and disadvantages from a user perspective? Social media platforms are being heavily criticized for their content ranking algorithms. We discussed
Data + AI Summit, the world’s largest data and AI conference, returns June 27-30 2022, and we’re thrilled to say that this year,… The post What to Expect At Data + AI Summit: Open Source, Technical Keynotes and More! appeared first on Databricks.
This blog article has been cross-posted from the Delta.io blog. We are excited for the release of Delta Sharing 0.4.0 for the open-source… The post Extending Delta Sharing to Google Cloud Storage appeared first on Databricks.
March 16, 2022, 10:41 p.m.
This blog relates to an ongoing investigation. We will update it with any significant updates, including detection rules to help people investigate potential exposure due to CVE-2021-44228 both within their own usage on Databricks and elsewhere. Should our investigation conclude that customers may have been impacted, we will individually notify …
With hundreds of developers and millions of lines of code, Databricks is one of the largest Scala shops around. This post will be a broad tour of Scala at Databricks, from its inception to usage, style, tooling and challenges. We will cover topics ranging from cloud infrastructure and bespoke language …
It’s been an exciting last few years with the Delta Lake project. The release of Delta Lake 1.0 as announced by Michael Armbrust in the Data+AI Summit in May 2021 represents a great milestone for the open source community and we’re just getting started! To better streamline community involvement and …
This is a guest authored post by Stephanie Mak, Senior Data Engineer, formerly at Intelematics. This blog post offers my experience of contributing to the open source community with Bricklayer, which I’d started during my time at Intelematics. Bricklayer is a utility for data engineers whose job is to farm …
We are excited to announce the availability of Apache Spark™ 3.2 on Databricks as part of Databricks Runtime 10.0. We want to thank the Apache Spark community for their valuable contributions to the Spark 3.2 release. The number of monthly maven downloads of Spark has rapidly increased to 20 million. …
Apache Spark™ Structured Streaming allowed users to do aggregations on windows over event-time. Before Apache Spark 3.2™, Spark supported tumbling windows and sliding windows. In the upcoming Apache Spark 3.2, we add “session windows” as new supported types of windows, which works for both streaming and batch queries What is …
This is a collaborative post by Ordnance Survey, Microsoft and Databricks. We thank Charis Doidge, Senior Data Engineer, and Steve Kingston, Senior Data Scientist, Ordnance Survey, and Linda Sheard, Cloud Solution Architect for Advanced Analytics and AI at Microsoft, for their contributions. This blog presents a collaboration between Ordnance Survey …