All the data you need.

Tag: Open Source

Quantum Computing with @hellodavidryan: TDI 24
“Within ten years there will be everything from small form factor QPUs doing 10 to 30 qubits as a part of roaming networks of devices, through to crazy large quantum devices doing thousands of qubits” — David (@hellodavidryan) on Threads Today we have @hellodavidryan. How did you end up working …
Threads Dev Interview 14: @ben.codes
“we are stronger if we all share some of our works together than if we all kept everything to ourself.” — ben.codes (@bendotcodes) on Threads Today we have @bendotcodes. We are going to start out talking about open source software because that is a topic not yet covered in these …
PingCAP Empowers Open Source Community with New GitHub Data Explorer Tool
PingCAP, the provider of the advanced distributed SQL databases, announced the introduction of its new GitHub Data Explorer tool. This innovative new tool is built to help developers and open-source contributors achieve deeper insights into their GitHub activity, streamline workflows, and increase productivity.
DataStax Acquires Machine Learning Company Kaskada to Unlock Real-Time AI
DataStax, the real-time AI company, announced it has acquired Kaskada, a machine learning (ML) company that first solved managing, storing and accessing time-based data to train behavioral ML models and deliver the instant, actionable insights that fuel artificial intelligence (AI). Both DataStax and Kaskada have a track record of contributing …
Introducing Spark Connect – The Power of Apache Spark, Everywhere
At last week’s Data and AI Summit, we highlighted a new project called Spark Connect in the opening keynote. This blog post walks… The post Introducing Spark Connect – The Power of Apache Spark, Everywhere appeared first on Databricks.
Designing a Java Connector for Delta Sharing Recipient
Making an open data marketplace Stepping into this brave new digital world we are certain that data will be a central product for… The post Designing a Java Connector for Delta Sharing Recipient appeared first on Databricks.
Connect From Anywhere to Databricks SQL
Today we are thrilled to announce a full lineup of open source connectors for Go, Node.js, Python, as well as a new CLI… The post Connect From Anywhere to Databricks SQL appeared first on Databricks.
Can’t-miss Sessions Featuring MLflow
Data + AI Summit is the global event for the data community, where practitioners, leaders and visionaries come together to engage in thought-provoking… The post Can’t-miss Sessions Featuring MLflow appeared first on Databricks.
How to Monitor Streaming Queries in PySpark
Streaming is one of the most important data processing techniques for ingestion and analysis. It provides users and developers with low latency and… The post How to Monitor Streaming Queries in PySpark appeared first on Databricks.
Arcuate – Machine Learning Model Exchange With Delta Sharing and MLflow
Stepping into this brave new digital world we are certain that data will be a central product for many organizations. The way to… The post Arcuate – Machine Learning Model Exchange With Delta Sharing and MLflow appeared first on Databricks.
Open-source Twitter algorithm: What could go wrong?
There’s a controversy going on all over the internet regarding the Twitter algorithm. SpaceX and Tesla CEO Elon Musk has ambitions toward making it open-source. What would be the advantages and disadvantages from a user perspective? Social media platforms are being heavily criticized for their content ranking algorithms. We discussed
What to Expect At Data + AI Summit: Open Source, Technical Keynotes and More!
Data + AI Summit, the world’s largest data and AI conference, returns June 27-30 2022, and we’re thrilled to say that this year,… The post What to Expect At Data + AI Summit: Open Source, Technical Keynotes and More! appeared first on Databricks.
Extending Delta Sharing to Google Cloud Storage
This blog article has been cross-posted from the Delta.io blog. We are excited for the release of Delta Sharing 0.4.0 for the open-source… The post Extending Delta Sharing to Google Cloud Storage appeared first on Databricks.
Log4j2 Vulnerability (CVE-2021-44228) Research and Assessment
This blog relates to an ongoing investigation. We will update it with any significant updates, including detection rules to help people investigate potential exposure due to CVE-2021-44228 both within their own usage on Databricks and elsewhere. Should our investigation conclude that customers may have been impacted, we will individually notify …
Scala at Scale at Databricks
With hundreds of developers and millions of lines of code, Databricks is one of the largest Scala shops around. This post will be a broad tour of Scala at Databricks, from its inception to usage, style, tooling and challenges. We will cover topics ranging from cloud infrastructure and bespoke language …
The Foundation of Your Lakehouse Starts With Delta Lake
It’s been an exciting last few years with the Delta Lake project. The release of Delta Lake 1.0 as announced by Michael Armbrust in the Data+AI Summit in May 2021 represents a great milestone for the open source community and we’re just getting started! To better streamline community involvement and …
Turning 2 Trillion Data Points of Traffic Intelligence into Critical Business Insights
This is a guest authored post by Stephanie Mak, Senior Data Engineer, formerly at Intelematics. This blog post offers my experience of contributing to the open source community with Bricklayer, which I’d started during my time at Intelematics. Bricklayer is a utility for data engineers whose job is to farm …
Introducing Apache Spark™ 3.2
We are excited to announce the availability of Apache Spark™ 3.2 on Databricks as part of Databricks Runtime 10.0. We want to thank the Apache Spark community for their valuable contributions to the Spark 3.2 release. The number of monthly maven downloads of Spark has rapidly increased to 20 million. …