All the data you need.

Tag: Open Source

Large Scale ETL and Lakehouse Implementation at Asurion
This is a guest post from Tomasz Magdanski, Director of Engineering, Asurion. With its insurance and installation, repair, replacement and 24/7 support services, Asurion helps people protect, connect and enjoy the latest tech – to make life a little easier. Every day our team of 10,000 experts helps nearly 300 …
Oracle to PostgreSQL? 6 Reasons to Make Your Open Source Migration
In this sponsored post, Kirk Roybal, PostgreSQL Database Reliability Engineer at Instaclustr, outlines how Postgres offers some especially enticing advantages for enterprises looking to trim (if not downright slash) costs without impacting database performance. Here’s a half-dozen reasons enterprises should consider the fully open source version of Postgres as a …
How Incremental ETL Makes Life Simpler With Data Lakes
Incremental ETL (Extract, Transform and Load) in a conventional data warehouse has become commonplace with CDC (change data capture) sources, but scale, cost, accounting for state and the lack of machine learning access make it less than ideal. In contrast, incremental ETL in a data lake hasn’t been possible due …
Introducing Delta Sharing: an Open Protocol for Secure Data Sharing
Data sharing has become critical in the modern economy as enterprises look to securely exchange data with their customers, suppliers and partners. For example, a retailer may want to publish sales data to its suppliers in real time, or a supplier may want to share real-time inventory. But so far, …
Ray for Data Science: Distributed Python tasks at scale
In this article, Dr Dean Wampler provides an overview of Ray including raising the question of why we need it. The article covers practical techniques and some walk through code to help users get started. The post Ray for Data Science: Distributed Python tasks at scale appeared first on Data …
New Open-Source Tools Use Machine Learning to Streamline Content Writing Process
Machine learning has always been the great hope for automating a variety of tasks. However, writing was often seen as something that could never be automated. Significant progress has been made over the years, and tools like OpenAI are making it easier than ever before to streamline the essay writing …
How To Manage OpenShift Secrets With Akeyless Vault
Developed by RedHat, OpenShift is an enterprise-grade hybrid cloud Kubernetes platform. It is essentially a commercial version of the open source container orchestration system designed to automate the deployment, management, and scaling of containerized applications. OpenShift can be described as a hybrid K8s application platform that operates as a platform-as-a-service …
What it means to be customer obsessed
One of our values at Databricks is to be customer obsessed. We deeply care about the impact and success of our customers, and are proud to be recognized by Gartner for focusing on this. A key part of that is how we strategize on making the world better through the …
Introducing Glow: an open-source toolkit for large-scale genomic analysis
The key to solving some of today’s most challenging medical problems lies in the analysis of genomics data. Understanding the impact of the minor changes in an individual’s genome on their overall health is fundamentally a data driven challenge that requires integration across hundreds of thousands of individuals. By analyzing …
Delta Lake Now Hosted by the Linux Foundation to Become the Open Standard for Data Lakes
At today’s Spark + AI Summit Europe in Amsterdam, we announced that Delta Lake is becoming a Linux Foundation project. Together with the community, the project aims to establish an open standard for managing large amounts of data in data lakes. The Apache 2.0 software license remains unchanged. Delta Lake …
Announcing the MLflow 1.1 Release
We’re excited to announce today the release of MLflow 1.1. In this release, we’ve focused on fleshing out the tracking component of MLflow and improving visualization components in the UI. Some of the major features include: Automatic logging from TensorFlow and Keras Parallel coordinate plots in the tracking UI Pandas …
What’s new with MLflow? On-Demand Webinar and FAQs now available!
On June 6th, our team hosted a live webinar—Managing the Complete Machine Learning Lifecycle: What’s new with MLflow—with Clemens Mewald, Director of Product Management at Databricks. Machine learning development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple …
Attending the Instructional Design Summit at the 2019 Open edX Conference
The unofficial first day of the 2019 Open edX Conference included one of the best seminars/workshops for content creators: The Instructional Design Summit.
Video: New Survey Looks at What’s Driving Companies to the Cloud
In this video from KubeCon 2018 in Seattle, Abby Kearns from the Cloud Foundry Foundation looks at the results of a recent survey on key factors driving the enterprise to the Cloud. “We’re seeing a virtuous cycle, as comfortability with one technology results in lightning-speed adoption of more advanced technologies. …
Themes and Conferences per Pacoid, Episode 3
Paco Nathan‘s column covers themes that include open source, “intelligence is a team sport”, and “implications of massive latent hardware”. Introduction Welcome to our monthly series about data science! Themes to consider here: Open Source wins; Learning is not enough Intelligence is a team sport Implications of massive latent hardware …
Combining the Benefits of Commercial & Open Analytics
A new e-book explores how organizations in many industries are using open source analytics and SAS, getting the most from both, and what role SAS plays throughout the analytics life cycle.
A Certification for R Package Quality
There are more than 12,000 packages for R available on CRAN, and many others available on Github and elsewhere. But how can you be sure that a given R package follows best development practices for high-quality, secure software? Based on a recent survey of R users related to challenges in …
On the Importance of Community-Led Open Source
Wes McKinney, Director of Ursa Labs and creator of pandas project, presented the keynote, “Advancing Data Science Through Open Source” at Rev. McKinney’s keynote covered open source’s symbiotic relationship with data science and the importance of community-led open source. This blog post includes distilled highlights, the full video, and transcript …