All the data you need.

Tag: Reproducibility

Seeking Reproducibility within Social Science: Search and Discovery
Julia Lane, NYU Professor, Economist and cofounder of the Coleridge Initiative, presented “Where’s the Data: A New Approach to Social Science Search & Discovery” at Rev. Lane described the approach that the Coleridge Initiative is taking to address the science reproducibility challenge. The approach is to provide remote access for …
MNIST Expanded: 50,000 New Samples Added
This post provides a distilled overview regarding the rediscovery of 50,000 samples within the MNIST dataset. MNIST: The Potential Danger of Overfitting Recently, Chhavi Yadav (NYU) and Leon Bottou (Facebook AI Research and NYU) indicated in their paper, “Cold Case: The Lost MNIST Digits”, how they reconstructed the MNIST (Modified …
Addressing Irreproducibility in the Wild
This Domino Data Science Field Note provides highlights and excerpted slides from Chloe Mawer’s “The Ingredients of a Reproducible Machine Learning Model” talk at a recent WiMLDS meetup. Mawer is a Principal Data Scientist at Lineage Logistics as well as an Adjunct Lecturer at Northwestern University. Special thanks to Mawer …
Learn from the Reproducibility Crisis in Science
Key highlights from Clare Gollnick’s talk, “The limits of inference: what data scientists can learn from the reproducibility crisis in science”, are covered in this Domino Data Science Field Note. The full video is available for viewing here. Introduction Within Clare Gollnick’s Strata San Jose talk, “The limits of inference: …
Data Scientist? Programmer? Are They Mutually Exclusive?
This Domino Data Science Field Note blog post provides highlights of Hadley Wickham’s ACM Chicago talk, “You Can’t Do Data Science in a GUI”. In his talk, Wickham advocates that, unlike a GUI, using code provides reproducibility, data provenance, and the ability to track changes so that data scientists have …
The Machine Learning Reproducibility Crisis
Pete Warden is the Technical Lead on the TensorFlow Mobile Embedded Team at Google doing Deep Learning. He is formerly the CTO of Jetpac, which was acquired by Google. He is also an Apple alumnus and blogs at petewarden.com. This post candidly discusses some of the real world reproducibility challenges …
Becoming a Data Scientist Podcast Episode 07: Enda Ridge
Data Scientist, Author, and manager of data science teams Enda Ridge talks to us about data governance, data provenance, reproducible analysis, work pipelines and products, and people, among other topics covered in his book "Guerrilla Analytics - A practical Approach to Working with Data: The Savvy Manager's Guide". Podcast Audio …