Docker is platform based on container virtualization. To try Docker, you can either install it from the Docker Website or use the Play with Docker Website. Play with Docker, provides access to a testing virtual machine with docker installed. The machine is destroyed after a few hours. You can use it to follow the workshops presented here.

Introduction to Docker

A very good workshop from Jérôme Petazzoni, on PyCon2016, showing the basic Docker fuctionalitites.

YouTube Video

Advanced Docker Workshop

An advanced workshop, from Jérôme Petazzoni, on PyCon2017

Lecture on Container Virtualization

This is a single lecture, getting into more details on container virtualization

Lecture Video

Singularity

Singularity containers are containers specialized for HPC and scientific usecases. They are similar to Docker, but more optimized to this usecase.

Singularity: Scientific containers for mobility of compute

An article explaining what singularity containers are and their basic usage.

Article Link

Singularity: Containers for Science, Reproducibility, and HPC

A youtube presentation of Singularity containers

YouTube Video

Machine Learning

The two following (free) books, are considered to be the best books for Machine Learning – even though “Machine Learning” is not in their title.

General Material

A Quick Guide to Software Licensing for the Scientist-Programmer
Choose A Licence
- Licenses
- Detailed Overview
DataTau: A website with general news regarding Data Science.
Good Enough Practices in Scientific Computing
Best Practices in Scientific Computing
Opening Science
How to have a bad career in Research / Academia
You and Your Research

Books

Some general (non-technical) book recommendations:

Who Moved My Cheese, by Dr Spencer Johnson: A book that teaches you how to deal with constant changes
The Dip, by Seth Godin: Thoughts about when to quit and when to stay.
So Good They Can’t Ignore You, by Cal Newport: On why it’s more important to focus on building your skills, and why “Following your passion” is not always a good advice.
Deep Work, by Cal Newport: By the same author, recommendations for how to work more focused and why that is important.
Thinking, Fast and Slow: A lot of information on how we think and behave.
How to Read a Book: Self explanatory title.
Incerto, by Nassim Nicholas Taleb: A philosophical and practical essay on uncertainty. It is not a “Data Science” book, but it might be the most critical book that you read, regarding Data Science.

Revised Agenda

Due to the issues with the Jupyter Notebooks, (and the lack of time), we’ll focus more on the other parts of the workshop, that are not so heavily dependent on Jupyter.

16:00 – 16:15

Discussion on the Forensic Exercise (Organization Exercise 01 html)

16:15 – 16:45

Project Organization

16:45 – 17:15

Organization Exercise 02 html

17:15 – 18:15

Publication & Sharing Exercise html

Workshop Agenda

Etherpad Link

S21 Introduction, 11:30 – 12:30

This would be the introductory session for concept of Reproducible Research

The slides of the introduction are available here
To follow allong, during the Jupyter section, you need to follow the instructions from here Workshop Preperation.
Introduction to Jupyter
Working with Notebooks
Now, for the remaining time open (from your Jupyter Browser), the notebook 01_navigation.ipynb and read / evaluate all of it’s cells

S22 Organization & Data Exploration 14:00 – 15:30

In this part we’ll talk about how to better organize a project. Your first (practical) task is to follow the instructions from the Organization Exercise 01.

The slides of the organization lecture can be found here
Organization Exercise 01 html / ipynb
Organization Exercise 02 html / ipynb
Data Exploration html / ipynb

S23 Automation, 16:00 – 17:30

Here, we’ll automate the analysis from the previous step by implementing functions

Automation Exercise html / ipynb

Here we’ll see how to export a notebook, share it, and publish our results.

Publication & Sharing Exercise html / ipynb

References

The majority of the material, is based on the Reproducible Science Curriculum, which was in turn based on Data Carpentry material for reproducible research.

Disclaimer

The opinions expressed in this article are the committer’s own and do not necessarily reflect the views of GWDG.

ssgoe2017

Further Reading

Reproducible Science

Reproducible Science Curriculum

iDigBio, Reproducible Science Workshop

Reproducible Research: Walking the Walk

Reproducible Research – YouTube playlist

Programming

Scientific Python

Git and GitHub

Git for Scientists: A Tutorial

Atlassian Git Tutorial

Udacity Git Course

ProGit

Linux and Linux Command Line

Introduction to Linux, edX course

Containers

Docker

Introduction to Docker

Advanced Docker Workshop

Lecture on Container Virtualization

Singularity

Singularity: Scientific containers for mobility of compute

Singularity: Containers for Science, Reproducibility, and HPC

Machine Learning

General Material

Books

Revised Agenda

16:00 – 16:15

16:15 – 16:45

16:45 – 17:15

17:15 – 18:15

Workshop Agenda

S21 Introduction, 11:30 – 12:30

S22 Organization & Data Exploration 14:00 – 15:30

S23 Automation, 16:00 – 17:30

References

Disclaimer

Further Reading

Reproducible Science

Reproducible Science Curriculum

iDigBio, Reproducible Science Workshop

Reproducible Research: Walking the Walk

Reproducible Research – YouTube playlist

Programming

Scientific Python

Git and GitHub

Git for Scientists: A Tutorial

Atlassian Git Tutorial

Udacity Git Course

ProGit

Linux and Linux Command Line

Introduction to Linux, edX course

Containers

Docker

Introduction to Docker

Advanced Docker Workshop

Lecture on Container Virtualization

Singularity

Singularity: Scientific containers for mobility of compute

Singularity: Containers for Science, Reproducibility, and HPC

Machine Learning

General Material

Books

Revised Agenda

16:00 – 16:15

16:15 – 16:45

16:45 – 17:15

17:15 – 18:15

Workshop Agenda

S21 Introduction, 11:30 – 12:30

S22 Organization & Data Exploration 14:00 – 15:30

S23 Automation, 16:00 – 17:30

S24 Publication & Sharing, 17:30 – 18:30

References

Disclaimer