Data Science Crash Course 2/10: Anaconda and Jupyter Notebooks
Welcome to the second instalment of Data Science Crash Course. This text will be about setting up a framework for Data Science experiments. I’m assuming you have no knowledge of programming to make everything smooth and easy. Let’s begin.
Anaconda is a free, open-source distribution of both Python and R programming languages for data science and machine learning applications. It aims to simplify package management and deployment.
Ok, that might sound complicated but the truth is, it’s all about giving you a framework where you can code. You need a compiler to run Python code, think text editor which ‘runs’ your program, and Anaconda gives you that plus much more. It gives you all packages you might want to use and Jupyter Notebooks.
But let’s start with installation. Let’s go to www.anaconda.com and download the most recent version:
Now that you have it, it’s time to get into more explanations how it all works.
If you open Anaconda now you’ll see this screen:
Let’s now go to Jupyter Notebooks. We can forget for now about the rest. Let’s open Jupyter Notebook, you’ll see the following screen:
Click in the right corner on New and choose Python3. You’ll see
and… you’re good to go! This is the command line where you can start writing code in Python and then execute by clicking Run.
Jupyter Notebooks are extremely handy way of writing code and keeping track of different experiments. It’s not suited well for bigger projects, but for running quick experiments with data or training neural networks, it’s perfect.
You usually start by importing packages/libraries by ‘import ….’. Packages are ready-to-use pieces of code which allow you to save time by not having to write everything yourself. There are packages for everything:
- importing data (json)
- processing data (pandas, NumPy)
- text analysis (NLTK)
- neural networks (Keras, PyTorch, Tensorflow)
- visualization (Dash)
So it’s often the case that before you do everything yourself you should check whether there’s a relevant package already available. Of course sometimes you’ll want to do it yourself anyway, just to learn.
On the ending note, JupyterLab is also a great tool, it’s basically just a way to manage multiple Jupyter Notebooks at one time and thus it’s very useful if you want to organize your code.
Now you have everything you need to write your first Data Science experiment in Python.
If you’d like to watch a video version of this post, see this video: