How to set up a Machine Learning environment and upload SEM-Data
It only needs a few steps to install all the directories and libraries we need. Afterwards you’ll be ready to start with Machine Learning.
1. Create a Directory on your computer
Create a folder, where you store your SEM-data you’d like to work with. Later on you’ll access this folder from your ML-notebook.
On Windows it can look like this
If you’re a Mac user it’ll look like this
Hint: If you’re going to build an classifier it makes sense to store different categories in different directories to call them separately.
2. Download and Install Python
Visit www.python.org and download the latest version of Python. In October 19 this is version 3.8.
You can use earlier versions of course. But be aware, that scikit learn requires version 2.6 or higher.
3. Open Terminal/Python shell
Now that we’ve Python on our computer we need to open the terminal, where we’ll type in our code and commands. This is called the „terminal“. As you might guess there are different ways to call the terminal for Mac and for Windows.
On Mac:Press ‚Cmd + Space’ to open Spotlight. Then search for terminal.
On Windows:In your Start-Menu search for ‚cmd’ and open ‚Command Prompt’
This is how the Terminal looks like
To check, if python was installed correctly in the required version type the following command in your shell
The output will be the Python-version installed on your computer.
You can also kick off the shell by clicking on the IDLE (short for ‚Integrated Development and Learning Environment’ but also the last name of one oft he Monty Python members) which comes with the Python package you’ve downloaded in step three.
4. Install pip package manager
Pip stands for „pip installs packages“. As you can assume by the name, it’s the standard package manager for Python for installing additional libraries like numpy or pandas.
Pip is neccessary since we have to install additional packages for our purpose
To install pip type provide the following command
5. Install scikit learn, numpy, pandas etc.
Now we can use pip to call and install the libraries and sub-programs we need!
Type in the following command to install the packages
pip install numpypip install pandaspip install matplotlibpip install seabornpip install scikit-learnpip install jupyter
What you can access now:
- scikit learn
- jupyter notebook
6. Open the Jupyter Notebook
We’re almost there!
Even to type code within the shell feels very cool and tech-like, let’s move away from our shell to a more user-friendly environment, called jupyter notebook.
To call the Jupyter Notebook just type it as a command into your shell
You should be directly forwarded to a localhost browserpage now
You should see your directions, including the one you’ve created earlier. Furthermore you can access three tabs (files – Running – clusters).
Create a new notebook by clicking on ‚new’ on the right-hand side where you’ll type in your Python commands. Notebooks are open-source browser-applications, that allow to visualize and run live code within cells. For many people, including myself, it feels more pleasent to write code within Jupyter than within the terminal.
Important: don’t close your Terminal, even if you’re operating within Jupyter now. The notebooks relies on the shell-connection. If you close your shell, your running notebook-operations will crash.
7. Import csv-file with pandas
Now we’re ready to push our SEM-Data to our environment.
To upload your data to the notebook 8-UTF encoding is required. Something that not all systems support. To solve this quick and easy, just copy and paste your csv-data to a Google Spreadsheet and export it from here as a new, encoded csv-file to your directory.
Within Jupyter Notebook click on the filename in the directory to check if the data and all columns are there
You should see a list like this
In the next step go back to your notebook-tab and type in the command below to import your csv-file to pandas library. As mentioned earlier one of pandas (acronym coming from ‚panel’ and data’) strenghts is ist useful csv-reading function, what is exactly what we need for our SEM-purpose.
import pandas as pddf = pd.read_csv(‚filename.csv’)
By importing your SEM-csv-file and calling the Python print-function the output will be an exerpt of your data below the cell.
to make it a bit prettier type in Python head-function in the next cell to add a grid and make the header bold.
Last not least check, if all relevant values are inside. In our simple example the amount of all month-rows should be equal tot he amount of clicks, conversions, and cost-values.
Congrats you’re now ready to start with Machine Learning!
8. Recommended Resources
- Download Python: https://www.python.org/downloads/
- Install scikit-learn on Mac: https://scikit-learn.org/0.16/install.html#mac-osx
- Install scikit-learn on Windows: https://scikit-learn.org/0.16/install.html#windows
- Pip installation: https://pip.pypa.io/en/stable/installing/
- Pandas read-functions: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html