How to set up a Machine Learning environment and upload SEM-Data

Published by Patrick Mebus on

After we’ve collected and cleaned our data, the next step is to build a Machine Learning environment for our purpose and to push the SEM-data into our library.

It only needs a few steps to install all the directories and libraries we need. Afterwards you’ll be ready to start with Machine Learning.

1. Create a Directory on your computer

Create a folder, where you store your SEM-data you’d like to work with. Later on you’ll access this folder from your ML-notebook.

On Windows it can look like this

C:/machine_learning/sem_data         

 If you’re a Mac user it’ll look like this

Desktop/machine_learning/sem_data    

Hint: If you’re going to build an classifier it makes sense to store different categories in different directories to call them separately.

2. Download and Install Python

Visit www.python.org and download the latest version of Python. In October 19 this is version 3.8.

You can use earlier versions of course. But be aware, that scikit learn requires version 2.6 or higher.

3. Open Terminal/Python shell

Now that we’ve Python on our computer we need to open the terminal, where we’ll type in our code and commands. This is called the „terminal“. As you might guess there are different ways to call the terminal for Mac and for Windows.

 On Mac:Press ‚Cmd + Space’ to open Spotlight. Then search for terminal.

On Windows:In your Start-Menu search for ‚cmd’ and open ‚Command Prompt’

This is how the Terminal looks like

To check, if python was installed correctly in the required version type the following command in your shell

Python -V      

The output will be the Python-version installed on your computer.

You can also kick off the shell by clicking on the IDLE (short for ‚Integrated Development and Learning Environment’ but also the last name of one oft he Monty Python members) which comes with the Python package you’ve downloaded in step three.

4. Install pip package manager

Pip stands for „pip installs packages“. As you can assume by the name, it’s the standard package manager for Python for installing additional libraries like numpy or pandas.

Pip is neccessary since we have to install additional packages for our purpose

To install pip type provide the following command

Python get-pip.py               
5. Install scikit learn, numpy, pandas etc.

 Now we can use pip to call and install the libraries and sub-programs we need!

 Type in the following command to install the packages

pip install numpypip install pandaspip install matplotlibpip install seabornpip install scikit-learnpip install jupyter

What you can access now:

  • scikit learn
  • numpy
  • pandas
  • matplotlib
  • seaborn
  • jupyter notebook
6. Open the Jupyter Notebook

We’re almost there!
Even to type code within the shell feels very cool and tech-like, let’s move away from our shell to a more user-friendly environment, called jupyter notebook.

To call the Jupyter Notebook just type it as a command into your shell

Jupyter notebook              

You should be directly forwarded to a localhost browserpage now

You should see your directions, including the one you’ve created earlier. Furthermore you can access three tabs (files – Running – clusters).



Create a new notebook by clicking on ‚new’ on the right-hand side where you’ll type in your Python commands. Notebooks are open-source browser-applications, that allow to visualize and run live code within cells. For many people, including myself, it feels more pleasent to write code within Jupyter than within the terminal.

Important: don’t close your Terminal, even if you’re operating within Jupyter now. The notebooks relies on the shell-connection. If you close your shell, your running notebook-operations will crash.

 

7. Import csv-file with pandas

 Now we’re ready to push our SEM-Data to our environment.

To upload your data to the notebook 8-UTF encoding is required. Something that not all systems support. To solve this quick and easy, just copy and paste your csv-data to a Google Spreadsheet and export it from here as a new, encoded csv-file to your directory.

Within Jupyter Notebook click on the filename in the directory to check if the data and all columns are there

You should see a list like this

In the next step go back to your notebook-tab and type in the command below to import your csv-file to pandas library. As mentioned earlier one of pandas (acronym coming from ‚panel’ and data’) strenghts is ist useful csv-reading function, what is exactly what we need for our SEM-purpose.

import pandas as pd df = pd.read_csv(‚filename.csv’) print(df)

By importing your SEM-csv-file and calling the Python print-function the output will be an exerpt of your data below the cell.

to make it a bit prettier type in Python head-function in the next cell to add a grid and make the header bold.

Last not least check, if all relevant values are inside. In our simple example the amount of all month-rows should be equal tot he amount of clicks, conversions, and cost-values.

Congrats you’re now ready to start with Machine Learning!

8. Recommended Resources 
exited about this post? feel free to share

Patrick Mebus

I’m a Digital Marketer with deep passion for Search Engines, Automation and AI. I’m here to make Machine Learning more feasible for Search Engine Marketers.

Leave a Reply

Your email address will not be published. Required fields are marked *