SEMMachineLearning Glossary
To access Machine Learning and apply its techniques to Search Engine Marketing it’s helpful to know about a couple of basic terms you’ll cross quite often. All of these terms are commonly used on these pages and in Machine Learning in general.
Term  Explanation 
Algorithm  Process of defined (mathematical) steps to solve a certain problem 
Artificial Intelligence  The process of building inteligent systems that are able to mirror human behaviour 
Attributes  Each particular observation your datasets includes in its columns 
Classification  Routine of categorizing values into different classes and groups of data. Cam be used for audience or keywordclassification a.o. 
Click Through Rate  The number of websiteclicks divided by the number of Impressions 
Clustering  Technique of grouping data into different categories based on similarities and observations. kMeans is a widely used algorithm for this problem 
Confusion matrix  Tool to visualize and describe the results and performance of a classification model in Machine Learning 
Conversions  The defined and measured action a user makes on your website based on your KPIs 
Correlation  Statistical metric that provides the size of fluctation betweentwo or more variables 
csv  Comma seperated value files are a file format to store data in tables 
Data cleaning  Process of removing and replacing wrong data, fixing missing data, detecting outliers to prepare your csv file for Machine Learning 
Data collection  Procedure of gaining relevant data from all kind of sources like websitetracking, CRM or 3rdparty providers 
Data science  The wonderful discipline of gaining knowledge and insights from data 
Dataset  Collection of datapoints build from rows and columns including variables and features 
Decision Tree  Classification algorithm within Supervised Learning that predicts new values based learned rules and decisions and representing the output in treeform 
Deep learning  One of several AI applications. Deep Learning imitates the human brain functions while performing a certain task repeatedly and learning from huge datasets 
Descreptive statistics  Figures thate are used to describe data and put them together in columns and rows 
Feature  The datainput provided to a Machine Learning Model in Supervised Learning 
Feature selection  Technique to select the most valuable features for your purpose from your columns and get rid of the irrelevant ones 
Histogram  Visualization tool that displays data in groups and logical ranges 
Impressions  Number of times your search ad was shown to the user 
Input  The labeled information (x) you provide to your machine in supervised learning 
Kmeans  Clustering Algorithm within Unsupervised Learning Models that calculates the distance between entities to group data in clusters. Often used for market or audience segmentation. 
KNearest Neighbours  KNN is a classifification algorithm within Supervised Learning which uses the datapoints close to each other as a reference point to group and cluster raw data 
Kaggle  Online Community of data scientists, well known for it’s Machine Learning competitions 
Lable  The dataoutput provided to a Machine Learning Model in Supervised Learning 
Linear Regression  Regression algorithm within Supervised Learning which uses labeled data to predict new values 
Logistic Regression  Classification algorithm within Supervised Learning which uses the dependency of one ore more variables to predict an outcome 
Machine Learning  Subset of Artificial Intelligence that works with huge amount of data to predict new values, find patterns and similiarities e.g. without explicity beeing programmed 
Matplotlip  Machine Learning library for data visualization through charts, scatterplots and histograms 
Mean  Statistical distribution describing the numerical average 
Median  Statistical distribution describing the value in the middle of a group of numbers 
Model  Depiction of computation operations that processes provided data through an algorithm 
Model Accuracy  The metric that defines the rate of correctnes and quality of a model based on a test with provided data 
Naive bayes  Classification algorithm within Supervised Learning that uses the independency of one ore more variables to predict an outcome 
Numpy  Machine Learning library to transform data into vectors, matrices, arrays and functions 
Output 
The labeled information (y) you provide to your machine in supervised learning. It’s the resultinformation we already know but need to combine with (x) to find outliers and errors

Overfitting  Describes the “too good” performance of a model referring to its target function 
Pandas  Machine Learning library with csvreading function for data import and datacleaning functions 
Probability  Field in the mathematical discipline of statistics that quantifies chance of offurance refering to certain events. Commonly used or clickprediction in adtech 
Python  The most popular and mainly used programming language in Machine Learning and Data Science 
Pytorch  Machine Learning framework written in Lua language, which is based on imperative programming 
R  Commonly used programming language in Machine Learning for statistical purposes 
Random Forest  Classification algorithm within Supervised Learning that uses a group of Decision Trees to predict new values 
Reinforcement Learning 
Type of Machine Learning where the model follows a trial and error approach and learns from exploration and mistakes. It’s the backbone of Real Time Bidding in SEM.

scikit learn  Machine Learning framework written in Python that provides several algorithms for supervised and unsupervised learning 
Segmentation  Clustering datapoints into different groups based on similiarities and patterns 
Standard Deviation  Numerical value in statistics that describes how the numbers of a group differs from the mean 
Statistical Fit  Numerical value that displays the accuracy of your approximation referring to your target 
Statistics  Mathematical discipline dealing with analyzing and interpreation of numerical data 
Stochastic Gradient Descent  Mathematical function that can be used for error fixing to adjust and optimize parameters within a dataset. Commonly used as learning algorithm in very large datasets 
Supervised Learning  In Supervised Learning you train an algorithm by providing labeled datainput by yourself 
Support Vector Machines  Classification algorithm within Supervised Learning that splits a datasets into predefined categories 
Tensorflow  Machine Learning framework with capacities to build Deep Learning models and Neural Networks 
Tracking  The process of gaining data from interactions users have with your website 
Training  The process of providing inputdata to your model to improve accuracy 
Underfitting  Describes the poor performance of a model referring to its target function 
Unsupervised Learning  Unsupervised Learning works without labeled data and training. Algorithms in USL cluster and group dataset based on similarities, patterns and associations. 