SEM-Machine-Learning Glossary
To access Machine Learning and apply its techniques to Search Engine Marketing it’s helpful to know about a couple of basic terms you’ll cross quite often. All of these terms are commonly used on these pages and in Machine Learning in general.
Term | Explanation |
Algorithm | Process of defined (mathematical) steps to solve a certain problem |
Artificial Intelligence | The process of building inteligent systems that are able to mirror human behaviour |
Attributes | Each particular observation your datasets includes in its columns |
Classification | Routine of categorizing values into different classes and groups of data. Cam be used for audience- or keyword-classification a.o. |
Click Through Rate | The number of websiteclicks divided by the number of Impressions |
Clustering | Technique of grouping data into different categories based on similarities and observations. k-Means is a widely used algorithm for this problem |
Confusion matrix | Tool to visualize and describe the results and performance of a classification model in Machine Learning |
Conversions | The defined and measured action a user makes on your website based on your KPIs |
Correlation | Statistical metric that provides the size of fluctation betweentwo or more variables |
csv | Comma seperated value files are a file format to store data in tables |
Data cleaning | Process of removing and replacing wrong data, fixing missing data, detecting outliers to prepare your csv file for Machine Learning |
Data collection | Procedure of gaining relevant data from all kind of sources like website-tracking, CRM or 3rd-party providers |
Data science | The wonderful discipline of gaining knowledge and insights from data |
Dataset | Collection of datapoints build from rows and columns including variables and features |
Decision Tree | Classification algorithm within Supervised Learning that predicts new values based learned rules and decisions and representing the output in tree-form |
Deep learning | One of several AI applications. Deep Learning imitates the human brain functions while performing a certain task repeatedly and learning from huge datasets |
Descreptive statistics | Figures thate are used to describe data and put them together in columns and rows |
Feature | The data-input provided to a Machine Learning Model in Supervised Learning |
Feature selection | Technique to select the most valuable features for your purpose from your columns and get rid of the irrelevant ones |
Histogram | Visualization tool that displays data in groups and logical ranges |
Impressions | Number of times your search ad was shown to the user |
Input | The labeled information (x) you provide to your machine in supervised learning |
K-means | Clustering Algorithm within Unsupervised Learning Models that calculates the distance between entities to group data in clusters. Often used for market- or audience segmentation. |
K-Nearest Neighbours | KNN is a classifification algorithm within Supervised Learning which uses the datapoints close to each other as a reference point to group and cluster raw data |
Kaggle | Online Community of data scientists, well known for it’s Machine Learning competitions |
Lable | The data-output provided to a Machine Learning Model in Supervised Learning |
Linear Regression | Regression algorithm within Supervised Learning which uses labeled data to predict new values |
Logistic Regression | Classification algorithm within Supervised Learning which uses the dependency of one ore more variables to predict an outcome |
Machine Learning | Subset of Artificial Intelligence that works with huge amount of data to predict new values, find patterns and similiarities e.g. without explicity beeing programmed |
Matplotlip | Machine Learning library for data visualization through charts, scatterplots and histograms |
Mean | Statistical distribution describing the numerical average |
Median | Statistical distribution describing the value in the middle of a group of numbers |
Model | Depiction of computation operations that processes provided data through an algorithm |
Model Accuracy | The metric that defines the rate of correctnes and quality of a model based on a test with provided data |
Naive bayes | Classification algorithm within Supervised Learning that uses the independency of one ore more variables to predict an outcome |
Numpy | Machine Learning library to transform data into vectors, matrices, arrays and functions |
Output |
The labeled information (y) you provide to your machine in supervised learning. It’s the result-information we already know but need to combine with (x) to find outliers and errors
|
Overfitting | Describes the “too good” performance of a model referring to its target function |
Pandas | Machine Learning library with csv-reading function for data import and data-cleaning functions |
Probability | Field in the mathematical discipline of statistics that quantifies chance of offurance refering to certain events. Commonly used or click-prediction in ad-tech |
Python | The most popular and mainly used programming language in Machine Learning and Data Science |
Pytorch | Machine Learning framework written in Lua language, which is based on imperative programming |
R | Commonly used programming language in Machine Learning for statistical purposes |
Random Forest | Classification algorithm within Supervised Learning that uses a group of Decision Trees to predict new values |
Reinforcement Learning |
Type of Machine Learning where the model follows a trial and error approach and learns from exploration and mistakes. It’s the backbone of Real Time Bidding in SEM.
|
scikit learn | Machine Learning framework written in Python that provides several algorithms for supervised and unsupervised learning |
Segmentation | Clustering datapoints into different groups based on similiarities and patterns |
Standard Deviation | Numerical value in statistics that describes how the numbers of a group differs from the mean |
Statistical Fit | Numerical value that displays the accuracy of your approximation referring to your target |
Statistics | Mathematical discipline dealing with analyzing and interpreation of numerical data |
Stochastic Gradient Descent | Mathematical function that can be used for error fixing to adjust and optimize parameters within a dataset. Commonly used as learning algorithm in very large datasets |
Supervised Learning | In Supervised Learning you train an algorithm by providing labeled data-input by yourself |
Support Vector Machines | Classification algorithm within Supervised Learning that splits a datasets into pre-defined categories |
Tensorflow | Machine Learning framework with capacities to build Deep Learning models and Neural Networks |
Tracking | The process of gaining data from interactions users have with your website |
Training | The process of providing input-data to your model to improve accuracy |
Underfitting | Describes the poor performance of a model referring to its target function |
Unsupervised Learning | Unsupervised Learning works without labeled data and training. Algorithms in USL cluster and group dataset based on similarities, patterns and associations. |