How to build a Keyword-Classifier with Machine Learning
How to build a keyword-classifier with Machine Learning
In this article we will train a model to automatically recognize search-keywords as lower or upper funnel keywords based on historical performance. What we want to know is: Is a term used when a user is ‘ready to buy’ or when he is still at the beginning of the customer journey?
Setup & approach
The approach is supervised: This means, that we specify the respective KPI thresholds and provide the model with as many positive and negative data examples as possible from which it can learn.
In our example we use a binary Logistic Regression Classifier as our algorithm, which we implement with in the programming language Python. using the Sigmoid function, it will provide us with easily interpretable probability values between 0 and 1 as the rsult for each of the keeywords and thus assign the keyword to either the lower funnel (1) or the upper funnel (0).
With this result we can decide how much we are willing to pay for this keyword and if we want to use a bid modifier. We save ourselves time and the endless scrolling through thousands of keyword performance rows. Our model will take over.
The reporting structure is taken from Google Analytics and the keywords for the example were taken from the Google Ads Keyword Planner. However, the performance values of the dataset were created manually for this example and do not originate from an existing campaign.
Let's get started
1. Data Preprocessing
At the beginning we load a customized keyword report from Google Analytics. Important as a column are especially “Transactions/Conversions”. But also efficiency values like CTR, Return on Ad Spend (ROAS), Bounce Rate and Time-on-Site. Everything that is tracked and relevant as metrics can be included here.
Within the downloaded Excel file we enter additional columns for threshold values, which we use to define our lower funnel. For example CTR >3% or Converting Yes/No.
One possible solution to the delimitation is to drag the average value of the respective column and use it as the threshold. However, this is completely individual and varies from set to set.
With an if/then formula we can fill the red-marked learning columns here. If the value is above the threshold, we want to get a 1 (positive), if the value is below, we get a 0 (negative).
As an additional column we now create “Lower Funnel”. In our example a keyword is assigned to the lower funnel if it fulfills all 4 conditions.
This is what we want to teach our model.
2. Data import
Now we import our data set and the libraries we need for our calculations into our Machine Learning Notebook. In our example we use Google’s Colaboratory (Colab), which can be quickly set up and easily linked to Google spreadsheets with just a few clicks.
With the Python command “data.shape” we can display the dimension of the data set. In this case it is 94 rows and 13 columns (in “real life” the dimension should be much larger. The more data the better).
3. Train/Test Split
Now we have to define the dependencies of the variables that are included in the model. In our example, the threshold variables are located on the X-axis. We define “Lower-Funnel” as the label on the Y-axis.
In the next step we divide our data set into two parts: a training data set, with which we train our model and a test data set, which we use to validate the results. We perform the split by importing “train_test_split” on the scikit learn library.
4. Model training
As an algorithm we now import the Logistic Regression from scikit-learn and create our Python model object.
That’s the great thing about Python. You don’t need to write every algorithm from scratch and use the import function instead (An overview about import commands for other algorithms you’ll find here)
To train our model afterwards, we use the fit method “model.fit(x_train, y_train)”.
In y_train we also provide the system with the correct answers, which it will memorize and draw its conclusions from.
5. Model prediction
What we want to know in the next step is: how will our model perform on unknown data?
With “model.predict(x_test)” we get as output the funnel association values of the keywords from the test set.
With “model.predict_proba(x_test)” we can also display the calculated probability.
Also for single keywords in our dataset we can display the assignment and probability.
6. Model evaluation
A useful and easy to interpret value to evaluate the performance of the model is the score method. “model.score(x_test, y_test)” gives us a quality value between 0 and 1, where 0 means very low accuracy and 1 means very high accuracy.
At this point it gets a bit tricky again. Because a high accuracy value does not always mean that we have built a super smart and perfect model here.In our example, we only provided the model with 65 rows and 4 columns (70% of the total data) of training material it could learn from.
So it passes the test based on the existing data. However, its model world is very limited in both depth and width. This is known as the “Low Bias – High Variance Problem”, which leads to an over-performance (overfitting) of the model.This problem can be solved by more training data and more columns/metrics, which we place on the x-axis to expand our hypothesis horizon.
7. Scope Integration
By linking Colab and Google Drive, data spreadsheets can be accessed at any time. Through an adscript such as the keyword report, the data can be regularly updated here, so that the algorithm regularly receives fresh data and can learn.
In this way, the mean/threshold value and the classification are automatically adjusted.
Our classifier does this job for us and is integrated into the workflow.
Was this article helpful for your work? leave a comment 🙂