Artificial Intelligence

Reinforcement learning process: How to use reinforcement learning profitably for your online shop (Part 1)

Published February 1, 2018
Eric Mende
Reading time: 7 min.

Reinforcement learning is an artificial intelligence method in which a so-called agent learns to interact with its environment as effectively as possible. In recent years, many applications have emerged that have made the method popular. Programs have been developed that can beat humans at games such as chess or Go, or even simple Atari games. They help robots to play soccer successfully or perform daring acrobatic helicopter flights. In this three-part blog series, we show you how reinforcement learning can be used to personalize online store recommendations.

The picture shows a dog playing fetch in a field.

Here's what you can expect to find in this blog article:

An example from behavioral psychology
Reinforcement learning in e-commerce
Reinforcement learning in use for the recommendation engine
Personalization of recommendations
Our conclusion on the use of reinforcement learning in e-commerce

An example from behavioral psychology

The name reinforcement learning was borrowed from behavioral psychology. Reinforcement learning, a subfield of machine learning (AI), works in a similar way to instrumental conditioning, in which, for example, a dog learns to fetch a ball.

In this case, our dog "Benno" is the agent. The environment is the world in which he finds himself. The trainer and the ball are particularly important here. Benno perceives the environment through his senses. He smells, hears, and sees what is happening around him. His brain creates an internal representation of this environment. He can respond to this representation with various actions. When he sees the ball flying away, he can decide, for example, whether to look after it, run after it, bark, sniff the ground, or lift his leg. If he runs after the ball and brings it back to the trainer, the trainer can then give him a reward.

Stay up to date on personalization: Sign up for the epoq newsletter. Register now!

The behavior is then reinforced when Benno's brain makes the association between bringing the ball back and receiving a reward. Benno will be more motivated to perform this action again next time. However, it is not worthwhile for Benno to chase after a squirrel. He receives no reward for doing so. If he experiences several times that he is only rewarded for bringing back the ball and not the squirrel, his brain can make the connection between the internal representation "ball thrown," running after it and bringing it back, and the reward. This means that he has learned to choose the best response for him in the given situation.

Reinforcement learning in e-commerce

Reinforcement learning works in a very similar way and is used, among other things, for personalization of online shops. Unfortunately, the agent here is not so fluffy and does not bark. But like Benno, it must perceive its environment and, based on this perception, be able to decide on an action that in turn influences the environment.

Reinforcement learning for online shops

The environment we are interested in is online shops and customers who interact with them. A customer's behavior in an online shop can be recorded on the server side. Just as Benno smells the ball, sees it fly away, and hears it hit the ground, the server records when the customer opens a new page (including the exact time), what they searched for, and whether they clicked on a product recommendation. The longer the customer surfs in the online shop, the longer the log, i.e., the record of their behavior. The agent, on the other hand, requires a vector of constant length as input every time it is to act, as do other machine learning methods. This vector is the internal representation of the environment.

Internal representation in the form of a vector

Reinforcement learning in use for the recommendation engine

Just as Benno has the ability to perform various actions such as running or sniffing, the agent can also perform various actions. These actions then have an effect on the environment. Our agent has an influence on e-commerce recommendations on a newly accessed page of an online shop. For example, he can decide that only products of a certain brand should be displayed, or only products that cost a maximum of €20. He can also decide to do both at the same time, just as Benno could decide to fetch and bark at the same time.

The agent's decisions influence the product recommendations and personalized elements that the customer sees and can therefore also influence their behavior:

In a positive scenario, the customer is shown something that may be of interest to them, and they are more likely to buy more or at all. If the customer actually makes a purchase, the agent receives a digital treat, i.e., it is informed of the amount the customer has spent. This reward reinforces the agent's behavior. This means that if it receives a similar input vector again later, it is more likely to behave in the same way again.
Otherwise, the customer will hesitate to make a purchase or leave the store, and the agent will come away empty-handed. The behavior shown will not be reinforced. If the agent later receives a similar input vector, it will be less likely to perform the same action.

This procedure is repeated for many customers. Each individual online shopper thus becomes the agent's trainer. Over time, the agent learns which product recommendations are best for which customer behavior.

Personalization of recommendations

What makes the agent special is that it can respond to the different situations customers find themselves in. Customers with similar behavior generate similar vectors. For example, there are customers who are looking for something specific and know what they want. These customers tend to look at fewer category overview pages, but spend more time on average on each page they visit. For customers who want to browse and be inspired, it's more the other way around.

Stay up to date on personalization: Sign up for the epoq newsletter. Register now!

The agent learns to distinguish between these groups and which action is most appropriate for each group. This can increase sales compared to rigid strategies that perform the same actions for every customer based on preset rules.

Our conclusion on the use of reinforcement learning in e-commerce

With the right training, not only can dogs learn to fetch, but relevant product recommendations can also be generated in online shops. In reinforcement learning, the agent is trained with the different behaviors of shop users and can thus provide increasingly tailored recommendations for each customer.

In the second part of this blog series, we describe how we use real-time analytics to create input vectors from customer behavior. This shows what needs to be considered when tracking so that the agent can make good predictions.
In the third part , we take a closer look at the self-learning algorithms that our agent uses to determine which actions are best for which input vector.

5.04% increase in sales per session: Outletcity Metzingen put its personalization strategy to the test.
Request the case study now!

01 / 1

More information about our e-commerce technology >>

Eric Mende

Data scientist

At the time of publication, Eric was working as a data scientist at Epoq, where he was responsible for machine learning. He optimized our algorithms on a daily basis to ensure they delivered the best results for our customers.

Reinforcement learning process: How to use reinforcement learning profitably for your online shop (Part 1)

An example from behavioral psychology

Reinforcement learning in e-commerce

Reinforcement learning in use for the recommendation engine

Personalization of recommendations

Our conclusion on the use of reinforcement learning in e-commerce

5.04% increase in sales per session thanks to new personalization strategy at Outletcity Metzingen

Eric Mende

How to semantically enrich search results using the thesaurus in Control Desk, and how AI can help you do this

Guide to Analyzing and Optimizing Your Smart Search in Control Desk – Now AI-Powered

Live chat in e-commerce vs. AI-supported consulting: This is how customer service works today

Guided Selling Software 2.0: The new era of digital purchasing advice

The most important metrics in email marketing: How to measure success

Customer lifetime value: What is your customer worth?

How you can optimize the product detail page as a conversion lever in 2025

Filter search results in a targeted and flexible way: What you need to consider when creating your faceted search