Zur Übersicht Startseite / Blog / The Reinforcement Learning Process: How To Use Reinforcement Learning to Increase the Profitability of Your Online Shop

The Reinforcement Learning Process: How To Use Reinforcement Learning to Increase the Profitability of Your Online Shop

01. Feb 2018 04:00 | Artificial Intelligence

Reinforcement learning is a type artificial intelligence, where a so-called agent learns to interact with its environment in the best way possible. In recent years, various applications have emerged which have made the process popular. Programs have been developed that can beat humans at games like chess or go, or even for simple Atari games. These programs help robots successfully play football or perform daring acrobatic helicopter flights. In this three-part blog series, we're going to show you how the reinforcement learning process can be used to personalise online store recommendations.


In this blog article, we will cover the following topics:

An Example from Behavioural Psychology
Reinforcement Learning in E-Commerce
Using Reinforcement Learning in a Recommendation Engine

Personalising Recommendations
Our Conclusion on Using Reinforcement Learning in E-Commerce


An Example from Behavioural Psychology

The name reinforcement learning has been borrowed from behaviour psychology. Reinforcement learning, an area of machine learning (AI), is similar to operant conditioning, which is how a dog learns to fetch a ball.

In this case, our dog Buster is the agent. The environment is the world in which he is located. Buster’s trainer and the ball are particularly important. Buster experiences the environment through his senses. He smells, hears, and sees what is happening around him. Thus, his brain creates an internal representation of this environment. He can respond to this representation with various actions. When he sees the ball flying away, he can decide, for example, whether to watch it, run after it, bark, sniff the ground, or ignore it and take a leak. If he chases after the ball and brings it back to his trainer, the trainer can then give him a reward.

This behaviour will then be reinforced if Buster’s brain establishes a connection between bringing the ball back and the reward. Next time, Buster will be more motivated to perform this action again. Chasing after a squirrel definitely wouldn’t be worthwhile because he won’t get a reward. If Buster experiences being rewarded for bringing the ball back instead of chasing the squirrel multiple times, his brain creates a link between the internal representation of “ball thrown”, chasing after and bringing back the ball, and the reward. As a result, he has learned to choose the response that benefits him most in the given situation.

Reinforcement Learning in E-Commerce

Reinforcement learning works very similarly and is used, amongst other things, to personalise online shops. Unfortunately, in this case the agent isn’t as fluffy — and they definitely don’t bark. But like Buster, the agent needs to experience their environment and use this experience to choose an action that will, in turn, influence the environment.

Graphic depicting the Reinforcement Learning process in an online shop
Fig. 1: Reinforcement Learning in an online shop

The environment we’re interested in is online shops and the customers who interact with them. Customer behaviour in an online shop can be collected by the server. Just like Buster smells the ball, watches it fly away and hears it hit the ground, the server can record when a customer loads a new page (including the exact time), what they have searched for, and whether they clicked on a recommended product. As customers spend more time browsing the online shop, more of their behaviour is recorded. This, in turn, increases the size of the data log. However, every time the agent needs to act, it requires a vector that is always the same length as the input, as with other machine learning methods. This vector is the internal representation of the environment.

Intern representation in shape of a vector
Fig. 2: Intern representation in shape of a vector

Using Reinforcement Learning in a Recommendation Engine

Just like Buster could choose to perform various actions, like running and sniffing, the agent can also perform various actions. These actions have an impact on the environment. Our agent can influence the e-commerce recommendations on a newly loaded page of an online shop. For example, they can decide that only products from a particular brand or products with a maximum cost of $20 should be displayed. Or, just like Buster could decide to fetch the ball while barking, the agent can also choose to do both at the same time.

The agent’s decisions influence the product recommendations and the personalised elements that the customer sees. In doing so, it can also influence the customer’s behaviour:

  • The best outcome: The customer is shown products that may interest them and so they are more likely to buy something or buy more. If the customer does buy something, the agent receives a digital treat, i.e., the agent is told the amount that the customer has spent. This reward reinforces the agent’s behaviour. This means that if the agent receives a similar input vector in the future, they are more likely to behave in the same way.
  • The worst-case scenario: The customer is reluctant to buy anything or leaves the shop. In this case, the agent goes away empty handed and its behaviour is not reinforced. So if the agent receives a similar input vector in the future, they are less likely to perform the same action.

This procedure is repeated for lots of customers. Each individual online shopper becomes a trainer for the agent. Over time, the agent learns which product recommendations are best for the customer’s behaviour.


Personalising Recommendations

What’s special about the agent, is that they can respond to the various situations that customers find themselves in. Customers with similar behaviour create similar vectors. For example, some customers look for something in particular and know what they want. These customers tend to look at category overview pages less, but spend a longer average time on every page they visit. For customers who want to browse and find inspiration, the opposite is true.

The agent learns not only to distinguish between such groups, but also the most appropriate action for each group. So rather than using rigid strategies that perform the same action for every customer based on pre-set rules, this strategy can increase sales.


Our Conclusion on Using Reinforcement Learning in E-Commerce

With the right training, not only can dogs learn to fetch, relevant product recommendations can also be generated in online shops. Reinforcement learning trains the agent with different shop-user behaviour, meaning that the agent can provide improved, customer-specific recommendations.



Get to know how Internetstores raises its turnover through recommendations:

Read the case study now!

More information on our e-commerce technology >>

If you have any questions or suggestions, leave us a comment below!

Was this article helpful?

Share article

Eric Mende

Data Scientist

Eric is a Data Scientist at epoq and is responsible for Machine Learning. He optimises our algorithms every day so that they achieve the best results for our customers.

Leave a Reply

Your email address will not be published. Required fields are marked *

These articles could also interest you
Blog Post

Live Video Shopping - The Game Changer in E-Commerce

In this article we talk about live video shopping - where does it come from and what opportunities does it offer for retailers? If live stream shopping is still new territory for you, then you can now find out whether this trend is also suitable for your e-commerce as a sales booster.

Nadine Roth: 24. Feb 2022

Learn More
Blog Post

Product Categories for Successful Marketing in E-Commerce

Have you ever been in a shop without signage? Quite a strange idea, isn't it? Product categories are the backbone of any online shop. Our definition: product categories are like a road map that directs traffic. The easier the signage is to understand, the faster customers get to their destination - the product they need. Customers need orientation when shopping. For a first-class shopping experience, they expect products to be correctly categorised and easy to find in bundles: Be it breakfast cereals, kitchen utensils, or winter coats. The same is true in e-commerce. In this article, you will learn how product categories improve the user experience, optimise your marketing investment and contribute to your business goals.

Daniela Ilincic: 23. Feb 2022

Learn More