Blog

Valuable impulses for your online shop

The Reinforcement Learning Process: How To Use Reinforcement Learning to Increase the Profitability of Your Online Shop

1. February 2018

Eric Mende

Artificial Intelligence

Reinforcement learning is a type artificial intelligence, where a so-called agent learns to interact with its environment in the best way possible. In recent years, various applications have emerged which have made the process popular. Programs have been developed that can beat humans at games like chess or go, or even for simple Atari games. These programs help robots successfully play football or perform daring acrobatic helicopter flights. In this three-part blog series, we’re going to show you how reinforcement learning can be used to personalize online store recommendations.

A dog with a ball in his mouth representing reinforcement learning.

An Example from Behavioral Psychology

The name reinforcement learning has been borrowed from behavior psychology. Reinforcement learning, an area of machine learning (AI), is similar to operant conditioning, which is how a dog learns to fetch a ball.

In this case, our dog Buster is the agent. The environment is the world in which he is located. Buster’s trainer and the ball are particularly important. Buster experiences the environment through his senses. He smells, hears, and sees what is happening around him. His brain creates an internal representation of this environment. He can respond to this representation with various actions. When he sees the ball flying away, he can decide, for example, whether to watch it, run after it, bark, sniff the ground, or ignore it and take a leak. If he chases after the ball and brings it back to his trainer, the trainer can then give him a reward.

This behavior will then be reinforced if Buster’s brain establishes a connection between bringing the ball back and the reward. Next time, Buster will be more motivated to perform this action again. Chasing after a squirrel definitely wouldn’t be worthwhile because he won’t get a reward. If Buster sees that he is rewarded for bringing the ball back, and not chasing the squirrel, multiple times, his brain creates a link between the internal representation of “ball thrown”, chasing after and bringing back the ball, and the reward. So he has learned to choose the response that benefits him most in the given situation.

Reinforcement Learning in E-Commerce

Reinforcement learning works very similarly and is used, amongst other things, to personalize online shops. Unfortunately, in this case the agent isn’t as fluffy — and they definitely don’t bark. But like Buster, the agent needs to experience their environment and use this experience to choose an action that will, in turn, influence the environment.

 

Graphic depicting the Reinforcement Learning process in an online shop
Fig. 1. Reinforcement Learning in an online shop

 

The environment we’re interested in is online shops and the customers who interact with them. Customer behavior in an online shop can be collected by the server. Just like Buster smells the ball, watches it fly away and hears it hit the ground, the server can record when a customer loads a new page (including the exact time), what they have searched for, and whether they clicked on a recommended product. As customers spend more time browsing the online shop, more of their behavior is recorded and the size of the data log increases. However, every time the agent needs to act, it requires a vector that is always the same length as the input, as with other machine learning methods. This vector is the internal representation of the environment.

 

Intern representation in shape of a vector
Fig. 2. Intern representation in shape of a vector

Using Reinforcement Learning in a Recommendation Engine

Just like Buster could choose to perform various actions, like running and sniffing, the agent can also perform various actions. These actions have an impact on the environment. Our agent can influence the e-commerce recommendations on a newly loaded page of an online shop. For example, they can decide that only products from a particular brand or products with a maximum cost of $20 should be displayed. The agent can also choose to do both at the same time, just like Buster could decide to fetch the ball while barking.

The agent’s decisions influence the product recommendations and the personalized elements that the customer sees, and so can also influence the customer’s behavior:

  • The best outcome: The customer is shown products that may interest them and so they are more likely to buy something or buy more. If the customer does buy something, the agent receives a digital treat, i.e., the agent is told the amount that the customer has spent. This reward reinforces the agent’s behavior. This means that if the agent receives a similar input vector in the future, they are more likely to behave in the same way.
  • The worst-case scenario: The customer is reluctant to buy anything or leaves the shop, and so the agent goes away empty handed. The agent’s behavior is not reinforced. So if the agent receives a similar input vector in the future, they are less likely to perform the same action.

This procedure is repeated for lots of customers. Each individual online shopper becomes a trainer for the agent. Over time, the agent learns which product recommendations are best for the customer’s behavior.

 

Personalizing Recommendations

What’s special about the agent, is that they can respond to the various situations that customers find themselves in. Customers with similar behavior create similar vectors. For example, some customers look for something in particular and know what they want. These customers tend to look at category overview pages less, but spend a longer average time on every page they visit. For customers who want to browse and find inspiration, the opposite is true.

The agent learns not only to distinguish between such groups, but also the most appropriate action for each group. So rather than using rigid strategies that perform the same action for every customer based on pre-set rules, this strategy can increase sales.

 

Our Conclusion on Using Reinforcement Learning in E-Commerce

With the right training, not only can dogs learn to fetch, relevant product recommendations can also be generated in online shops. Reinforcement learning trains the agent with different shop-user behavior, meaning that the agent can provide improved, customer-specific recommendations.

 

Get to know how internetstores raises its turnover through recommendations:

Read case study now!

Case study: More turnover at internetstores

 

 

More information on our e-commerce technology >>

If you have any questions or suggestions, leave us a comment below!

 

Was this article helpful?

Eric Mende

Data Scientist

Eric ist als Data Scientist bei epoq tätig und für den Bereich Machine Learning zuständig. Er optimiert täglich unsere Algorithmen, damit sie für unsere Kunden die besten Ergebnisse erzielen.

Comments (0)

Leave a Reply

Your email address will not be published. Required fields are marked *

Digital Returns Management Begins in the Online Shop Discover New Sales Opportunities in the Checkout Process

Start typing and press Enter to search