Text práce (916.9Kb)

(1)

Charles University in Prague Faculty of Mathematics and Physics

BACHELOR THESIS

Jiˇ r´ı Helmich

Doporuˇ cov´ an´ı zboˇ z´ı ve webov´ ych obchodech Recommender system for a web shop

Department of Theoretical Computer Science and Mathematical Logic

Supervisor: Mgr. Ondˇrej S´ykora Study programme: Computer Science

2010

(2)

I wish to thank my supervisor Mgr. O. S´ykora for advices and patience.

My thanks also belongs to my friends A. Varkoˇckov´a, Bc. M. Tul´aˇcek, Bc.

M. Mráz and M. Weissová who helped me with correction of the English text. Last but not least my thanks belong to the Sunnysoft, s.r.o. and its e-Commerce Manager Ing. David ˇSilhán for consultations and purchase data provided.

Prohlaˇsuji, ˇze jsem svou bakaláˇrskou práci napsal samostatnˇe a výhradnˇe s pouˇzit´ım citovaných pramen˚u. Souhlas´ım se zap˚ujˇcován´ım práce a jej´ım zveˇrejˇnován´ım.

I declare that I wrote my bachelor thesis independently and exclusively with the use of the cited sources. I agree with lending this thesis.

In Prague, 6^th August 2010 Jiˇr´ı Helmich

(3)

Pouˇzitelnou alternativou vˇsak m˚uˇze být systém zaloˇzený na sledován´ı akc´ı zákazn´ık˚u. Student zanalyzuje pouˇz´ıvané pˇr´ıstupy k doporuˇcován´ı zboˇz´ı zaloˇzené na algoritmu zpˇetnovazebného uˇcen´ı a na základˇe výzkumu im- plementuje prototyp systému pro doporuˇcován´ı zboˇz´ı ve webovém obchodˇe.

Kl´ıˇcová slova: systém pro doporuˇcován´ı, zpˇetnovazebné uˇcen´ı, umˇelá in- teligence, nákupn´ı koˇs´ık

Title: Recommender system for a web shop Author: Jiˇr´ı Helmich

Department: Department of Theoretical Computer Science and Mathemat- ical Logic

Supervisor: Mgr. Ondˇrej S´ykora

Supervisor’s e-mail address: ondrasej@centrum.cz

Abstract: Electronic shopping recommendation systems are an integral part of most on-line stores. Nowadays, these systems are usually driven by Mar- ket Basket Analysis algorithms and by finding relations within the range of goods offered by the shop. However, for a recommendation system an action-based approach might be a viable alternative. The student will review existing approaches to recommendation based on reinforcement learning algorithms, and implement a prototype of a recommender system for a web store based on the results of the research.

Keywords: recommender, reinforcement learning, artificial intelligence, market basket

(6)

Chapter 1 Introduction

In this thesis, we will study the problem of recommending goods on an e- commerce website. Every website has a recommender system, but they are based on different algorithm. Starting with the most simple Top-N up to algorithms with the ambitions to beat up Google’s Page Rank algorithm.

In this thesis, we will make further research into reinforcement learning and recommendation algorithms based on this technique. We will also implement a prototype of RL based recommender system based on a real purchase data gathered by a Czech retailer for mobile hardware.

1.1 Recommender systems

A recommender system is a system which recommends the goods which should be offered to the customer based on several criteria. The customer usually comes to a shop with the demand to buy one stock item such as a camera for an upcoming vacation. This is his main goal for the purchase and if nobody influences him, he will be satisfied and leaves the shop just with the single item. However, after several days, he will realise, that he needs a case, memory cards, polarisation filters, etc. Naturally, he will go to a shop to buy wanted items, but it does not have to be the same shop where he had bought the camera. The first shop could perceive this fact as a loss.

This is a task for a salesman to convince the customer to buy such items at the same time as the camera. If the salesman is successful, it is a win-win situation. The customer have what he needs and does not need to go to shopping again. For the shop, the advantage is clear, the profit increased.

Moreover, when the profit made by selling the accessories is higher then in

(7)

the case of the main product.

On an e-commerce site, the situation is nearly the same. The customer needs an item and looks for the best offer. The main difference is, that he can review several shops at a time and compare the offers. The second difference is, that there is no salesman who can influence the customer’s choice.

Shops solve this problem by advertisement, by stating of the related goods at the product offer page and so on. The most sophisticated technique is to recommend another products according to the products, that are already in the market basket. This simulates the natural process of the shopping where customers come with the idea of items they want to buy and the salesman recommends them additional items. Therefore the e-commerce sites use the recommender systems.

1.2 The problem of e-commerce recommen- dations

Since every site wants the consumer to buy something or buy something more than he or she already has in his or her market basket, it uses a recommender system to offer relevant or interesting products to the customer recommender system. The output of the recommender system is a quite simple, but the algorithm beyond it could be quite complicated. Several types of algorithm are used for goods recommendations. The most frequently used are based on Top-N and Market Basket Analysis (MBA) techniques. We will talk about these techniques further in Chapter 2.

These techniques are quite simple, but successful, so we will try to learn, why is that so. The most frequently used algorithm is so-calledTop-N, which just sorts the goods based on the count of the sold pieces of the good. After that, topN items are recommended to the consumer. Despite the simplicity, the algorithm is very successful.

1.3 Outline of the thesis

The Top-N and MBA techniques are quite simple, but very successful. With a minimum of effort, they are suitable to recommend goods attractive to a consumer. In Chapters 2 and 3, we will try to learn something about them and try to find, what types of more sophisticated systems have been

(8)

implemented until now. We will also focus on studying a more complicated technique - Reinforcement learning.

In Chapters 4 and 5, we will describe, how to implement a prototype of reinforcement learning based algorithm. With a real purchase data set, we will focus on problems resulting from the technique architecture, mainly on reducing data set dimension problem and building an environment for the reinforcement learning agent.

Finally in Chapters 6 and 7, we will inform the reader about how the implemented system behaves in the environment built on the gathered data.

We will be able to describe the speed of learning of the implemented algorithm and gather some statistics. We will not be able to determine, if the system is successful because it would need to be used on the e-commerce site to be able to learn, gather additional data and fine-tune the constants used in the algorithm. We will also talk about possible future improvements which could affect the way, how the system behaves. With additional data and further data preprocessing, we might improve the performance of the implemented system.

(9)

Chapter 2 Methods

2.1 Top-N

The Top-N algorithm is the most popular one. It is also considered as one of the most successful (according to the [11]). It just recommends products based on the amount of clicks on the e-commerce site over a time period.

The more is a product popular, the higher is the probability, that it will be recommended to the consumer.

There are several modifications of the algorithm developed to do the algorithm more sophisticated. One of them is described by Deshpande in [5].

He presents problems connected with an item-based variant of the algorithm which uses similarities between products to bring a better recommendation.

We can also use any available information about users to bring a user-based variant of the algorithm.

2.2 Market basket analysis

Market Basket Analysis (MBA) is a technique based on presumptions (we used [17] as a source for this section). Let’s have a simple example of the technique. When a user already has some items in the market basket, we can guess, what he could be more likely to buy. For example a user of PC hardware e-shop, which has a 15-inch notebook in his shopping cart. Based on MBA, we can guess, that the user will be more likely to add a case for 15-inch notebook into his market basket than another notebook or a case for 10-inch notebook.

(10)

The technique is naturally more sophisticated than guessing. The concept of the method is generating simple association rules based on data gathered during previous shopping sessions. An association rule could have the following form:

IF{15-inch notebook} THEN {15-inch notebook case}

So the technique is simple, based on a set of items in the market basket (itemset), we recommend another itemset. The system can generate more sophisticated rules, less general, based on concrete items. Each of the rules has a probability, that it will be chosen by the user. A rule has two parameters which determine its quality. The first one issupport, which determines the probability, that a user will add the itemset used as presumption of the rule. The second one, confidence, denotes the probability, that the user will follow the rule.

This is a straightforward approach which is not very difficult to implement. That’s why it is very popular. But it also has several disadvantages:

• a large amount of data has to be processed,

• a high number of rules is generated.

So when implementing a MBA-based algorithm, we need to reduce input data and bring the number of generated rules down. The most frequent way how to do that is to set the parameters (confidence and support) strictly.

When the support is higher, the algorithm will use as presumptions only itemsets which have turned out in the gathered data for a certain amount of times. When the confidence is higher, only those rules, that have been followed for a certain amount of times, are used.

That could be very good, but we can get rid of rules that could have became successful. That’s also a problem when the rules are customised based on the experience of a salesperson, because it could show up that the consumer’s behaviour is very hard to be predicted.

MBA based algorithms could be also used for further analysis. A disci- pline called Differential MBA is used to compare parameters of the same association rules across a spectrum of e-commerce sites. When a rule has a high confidence all of the compared sites and it is not enough confident in a single shop, it tells us that we need to reconsider something (e.g. the price of the product, advertisement). It also could mean that the site has users with different behaviour.

(11)

The Differential MBA could be also used with data of only one e-commerce site, comparing different time periods, different groups of users, etc.

2.3 Reinforcement learning

There are three terms tightly connected to the reinforcement learning (RL) problem:

policy

A policy determines, how the agent will react according to the current conditions. In general, it is a function which chooses the action made by the agent based on input values.

reward

A reward is a number which tells the agent, how good is an action.

value

A value of a state is a number which tells the agent the total reward, that could be available from states which follow.

Reinforcement learning technique is based on maximising a numerical reward. A learner is not told, which action he is supposed to do, he has to learn this by itself. He has to discover, which action leads to the highest reward.

In comparison with the MBA technique, Reinforcement learning works with so-called delayed reward. When using MBA, you have several rules reflecting the current itemset in the market basket. The only scale you have is the confidence parameter which tells you the probability of choosing the rule. When using reinforcement learning, the algorithm tries to think about all of the following steps which could have been made and maximise the reward made by the whole shopping session, not only by the following step.

In the speech of reinforcement learning, a shopping session matches an episode and adding an item into the market basket to taking an action or making a step. So we need to maximise the reward made during an episode, not only during a step, which is not the same.

Reinforcement learning also utilises so-called trial-and-error search approach. When starting the learning process, all of the states of market basket have its value initialised arbitrarily. This means that all of the states are as

(12)

good as any other state. The value is changed during the learning process which creates the resulting difference between the states.

When maximising the delayed reward, we need the states to influence each other. The algorithm has to be designed to change a value of a state based on values of the obtainable states. That is how we are able to determine, which path is the best to choose in the current state.

An advanced technique, eligibility traces, could be used to speed up the influencing between states. It distributes the reward gained by taking an action to the state where the action has been taken from.

There is another difference between MBA based algorithms and Rein- forcement learning algorithms. When the MBA algorithm is deterministic and its policy is straightforward - recommending based on the confidence of rules, the RL algorithm could be non-deterministic. It is able to make an exception in its behaviour and not to recommend a rule with a high value. Because of the trial- and-error approach, we need the algorithm to explore. Time to time, we need it to make an unexpected decision to be able to explore new paths and let the algorithm decide if the path could be successful.

By exploring the environment, the algorithm will estimate values of states. A value of the state is the main scale when comparing states. The algorithm uses reward as a base element of the value, but the value contains also a lot of experience. When it chooses between two states, it can exploit its experience and take an action with the highest value (so-called greedy action) or it can explore by taking an non-greedy action. A special variant of algorithm, so-calledε-greedy, chooses the non-greedy actions with a probability determined by ε. The algorithm generates a random number. If it is lower than ε, an non-greedy action is taken, an greedy one otherwise.

It is important to realise, that an ε-greedy approach is not the only way how the algorithm is forced to change a decision made in the certain state. From episode to episode, values of states are changing which leads to exploration.

This is a chance, how to get a completely new recommendation, which is important mainly in the case of newly introduced product to the market.

An MBA-based algorithm will not integrate a newly introduced product until someone will buy it. When considering confidence and support of the rule, it can take weeks until the product appears in an itemset. That is why many recommending systems uses at least two types of rules. The first one is generated based on purchase data, the second one is integrated into the

(13)

Algorithm 2.1 Tabular version of Watkins’s Q(λ) algorithm

1: Initialise Q(s, a) arbitrarily and e(s, a) = 0, for all s, a

2: repeat {for each episode}

3: Take action a, observe r,s⁰

4: Choose a⁰ froms⁰ using policy derived fromQ {e.g. ε-greedy}

5: a^∗ ←argmax_bQ(s⁰, b){if a⁰ ties for the max, thena^∗ ←a⁰}

6: δ ←r+γQ(s⁰, a^∗)−Q(s, a)

7: e(s, a)←e(s, a) + 1

8: for alls, a do

9: Q(s, a)←Q(s, a) +αδe(s, a)

10: if a⁰ =a^∗ then

11: e(s, a)←γλe(s, a)

12: else

13: e(s, a)←0

14: end if

15: end for

16: s ←s⁰

17: a ←a⁰

18: until s is terminal

(14)

system manually, by the staff of the e-commerce site. Also rules based on Top-N algorithm can be involved.

When speaking about the newly introduced products, we should not forget to think about discontinued products. Every product has limited lifetime on the market, so we need it to not to be recommended when it is no longer available. If we consider an MBA based algorithm, it can recommend the discontinued product for a long time, because the support of the rule could remain (depends on how the support is computed) as well as confidence. Be- cause of the high confidence, the rule will be recommended despite the fact it has no longer any sense. We would need to remove the product manually or restrict the length of the time period of for the purchasing history. But this can change the results in the case of permanent products.

When implementing an RL algorithm, we could implement a flexible policy, which would penalise a state like this very quickly. If we will gather data about the given recommendations, we will find out very quickly, that the recommendation is no longer followed by users and bring down the value of states representing itemsets containing the product. We can naturally do that automatically while regenerating the recommendations (also for a MBA based variant) by not including the product.

But what is more important, it can also react to the preferences of the consumer, when a product became more popular then another one. Let’s have two rules, A and B. Both of them have the same support, because of having the same itemset as a presumption. But rule B has a lower confidence (e.g. mobile phone with buggy OS) than the rule A. But when a vendor of the buggy OS will publish a hotfix, the mobile phone will became popular, even more than the one from itemset A. MBA based system will still recommend the itemset A because of its deterministic approach until rules are regenerated. But the rule B has to be more confident already. RL based system with sophisticated policy could detect a change in consumers behaviour and penalise the rule A. It just continues to learn itself. And don’t forget about the non-deterministic variant of algorithm.

A good approach is to take advantage of the possibility to initialise values of the states arbitrarily. If we set the value of the state to the maximal value, the algorithm will choose automatically those states which has not been tried yet. It will quickly explore all the states in the state space and penalise those which are not popular. This technique was described by R. Sutton in [25] as Optimistic Initial Values. Nevertheless, we need to construct the algorithm as a non-deterministic one to make it able to try the state again in the

(15)

future and check if it doesn’t changed its popularity. But finding an optimal value of ε parameter is a challenging problem. We need to find a balance between exploration and exploitation to get good results. It is even harder when using the Optimistic Initial Values, because the initial exploration is done automatically without the need of the ε-greedy policy.

(16)

Chapter 3 State of the art

There are many approaches how to deal with the recommendation problem.

Let us introduce some of them which helped us to get more familiar with the problem and made us to think, how to solve the problem using techniques available nowadays.

3.1 Taxonomy of recommender systems

Schafer et al. [23] try to set up a taxonomy for recommender systems based on six examples of big e-commerce sites(Amazon [12], CDNOW [13] (purchased by Amazon in 2002), eBay [6], Levis [2], Moviefinder [24] and Reel [19] ). According to the study, these sites are using one or more recommender systems. At a higher level, the authors are presenting three base types of the recommender systems:

Browsers into buyers

Systems that are used to convince a random visitor of the site to buy something.

Cross-sell

Systems suggesting additional products (e.g. products similar to the product currently viewed by the customer, or accessories for a product added into the market basket).

Loyalty

More sophisticated systems which try to offer a customised recommen-

(17)

dation based on information gathered from the customer (demograph- ics, previous shopping sessions, user behaviour on the site, etc.).

But if we look closer, we can categorise recommender systems using another two criteria:

Persistency

If the system uses data gathered during previous sessions, the generated recommendations arepersistent. If the system gives the customer recommendations based on the current session, the recommendations are ephemeral.

Automation

If the system needs to interact with the user and retrieves data from him, we think about it as a manual recommender system. Many of the systems can be called partially automatic because the degree of user interaction is low or the interaction is not perceived as such. For example, recommenders based on the previous purchases are partially automatic, recommenders based on the ratings performed by the customer or even verbose comments are mentioned in the text as manual.

As an example of a fully automatic system we can mention systems that use information gathered from the users social network account.

But we can also classify the systems by the technique used to generate a recommendation:

Non-Personalised Recommendations

The system recommends products based on the data gathered from the rest of the customers. Each customer gets the same recommendation.

Those systems can be automatic because they do not need any explicit interaction with the customer. They are also ephemeral, because it does not matter what actions has the user made in the past. We can mention an average user rating as an example of this technique.

Attribute-Based Recommendations

Those systems are often manual, because they need the user to spec- ify the product he needs to find by entering the product properties.

This technique can be ephemeral (based on currently offered products) as well as persistent (newsletters composed of the attribute-specified products).

(18)

Item-to-Item Correlation

This technique uses the user selection of a small group of products (e.g. placing them into the market basket) to find more products the customer could be interested in. The best known example of this technique is suggesting accessories or alternative products.

People-to-People Correlation

This technique is also called ascollaborative filtering, because it uses a similarity between customers to generate the recommendations. Sys- tems based on this kind of technique are very often automatic. They learn from behaviour of previous users. But user feedback (e.g. rating) can be also used too, which would make this technique partially automatic. It is also persistent because it uses data gathered during previous sessions.

User inputs

Starting from purchase data, mentioning rating, comments and editor’s choice, many sites use a system based on user input. This is what Schafer et al. in the article callmanual, because it is tightly connected to a user interaction.

One of the most interesting ideas of their paper is that the most frequently used technique, generating recommendations based on purchase data, is often being implemented not very correctly. Many of those systems take the act of the buying of a product as an expression of satisfaction.

Schafer et al. mention the fact, that a customer might not be satisfied with the product he bought. But the system needs to get a further feedback from the customer to discover that.

It is also true that many of those systems are used to increase comfort of the customer and any possible increasing of the site’s revenue are rather side-effects. Even despite the fact that data gathered from recommending systems can be used to adjust the price of the product to make more profit.

3.2 Different techniques of recommender sys- tems

Sarwar et al. made in [22] a research focused on several different techniques used to implement a recommender systems. There are three main goals of the paper:

(19)

• analysis of effectiveness of recommender systems,

• performance comparison of a couple of different recommendation algorithms,

• new approach with effectiveness advantages.

The authors were running a couple of tests and using standardF1 metric [26]

to complete the tasks. They performed the comparison of three algorithms.

Two personalised and one non-personalised. The personalised algorithms were based on the nearest-neighbour collaboration filtering technique and a dimensionality reduction technique (based onPrincipal Component analysis [3] andLatent semantic indexing[4]). The non-personalised one was based on a data mining technique based on association rules. The recommenders were run on data gathered from two sources - a large e-commerce site (Fingerhut Inc. [15]) and a movie recommendation site (MovieLens [20]). They tried to find an algorithm, which is scalable, because of large amounts of the gathered data, but that also gives quality recommendations. An algorithm, that keeps balance between time spent on searching neighbours and recommendations quality.

The association rules algorithm was based on computing support and confidence for each rule. At first, the algorithm find all rules with support greater than zero, in the next step, it orders them by the computed confidence. Then top N items are selected.

For the collaborative filtering algorithm, a set of users noted as neighbours is prepared. It contains users that have bought or viewed the same products with the target user in the past.

The authors divided the Collaborative Filtering recommendation problem into three main parts:

• representation of input data,

• neighbourhood formation,

• recommendation generation.

A typical example of input data for collaborative filtering based recommender systems is a collection of historical purchasing transaction. It creates a relation between U users and P products. A common way of representing the data as aP×U matrixM whereMi,j = 1 means, that the i-th customer has purchased the j-th product, 0 otherwise.

There are several problems which make this step difficult:

(20)

Sparsity

When the amount of products offered by the site is too large, only a very few users have bought a significant amount of products related to the recommendations. But the target user can correlate with many other users which makes the recommender system unable to make a recommendation.

Scalability

With a large amount of products and customers, the amount of data required for the recommendation is even larger.

Synonymy

One product can be named a little bit different by its vendor so the system should detect the same products with a different name.

3.3 Tuning the performance

Another research of recommender systems performance was made by Huang et al. They wrote a study [11] which compares a spectrum of collaborative filtering algorithms for e-commerce recommendation. It uses the taxonomy set by Schafer et al. in [23]. Moreover, the authors mentioned three types of source data which are frequently used:

• product attributes,

• consumer attributes,

• previous interactions between consumers and products (buying, rating, catalogue browsing).

Those types of data are the most frequent, but not the only ones. Various recommender systems use various types of input data. Based on those input data types, we can develop a clear model that discovers rules (e.g. user X often buys books with tag Y).

The method examined in the study is collaborative filtering. It uses the data based on consumer-product interaction and ignores the attributes of both of the entities. This technique leads to a simple recommendation of the most popular products which is according to the study considered as the most successful method. But there are several problems tightly connected to this technique. The most important are:

(21)

Binary character of data

It just says, whether the consumer bought the product, but it does not contain the information, if the consumer was thinking about buying the product.

Sparsity problem

A lack of feedback to work with consumer similarity and other patterns for prediction

The authors compare six different algorithms including a new algorithm they developed based on link analysis thoughts.

User-based algorithm

Prediction of future customer steps based on aggregation of observed transactions of all the customers.

Item-based algorithm

The same as the user-based, just the data are driven by products.

Dimensionality reduction algorithm

The data matrix is firstly condensed into a less sparse one using the singular vector decomposition, then the user-based algorithm is used.

Generative model algorithm

Method based on probabilities using interaction matrix, latent class variables and expectation maximisation; the result is a potential score for a product in a relation to a customer.

Spreading activation algorithm

Based on transitive associations, eliminating the sparsity problem;

graph representation of the data set, where each node represents consumer or product and has its activation level which is iteratively increased; the result is number of paths between two nodes and the length of those paths, the authors call it connectedness.

Link analysis algorithm

This algorithm was developed by the authors of the article based on link analysis research to achieve the same success as the PageRank or HITS did. The data set is represented by the bipartite graph with two types of nodes (product, consumer) where a link between two nodes means that the product has the potential to be a part of consumer’s

(22)

interest. It is a rather more complicated algorithm than the others and it respects more than only the binary relationship (e.g. penalisation of frequent consumers and products) so it would be interesting to see its results.

There were several disciplines for comparing the algorithms:

• Precision,

• Recall,

• F Measure,

• Rank Score,

• Area under ROC (Receiver Operation Characteristics) curve.

Based on literature, the authors had the following expectations:

• better algorithm performance with unreduced data set,

• algorithms alleviating the sparsity problem have better performance than a Top-N-types algorithms,

• link analysis algorithms have better performance than collaborative filtering algorithms.

Reported observations:

• all algorithms achieved better performance with the unreduced data set,

• the link analysis algorithm achieved the best performance across all configurations except one specific data set,

• different algorithms perform different on different data sets,

• the Top-N algorithm performs rather well for many cases, but it gives us different recommendations than the others,

• the dimensionality reduction algorithm is the slowest one, the others are nearly comparable.

(23)

3.4 Reinforcement learning on the web

Golovin et al. describe in [8] the architecture of a recommendation system based on reinforcement learning. The system is based on rules as well as many other systems, but it also uses more than one algorithm at the time, moreover a reinforcement learning algorithm is applied to results to personalise them based on the user behaviour. The authors wanted to build a flexible and generic recommendation system which is able to react to changes in user behaviour.

It is interesting that the system uses two-level recommendations. It uses several algorithms to build a rule database. After that a learning module works with the rules stored in the database and generates a recommendation. It also gathers a feedback from the user. The system has a specific recommendation algorithm for each logical section of an e-shop. That is also because of the different attributes set for each of the products type.

The architecture of the system does not limit the number of used recommender algorithms which is the most interesting part of the paper. Each algorithm produces a set of rules and saves it into the recommendation database. Each rule has its weight from the interval from 0 to maximal weight of the rule, which is value based on recommender-specific heuristic.

Based on literature, the weight should not be affecting the following process of reinforcement learning. The system uses the following recommenders:

Content similarity

A score which determines the similarity between two products is computed.

Top-N

A simple algorithm where the products are recommended based on the amount of clicks over a specified period.

Top-N for the category

The same as Top-N, but applied separately to each e-shop category.

Sequence patterns

Products which one user interacts with during one session.

Item-to-item collaborative filtering

Items frequently appearing in the basket together are recommended to each other.

(24)

If multiple algorithms produce the same rule, the second recommendation process considers only the one with the highest recommender-specific weight.

At first, the system determines the context of current session. Based on that, all rules matching the context are retrieved from the database. If the amount of the selected rules is not large enough, the context is iteratively extended and the whole process is repeated.

The learning module has to modify the weights of the rules based on the feedback from the user (whether the user previously followed the rule).

The goal is to flexibly react to changes in user’s behaviour and interests.

Sets of recommendation are shown to a user. After that, more than one approach is available. One can simply count number of clicks for each recommendation, but the authors of the paper calls this approach too greedy.

They came up with a special formula which helps to compute the weight for recommendation:

Q(r) =

1− 1 T

Q(r) + Feedback(r) T

T stands for the size of sampling window andF eedback is positive if the recommendation r is clicked, negative otherwise.

3.5 Social networks based recommendations

A social network is a great source of personal information. The best known example of a social network is Facebook [14]. The network has got over 500 million users. Facebook is well known due to privacy issues, because it is frequently changing the privacy settings and making more and more information visible publicly. Despite this fact, people are sharing more and more information on Facebook and other social networks. If the user allows that, an application is able to retrieve many useful information from the user’s profile, using its API. When speaking about Facebook, you can get access to the user’s interests, favourite books, music, movies, TV shows, brands, country, etc. You can also find out, which user groups has the user joined, which fan pages he or she likes. Moreover, you can find out, whom the user interacts with, how often, etc. You can read locations which the user has visited in the past from their photo albums.

For a personalised recommender, a social network connection is one of the most important things, because of the fact that nobody is able to gather more personal data about the user without forcing the user to fill up personal

(25)

data again and again on each website he registers on, moreover, he or she probably does not see any benefits in doing that. But there is a big chance that the user will allow the shop to read his personal data from a social network, because it needs him to do a single click. This approach can be used also for collaborative filtering based on the interactions between users on the social network. A recommendation system is able to group users based on the gathered data and customise recommendations for a user based on actions of another user from the group (e.g. friend, fan of the same movie, etc.). The point is, that the system does not need data connected to its assortment, it can improve its experience just with the connections between the users.

As an example, we can mention a recently announced cooperation of two giants, Facebook and Amazon (announced by Mashable.com in [18]). The partnership between those companies makes users to be able to link their Amazon accounts to their Facebook accounts. After doing that, Amazon is able to scan user’s interests and activity. Amazon will also suggest to the user, what to give to his friend on his upcoming birthday and let him share the order on Facebook. This can convince a random user who is not browsing the web of the electronic shop to buy something. A personalised recommendation has been given to him.

Some media are also speculating, that Facebook will introduce its new feature - sharing current location. Based on the concept of another social networks (Foursquare [7] or Gowalla [16]), this could bring a completely new level of personalised recommendations. If an application could guess the user’s future location based on the experience, it could offer him a pair of new ski before he will leave to his winter vacation in the Alps.

(26)

Chapter 4 Our method

We will implement a prototype of recommender system working only with purchase data. Based on the research (especially Sutton and Burto [25]), we have chosen to implement one of the variants of Monte Carlo methods. Monte Carlo is a name for a group of methods which estimates complete returns.

Monte Carlo methods are totally different from another RL methods. For example, it does not need to know the whole environment like Dynamic Programming methods do. The outstanding feature of the algorithm method is that it is based only on experience with the environment, so it is able to learn itself from previously gathered data.

There are two basic types of Monte Carlo methods - on-policy and off- policy. In the case of on-policy Monte Carlo, the algorithm always follows the policy which is being estimated. That means that an action which is chosen based on the policy is also taken. Off-policy Monte Carlo method estimates a policy while following another one. This is exactly the approach that we need to use when simulating the episodes based on purchase data. We will have an estimation policy which estimates actions with the highest value, but it takes actions based on the real purchase data (which is theoretically also a policy).

When using eligibility traces technique to speed up the learning process, the algorithm will be based on Watkins’s Q(λ) algorithm 2.1 (Q-learning, Watkins 1989 [25]).

It has got four constants, α (step-size parameter), γ (discount-rate parameter),λ (decay-rate parameter for eligibility traces) andε(ε-greedy policy parameter), which have to be fine-tuned based on the quality of recommendations produced by the recommender system. This will need a lot of

(27)

1

2 agen t Ru n(s t a t e S p a c e) 3 {

4 // g e t e p i z o d e s w h i c h s h o u l d b e s i m u l a t e d b a s e d on 5 // p u r c h a s e d a t a ( b u y i n g p r o d u c t s )

6 e p i z o d e s = r e a d D a t a F r o m F i l e(ACTION BUY) ; 7

8 // i n i t i a l i z e t h e a g e n t 9 a g e n t S t a r t(s t a t e S p a c e) ; 10

11 // l o o p o v e r a l l t h e e p i z o d e s 12 f o r e a c h (e p i z o d e i n e p i z o d e s)

13 {

14 // s i m u l a t e t h e e p i z o d e

15 s i m u l a t e E p i z o d e(e p i z o d e, s t a t e S p a c e) ;

16 }

17

18 // r u n some f i n a l i z e r ( c l e a n u p , e t c . ) 19 a g e n t E n d( ) ;

20 }

1 e p i z o d e S t a r t( ) 2 {

3 // s t a t e s i n i t i a l i z a t i o n ( e v e r y s h o p p i n g s e s s i o n b e g i n s 4 // w i t h an em pt y m a r k e t b a s k e t )

5 c u r r e n t S t a t e = ∅;

6 p r e v i o u s S t a t e = ∅; 7 }

1 s i m u l a t e E p i z o d e(e p i z o d e, s t a t e S p a c e) 2 {

3 // i n i t i a l i z e t h e e p i z o d e 4 e p i z o d e S t a r t( ) ;

5

6 // r u n o v e r a l l t h e p r o d u c t s i n v o l v e d 7 f o r e a c h (p r o d u c t i n e p i z o d e)

8 {

9 // p e r f o r m a s i n g l e s t e p

10 a g e n t S t e p(p r o d u c t, s t a t e S p a c e) ;

11 }

12

13 // en d t h e e p i z o d e 14 e p i z o d e E n d( ) ; 15 }

(28)

1 a g e n t S t e p(p r o d u c t, s t a t e S p a c e) 2 {

3 // s i m u l a t e t a k i n g an a c t i o n ( a d d i n g p r o d u c t i n t o t h e b a s k e t ) 4 n e x t S t a t e = c u r r e n t S t a t e.add(p r o d u c t) ;

5

6 // t h e r e i s any a v a i l a b l e a c t i o n 7 i f (n e x t S t a t e.h a s A c t i o n s( ) )

8 {

9 //ε−g r e e d y p o l i c y

10 i f (getRandomNumber( ) <EPSILON)

11 {

12 // e x p l o r e − o b s e r v e a random a c t i o n

13 e s t i m a t e d S t a t e = n e x t S t a t e.t a k e A c t i o n(ACTION RANDOM) ;

14 }e l s e

15 {

16 // e s t i m a t e − o b s e r v e an a c t i o n w i t h t h e h i g h e s t v a l u e 17 e s t i m a t e d S t a t e = n e x t S t a t e.t a k e A c t i o n(ACTION HIGHEST VALUE) ;

18 }

19

20 // g e t T em por al D i f f e r e n c e b a s e d on Q−L e a r n i n g t e c h n i q u e 21 d e l t a = c u r r e n t S t a t e.g e t R e w a r d(p r e v i o u s S t a t e)

22 +GAMMA ∗ (

23 e s t i m a t e d S t a t e.v a l u e − c u r r e n t S t a t e.v a l u e

24 ) ;

25

26 // e l i g i b i l i t y t r a c e s 27 c u r r e n t S t a t e.t r a c e s++;

28

29 // e s t i m a t e t h e p o l i c y

30 f o r e a c h (s t a t e i n s t a t e S p a c e)

31 {

32 // a p p l y t h e t e m p o r a l d i f f e r e n c e , 33 // i n c r e a s e v a l u e o f e v e r y s t a t e

34 s t a t e.v a l u e +=ALPHA ∗ d e l t a ∗ s t a t e.t r a c e s; 35

36 // e l i g i b i l i t y t r a c e s

37 i f (s t a t e == e s t i m a t e d S t a t e)

38 {

39 s t a t e.t r a c e s ∗=GAMMA ∗LAMBDA;

40 }e l s e

41 {

42 s t a t e.t r a c e s = 0 ;

43 }

44 }

45 }

46

47 // p r e p a r e f o r an u pcom in g l o o p 48 p r e v i o u s S t a t e = c u r r e n t S t a t e; 49 c u r r e n t S t a t e = n e x t S t a t e; 50 }

1 e p i z o d e E n d( ) 2 {

3 // a t t h e en d o f t h e e p i z o d e , o b s e r v e t h e g a i n e d r e w a r d 4 c u r r e n t S t a t e.v a l u e = c u r r e n t S t a t e.g e t R e w a r d(p r e v i o u s S t a t e) ; 5 }

1 a g e n t S t a r t(s t a t e S p a c e) 2 {

3 // i n i t i a l i z e t h e s t a t e s p a c e 4 f o r e a c h (s t a t e : s t a t e S p a c e)

5 {

6 // o p t i m i s t i c i n i t i a l v a l u e

7 s t a t e.v a l u e =MAXVAL;

8

9 // no e l i g i b i l i t y t r a c e s a r e i n v o l v e d 10 s t a t e.t r a c e s = 0 ;

11 }

12 }

(29)

1 agen t Ru n(p r e p a r e S t a t e S p a c e( ) ) ;

feedback from running a production version of the system.

4.1 The state space

While implementing the chosen Monte-Carlo algorithm, we have to consider that the algorithm is based on looping over all of the states of the state space. If we need the algorithm to visit each state many times to make the computed policy more accurate, the state space should not have a large dimension. If so, the algorithm would have to take an enormous amount of steps to generate an useful policy and provide interesting data.

The question is, what to consider as a state and how to generate the state space. We need the application to recommend goods so there is a straightforward approach to have a state as an itemset as MBA does it.

When considering basics of the reinforcement learning technique, we would need to generate all possible itemsets and build-up the state space on them.

4.2 The reward

We have to determine, how the reward for taking an action will be computed.

The reviewed recommender systems were based on the product’s popularity.

They were looking for interaction between product and user and more interactions with more users made the product more confident, determined to be recommended. This is a very good point of view for the consumer, but not so good for the shop, because the system can recommend popular products which make the shop less profit than another one.

That is why the implemented variant will insist on profit made by e- shop. But that would be a very straightforward and there would not be any place for the main feature of the reinforcement learning technique - the learning itself. That is why the reward will respect the popularity of previous recommendations.

The learning module will have to log all given recommendations and find out if (and how many times) the recommendation was or was not followed by the user. This will determine a ratio which will be applied with a cer-

(30)

tain constant to the reward. To compute the reward r we use the following formula:

r=ρP, ρ= F R,

where R is the number of recommendations of the action and F is the number of followed recommendations and P is the profit.

Why is the ratioρapplied like this? When the recommendation is favourite by consumers ρ converges to 1. That means that withρ computed like this, the success of the product is limited, we just penalise the value of products which are not successful. That should be a good feature when popularity of products will change.

But the states penalised like this wouldn’t be recommended anymore due to the algorithm. The algorithm will have anεparameter (so-calledε-greedy policy is generated) which makes the algorithm non-deterministic. It is then able to choose a random recommendation to try if there is a change in the recommendation popularity. ε is one of the parameters which have to be tuned.

(31)

Chapter 5 Implementation

5.1 The application

The system is implemented as a console application. There is no need to implement any sophisticated GUI, because we expect, that the application will run on a *nix system based server, over the SSH protocol. User documentation can be found on the enclosed DVD.

The application stores the source purchase data in a database. It is able to import data into the database from a directory specified by the user of the application. It scans the given directory recursively and looks for all

*.csv files in its subdirectories. The data are imported into the database with specified relations, using foreign keys, so that we are able to perform more complex queries to get the required data.

The system is divided into several packages and classes. A detailed info is available in the programmer’s documentation which can be found on the enclosed DVD. Here is a brief description of the most important classes:

Recommender

The Recommender class handles retrieving of the data from the database.

It prepares the state space and run the agent. Also all of the import methods are implemented in this class.

StateSpace

This class encapsulates some well known data structures to offer a more comfortable work with the state space.

State

The state space consists of states which are represented by this class.

(32)

It handles adding products and carries all the information needed to run the reinforcement learning algorithm.

Agent

This class is an implementation of the reinforcement learning algorithm. All the RL logic is coded in its methods.

5.2 Workflow

The recommender system uses the purchase data logged by the e-commerce site to generate the state space and episodes. The data is also used to get familiar with all the products offered by the e-commerce site. It also uses data containing the popularity of the recommendations, which are stored in the database. A consumer interacts with the recommender system. The interaction generates feedback shared with the e-commerce site. The schema of the workflow could be seen on the Figure 5.1.

Figure 5.1: Workflow of the recommender system

5.3 Technologies

While implementing a prototype of a recommender system, we have used the following technologies:

Java

An object-oriented, platform-independent, multithreaded programming

(33)

environment was used to prepare a system which is able to run on the most of the commonly used platforms [21].

PostgreSQL

The PostgreSQL object-relational database management system was used to handle the purchase data and provide a more comfortable way of working with them [9].

PostgreSQL JDBC Driver

The PostgreSQL JDBC driver allows Java programs to connect to a PostgreSQL database using standard, database independent Java code.

It is a pure Java (Type IV) implementation. [10]

(34)

Chapter 6 Experiments

Based on the method described in Chapter 4, we have implemented a prototype of a recommender system. Let us see, what we have learnt while implementing it.

6.1 Source data

We did not have to generate artificial data, because a retailer for mobile hardware provided us data gathered on their own e-commerce site. The gathered data contains information about shopping sessions which are identified by session IDs. The session IDs were used by company’s web server to identify requests from the same browser during a single session. That is how we are able to group actions made by a single user during one shopping session.

A small amount of records contains a persistent user ID. In the case of those records could be the recommender able to personalise recommendations or use collaborative filtering technique. Because of the majority of anonymous data and based on the taxonomy set up by Schafer, Konstan and Riedl [23], we will implement a system which is not persistent, it will be an ephemeral one. The only available information about a product is its relation to a category, so we are not able to implement a system which uses products attributes.

The data contains three types of actions made by the consumer:

• product page view (ACTION VIEW),

• adding a product into the market basket (ACTION ADD),

(35)

• buying a product (ACTION BUY).

Based on that, we have a complete history of actions made by users on the website. It is a lot of user input, which also determines another classification of the implemented system.

The implemented system could be categorised as a cross-sell system, but it would be a little bit sophisticated than a system which is simply based on association rules. The produced recommendations will be non-personalised because of character of the provided data. The data leads to an input which is not sparse, every line of the source files carries an useful information which can be used to improve the results of the recommender. But if we consider a matrix representing product-session relations, it will be very sparse.

6.2 The state space

The first idea is to generate all of the possible itemsets as a state space. The data provided consists of about 3500 products which is not a big number, but if we consider the amount of all possible combinations which is

3500

X

i=1

3500 i

!

= 2³⁵⁰⁰ −1

That gives a very large state space which takes a lot of time only to generate it. Working with a state space of this dimension would take a really long time. Based on the research and my experience from implementing more simple tasks (e. g. Mountain Car), it will be very hard for the algorithm to learn and work with such a large state space.

So we need to reduce the state space. There is more than one possibility how to do that. Let us figure out, which one will be used.

Based on the classic data mining technique, the Apriori algorithm (described in [1]), we were able to determine the largest amount of products sold together. When running the algorithm with support set to 2 to eliminate really rare counts of products sold together, we found out that in the shop which provided us the data, a customer buys at most 5 products together at a time. That gives us

5

X

i=1

3500 i

!

= 4370579245304826

states which is a great improvement, but it is still a very large number.

(36)

0 200 400 600 800 1000 1200 1400 1600 1800

0 5 10 15

No. of products

Category ID

Products counts per categories

Figure 6.1: Number of products in the categories

There was a consultation of the problem with the company which provided us the data. We were told that all products from a single order are practically always each from another category. So they gave us an additional data - tree of categories and relations between categories and products. Ac- cording to the consultation, we decided to keep only the main categories (which are categories with no parent) and products connected to its descen- dants connect to them. It was an attempt to divide data into the smallest reasonable count of categories with a large number of products. After doing that, the state space should be generated in the same way like before, only with one additional constraint: each state contains up to one product from each category. This gives 18 groups of products, the counts of product for each of them could be seen in Table 6.1. This cuts down the number of states, but not enough to get an appropriate state space.

It just requires a completely another point of view. The Monte Carlo algorithm should learn itself, what is popular and bring down the reward made by adding the non-popular combinations of the products into the market basket. And that represents a lot of states, because if we look into the gathered data, one can see that only a small amount of combinations is ever used. They will be recommended several times and after that rejected for future recommendation. But the algorithm will still have to loop over them.

(37)

When thinking about the data, we realised that it carries the needed information. It contains the combinations of products sold together in a single shopping session. But when strictly following the data, the reinforcement learning technique will lose one of its features. It would not be able to explore any new possible path in the state space. That will negatively influence the behaviour of the algorithm.

Now, the goal has slightly changed. We need to find a balance between reducing the state space and limiting the reinforcement learning possibilities.

It can be solved easily by manual injection of new combinations into the recommender input data. This is always a way, how the shop can extend the state space and let the algorithm decide, if the inserted state could be successful. There is also a massive marketing campaign when a new product is introduced, so there is a chance that the new product will show up in the input data.

It is also not a big problem to integrate a new product into the state space so it will be recommended to the customer. The state space will grow linearly according to the number of products. The problem is to include states, which represents combinations of the new product with some of previously used products. It would not be a trivial task to do that. We would need to process all of the existing states and based on the relation of the products to the categories substitute the products in the combination. In the basic version of the algorithm, we will try to learn, how the algorithm behaves when using combinations generated by previous shopping sessions.

When generating the state space based on the gathered data (steps leading to the combination are also included as a separate state which), we got:

• 41371 states when using ACTION VIEW,

• 3742 states when using ACTION ADD,

• 1232 states when using ACTION BUY.

Data gathered during one month was used to learn how the state space grows depending on the chosen action type. If we do not include the steps leading to the final combination as a separate state, we will get about a half of states for each case. Why is the number of states for case no. 1 so different from the other cases? It also contains data generated by indexing robots - many irrelevant combinations.

The action type ACTION ADD seems to generate the state space which is not very large, but reasonable and provides more combinations of products

(38)

which the recommender can try to make popular. We also use the type ACTION BUY, because it represents the final itemset added to the basket (a consumer is able to remove products from the basket, which is not logged).

When simulating episodes, the algorithm will use the type ACTION BUY actions as a source of episodes data.

The following pseudocode shows, how we generate the state space:

1 p r e p a r e S t a t e S p a c e( ) 2 {

3 // b e g i n n i n g w i t h an em pt y s t a t e s p a c e 4 s t a t e S p a c e = ∅;

5

6 // g e t s h o p p i n g s e s s i o n s b a s e d on a d d i n g p r o d u c t s i n t o t h e m a r k e t b a s k e t 7 // and b u y i n g p r o d u c t s

8 s e s s i o n s = r e a d D a t a F r o m F i l e(ACTION ADD | ACTION BUY) ; 9

10 // l o o p o v e r a l l s e s s i o n s 11 f o r e a c h (s e s s i o n i n s e s s i o n s)

12 {

13 // add s t a t e s b a s e d on p r o d u c t s i n v o l v e d i n t h e s e s s i o n 14 s t a t e S p a c e.a d d S t a t e s(s e s s i o n.g e t P r o d u c t s( ) ) ;

15 }

16 }

1 a d d S t a t e s(p r o d u c t s) 2 {

3 // b e g i n w i t h an em pt y s t a t e 4 s t a t e = $\e m p t y s e t$; 5

6 // l o o p o v e r a l l g i v e n p r o d u c t s 7 f o r e a c h(p r o d u c t i n p r o d u c t s)

8 {

9 // i f t h e p r o d u c t i s n o t i n v o l v e d y e t 10 i f ( !s t a t e.c o n t a i n s(p r o d u c t) )

11 {

12 // i n t e g r a t e t h e p r o d u c t i n t o t h e s e s s i o n 13 s t a t e.a d d P r o d u c t(p r o d u c t) ;

14

15 // i f t h e s t a t e s p a c e d o e s n o t c o n t a i n t h e g e n e r a t e d s t a t e ,

16 // we w i l l add i t

17 i f ( !t h i s.c o n t a i n s(s t a t e) )

18 {

19 t h i s.a d d S t a t e(s t a t e) ;

20 }

21 }

22 }

23 }

6.3 Settings

There are several parameters of the algorithm. We have to choose their values:

α

The step-size parameter α was set to 0.2. Sutton and Burto in [25]

often uses the value 0.1, but they make experiments with episodes based on a lot of steps. That is how we decided to use a higher value.

(39)

0 2000 4000 6000 8000 10000 12000 14000 16000

0 5000 10000 15000 20000 25000 30000 35000 40000

Size of state space

No. of shopping sessions

No. of states generated

Figure 6.2: Size of the state space depending on the number of shopping sessions involved

γ

The discount-rate parameter was set to 0.9, because we do not need to penalise episodes with a lot of steps. But we are fine with short episodes too.

λ

The decay-rate parameter for eligibility traces was set to 0.3.

ε

Because of the optimistic initial values, theε parameter was low, 0.05.

We did not know profits made by selling a product. That is why we used randomised profits to run the experiment.

6.4 Results

Let us see, how the state space size grows with the number of involved shopping sessions on 6.2. We can see, that the consumers of the e-commerce site use many different itemsets while purchasing. About 25000 shopping sessions give us more than 12000 different states. As you can see, the growth