how to do market basket analysis in r

how to do market basket analysis in r插图

There is a arules package” in Rwhich implements the apriori algorithm can be used for analyzing the customer shopping basket. It requires 2 parameters to be set which are Support and Confidence. We will see how Market Basket analysis performed propose recommendations in 2 areas: Store Layout Marketing and Catalogue Arrangement.

What are the key concepts of market basket analysis?

Basically, there are four key concepts of Market Basket Analysis: Rule: a rule expresses the incidence across transactions of one set of items as a condition of another set of items, i.e. X = Support: the support for a set of items is the proportion of all transactions that contain the set;

How do you pick rules in market basket analysis?

In market basket analysis, we pick rules with a lift of more than one because the presence of one product increases the probability of the other product(s) on the same transaction. Rules with higher confidence are ones where the probability of an item appearing on the RHS is high given the presence of the items on the LHS.

What is the difference between support and confidence in market basket analysis?

Market Basket Analysis. Confidence value of 1 indicates If someone buys Product N, they are 100% likely to buy Product D. The support value of 0.067 indicates that 6.7% of the transaction in the data involve Product N purchases. Hence the support indicates goodness of the choice of rule and confidence indicates the correctness of the rule.

What is association rule mining in market basket analysis?

This entire process and analysis are known as ‘Market Basket Analysis’ in terms of technology and data. It works on the idea that if a customer buys one item, they are bound to buy (or not buy) another related item or group of items. To implement this, associate rule mining is used. Let’s learn what Association Rule Mining is.

What is Apriori algorithm?

APRIORI is the by far widely-used and well-known association rule algorithm. It is considered accurate and outperforms AIS and SETM algorithms. It finds frequent itemsets in transactions and identifies association rules between those items. One of the limitations of the Apriori algorithm is a frequent itemset generation.

How are association rules used in a basket?

Association rules are widely used to analyze basket or transaction data to discover strong rules based on the interestingness and frequency of occurrences.

What is support in a transaction?

Support is the number of transactions that include both {A} and {B} parts as a percentage of the total number of transactions.

What is market basket analysis?

Market Basket Analysis is a technique that is used to discover the association between items. In simplest terms, it allows retailers to identify a relationship between items that generally people buy together.

What is the largest eCommerce chain?

If you are still not able to understand it completely, here is an example from Amazon, the world’s largest eCommerce chain.

What is the R language?

R language uses the ‘arules’ package to represent, manipulate, and analyze transaction data and patterns. It uses frequent itemsets and association rules to perform an MBA on data.

What is the go-along technique in eCommerce?

This is further used in up-sell technique of eCommerce where retailer finds an increase in the sales of one item, they can promote sales of related items by giving discount so that people buy them together. This analysis is popular for cross-selling and up-selling of products. Retailers are using this in their marketing campaigns to boost sales and cross-sell products to customers.

How is Market Basket Analysis used?

As a first step, market basket analysis can therefore be used to determine where goods are to be placed and promote within a shop. If buyers of barbie dolls buy more sweets, high margin candy can be placed near the doll display of Barbie as observed.

What is association rule?

Association rules are widely used to analyse retail basket or transaction data, with the objective of establishing strong rules based on a strong transaction information rules concept.

What is a food dataset?

The foodstuffs dataset contains exactly this: a collection of receipts with a receipt of 1 receipt per line and items purchased. Every column in a column represents an item, each line is called a transaction.

What is confidence in accounting?

Confidence: the probability that a rule is correct with items on the left for a new transaction. Confidence tells us what percentage of transactions with item A also have item B. (e.g., how many transactions that have bread also have butter).

What does lift mean in math?

Lift (A => B) = 1 means that within a set of elements there is no correlation.

What is lift in statistics?

Lift: The ratio that exceeds the expected confidence by the confidence of the rule. The ratio of the number of respondents obtained with the model to the number obtained without the model is known as lift.

What are the three important proportions?

Three important proportions can be understood: support, confidence, and lift . I shall describe in the following bullet points the importance of these

Introduction to Market Basket Analysis

What’s in your basket? In this first chapter, you’ll learn how market basket analysis (MBA) can be used to look into baskets and dig into itemsets to better understand customers and predict their needs.

Metrics & Techniques in Market Basket Analysis

In this chapter, you’ll convert transactional datasets to a basket format, ready for analysis using the Apriori algorithm. You’ll then be introduced to the three main metrics for market basket analysis: support, confidence, and lift, before getting hands-on with the Apriori algorithm to extract rules from a transactional dataset.

Visualization in Market Basket Analysis

Let’s get visual. In this chapter, you’ll visually inspect the set of rules you have previously extracted. Visualizations in market basket analysis are vital given that often you are dealing with large sets of extracted rules. You’ll use the arulesViz package to create barplots, scatterplots, and graphs to visualize your sets of inferred rules.

Case Study: Market basket with Movies

We’re going to the movies. In this final chapter, you’ll apply everything you’ve learned as you work with a movie dataset. Using market basket analysis you’ll turn this dataset into a movie recommendation system, using information from movie transactions to understand and predict what your audience might want to watch next.

What is the maximum number of items in a 80% confidence interval?

Its provides default setting results with 80% confidence and maximum of number of items is 10.

What is affinity analysis?

Basically, it is the study of “what goes with what”. Examples are customers who bought X item also bought Y item or in another case what symptoms go with what diagnosis. In most cases, companies are not interested why Y bought with X, they just want to identify the patterns. Its also called association rules or affinity analysis.

Does Amazon Prime suggest movies?

Many examples are available, suppose if you are login into amazon prime, they will suggest some of the interesting movies to you based on your previous watch views.

Introduction

Hundreds and thousands of transactions occur every day in a supermarket, while a customer would buy multiple products in each transaction. For example, it may look like this in the database: { Transaction1: Product1, Product3, Product4, Product8, Product9 }. In a larget data set of transactions, purchase patterns, i.e.

Example

The following is an example of how to do Market Basket Analysis using R with a large data set from a Belgian supermarket chain ( http://fimi.uantwerpen.be/data/retail.dat ).

Applications

Market Basket Analysis can contribute to supermarkets in various ways. First, supermarkets can use the output to optimize physical layouts in the stores. For example, supermarkets should put products of 3402, 3535, and 3537 together according to the analysis. Second, some creative marketing strategies could be made accordingly.

Why do we pick rules with a lift of more than one?

In market basket analysis, we pick rules with a lift of more than one because the presence of one product increases the probability of the other product (s) on the same transaction. Rules with higher confidence are ones where the probability of an item appearing on the RHS is high given the presence of the items on the LHS.

What does a confidence value of 1 mean?

Confidence value of 1 indicates If someone buys Product D and G, they are 100% likely to buy Product E. The support 0.2 indicates that 20% of the transaction in the data involve both Product D and G purchases. Hence the support indicates goodness of the choice of rule and confidence indicates the correctness of the rule.

What does it mean when a lift is greater than 1?

A lift greater than 1 indicates that the presence of A has increased the probability that the product B will occur on this transaction. A lift smaller than 1 indicates that the presence of A has decreased the probability that the product B will occur on this transaction.

What does lift mean in statistics?

Lift indicates the strength of an association rule over the random co-occurrence of Item A and Item B, given their individual support. Lift provides information about the change in probability of Item A in presence of Item B.

What does "support" mean in a transaction?

Support of a product or set of products implies the popularity of the product or set of products in the transaction set. Higher the support, more popular is the product or product bundle.

Why should marketing team target customers who buy bread and eggs with offers on butter?

Marketing team should target customers who buy bread and eggs with offers on butter, to encourage them to spend more on their shopping basket. It is also known as "Affinity Analysis" or "Association Rule Mining".

What is a group of items?

Items are the objects that we are identifying associations between. For an online retailer, each item is a product in the shop. A group of items is an item set (set of products).

What purchases did product X influence?

Now that we know what products influenced the purchase of sugar, let us answer the second question.

How to view products that influenced the purchase of sugar?

To view the products which influenced the purchase of sugar, we will continue to use the apriori () function but add one more argument, appearance . It restricts the appearance of the items. Since we want the right hand side of the rules to have only one value, sugar, we will set the rhs argument to sugar. The left hand side of the rules should include all the products that influenced the purchase of sugar i.e. it will exclude sugar. We will use the default argument and supply it the value lhs i.e. all items excluding sugar can appear on the left hand side of the rule by default.

What is lift ratio?

The lift ratio calculates the efficiency of the rule in finding consequences, compared to a random selection of transactions. Generally, a Lift ratio of greater than one suggests some applicability of the rule.To compute the lift for a rule, divide the support of the itemset by the product of the support for antecedent and consequent. Now, let us understand how to interpret lift.

What is an itemet?

Itemset is the collection of items purchased by a customer. In our example, mobile phone and screen guard are a frequent intemset. They are present in 3 out of 5 transactions.

What does the Y axis represent?

In the below plot, the Y axis represents the relative frequency of the items plotted.

What does "lift" mean in a phone?

Lift = 1: implies no relationship between mobile phone and screen guard (i.e., mobile phone and screen guard occur together only by chance)

What is confidence in a rule?

Confidence is the probability the consequent will co-occur with the antecedent. It expresses the operational efficiency of the rule. In our example, it is the probability that a customer will purchase screen guard provided that he has already bought the mobile phone.

What is the grocery data set?

That is exactly what the Groceries Data Set contains: a collection of receipts with each line representing 1 receipt and the items purchased. Each line is called a transaction and each column in a row represents an item. You can download the Groceries data set to take a look at it, but this is not a necessary step.

What is a receipt in a basket?

Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer’s basket – and therefore ‘Market Basket Analysis’.

What does redundancy mean in statistics?

Sometimes, rules will repeat. Redundancy indicates that one item might be a given. As an analyst you can elect to drop the item from the dataset. Alternatively, you can remove redundant rules generated.

What is support in statistics?

Support: The fraction of which our item set occurs in our dataset.

What happens if a customer buys coffee and sugar?

If a customer buys coffee and sugar, then they are also likely to buy milk.

What is the last step in a graph?

The last step is visualization. Lets say you wanted to map out the rules in a graph. We can do that with another library called “arulesViz”.

Is Rule 4 too long?

Rule 4 is perhaps excessively long. Lets say you wanted more concise rules. That is also easy to do by adding a “maxlen” parameter to your apriori function:

About the Author

You may also like these

[tp widget="default/tpw_default.php"]