BPR Paper Notes
Bayesian Personalized Ranking (BPR) (section 3.1) is a per-user ranking approach that optimizes a smooth approximation of the area under the ROC curve (AUC) (section 2.3) for each user. BPR has seen successful applications in item prediction from implicit positive-only feedback  –even though the framework is not limited to positive-only data – and tag prediction .
The Bayesian Personalized Ranking (BPR) framework  consists of an optimization criterion and a gradient-based learning algorithm for personalized item recommendations. BPR is based on the idea of reducing ranking to pairwise classification .
Because the size of the training data DS is quadratic in the number of items, the BPR learning algorithm (Figure 4) samples from DS instead of going over the complete set of item pairs.
Note that maximizing BPR-Opt is similar to maximizing the AUC (eq. 12), by approximating the non-differentiable δ(x) by the differentiable sigmoid function σ(x). See  for a more detailed explanation.
Collaborative filtering has traditionally been done w/ using a KNN technique that involves building a simularity matrix and distance measures like Pearson correlation. More modern techniques learn the matrix directly via matrix factorization.
In  he describes early methods to learn the user-item matrix using SVD matrix factoriztion that were prone to over fitting b/c of lack of regualarization.s
 B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Incremental singular value decomposition algorithms for highly scalable recommender systems. In Proceedings of the 5th International Conference in Computers and Information Technology, 2002.
Regarding training, all past work on implict recomendations work to optimize if an item is selected by a user or not. BPRs work is to optmize of ranking of items.
In past implicit systems handled postive feedback by giving it a positive label and negative feedback a negitive label. the latter is an issue b/c w/ impliciti data the remainder after positive interactions are not necessarly negative.
Then that leads to the argument that a classifier w/ enough expressiveness will learn to always predict negative. The only way these systems can make inferences on new imput is to prevent overfitting by incurring hevy (regulariztion) penalities on parameter weights.
Instead of replacing missing values w/ negative ones, BPR uses item-pairs for a given user as training data and works to optimize the correct ranking of the pairs. This is as opposed to optimzing the error of a single item approximation.
BPRs training data is composed of user, pos, neg triples where user is a given user and pos is an random item the user interacted w/ and neg is a random item the user did not interact with. It then works to maximize the difference between the score of pos and neg meaning it ranked the pos item over the neg item w/ the largest possible margin.
It is important to note that even though we use thesame models as in other work, we optimize themagainst another criterion. This will lead to a betterranking because our criterion is optimal for the rank-ing task. Our criterion does not try to regress a singlepredictor xˆul to a single number but instead tries toclassify the di↵erence of two predictions xˆui xˆuj
For evaluation we take 1 pos item for each user and compare it against every negative item and give it a 1 if it has a larger rank. Therefore an average of 1 is a perfect AUC score.
the AUC of a classifier is equal to the probability that the classifier will rank a randomly chosen positive example higher than a randomly chosen negative example, i.e. P(score(x+)>score(x−))