0
1

Is there any literature on learning from a large number of short sequences, where each sequence corresponds to something like a user history but there are no shared items as in collaborative filtering? That is, each sequence is a small (0-10) set of labeled feature vectors and the task is to predict the n+1st label given the user and new features. Any temporal component is likely to be small so I'm not looking for time series or state space methods. Ideally I'd also like to avoid an expensive E step for computing per-user latent variables though that may be inevitable.

asked Jul 01 '10 at 11:33

cityhall's gravatar image

cityhall
76378

Do I understand correctly: If the users have no shared items in common, then they all have different feature vectors. But if every feature occurs at most once, then on what basis can there be any generalization?

(Jul 01 '10 at 16:10) Joseph Turian ♦♦

One Answer:

You can try to learn a "maximum entropy markov chain" for that (as in, you model the sequence of feature vectors as each feature arising from a separate markov chain with transition probabilities learned by logistic regression). It shouldn't be very hard to train (ie, you can get an example for each vector you have in the training set). Also nothing stops you from using higher order features.

If you want to model the users you can try to do this transition model mixed with some algorithm for domain adaptation, treating each user as a different domain. I suspect fitting a global model and using it as a "prior" (as in, regularizing by norm of the difference between the user-specific classifier and the global classifier) should work, or maybe some bayesian model that clusters users and uses a single model with variation for each user. For example, you can sample each vector for each user from a gaussian whose mean was sampled from a dirichlet process model (to guarantee that users will share features) and the variance you estimate globally.

answered Jul 01 '10 at 12:37

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

edited Jul 01 '10 at 16:28

I'd considered this and temporal RBMs, but I think they'll end up modeling the transitions instead of the users, and placing too much emphasis on the last observation. Higher order features are a possibility but that seems inelegant.

(Jul 01 '10 at 16:07) cityhall

I edited to add some information as to how you might incorporate the per-user info in this model.

(Jul 01 '10 at 16:29) Alexandre Passos ♦
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.