|
I'm building a Getting Things Done app. My principle complaint with most GTD / TODO apps is they show me too much. When I'm at work, I don't want to see tasks I'm planning to do on the weekend. When I'm at home, I don't want to see work tasks. When out running errands, I just want to see my errands tasks, etc. I've been thinking about this problem for awhile and with some help from Bret Victor's article "Magic Ink" I've designed a system I think will work quite well. At the heart will be a system that learns what projects/tasks/tags I want to see based on the context I'm in. So when I'm at work, the system should know to only show me work items. When I'm at home, work items should be hidden. What I'm trying to figure out right now is which machine learning algorithm would be "best" for my system. How the system will work is at each page load and every 10 minutes or so while the application is open, a context object will be created on the client based on the current time, day of week, location, and anything else I can think up. Based on the context, every project/task/tag will have a relevancy score calculated. The most relevant projects/tasks/tags will be shown on the home page of the app. Irrelevant items would be hid completely. The system will learn based on what the user does after opening the app. If she clicks on a project on the front page, that will strengthen the connection between the current context and that project and any tags on that project. If she has to change one of the global filters (say at 9pm she turns on the work tag and turns off all other tags as she has a late meeting with her team in India), the system should quickly learn that at around 9pm, items tagged with "work" are really relevant. It should also learn over time more subtle things like this meeting happens only on Mondays and Thursdays but not on the third week of the month and also never when she's not at home (e.g. she's traveling so the meeting is canceled). I've read a fair bit about machine learning and have some practical experience but I'd like some expert guidance about what would be the appropriate algorithm for this use case. My best guess right now is to use some variant of a neural network but I might be completely off-base. I'd prefer simpler algorithms that I can implement myself as I'd like to learn more about ML. Also that the algorithm is not a black box that I can't tweak. |
|
If I understand you correctly, quite a few of your predictors will be discrete (as opposed to continuous) variables (including bunch of irrelevant variables). In such case I would recommend using tree-based methods since they can naturally deal with such variables (and have a few other nice properties). I would start with simple tree-based learner (for example, see R package rpart). If you need more accuracy (and are okay with loosing interpretability), I would also try ensemble methods built on the top of decision trees, such as Random Forest and Gradient Boosting Machine (R packages randomForest and gbm, respectively) And before trying decision trees, I would check whether a simple logistic regression (trained for each project/task/tag) suffice - if it does then I would not bother with decision trees.
(Dec 09 '11 at 14:57)
Yevgeny
|