|
So yesterday Google opened up a classification routine as Google Prediction API As a rule, I think this is a good step for data analysis and the adoption of machine learning by a broader community. But what do you think? Also, if anyone has insight about the algorithms used, chime in. |
Is there a list of the tasks the API can perform? Looks like collaborative filtering and classification. Is there something else?
I think it's a nice toy, but the fact that there is no insight on the used algorithms isn't good at all. Also, I find hard to believe that there is no parameters to be adjusted.
The API's been around for a year or so actually. It was accessible with a developer account and also required access to Google Storage. I tried the now deprecated v1.1 and at the time one could do multi-class classification and regression. If I recall correctly, the people behind it would not answer questions about which methods were used. I did some benchmarking (Boston Housing, Iris sets) against "standard" classifiers and they performed about the same.
The classsification output only consisted of a label (no score/confidence) but I think that's been added. Training data are submitted as .csv to Google storage, maximum 100mb per set, features could be numeric or text. Training is initated via an API call, referencing the set on Google storage.
The biggest limitation - probably intentional - was (is?) that only one sample can be classified per API call. Classifying 15000 datapoints, persistent connections, asynchronous calls and all still took about 10 minutes. Then again, the API was never meant to be used for batch classification.
All in all an ambitious undertaking, wonder if the pay service takes off.