We have extracted features from search engine query log data and the feature file (as per input format of Vowpal Wabbit) amounts to 90.5 GB. The reason for this huge size is necessary redundancy in our feature construction. Vowpal Wabbit claims to be able to handle TBs of data in a matter of few hours. In addition to that, VW uses a hash function which takes almost no RAM. But When we run logistic regression using VW on our data, within a few minutes, it uses up all of RAM and then stalls. This is the command we use-

vw -d train_output --power_t 1 --cache_file train.cache -f data.model --compressed --loss_function logistic --adaptive --invariant --l2 0.8e-8 --invert_hash train.model

train_output is the input file we want to train VW on, and train.model is the expected model obtained after training

Any help is welcome!

asked Mar 30 '14 at 08:08

Satarupa%20Guha's gravatar image

Satarupa Guha
16112


One Answer:

I've found the --invert_hash option to be extremely costly; try running without that option. You can also try turning on the --l2 regularization option to reduce the number of coefficients in the model.

(See also the discussion on stackoverflow)

answered Mar 31 '14 at 12:02

Zach%20Mayer's gravatar image

Zach Mayer
163

edited Mar 31 '14 at 12:03

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.