|
We have extracted features from search engine query log data and the feature file (as per input format of Vowpal Wabbit) amounts to 90.5 GB. The reason for this huge size is necessary redundancy in our feature construction. Vowpal Wabbit claims to be able to handle TBs of data in a matter of few hours. In addition to that, VW uses a hash function which takes almost no RAM. But When we run logistic regression using VW on our data, within a few minutes, it uses up all of RAM and then stalls. This is the command we use- vw -d train_output --power_t 1 --cache_file train.cache -f data.model --compressed --loss_function logistic --adaptive --invariant --l2 0.8e-8 --invert_hash train.model train_output is the input file we want to train VW on, and train.model is the expected model obtained after training Any help is welcome! |
|
I've found the |