Hi,
I am trying to fine tune my parameters in a LinearSVM using libLinear via scikits-learn package. I observe different values of C on different runs on my program which affects the accuracy of my classifier and results in variations. Here's what I do :
X = sparse_feats
Y = target_labels
train, test = iter(StratifiedKFold(Y, 2, indices = True)).next()
C_start, C_end, C_step = -3, 15, 2
# Generate grid search values for C
C_val = 2. ** np.arange(C_start, C_end + C_step, C_step)
print C_val
grid_clf = svm.sparse.LinearSVC()
print grid_clf
linear_SVC_params = {'C': C_val}
grid_search = GridSearchCV(grid_clf , linear_SVC_params, n_jobs = 10, iid = False, score_func = f1_score)
grid_search.fit(X[train], Y[train], cv=StratifiedKFold(Y[train], 10))
y_true, y_pred = Y[test], grid_search.predict(X[test])
print "Classification report for the best estimator: "
print grid_search.best_estimator
print "Tuned for with optimal value: %0.3f" % f1_score(y_true, y_pred)
print classification_report(y_true, y_pred)
print "Grid scores:"
pprint(grid_search.grid_scores_)
print "Best score: %0.3f" % grid_search.best_score
best_parameters = grid_search.best_estimator._get_params()
I am not sure why I get different values of C on different runs of my program. I initially felt it may be due to the randomness of the StratifiedKFold function however, I manually sliced the numpy arrays into halves and tried things out but the variations still persist. The C value range is taken from the libSVM guide.
asked
May 18 '11 at 02:25
Dexter
416●24●34●38
Are you getting different values for the best score too? Can you post some output?
Joseph, Here's the variation for C, Accuracy for the same set of features:
C, Accuracy : { 32, 65.07 / 0.5, 63.02 / 128, 61.24 / 8.0, 63.01 / 2.0, 59.76}
A few quick points: - I am training C to be optimized on the the f1_score during the grid search - The Accuracy mentioned above is the accuracy of the classifier after the grid search is performed i.e. I perform the grid search and retrieve the C parameter dynamically and input it to my text classification task where I perform Stratified 10-fold cross validation.