Hi,

I am trying to fine tune my parameters in a LinearSVM using libLinear via scikits-learn package. I observe different values of C on different runs on my program which affects the accuracy of my classifier and results in variations. Here's what I do :

    X = sparse_feats
    Y = target_labels

    train, test = iter(StratifiedKFold(Y, 2, indices = True)).next()
    C_start, C_end, C_step = -3, 15, 2

    # Generate grid search values for C
    C_val = 2. ** np.arange(C_start, C_end + C_step, C_step)
    print C_val
    grid_clf = svm.sparse.LinearSVC()        
    print grid_clf

    linear_SVC_params = {'C': C_val}

    grid_search = GridSearchCV(grid_clf , linear_SVC_params, n_jobs = 10, iid = False, score_func = f1_score)
    grid_search.fit(X[train], Y[train], cv=StratifiedKFold(Y[train], 10)) 
    y_true, y_pred = Y[test], grid_search.predict(X[test])

    print "Classification report for the best estimator: "
    print grid_search.best_estimator

    print "Tuned for  with optimal value: %0.3f" % f1_score(y_true, y_pred)
    print classification_report(y_true, y_pred)

    print "Grid scores:"
    pprint(grid_search.grid_scores_)

    print "Best score: %0.3f" % grid_search.best_score

    best_parameters = grid_search.best_estimator._get_params()

I am not sure why I get different values of C on different runs of my program. I initially felt it may be due to the randomness of the StratifiedKFold function however, I manually sliced the numpy arrays into halves and tried things out but the variations still persist. The C value range is taken from the libSVM guide.

asked May 18 '11 at 02:25

Dexter's gravatar image

Dexter
416243438

edited May 18 '11 at 02:25

Are you getting different values for the best score too? Can you post some output?

(May 18 '11 at 15:19) Joseph Turian ♦♦

Joseph, Here's the variation for C, Accuracy for the same set of features:

C, Accuracy : { 32, 65.07 / 0.5, 63.02 / 128, 61.24 / 8.0, 63.01 / 2.0, 59.76}

A few quick points: - I am training C to be optimized on the the f1_score during the grid search - The Accuracy mentioned above is the accuracy of the classifier after the grid search is performed i.e. I perform the grid search and retrieve the C parameter dynamically and input it to my text classification task where I perform Stratified 10-fold cross validation.

(May 18 '11 at 23:25) Dexter
Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.