I'm not really familiar with Weka, I learned to use it by watching some tutorials so I am not 100% sure if my approach is the correct one.

I have collected the Reuters21578 dataset and I use the documents prescribed in the ModApte split. To make it easy I load training and test instances in Weka (first all training instances followed by all test instances), perform preprocessing and during classification is specify a 75% split.

I was wondering if someone can have a look at my .arff file as well as the output of this classification to tell me whether or not I did something wrong. It can be found here: https://dl.dropboxusercontent.com/u/42974675/dataset.zip

For clarification, I represent each topic in the Reuters dataset in its binary form {0,1} and train a classifier separately for each topic. In the zip-file in the link is the .arff file and output file of the topic acq.

asked Mar 13 '14 at 08:06

TheGreatEye's gravatar image

TheGreatEye
1444

edited Mar 13 '14 at 08:11

Please just someone who can tell me if this is correct or not? I'm really stuck with my work if i'm not certain if i can go through with this approach. I find it very odd to have like +99% of correct classifications.

(Mar 13 '14 at 08:25) TheGreatEye
Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.