|
I'm not really familiar with Weka, I learned to use it by watching some tutorials so I am not 100% sure if my approach is the correct one. I have collected the Reuters21578 dataset and I use the documents prescribed in the ModApte split. To make it easy I load training and test instances in Weka (first all training instances followed by all test instances), perform preprocessing and during classification is specify a 75% split. I was wondering if someone can have a look at my .arff file as well as the output of this classification to tell me whether or not I did something wrong. It can be found here: https://dl.dropboxusercontent.com/u/42974675/dataset.zip For clarification, I represent each topic in the Reuters dataset in its binary form {0,1} and train a classifier separately for each topic. In the zip-file in the link is the .arff file and output file of the topic acq. |
Please just someone who can tell me if this is correct or not? I'm really stuck with my work if i'm not certain if i can go through with this approach. I find it very odd to have like +99% of correct classifications.