I want to apply the following algorithm for my feature selection:

  1. Shuffle the training data and take a 10% sample.
  2. Run C4.5 on data from step 1.
  3. Select a set of attributes that appear only in the first 3 levels of the simplified decision tree as relevant features.
  4. Repeat 5 times (step 1-3)
  5. Form a union of all the attributes from the 5 rounds.
  6. Run Naïve Bayesian classifier on the training and test data using only the final features selected in step 5

I am pretty new from weka.. so I would like to know how can can select a set of attributes that appear in the first 3 level. What classes should I use?

asked Mar 09 '11 at 13:22

Alex%20Hernandez's gravatar image

Alex Hernandez
4081015


3 Answers:

I've tried similar things with Weka's J48 classifier (it's implementation of C4.5). Unfortunately, this quite a pain. The only way I found was via the command line using the -g option, which outputs a graph representation of the classifier. e.g.

java -cp /usr/share/java/weka.jar weka.classifiers.trees.J48 -g -t training.arff -o -d model.dat

digraph J48Tree {
N0 [label="married" ]
N0->N1 [label="= YES"]
N1 [label="YES (6.0/2.0)" shape=box style=filled ]
N0->N2 [label="= NO"]
N2 [label="NO (12.0/4.0)" shape=box style=filled ]
}

You could capture this text and parse it to get the node names.

answered Mar 09 '11 at 21:17

Cerin's gravatar image

Cerin
402253744

edited Mar 09 '11 at 21:17

I would suggest you post such Weka-specific questions to the Weka mailing list (preferably after looking briefly at the documentation, which you can find at the Weka home page).

answered Mar 09 '11 at 15:04

Oscar%20T%C3%A4ckstr%C3%B6m's gravatar image

Oscar Täckström
1459102743

this algorithm works pretty nice for the feature selection. but unfortunately there is no direct way of performing the same in weka. if you want to run the same indirectly then you can perform following steps: 1. load your dataset. 2. go to classifier selection and select C45 means J48 from the tree based classifier list. 3. perform the classification. 4. tree model will be generated and will be shown in the right side output panel. 5. from the generated tree find out the first three level of attributes. 6. now load the same dataset again. 7. use the filter remove from the attribute.unsupervised list and remove the attributes which are not in you list which you had generated in the step 5. 8. now classify the data with any of the other classifier that you want to use.

i hope this will be helpful to you...

regards, Ankit.

answered Jul 28 '11 at 01:10

ankit's gravatar image

ankit
1111

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.