Hi all ,

So I have this csv file which has 290 examples and 95 attributes. A snippet from the file is shown below ( first five examples). As you can see the last attribute is the class.

Example 3 & 4 belong to class '1', whereas 1,2,5 belong to class '0' I want to build a model based on this input file. Can somebody suggest me which software & algorithm to use. So that at least I would start working on this.

    Label,Betweenness,Betweenness_estimate,Closeness_all,Closeness_in,Closeness_out,Degree_all,Degree_in,Degree_out,Degree_total,Eigen,k_nbor_all,N-size1,N-size2,pep_length,dis_length,percent_dis,Strand,Protein abundance,Expression level,Ax1,Ax2,Ax3,Ax4,G3s,GC3s,Nc,Fec value,multyfunctionality,Complex_number,UUU,UCU,UAU,UGU,UUC,UCC,UAC,UGC,UUA,UCA,UAA,UGA,UUG,UCG,UAG,UGG,CUU,CCU,CAU,CGU,CUC,CCC,CAC,CGC,CUA,CCA,CAA,CGA,CUG,CCG,CAG,CGG,AUU,ACU,AAU,AGU,AUC,ACC,AAC,AGC,AUA,ACA,AAA,AGA,AUG,ACG,AAG,AGG,GUU,GCU,GAU,GGU,GUC,GCC,GAC,GGC,GUA,GCA,GAA,GGA,GUG,GCG,GAG,GGG,Class
        YAL002W,3565.714949,3565.714949,0.284801334,0.284801334,0.284801334,8,8,8,8,0.056580007,3,9,311,1274,89,6.985871272,1,736.4646183,0.2,-0.0259,-0.0291,0.0047,0.0054,0.1568,0.3481,50.9286,0.981960784,4,1,1.42,1.86,1.32,0.92,0.58,1.02,0.68,1.08,1.49,1.4,3,0,1.57,0.37,0,1,1.12,1.36,0.87,0.29,0.41,0.68,1.13,0.29,0.71,1.53,1.47,0,0.71,0.43,0.53,0.29,1.29,1.49,1.33,0.98,0.68,0.6,0.67,0.37,1.03,1.01,1.42,2.05,1,0.9,0.58,3.07,1.61,1.28,1.1,1.51,0.49,1.04,0.9,0.89,1.22,1.44,1.5,0.8,0.68,0.24,0.5,0.8,0
        YAL007C,16185.60588,16185.60588,0.294472535,0.294472535,0.294472535,25,25,25,25,0.028053586,3,26,354,215,38,17.6744186,-1,26271.84743,2.2,-0.0737,0.0269,0.0026,0.0265,0.1579,0.3732,49.4595,0.944444444,4,1,1.09,1.89,0.89,1,0.91,1.26,1.11,1,1.36,0.95,3,0,2.45,0.32,0,1,0.27,1,2,1,0.27,1,0,0,1.09,2,2,0,0.55,0,0,0,1.91,2,1,1.26,0.68,1,1,0.32,0.41,0.75,1.14,4,1,0.25,0.86,1,1.75,1.5,1.5,2.5,0.75,1.25,0.5,0,0.25,0.75,1.23,1,1.25,0.5,0.77,0.5,0
        YAL032C,13208.07058,13208.07058,0.296671491,0.296671491,0.296671491,22,22,22,22,0.043109604,3,23,494,379,310,81.79419525,-1,1674.508609,0.1,-0.0375,0.0266,-0.0092,-0.0013,0.2195,0.4417,58.7057,0.831578947,1,4,1.11,0.97,0.6,2,0.89,1.16,1.4,0,0.41,1.55,0,0,2.07,0.58,3,1,0.83,0.8,1.25,0.5,0.62,0.8,0.75,0.25,1.03,1.8,1.16,0.5,1.03,0.6,0.84,0.5,1.2,0.71,0.89,0.58,0.8,1.18,1.11,1.16,1,1.18,0.88,2.75,1,0.94,1.12,1.5,1.67,1.03,1.27,1,0.67,0.91,0.73,1,0.67,1.37,1.66,1,1,0.69,0.34,1,1
        YAL033W,497.7513203,497.7513203,0.227010985,0.227010985,0.227010985,2,2,2,2,0.000997598,4,3,16,173,35,20.23121387,1,2229.450427,1.2,0.0335,0.0371,0.0486,0.0055,0.2455,0.497,65.1633,1.074712644,1,1,1.33,1.5,1.33,1,0.67,0.3,0.67,1,0.35,0.9,0,0,2.12,2.1,3,0,1.06,2,1,0.6,1.06,1,1,0.6,0,1,1,1.8,1.41,0,1,0,0.75,0,0.83,0.9,1.2,1.6,1.17,0.3,1.05,0.8,1.14,1.8,1,1.6,0.86,1.2,0.89,0.67,0.92,0,0.89,0,1.08,1.14,0.89,2.67,1,1.14,1.33,0.67,1,1.71,1
        YAL010C,1413.265724,1413.265724,0.269198445,0.269198445,0.269198445,4,4,4,4,0.018303278,4,5,127,493,136,27.5862069,-1,768.1027686,0.3,-0.0145,-0.0218,0.0273,0.0091,0.1255,0.3305,50.675,1.145748988,5,0,1.21,1.36,1.2,0.8,0.79,1.09,0.8,1.2,2.85,0.73,0,3,0.71,0.82,0,1,0.71,0.67,1.09,0.3,0.1,1.11,0.91,0.6,1.02,1.78,1.56,0.9,0.61,0.44,0.44,0,0.62,0.91,1.37,1.36,0.5,0.91,0.63,0.64,1.88,1.55,1.31,3,1,0.64,0.69,1.2,2.13,0.47,1.52,1.41,0.53,0.94,0.48,1.18,1.33,1.88,1.62,0.47,0,0.71,0.38,0.94,0

asked Feb 04 '14 at 00:55

Rahul%20SIngh's gravatar image

Rahul SIngh
1223

edited Feb 04 '14 at 23:46


One Answer:

If there are 95 attributes then I am guessing you are only showing a few of them in you snippet above?

I am not sure if you have enough examples to build a classifier with high enough accuracy, it all depends on the correlation of your features(attributes) and the class.

A good starting point would be to build a logistic regression classifier, the following videos may help https://class.coursera.org/ml-003/lecture/preview (week 3 lectures are what are relevant, however you need to understand the prior concepts).

Are you familiar with Octave, Matlab or Python, if so that will make it easier, however I would say that it would be a little tough if you are not familiar with ML at all.

answered Feb 05 '14 at 00:43

Farzan%20Maghami's gravatar image

Farzan Maghami
1111

Thanks Farzan. So I just opened my dataset in WEKA, and it shows I have 219 instances(ie examples) and 95 attributes. How much should be the ratio instances:attributes for a good classification. Here is the link to complete data http://tinyurl.com/alldata-meta

Please see if you can help me get some insight.

(Feb 05 '14 at 01:27) Rahul SIngh

The ratio really depends on the correlation of the features, if they are really strong, then it should be fine, but if not then you may need more training data.

What language are you planning on doing this on?

(Feb 05 '14 at 06:06) Farzan Maghami

Farzan I guess I'll use octave. Secondly how do I know if the correlation is strong enough ? How to check it ? could you specify any method/tool

(Feb 05 '14 at 07:18) Rahul SIngh

I am not sure what the best method is, but a quick and obvious method would be to carry out liar regression and see how well it performs and then try and fine-tune it

(Feb 07 '14 at 08:55) Farzan Maghami

Thanks Farzan. As I told you the data has 95 attributes and only 280 (http://tinyurl.com/alldata-meta), so I was thinking if there is any method of reducing the dimensions/attributes of this data and only keep the most important ones/those which have high information content with respect to classification, prior to doing any analysis. ?

(Feb 08 '14 at 17:29) Rahul SIngh
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.