I am working on a new bioinformatics project and I have some doubts about which machine-learning algorithm I should choose.
I have a matrix made of ~ 100 rows and 12 columns. Each entry contains a real value. The first 6 columns refer to a particular concept (firstClass), the following 6 to another one (secondClass), and the last column is a boolean flag stating if that correlation between the firstClass row and the secondClass row is correct or not.
It's a bioinformatics problem: each row is a chromosome region, each column is a cell type, and each entry is the level of sensitivity of that chromosome region to the DNase I enzyme for that cell type.
It's something like this:
firstClass[1...6] | secondClass[1...6] | label
----------------------------------------------------------------------------------------------------------
0.69 0.00 0.00 1.26 0.25 1.05 | 0.35 2.29 0.92 2.05 5.75 1.96 | YES
1.53 0.20 0.53 1.08 0.74 0.85 | 1.53 0.24 0.51 1.04 2.74 3.85 | NO
0.63 1.00 0.77 1.12 0.06 0.84 | 3.73 2.93 1.38 3.33 4.35 2.31 | NO
4.73 1.23 1.36 5.33 5.35 1.31 | 4.83 1.73 9.36 2.33 4.35 1.21 | YES
... ... ... ... ... ... ... ... ... ... ... ...
I know that some of these rows represent correct correlations between the two vector types (the ones with an YES flag in the end), and some other don't (the NO ones).
My goal is to find out what is the statistical correlation between correct rows. Why some rows are YES and some others are NO.
I think this problem setting is suitable to be faced with an artificial neural network. But I am wondering, which neural network algorithm should I choose?
A simple feed-forward neural network? An autoencoder? A Restricted Boltzmann Machine? Something else? How to choose?