|
Hi Experts, I have some sample data posted below. Each row is an observation of 28 features. I was trying to train a model from this data set and then use that model to predict anomalies in testing data set. However, I had no luck in finding an appropriate algorithm suitable for this data set. So if you have anything that might be helpful, please let me know. Thanks a lot. P.S. happy thanksgiving. 16,160,40,8,8,8,0,16,16,16,32,32,160,16,16,16,4357360,4357360,0,0,10403840,0,10403840,0,0,0,0,0 112,1120,280,56,56,59,6,12,12,12,24,24,120,12,12,12,4357360,4979840,4357360,0,10403840,10403840,10403840,0,0,160,512,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4357360,4979840,4357360,0,10403840,10403840,10403840,0,0,160,512,0 32,320,80,16,16,17,2,4,4,4,8,8,40,4,4,4,4357360,4979840,4357360,0,10403840,10403840,10403840,0,0,160,512,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4357360,4979840,4357360,0,10403840,10403840,10403840,0,0,160,512,0 48,480,120,24,24,26,4,8,8,8,16,16,80,8,8,8,4357360,4979840,4357360,0,10403840,10403840,10403840,0,0,160,512,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4357360,4979840,4357360,0,10403840,10403840,10403840,0,0,160,512,0 32,320,80,16,16,17,2,4,4,4,8,8,40,4,4,4,0,4357360,4357360,0,0,10403840,10403840,0,0,160,512,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4357360,0,4357360,10403840,0,10403840,10403840,128,0,0,512 48,480,120,24,24,26,4,12,12,12,24,24,120,12,12,12,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 32,320,80,16,16,17,2,4,4,4,8,8,40,4,4,4,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 112,1120,280,56,56,59,6,12,12,12,24,24,120,12,12,12,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 32,320,80,16,16,17,2,4,4,4,8,8,40,4,4,4,0,4357360,4979840,4357360,0,10403840,10403840,10403840,128,160,0,512 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4357360,0,4357360,10403840,0,10403840,10403840,128,0,0,512 48,480,120,24,24,26,4,8,8,8,16,16,80,8,8,8,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 32,320,80,16,16,17,2,4,4,4,8,8,40,4,4,4,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 64,640,160,32,32,35,6,20,20,20,40,40,200,20,20,20,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 32,320,80,16,16,17,2,4,4,4,8,8,40,4,4,4,0,4357360,4979840,4357360,0,10403840,10403840,10403840,128,160,0,512 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4357360,0,4357360,10403840,0,10403840,10403840,128,0,0,512 48,480,120,24,24,26,4,8,8,8,16,16,80,8,8,8,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 32,320,80,16,16,17,2,4,4,4,8,8,40,4,4,4,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 96,960,240,48,48,50,4,8,8,8,16,16,80,8,8,8,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 32,320,80,16,16,17,2,4,4,4,8,8,40,4,4,4,0,4357360,4979840,4357360,0,10403840,10403840,10403840,128,160,0,512 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4357360,0,4357360,10403840,0,10403840,10403840,128,0,0,512 48,480,120,24,24,26,4,12,12,12,24,24,120,12,12,12,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 32,320,80,16,16,17,2,4,4,4,8,8,40,4,4,4,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 96,960,240,48,48,50,4,8,8,8,16,16,80,8,8,8,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 32,320,80,16,16,17,2,4,4,4,8,8,40,4,4,4,0,4357360,4979840,4357360,0,10403840,10403840,10403840,128,160,0,512 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4357360,0,4357360,10403840,0,10403840,10403840,128,0,0,512 80,800,200,40,40,42,4,16,16,16,32,32,160,16,16,16,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 32,320,80,16,16,17,2,4,4,4,8,8,40,4,4,4,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 80,800,200,40,40,42,4,8,8,8,16,16,80,8,8,8,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 32,320,80,16,16,17,2,4,4,4,8,8,40,4,4,4,0,4357360,4979840,4357360,0,10403840,10403840,10403840,128,160,0,512 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4357360,0,4357360,10403840,0,10403840,10403840,128,0,0,512 96,960,240,48,48,51,6,16,16,16,32,32,160,16,16,16,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 32,320,80,16,16,17,2,4,4,4,8,8,40,4,4,4,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 48,480,120,24,24,26,4,8,8,8,16,16,80,8,8,8,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4979840,4979840,4979840,4979840,10403840,10403840,10403840,10403840,128,160,0,0 32,320,80,16,16,17,2,4,4,4,8,8,40,4,4,4,0,4357360,4979840,4357360,0,10403840,10403840,10403840,128,160,0,512 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4357360,0,0,4357360,10403840,0,0,10403840,128,0,0,512 96,960,240,48,48,51,6,12,12,12,24,24,120,12,12,12,4357360,0,4357360,4979840,10403840,10403840,0,10403840,128,160,512,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4357360,0,4357360,4979840,10403840,10403840,0,10403840,128,160,512,0 32,320,80,16,16,17,2,4,4,4,8,8,40,4,4,4,4357360,0,4357360,4979840,10403840,10403840,0,10403840,128,160,512,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4357360,0,4357360,4979840,10403840,10403840,0,10403840,128,160,512,0 48,480,120,24,24,26,4,8,8,8,16,16,80,8,8,8,4357360,0,4357360,4979840,10403840,10403840,0,10403840,128,160,512,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,4357360,0,4357360,4979840,10403840,10403840,0,10403840,128,160,512,0 16,160,40,8,8,8,0,0,0,0,0,0,0,0,0,0,0,0,4357360,4357360,0,10403840,0,10403840,128,160,512,512 |
|
I don't know how many unique patterns there are but it looks like between 5 and 10. How about you simply calculate the multinomial distribution of these patterns and than compare it with the distribution in the test set. That of course assumes that all test set patterns already appear in the training set. But if your training set consist of only 5 patterns, then I don't think you can really generalize to new patterns if there is no really obvious rule behind them. So to get a grip on the data I would fist find out how many unique samples there are, whether you expect the same samples in the test set and whether there is any obvious pattern (like them being on a straight line or something like this). Once you have the unique patterns in hand I would do a PCA or similar to visualize the data and look for pattern. Hope that made some sense and helps you. Cheers |