|
So we have pairs of data for learning Why I need such NN (simplified a bit): I have table of data pairs (
and I have another table of data pairs on one machine:
lots of such lines on many I have one
And when a new slave would calculate its data I would be capable to append it to my NN. My main point is to get info (name or something) of NN type that supports Map/Reduce is grate concept for sorting lots of data at once. What to do if you have small parts of data and you need to reduce it all the time? Simple example - choosing a service for request. Imagine we have 10 services. Each provides services host with sets of request headers and post/get arguments. Each service declares it has 30 unique keys - 10 per set. service A: name id ... Now imagine we have a distributed services host. We have 200 machines with 10 services on each. Each service has 30 unique keys in there sets. but now to find to which service to map the incoming request we make our services post unique values that map to that sets. We can have up to or more than 10 000 such values sets on each machine per each service. service A machine 1 name = Sam id = 13245 ... service A machine 1 name = Ben id = 33232 ... ... service A machine 100 name = Ron id = 777888 ... So we get 200 * 10 * 30 * 30 * 10 000 == 18 000 000 000 and we get 500 requests per second on our gateway each containing 45 items 15 of which are just noise. And our task is to find a service for request (at least a machine it is running on). On all machines all over cluster for same services we have same rules. We can first select to which service came our request via rules filter 10 * 30. and we will have 200 * 30 * 10 000 == 60 000 000. So... 60 mil is definitely a problem... I hope to get on idea of mapping 30 * 10 000 onto some artificial neural network alike Perceptron that outputs 1 if 30 words (some hashes from words) from the request are correct or if less than Perceptron should return 0. And I’ll send each such Perceptron for each service from each machine to gateway. So I would have a map Perceptron <-> machine for each service. Can any one tall me if my Perceptron idea is at least “sain”? Or normal people do it some other way? Or if there are better ANNs for such purposes?
This question is marked "community wiki".
|
Are url-a, url-b, url-c class labels? If so does each machine have examples from each class? It seems like you might be able to use ensembles, but I may be failing to understand your exact problem.
Type of data is not so important string/int mainly. Each learning data set has unique data in it with garantee of no duplication. Main thing for me is to be capable to create NN from small parts and be capable to update it with new parts.
It looks like you're trying to do multi-class classification with mutually exclusive classes, but again I'm just guessing as it is not entirely clear from your question.
I would recommend editing your question and adding pertinent details regarding the problem you're trying to solve.
For example, are you doing regression or classification? If the later, are the classes mutually exclusive? Are the subproblems you're solving different instances (different data) of the same problem, or completely different problems? Are you interested in other methods which may be potentially better suited for solving this problem? If not, why have you decided neural networks are the best choice?
In my experience adding such details will greatly improve your chances of getting a quality answer.