|
I have a signal processor running in real time that evaluates a real-valued input vector and produces a single real scalar output. The signal processor is implemented as a multi-layer partially connected neural network using the FANN library, on a standard x86 multicore CPU. Due to the real-time nature of the problem it's critical that the neural network can be evaluated on new input with as little response time as possible. The current trained network takes longer than desirable, some approaches I've taken or considered to speed it up: 1) Using Oracle Learning to reduce the size of the network. (Performance analysis indicates that both the number of connections and number of neurons need to be reduced to hit target runtime.) Oracle learning generally led to good MSE at the targeted network size (90%+ correlation with the full-size network), but tends to produce very weak estimates in the tails of the network output. Due to the problem domain these extreme points are highly relevant and need to be represented accurately. 2) Using linear (or quadratic) sensitivities to estimate the next network value from the previous. Most of the updates to the input vector tend to be very similar to the previous vector. The approach here is to calculate linear sensitivities from the previous network evaluation, and use them to allow a very fast response without having to wait for the whole network to evaluate. Similar to above this also produces quite good MSE (95%+ correlation with the full evaluation), but is weak at those same type of extreme points (which also tend to be points where the input vector moves away from the previous vector). 3) Similar to the previous approach, but calculate individual neuron higher-order sensitivities. Fully evaluate neurons and connections that indicate that the linear approximation may be far off. The hope being that those rapid changes could still be evaluated accurately on the subset of the network that they need to be, with the rest of the network linearly approximated. I'm not sure the most efficient implementation here, and haven't seen any literature on it. 4) Evaluate the full network in real-time but utilize multi-threading across cores to speed up the response time. This might be possible, but is difficult. Multiple hidden layers and connections between most of the neurons means a single neural network evaluation isn't easily parallelizable. That means a lot of complex lock-free concurrent programming for possibly little to no reward. I've seen literature on multi-threading neural network training or batch evaluation, but don't know of any work showing the most efficient concurrent approach to a single evaluation on a neural network. Conclusion: Obviously I don't expect there to be a silver bullet here. I expect the payoff to different approaches to also be highly subjected to the problem domain and nature of the data. But I'd greatly appreciate any feedback on my ideas, tips from people who've dealt with similar issues, or knowledge of other angles that I might be unaware of. Thanks. |