I'm using a convolutional neural network to classify videos. I'm experimenting with max pooling over video frames. I already tried max pooling in a 3D neighborhood. Now, I implemented a pooling algorithm to pool 2 frames that takes the frame with the highest mean value. But my code compiles really slow and the execution time isn't that great either.
Is there a way to make this faster with Theano, without coding CUDA C?
from numpy import prod
import theano.tensor as T
from theano.ifelse import ifelse
def pool_frames(input, input_shape):
# reshape input (the images are on the last 2 dimensions)
new_shape = (prod(input_shape[:-2]),)+input_shape[-2:]
X = input.reshape(new_shape)
# get max frame indices
frames_idx = []
for i in range(shape[0])[::2]: # pooling with param 2
mean1, mean2 = X[i].mean(), X[i+1].mean()
max_frame = ifelse(T.lt(mean1,mean2),i+1,i)
frames_idx.append(max_frame)
# get max frames
indices = T.stack(frames_idx)
output = X[indices]
# reshape
out_shape = input_shape[:-3]+(input_shape[-3]/2,)+input_shape[-2:]
return output.reshape(out_shape)
asked
Mar 26 '14 at 05:57
jolix
26●3●3●7