I'm using a convolutional neural network to classify videos. I'm experimenting with max pooling over video frames. I already tried max pooling in a 3D neighborhood. Now, I implemented a pooling algorithm to pool 2 frames that takes the frame with the highest mean value. But my code compiles really slow and the execution time isn't that great either.

Is there a way to make this faster with Theano, without coding CUDA C?

from numpy import prod
import theano.tensor as T
from theano.ifelse import ifelse

def pool_frames(input, input_shape):

    # reshape input (the images are on the last 2 dimensions)
    new_shape = (prod(input_shape[:-2]),)+input_shape[-2:]
    X = input.reshape(new_shape)

    # get max frame indices
    frames_idx = []
    for i in range(shape[0])[::2]: # pooling with param 2
        mean1, mean2 = X[i].mean(), X[i+1].mean()
        max_frame = ifelse(T.lt(mean1,mean2),i+1,i)
        frames_idx.append(max_frame)

    # get max frames
    indices = T.stack(frames_idx)
    output = X[indices]

    # reshape
    out_shape = input_shape[:-3]+(input_shape[-3]/2,)+input_shape[-2:]
    return output.reshape(out_shape)

asked Mar 26 '14 at 05:57

jolix's gravatar image

jolix
26337

edited Mar 26 '14 at 06:03

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.