Hi All,

I'm running some experiments where I have a large number of vectors to regress on (but unfortunately not all at once, or CUDA would be the key), and have used scikit's SGD to get myself a weight vector in CSR format. My sparse vectors are also in CSR format, and I'm interested in the most efficient way to infer using these two CSR vectors. I've profiled a bunch of different approaches, and am currently using scipy's binopt csr_elmul_csr() on the X and W vectors, and then summing the result, but this is still slow as it returns a new sparse array, and the creation of that array is taking up too much time (according to my profiling). I'm considering writing a small element-wise multiplier and adder in cython, but wanted to know if there was already a 'standard' approach to this. In the case there isn't, a bit of advice as to the most efficient way to element-wise multiply a priori unsorted CSR matrices in cython would be nice.

Sorry for the semi-double question, looking forward to any help.

Thanks,

Gabe

EDIT: I asked this question in a state of mind haze, after discussion on my side I'm gunna post a new, more concise question.

asked Jul 20 '11 at 12:40

tehf0x's gravatar image

tehf0x
1114

edited Jul 21 '11 at 10:32


3 Answers:

Sorry to answer myself, but the fastest solution was to rewrite the simple XW.T code in cython (remember, these are sparse vectors not matrices). Here y is used to represent the W weights. Here the sparse vector is passed as 2 ndarrays, one of indices, the other of corresponding values. This is a CSRMatrix's native format if you're dealing with an nx1 matrix (aka vector), so you can easily pass CSRMatrices to this code. In a program where XW.T is getting done ~10k times, this went from taking 33s with a standard numpy XW.T to 0.76 seconds, so this is really* faster, and the long runs are more on the order of 1e6 to 1e7 calls, so you can imagine the time saved :-P

import numpy as np
cimport numpy as np
import cython

@cython.boundscheck(False)
def t_dot( np.ndarray[int, ndim=1] x_indices, 
           np.ndarray[np.float64_t, ndim=1] x_data,
           np.ndarray[np.float64_t, ndim=1] y):   
    """
    Dot product of the transpose.
    """
    #Using this type is important, this gives us exactly the same values as it
    #will give exact same values as numpy's equivalent X*W.T
    cdef np.float64_t sum
    cdef int idx, ptr
    sum = 0.0
    for idx, ptr in enumerate(x_indices):
        sum += x_data[idx] * y[ptr]    
    return sum
This answer is marked "community wiki".

answered Aug 30 '11 at 10:22

tehf0x's gravatar image

tehf0x
1114

edited Aug 30 '11 at 10:27

Could you push you code + data onto a http://gist.github.com ? That would make it easier to understand the details and reproduce your issues.

Maybe you should also ask on the scipy users mailing list (with the gist at hand).

answered Jul 20 '11 at 13:00

ogrisel's gravatar image

ogrisel
398464480

I think it might be difficult to get efficient calculations without turning it into a non-sparse representation first. However, you can always do sections of the matrices at a time without transforming the entire matrix at once (similar to how distributed computing does matrix math).

answered Jul 22 '11 at 16:02

nop's gravatar image

nop
1062310

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.