|
Hi All, I'm running some experiments where I have a large number of vectors to regress on (but unfortunately not all at once, or CUDA would be the key), and have used scikit's SGD to get myself a weight vector in CSR format. My sparse vectors are also in CSR format, and I'm interested in the most efficient way to infer using these two CSR vectors. I've profiled a bunch of different approaches, and am currently using scipy's binopt csr_elmul_csr() on the X and W vectors, and then summing the result, but this is still slow as it returns a new sparse array, and the creation of that array is taking up too much time (according to my profiling). I'm considering writing a small element-wise multiplier and adder in cython, but wanted to know if there was already a 'standard' approach to this. In the case there isn't, a bit of advice as to the most efficient way to element-wise multiply a priori unsorted CSR matrices in cython would be nice. Sorry for the semi-double question, looking forward to any help. Thanks, Gabe EDIT: I asked this question in a state of mind haze, after discussion on my side I'm gunna post a new, more concise question. |
|
Sorry to answer myself, but the fastest solution was to rewrite the simple XW.T code in cython (remember, these are sparse vectors not matrices). Here y is used to represent the W weights. Here the sparse vector is passed as 2 ndarrays, one of indices, the other of corresponding values. This is a CSRMatrix's native format if you're dealing with an nx1 matrix (aka vector), so you can easily pass CSRMatrices to this code. In a program where XW.T is getting done ~10k times, this went from taking 33s with a standard numpy XW.T to 0.76 seconds, so this is really* faster, and the long runs are more on the order of 1e6 to 1e7 calls, so you can imagine the time saved :-P
This answer is marked "community wiki".
|
|
Could you push you code + data onto a http://gist.github.com ? That would make it easier to understand the details and reproduce your issues. Maybe you should also ask on the scipy users mailing list (with the gist at hand). |
|
I think it might be difficult to get efficient calculations without turning it into a non-sparse representation first. However, you can always do sections of the matrices at a time without transforming the entire matrix at once (similar to how distributed computing does matrix math). |