4
2

I'm mainly interested in MCMC or variational methods. I currently hand-code my samplers using python, numpy, and scipy, which has lots of built-in niceties (lots of density functions and samplers out-of-the-box, fast numerical functions, fast gradient optimizers for when I need one, fast enough vector/matrix operations, sparse matrix operations, possibility to generate C code, etc). However, numpy+scipy's support for sparse vectors (not matrices) is not good, and unless you fit in some very constrained scenarios, you're forced to abandon most things that make it run fast enough. Also, if you can't really code your algorithm as vector-matrix operations (if you need a conditional here and there) it starts to get painful to make you model do fast enough inference. Theano is not really an option if you have complex code and/or a lot of sparsity as well (since it uses numpy's vectors).

So my question is: in which languages do you usually implement graphical models? Which libraries do you use for sampling from all the base distributions (dirichlets, gammas, exponentials, betas, etc), and for computing all those annoying numerical functions (incomplete beta, incomplete gamma, lngamma, exp-digamma, etc)?

I'm not really interested in pre-packaged frameworks such as MALLET of Infer.NET or HBC since it's not as easy as it could be (in my limited experience, I could be wrong) to futz with sampling strategies and model parameters as I usually do when working. So, what are the main suggestions (apart from matlab, which is also not very fast for these cases, and for which I don't have a license)?

asked Jul 29 '10 at 17:52

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Here is a list of graphical models packages in R: http://cran.r-project.org/web/views/gR.html.

(Jul 30 '10 at 14:13) Frank

3 Answers:

If not packages like HBC, BUGS, or VIBES, I still usually prefer matlab otherwise. It can be slow but then there are lots of specialized toolboxes which provide efficient versions of many standard functions (such as the lightspeed toolbox by Tom Minka). Often, for your matlab code, you can write mex routines which can again make use of various efficient C libraries. Also, a number of already available inference packages in the community are matlab based, so that's another reason I prefer to stick with matlab (or octave, if licensing is an issue).

BTW, you can also take a look at this page which lists many popular packages people use for inference in graphical models and Bayes nets, and some review articles comparing choices of different packages.

answered Jul 29 '10 at 20:50

spinxl39's gravatar image

spinxl39
3698114869

What specifically didn't you like about the numpy sparse matrix support? In my case, we abandoned it around Feb when we discovered you can't do a lot with the sparse matrices, in particular no sparse Cholesky decomposition, which is strange because it uses UMFPACK as one of the dependencies if you build it from source.

Recently, however, I discovered PySparse, which is exactly what I needed: sparse matrix and solvers for sparse linear systems.

answered Jul 30 '10 at 14:24

Vicente%20Malave's gravatar image

Vicente Malave
355137

You can have sparse matrices, but not sparse vectors; so implementing things like parameter sharing gets a lot slower than it should be.

(Jul 30 '10 at 17:30) Alexandre Passos ♦

theano has some support for sparse operations but CPU only and not as complete as for dense representations.

I would advise you to have a look at cython to speed up computations on scipy.sparse matrices using the coo representation that offers direct access to the data arrays of non-zero values and the ij arrays of matching indices.

When working with cython and numpy be sure to follow all the tricks from this scipy 2009 tutorial on cython and the matching slides.

answered Jul 30 '10 at 14:26

ogrisel's gravatar image

ogrisel
498995591

Ah, cool, I didn't know this tutorial. I already use theano for some things, but for the sort of graphical model I end up doing it feels clunky, since you can't really have loops and conditionals. My main problem with cython is that it does not let me stay at the interpreter, and that's the way I run most experiments (to avoid parsing/loading things every time I change the code, for example).

(Jul 30 '10 at 17:31) Alexandre Passos ♦
1

I like cython a lot also. But the way to use it is to write as little in it as possible, and then use the interpreted python for all the other stuff. For example in your case I'd probably code up the specific operations with sparse vectors and tight loops in a cython class (which will hopefully be reused in the future...) and put all the rest of the code/experiments/etc in python that uses this one cython class.

(Jul 31 '10 at 00:48) yoavg
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.