9
12

Most of us know about MLOSS, SVMlight, LibSVM and Weka. However a lot of research groups tend to release their algorithms on their website alone. Such algorithms aren't publicized much.
Can you point out few such implementations?

This question is marked "community wiki".

asked Jul 04 '10 at 02:59

DirectedGraph's gravatar image

DirectedGraph
56031424

edited Jul 04 '10 at 02:59


10 Answers:
11

Very fast large scale online regression engine: vowpal wabbit.

This answer is marked "community wiki".

answered Jul 04 '10 at 04:25

Vaclav%20Petricek's gravatar image

Vaclav Petricek
42031012

Theano is a CPU and GPU compiler for mathematical expressions in Python. It combines the convenience of NumPy with the speed of optimized native machine language. For gradient-based machine learning algorithms (like training an MLP or convolutional net), Theano is from 1.6x to 7.5x faster than competitive alternatives (including those in C/C++, NumPy, SciPy, and Matlab) when compiled for the CPU and between 6.5x and 44x faster when compiled for the GPU. You can read more about it here.

This tutorial for Theano walks through building the following learning algorithms, and provides full source code:

  • Logistic Regression
  • Multilayer perceptron
  • Deep Convolutional Network
  • Auto Encoders, Denoising Autoencoders
  • Stacked Denoising Auto-Encoders (a state-of-the-art deep architecture)
  • Restricted Boltzmann Machines (RBMs)
  • Deep Belief Networks (DBNs)
This answer is marked "community wiki".

answered Jul 04 '10 at 17:39

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
576051125146

edited Aug 02 '10 at 16:48

I think Theano's claim of being faster than C/C++ is fallacious, because all it does to convert Python Code to the optimized C/C++ code and then this code is compiled by its compiler either on GPU or CPU. One can write and compile the same C/C++ code and compile with gcc using optimization flags (there are so many of them, and you can do incredible optimizations with them) and get the same or better performance. AFAIK, in their tests they didn't test the same optimized C/C++ code. Instead they test against the Torch 5's code. They should have said faster than Torch 5.

(Jul 05 '10 at 18:29) cglr
3

@cglr: Look, let's say I say my C/C++ code is faster than assembly. That's a fair claim if the experts who wrote the assembly benchmarks are doing worse than my C/C++ compiled code. You wouldn't get all semantic and say: "Well, all their doing is transforming it into assembly that is better optimized than the assembly written by experts."

You are also incorrect that "all" it does is translate to optimized C/C++. The speed comes not just from the translation to C/C++, but also because it optimizes the mathematical expression before translating it.

That's why Theano's convolutional nets beat C/C++ code from Yann LeCun's group (eblearn), even though Yann LeCun invented convolutional nets and has 15 years of experience implementing them. It's because it's a huge pain-in-the-ass to manually optimize the mathematical formula corresponding to a convolutional net SGD update. It's a huge mathematical formula. So Theano wins because it actually optimizes this mathematical expression, and does so automatically.

(Jul 05 '10 at 18:37) Joseph Turian ♦♦
1

It can be "faster than C/C++" because it can, before generating the code, do some graph-based optimizations that might be unsafe for the compiler to do (it knows about aliasing in arrays, for example).

(Jul 05 '10 at 18:39) Alexandre Passos ♦
1

@Joseph, thanks for your explanation i just realized that i skipped the part about the graph based computational optimizations and in my mind concluded that it is just another code generator which claims to be faster than all that i've been seeing recently.

About your example, I know I'm weird a bit but I'd get that semantic.(and I think that a good scientific thesis must be clean of analogies, metaphors and also should be sound, objective and complete.) But you're probably right that most of the people would got that meaning.

(Jul 05 '10 at 19:44) cglr

HBC is a compiler for directed graphical models that automatically generates gibbs samplers. It supports collapsing, uncollapsing, maximizing, annealing, and sampling, and a few nonparametric bayesian models as well. It generates C code, so the final sampler is actually pretty fast.

This answer is marked "community wiki".

answered Jul 04 '10 at 09:28

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2549653277421

Andrew McCallum's Mallet is probably the easiest way to analyze a corpus of text documents using a wide range of NLP/ML techniques
John Carrol's Morpha is a lemmatizer which is like an intelligent stemmer, something thats pretty useful in text-related ML

This answer is marked "community wiki".

answered Jul 04 '10 at 23:59

Aditya%20Mukherji's gravatar image

Aditya Mukherji
2251612

edited Jul 05 '10 at 00:06

Morpha works great, too bad it's only usable in C/C++. I wasn't able to get it compiled for use in Java using the language it's written in, FLEX.

(Aug 05 '10 at 18:58) Daniel Duckwoth

shogun | A Large Scale Machine Learning Toolbox http://www.shogun-toolbox.org/

"The machine learning toolbox's focus is on large scale kernel methods and especially on Support Vector Machines (SVM) [1]. It provides a generic SVM object interfacing to several different SVM implementations, among them the state of the art OCAS [21], Liblinear [20], LibSVM [2], SVMLight, [3] SVMLin [4] and GPDT [5]. Each of the SVMs can be combined with a variety of kernels.

SHOGUN is implemented in C++ and interfaces to Matlab(tm), R, Octave and Python and is proudly released as Machine Learning Open Source Software."

This answer is marked "community wiki".

answered Jul 13 '10 at 03:03

Georgiana%20Ifrim's gravatar image

Georgiana Ifrim
1664414

megam is a fast multiclass logistic-regression / perceptron / multitron / passive-aggressive optimizer.

lasvm is a very fast (approximate) SVM solver.

This answer is marked "community wiki".

answered Aug 03 '10 at 01:17

yoavg's gravatar image

yoavg
74182231

libDAI is a pretty good C++ toolkit that supports various inference methods in factor graphs.

From the web site: libDAI is a free/open source C++ library that provides implementations of various (approximate) inference methods for discrete graphical models. libDAI supports arbitrary factor graphs with discrete variables; this includes discrete Markov Random Fields and Bayesian Networks.

This answer is marked "community wiki".

answered Aug 02 '10 at 21:06

Frank's gravatar image

Frank
1319274453

That's also apparently the only framework out there that implements Loop Corrected Belief Propagation, which which in Joris Mooij's comparisons (his thesis) beat out all other major approximate inference methods for accuracy

(Aug 03 '10 at 02:30) Yaroslav Bulatov
-1

MATLABArsenal has a pretty nice collection of various classification algorithms provided as matlab wrappers. It also interfaces with NETLAB, WEKA, libSVM, SVMLight: http://www.informedia.cs.cmu.edu/yanrong/MATLABArsenal/MATLABArsenal.htm

This answer is marked "community wiki".

answered Jul 13 '10 at 03:41

spinxl39's gravatar image

spinxl39
3683114869

-1

For a Matlab-based ML tool: SPIDER http://www.kyb.tuebingen.mpg.de/bs/people/spider/ "intended to be a complete object orientated environment for machine learning in Matlab"

This answer is marked "community wiki".

answered Jul 12 '10 at 05:23

Georgiana%20Ifrim's gravatar image

Georgiana Ifrim
1664414

-1

FastInf (v1.0, as presented at ICML 2010)

The FastInf C++ library is designed to perform memory and time efficient approximate inference in large-scale discrete undirected graphical models. The focus of the library is propagation based approximate inference methods, ranging from the basic loopy belief propagation algorithm to propagation based on convex free energies. Various message scheduling schemes that improve on the standard synchronous or asynchronous approaches are included. Also implemented are a clique tree based exact inference, Gibbs sampling, and the mean field algorithm. In addition to inference, FastInf provides parameter estimation capabilities as well as representation and learning of shared parameters. It offers a rich interface that facilitates extension of the basic classes to other inference and learning methods.

This answer is marked "community wiki".

answered Aug 05 '10 at 17:40

Frank's gravatar image

Frank
1319274453

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.