Does anyone know of examples of wrapping an existing piece of supervised learning software to output models in PMML format? Of particular interest are learners that just take in labeled vectors of numbers as training data and put out models that are pretty much just coefficient vectors (liblinear, SVMlight, BXRtrain, BOW, etc.). That is, they don't have any smarts about data types, ranges of legal values of features, etc.: something else is assumed to deal with that and present the learner with appropriate numeric vectors.

For such software, all the interesting data dictionary stuff would need to be supplied alongside the input data if it's going to show up in PMML model that's output. There's nothing conceptually difficult about this: what I'm curious to see is if there's any conventions, design patterns, etc. that have grown up in the PMML community for doing this.

The PMML website list of software that either consumes or produces models in PMML, but this is mostly commercial closed source software. The programs with open source versions listed there (Rapidminer and WEKA that I can spot) are rather complex data mining suites. What I'd like to see an example of is a a minimalist wrapping of a simple one-trick pony kind of learner.

asked Aug 09 '11 at 22:46

Dave%20Lewis's gravatar image

Dave Lewis
785162644

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.