Hi, I am one of the developers of scikit-learn . The project has recently gained steam is moving fast thanks to new contributors. The design goals are:
wide coverage of cutting edge algorithms with a simple to use unified API
a permissive license for embedding (simplified BSD) and low dependencies requirements (numpy + scipy)
optimized yet maintainable implementation using cython when useful
scalable algorithms with dense and sparse representations of the features (useful for text classification of with ten of thousands of samples with ~100 000 features for instance).
well tested >= 500 tests that run under 15s with a coverage of ~85% and improving (see our buildbot)
well documented too with worked examples (could still be very much improved though)
readable source code respecting PEP8 conventions
We don't plan to have a complete dataflow programming model as MDP does (but the two projects are collaborating (a bit) to make it easy to use scikit-learn algo in MDP nodes). It should be also possible to wrap scikit-learn models into Orange components if your users want the rich user interface of that framework (but AFAIK nobody tried so far).
Current limitations:
the current API requires to load the training data in memory but this will evolve to handle streaming / large scale datasets (by integrating the work done by Peter in bolt)
tooling to perform cross validation & performance evaluation across all algorithms that respect the API (duck typing)
no command line interface: right now the user as to know basic python / numpy (the ipython shell is the most popular UI among scikit-learn devs). A generic CLI might appear in the coming months though. We will probably never offer more than CLI in terms of interface.
very focused towards supervised learning and linear models right now. More unsupervised approaches are planned or under development though.
We also plan to provide standard feature extractors for text classification / clustering (work under way, mostly done), image classification (some basic examples), face recognition (planned) and maybe audio / speech for segmentation / classification / fingerprinting (prospective, nothing done yet). The goal is for the user to have worked example with sane default parameters to build upon and not just machine learning building blocks that require to know the inner workings to apply to a concrete use case.