|
I know this sounds like a dumb question, since there are many datasets available out there. For me, personally I need to watch videos , read through lecture notes and textbook and finally to implement the algorithms once to understand what the theory and algorithm. Now all those except the last one can be easily. I am looking for a website that provide both the sample dataset and other people's implementations , which can be served as a guideline to check whether I have implemented my algorithm correctly or not. I know this is a bad metaphor but I am still going to mention it anyway. This is just like doing pastpapers for prepare for an exam, but without looking at the solutions one can never be sure if they really get the correct the result. It will be efficient for a website to contain both the pastpaper (dataset) and solutions (other people's implementations), so everytime I implement an algorithm, I can easily check / cross validate my results with others. Now, I know there are no definite answers, especially when it comes to implementing ML algorithms. And the obvious way to master ML is to get my hands dirty and play around with the different dataset. However, I am looking for an immediate step that can help me and others to better learn and master ML algorithms in an efficient manner before we start to play around with all the different datasets out there. Updated: interested primarily in R, then python . |
|
You could give scikit-learn a try. It is implemented in Python, open source and gives you exactly what you ask for (good documentation, examples, data sets, etc). Python is also a fine language for beginners. http://scikit-learn.org/stable/ |
|
I would have thought you would be better off finding a textbook that does this... Also you should specify which language you are interested in. For matlab there is certainly plenty. scikit-learn is useful as a black box, but not to read the code [it is perhaps too sophisticated code for learner to go through]. |
|
You should take a look at The Elements Of Statistical Learning. The book was written by several Stanford professors, is available for free online, and has several accompanying datasets and R packages, which include example code. It's updated every few years to keep it current, and was most recently updated in January. A blogger for Revolution Analytics (developers of an important R distribution) called it "The go-to bible for this data scientist [himself] and many others". |