Last quarter I took a NN/ML course outside of my department (I am currently in MS Applied Statistics -- I may end up in this department for PhD CS studies after completing my current program though) because I am interested in the topic overall and I had previous experience with genetic algorithms.
In the course I took we used the Mitchell text. I also used that opportunity grab a few other texts to dive into some topics in greater depth and get exposure to other topics entirely. Namely Hastie, Tibshirani, Friedman text, Bishop's NNPR text, Koller's Probabilistic Graphical Models text, and Kolaczyk's Statistical Analysis of Network Data text.
Now based on the TOC in PRML I feel like these other books cover much if not all of the topics in this other Bishop text. Would you agree that that is a fair assessment? I however know that PRML is a very popular text for those in computer science and I've seen it mentioned a few times on this site. So those of you familiar with the text's content do you feel I'd still be missing out on something without the other Bishop text, or can I get by without?
Many of these texts have a lot of overlapping content so it is interesting to hear essentially the same thing stated several different (and sometimes not so different) ways, but I would also be interested in recommendations of other texts that are more advanced since essentially all of these text books come with the standard "targeted for advanced undergraduates and beginning graduate students".
Don't get me wrong I understand that the cutting edge will be found in journals and I have been reading through many of them (COLT, AAAI, IEEE, etc), but there has to be at least a couple of text books that are more advanced than the ones already mentioned. In our class the Mitchell text was used to minimize the mathematical exposure and PRML was suggested for heavier math content. I am however interested in texts that may have even more advanced mathematics still. Potential topics of interest in a more advanced text: more on ensemble methods, asymptotic properties, and stability. Of course other advanced topics are welcomed as well. I mean I know I can find out a lot about the later two in actual math and stats texts, but I was hoping to see more about them in the context of machine learning.
I have a paper (PDF) I wrote as my final for that class. It is an overview and could use a lot of details added to it (it more than satisfied the requirements of the final though), but I was trying to present the basics to a wider audience so I tried not to go into the formulations and left it to the references for those details. Any thoughts or references to the questions I put forward at the end would be very much appreciated. Even general feedback on the paper would be nice.
(Sorry for it not being concise, but I figured one lengthy post in this case might make more sense than breaking it up into numerous more focused posts -- though I suppose I'm about to find out whether you all agree with that or not ;) )
This question is marked "community wiki".
asked Apr 26 '11 at 04:52
I'll try to answer as concise and useful as possible.
First, you'll need the basics of statistics, which I'm guessing you are getting from your masters. For this there is a number of books and papers that are useful. And actually most statistics you'll ever need are already well developed. Most modern models where described in the 70's, 80's. Blei actually said that if something was not figured out in the 80's it was a really difficult problem.
Given the list of books you presented, I do recommend you the Bishop NPML Book, look at it as the updated version of Mitchell's, I do like his book, the problem is that it is somewhat old and it lacks a lot of the current and widely used algorithms. Mitchell does not go into Variational Inference or SVM's for example.
With that said, Machine Learning is a wide field, and you could delve among one single theme for your entire PhD, and perhaps your entire Research Life. It is hard to find someone who does research on more than a couple of topics.
You need a base book, that'll let you look into most of the algorithms you can choose from, and after that, you'll need another more specialized books. For example, if you are into Non Parametric Models, you might start learning Gibbs Sampling, Distributions and Mixture Models from Bishop, and then you'll have to read Ghosh's "Bayesian Non Parametrics" Book.
Pick a topic and then start thinking on the books, otherwise you might end up spending a lot of time reading a book that is not closely related to your research (which isn't a bad thing at all, but if you are in a PhD, the last thing you want to do is read something that won't be as useful as it should be)
Some conferences to look into:
If you wish to learn the latest on GA try GECCO, that is the top conference in the topic.
Hope it helps you
This answer is marked "community wiki".
answered Apr 26 '11 at 07:08
Leon's answer is pretty good. I'll just add a bit. The problem I have is not collecting reading material, but actually getting the reading done. You can find many very good textbooks online these days. There is a post dedicated to them here on Metaoptimize. They will be more than sufficient to fill the gaps between the books you have already collected for the foundational topics. I would advise spending your time learning from the materials you already have, and then, as Leon suggested, finding the more specialised texts to pursue your interests in more depth.
This answer is marked "community wiki".
answered Apr 26 '11 at 08:55