|
I need to quickly get familiar with the basics of statistical analysis and data mining. To do so, I think it's reasonable to get familiar with some fundamental terminology and techniques, such that I can ask sensible questions and make reasonable assumptions. What are some good ways to get started? Fundamental reading, etc?
This question is marked "community wiki".
|
|
Statistical analysis and data mining are fields both wide and deep, but if you are just trying to get familiar with the core concepts in a hurry, I would suggest reading through StatSoft's Electronic Statistics Textbook and filling in the gaps with the relevant Wikipedia entries (this goes for all of these readings). For more depth, you might try Introductory Statistics (8th Edition) by Neil A. Weiss; it contains all the basics for understanding statistical concepts, though it does not have much on data mining. For a more comprehensive introductory/applied approach to analysis and data mining, you might want to pick up the Handbook of Statistical Analysis and Data Mining Applications. I have serious reservations about the introductory section's quality, but it does cover a lot of ground in an easily comprehensible fashion.
This answer is marked "community wiki".
|
|
It would help to know if you had a specific question you were trying to answer or a project you are working on. Then you can customize the area you were looking at. I knew extensive amounts about regression before realizing that what I really needed for my project was time series analysis. Reviewing the statsoft textbook is a good overview, but try to think about how what you are reading applies to the topic you will be applying the techniques to. Otherwise you won't get much out of the material. Having an application in mind (even if it is incomplete) will help you retain what you study.
This answer is marked "community wiki".
|
|
If you can program, Programming Collective Intelligence is one of my favorite books because it provides a fun introduction to many of the techniques in the data mining field. For basics statistics, Khan Academy might be a good resource to check out. I found The Lady Tasting Tea to be an enjoyable introduction to the history of statistics. Fooled by Randomness by Taleb is one of my favorite books of all time. John Allen Paulos has a number of entertaining books about statistics(Innumeracy, A Mathemetician Reads the Newspaper, among others). What do you need to learn this stuff for? Sorry if these suggestions are way below your level, I'm just a beginner/hack (I don't do anything with SAS, R, etc. yet), and the main tools I use for "statistical analysis" are SQL, PivotTables, and Python. But I hope to get good in this field over the next 5 years or so.
This answer is marked "community wiki".
|
|
This may not be the easiest/quickest way, but it's what I did: work your way through MacKay's book (free online!) and refer often to Wikipedia. Once you understand the basics of Bayes' Theorem and how it is applied, Data Analysis: A Bayesian Tutorial seems to be pretty good.
This answer is marked "community wiki".
|
Thanks for the answers, there are some great suggestions here.
I apologize for being too broad. Specifically I'm tasked with writing recommendation algorithms for two applications my team is developing. They're slightly different, in ways I can't fully quantify without a better beginner's skillset.
My initial attempts have been successful, but naive.
I'll reopen this question, since people seem to like it. I also suggest you post a separate, more specific question about your application area (recommender systems). You might be able to make a lot of progress by focusing on learning this one application area, and then expanding your knowledge more broadly.