2
7

People often throw around the terms IR, ML, and data mining, but I have noticed a lot of overlap between them.

From people with experience in these fields, what exactly draws the line between these?

Bonus: explain more about the types of problems each of these try to solve (how they are different or similar)

asked Aug 05 '10 at 14:21

Boris%20Yeltz's gravatar image

Boris Yeltz
45125

edited Aug 05 '10 at 14:22


2 Answers:

ML (machine learning) is a field of computer science that studies processes and algorithms that can learn, for many definitions of learning. Machine learning concerns itself with classification, regression, clustering, sequence labeling, th more general problem of structured learning, information extraction (ie, programs that learn to extract specific types of information), etc.

Data mining is looking at large collections of data to extract relevant information. It can use machine learning methods, but a lot of data mining can be just collecting simple statistics and stuff like that. A way to use machine learning for data mining is, for example, manually scanning through credit card statements fo find fraudulent ones, feeding them to a machine learning classifier and having the classifier detect other fraudulent transaction.

IR (information retrieval) is search. It assumes a user makes a query, looking for something, and you want to return him the best result as fast as you can. You can use data mining and machine learning to add more information to the index from where you retrieve the information. You can also use machine learning to learn to retrieve more useful information from the index (the learning to rank problem).

Of course, when you go to more advanced problems the intersection between these areas gets bigger.

answered Aug 05 '10 at 14:34

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Information retrieval (IR) is usually associated with search and retrieval of text data in various forms (text documents, pages on the web, etc). Often, other form of data is also available apart from the text in the documents (e.g., hyperlink information, metadata, etc) that can facilitate such tasks.

Data mining is a bit more general than IR as it also deals with other types of data from domains such as biology (e.g., gene-expression data), finance (e.g., stock prices), business (e.g., costumer profiling), spatial (e.g., remote sensing, visualization), etc.

IMO, both IR and data mining are considered somewhat applied disciplines as they make use of standard data analysis algorithms typically developed in machine learning and pattern recognition communities, but in some cases can often involve customizing those algorithms for specific needs (and available domain information).

Machine learning on the other hand is basically about developing models to solve both basic and advanced data analysis problems such as classification, clustering, ranking, etc. Also, a great deal of research in ML is on scaling up existing algorithms to real world datasets (e.g., new optimization techniques for a particular algorithm, new inference methods, etc). Besides, a major thrust in ML is about analyzing the various properties of these algorithms such as generalization, consistency, and various property of estimators.

answered Aug 05 '10 at 14:56

spinxl39's gravatar image

spinxl39
3698114869

edited Aug 05 '10 at 15:00

So from what I gather from this comment is that data mining is basically the application of ML concepts, while ML is typically more about the study and research of the algorithms themselves?

(Aug 05 '10 at 15:08) Boris Yeltz

To an extent, yes. But that is not to say that data-miners don't develop new algorithms. :) You would often see papers in data mining conferences (KDD, SDM, ICDM, etc) which propose customized algorithms for specialized domains. But as I said, traditionally, machine learning apart from developing new algorithms, also studies existing algorithms about their properties (e.g., generalization), and a good deal of effort is devoted on issues such as scalability.

Note that sometimes ML and DM are used and perceived synonymously. For example, in business and corporate world, the name data mining is more commonly used.

(Aug 05 '10 at 15:20) spinxl39

So how segregated are the two camps then? It would seem to me like in the DM camp, there are lots of people who would study the theory behind the algorithms to develop improvements as you stated. On the other hand, the ML camp certainly has people who want to apply their knowledge to real world problems.

This is where the distinction between the two feels almost arbitrary to me. Or perhaps the ML camp traditionally solves a different subset of problems than the DM camp or maybe they diverge on different approaches to solving the same problems.

(Aug 05 '10 at 15:28) Boris Yeltz

IMO, the segregation isn't that distinct. In fact, I see a good deal of synergy between them. For example. a number of ML algorithms are designed due to the need of real world data mining applications which usually require high scalability. In fact, many researchers actually straddle the boundary, and work and publish in both areas. Many people also consider data mining as "practical machine learning".

(Aug 05 '10 at 15:35) spinxl39
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.