1
1

Hi,

I have a database with 32344 Courses from Swedish universities. If a user is searching for a course, it only matching the search query with the name of each course. This could cause the user to miss interesting courses. An example on this would be if the user searching for "machine learning" he might also be interested in "Artificial intelligence" but it would not show up as it is now.

I am thinking of using clustering analysis to put similar courses into clusters by comparing course code, course name, and description. All courses do not have all attributes and I guess the title is more important than description so the attributes needs to be weighted somehow. The description needs to be processed with something like tf-idf.

Since I don't know how many clusters there is going to be I don't think K-Means is a good idea.

Instead I'm thinking of a Hierarchical Clustering Algorithm, but don't know which one.

Does this make sense? And is it a common use case for this kind of problems? What algorithms shall I look into?

asked Jan 13 '14 at 23:02

Nicklas%20Nilsson's gravatar image

Nicklas Nilsson
16113

edited Jan 13 '14 at 23:05


One Answer:

Yes. You may try Hierarchical Clustering Algorithm, like agglomerative clustering . It also can not exactly determine the number K. But you could find a approximate one. Someone has designed a error function for K-means. You can also have a try. Thanks.

answered Jan 15 '14 at 02:25

Baolin%20Peng's gravatar image

Baolin Peng
11

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.