Vote Up 0 Vote Down

classifying texts into categories

I have one doubt in data mining. I have many texts suppose and want to classify it into diff categories by classification. But i dnt want to predefine the categories...s there any algo that automatically create categories ? Eg: Now i hav text about computers , then it must create category technology.
flag

Answers


Vote Up 0 Vote Down
Actually this is a problem of the broad category of Cluster Analysis as we get it in Multivariate Statistical Analysis... There are numerous types of Cluster Analysis methods proposed in literature...LDA as written above is one of them... The basic difference of the approaches of the Classification techniques when you predefine the categories to that when you dont are known as Discriminant Analysis and Cluster Analysis respectively... In Discriminant Analysis one predefines the categories by mathematically formulating one function or boundary but in case of Cluster Analysis no such a-priori categories are proposed, all depend upon the hidden information of the data based on similarity.

These categories are not predetermined...Based on statistical distances between the observations you will get different clusters i.e. some groups of data points very closely scattered in comparison with the other data points... Yes K-means algorithm is also a viable alternative but you need to check with other algorithms also based on the performance measures of the clustering method...
flag | link |
Vote Up 0 Vote Down
Latent Dirichlet Allocation (LDA) would do that.
flag | link |

Your Answer

Login before answering

Login with facebook