MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text. (by mimno)
    Nevertheless, you might take a look at the practice of "topic modeling" and get ready for a whole lot of abstruse statistics. One place to start might be Ted Underwoods Topic Modeling Made Just Simple Enough. If you just want to play with some pre-written software that does this kind of thing, you might want to look at MALLET.


