20 Newsgroup data - Text Classification
Text classification for 20 newsgroup data Text classification is a way to assign predefined labels to text. T ext classifiers can be used to organize, structure, and categorize any kind of text – from documents, files, and all over the web. Data set: The 20 Newsgroup data is the collection of approximately 20,000 newsgroup documents, divided into 20 different newsgroups. The dataset contains 20 files one document from each group. The 20 newsgroups the collection has become a popular data set for experiments in text applications of machine learning techniques, such as text classification and text clustering. The groups are: • comp.graphics • comp.os.ms-windows.misc • comp.sys.ibm.pc.hardware • comp.sys.mac.hardware • comp.windows.x rec.autos • ...