In the last weeks I have spent some time experimenting and playing around with the tm
package, some learnings and examples on how to use it can be found here. It is a great package to perform text mining and transform free text into features that can be used for further data analysis.
Background Information
Text Mining is the process of finding useful insights/ information from text and transform it, using NLP (Natural Language Processing) and analytical methods, into data that could be used for further analysis.
There are many packages that could be used for Natural Language Processing but only one package is the cornerstone of NLP in R, the tm
package.
'In recent years, we have elaborated a framework to be used in packages dealing with the processing of written material: the package tm. Extension packages in this area are highly recommended to interface with tm's basic routines...'
(from CRAN website)
The tm
package provides a comprehensive text mining framework for R. More information about it can be found in “Text Minining Infrastructure in R” publication (Journal of Statistical Software) and the “Introduction of the tm Package” vignette.