Hi guys, I am trying to do text mining recently and Seeing the code, I have the whole picture about what it's trying to do about the text.

But the problem is on some specific part of code, I don't know why the format is this way, and what the parameters present. So do you guys have some suggestions about references or books about R language so that I can check what is this function used for and the interpretation of parameter in this functions?

Below is several questions in doing text mining, appreciate it if you guys can also help answer them :)

1)

cand=c("Romney","Obama")
tdm<-list(name=cand,tdm=s.tdm)     #s.tdm is TermDocumentMatrix of a text.
tdm.dm<-t(data.matrix(tdm[["tdm"]]))

my question is: why we need two "[ ]" in the third line when turn the termDocumentMatrix into matrix

2)

filepath<-"C:/e"
cor.score<-if(length(grep("http|html",filepath))){cor.score<-Corpus(URISource(filepath))}else{score.cor <- generateSpeechDocCorpus(filepath)}

This sentence is trying to see if the filepath is URL or not, I understand using "grep" to check if filepath has string "http" or "html", but why we need sentence "length" outside grep? I am confused. AND for the last term in the code:

generateSpeechDocCorpus(filepath),

I can also use

Corpus(DirSource(directory=filepath,encoding="ANSI"))

to achieve the same purpose. So what is the difference between generateSpeechDocCorpus and Corpus?

asked Jun 09 '14 at 02:08

NeoHuang's gravatar image

NeoHuang
1222

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.