r - Learn e1071 Naive Bayes with custom training data -
i quite new r , sentiment analysis. want use rtexttools , e1071 naive bayes classifier analyzing twitter tweets. followed tutorial https://datascienceplus.com/sentiment-analysis-with-machine-learning-in-r/ , worked fine. wanted modify tutorial code learn naive bayes classifier lot more pre labeled tweets. can not work.
function gettrainingtweets
gettrainingtweets <- function(){ tweets.training <- read.csv('trainingandtestdata/training.1600000.processed.noemoticon.csv') # pre labeled tweets colnames(tweets.training) <- c('sentiment', 'id', 'date', 'query', 'user', 'text') # add columns tweets.training <- select(tweets.training, 1,6) # need sentiment , text tweets.training <- tweets.training[,c(2,1)] # switch columns apply tutorial return(tweets.training) }
train model
tweets.training <- gettrainingtweets() # load training data tweets.training.test <- head(tweets.training, 15) # testing # want apply training classifier matrix= rtexttools::create_matrix(tweets.training.test[,1], language="english", removestopwords=false, removenumbers=true, stemwords=false) mat = as.matrix(matrix) classifier = naivebayes(mat[1:10,], as.factor(tweets.training.test[1:10,2]) ) predicted = predict(classifier, mat[1:15, 2]); # error here predicted print(table(tweets.training.test[11:15, 2], predicted)) recall_accuracy(tweets.training.test[11:15, 2], predicted)
view(tweets.training.test)
---------------------------- | | text | sentiment| ---------------------------- | 1 |tweet | 0 | ---------------------------- | 2 |tweet | 0 | ---------------------------- | 3 |tweet | 0 | ---------------------------- | 4 |tweet | 0 | ---------------------------- | 5 |tweet | 0 | ---------------------------- | 6 |tweet | 0 | ---------------------------- | 7 |tweet | 0 | ---------------------------- | 8 |tweet | 0 | ---------------------------- ...
error
error in apply(log(sapply(seq_along(attribs), function(v) { : dim(x) must have positive length
i not know why error. nice, if have suggestion why don't work.
the raw .csv of corpus e.g. can found here: https://github.com/asheeshgarg/stock/blob/master/stocksentiment/polaritydata/tweetcorpus/training.1600000.processed.noemoticon.csv
thanks in advance :)
Comments
Post a Comment