r - Learn e1071 Naive Bayes with custom training data -


i quite new r , sentiment analysis. want use rtexttools , e1071 naive bayes classifier analyzing twitter tweets. followed tutorial https://datascienceplus.com/sentiment-analysis-with-machine-learning-in-r/ , worked fine. wanted modify tutorial code learn naive bayes classifier lot more pre labeled tweets. can not work.

function gettrainingtweets

gettrainingtweets <- function(){   tweets.training <- read.csv('trainingandtestdata/training.1600000.processed.noemoticon.csv') # pre labeled tweets   colnames(tweets.training) <- c('sentiment', 'id', 'date', 'query', 'user', 'text') # add columns   tweets.training <- select(tweets.training, 1,6) # need sentiment , text   tweets.training <- tweets.training[,c(2,1)] # switch columns apply tutorial   return(tweets.training) } 

train model

tweets.training <- gettrainingtweets() # load training data tweets.training.test <- head(tweets.training, 15) # testing  # want apply training classifier matrix= rtexttools::create_matrix(tweets.training.test[,1], language="english",                                    removestopwords=false, removenumbers=true,                                    stemwords=false)      mat = as.matrix(matrix) classifier = naivebayes(mat[1:10,], as.factor(tweets.training.test[1:10,2]) )  predicted = predict(classifier, mat[1:15, 2]); # error here predicted  print(table(tweets.training.test[11:15, 2], predicted)) recall_accuracy(tweets.training.test[11:15, 2], predicted) 

view(tweets.training.test)

---------------------------- |  |    text  | sentiment| ---------------------------- | 1 |tweet    | 0          | ---------------------------- | 2 |tweet    | 0          | ---------------------------- | 3 |tweet    | 0          | ---------------------------- | 4 |tweet    | 0          | ---------------------------- | 5 |tweet    | 0          | ---------------------------- | 6 |tweet    | 0          | ---------------------------- | 7 |tweet    | 0          | ---------------------------- | 8 |tweet    | 0          | ---------------------------- ... 

error

error in apply(log(sapply(seq_along(attribs), function(v) { :    dim(x) must have positive length 

i not know why error. nice, if have suggestion why don't work.

the raw .csv of corpus e.g. can found here: https://github.com/asheeshgarg/stock/blob/master/stocksentiment/polaritydata/tweetcorpus/training.1600000.processed.noemoticon.csv

thanks in advance :)


Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -