statistics - SPSS Correlate a Variable Sets with a Variable

I am doing a correlation studying. I have a Multiple Response which can have more than a answer per case.The multiple response (q06_*) is a question about the kinds of transport used. The case could have chosen more than one possibility.How is possible to make a bivariate correlations with this variable sets with an other variable (a score)?...Read more

statistics - Could all heuristic approaches, as UPGMA, provide different results in repeated analyses?

I would like to know if all heuristic approaches, but concretely UPGMA or affinity propagation may provide different results in repeated analyses if the groups are not highly defined. I mean, as heuristic approaches are practical methods that cannot ensure an optimal, it is possible that in each repeated analyses we could obtain different solutions if there is no clear optimum, is it right?Therefore I would like to confirm that this may happen for all heuristic approaches. Thanks in advance...Read more

statistics - calculating confidence while doing classification

I am using a Naive Bayes algorithm to predict movie ratings as positive or negative. I have been able to rate movies with 81% accuracy. I am, however, trying to assign a 'confidence level' for each of the ratings as well.I am trying to identify how I can tell the user something like "we think that the review is positive with 80% confidence". Can someone help me understand how I can calculate a confidence level to our classification result?...Read more

statistics - Pointwise mutual information on text

I was wondering how one would calculate the pointwise mutual information for text classification. To be more exact, I want to classify tweets in categories. I have a dataset of tweets (which are annotated), and I have a dictionary per category of words which belong to that category. Given this information, how is it possible to calculate the PMI for each category per tweet, to classify a tweet in one of these categories....Read more

statistics - How to evaluate and explain the trained model in this machine learning?

I am new in machine learning. I did a test but do not know how to explain and evaluate.Case 1:I first divide randomly the data (data A, about 8000 words) into 10 groups (a1..a10). Within each group, I use 90% of data to build ngram model. This ngram model is then tested on the other 10% data of the same group. The result is below 10% accuracy. Other 9 groups are done same way (respectively build model and respectively tested on the remained 10% data of that group). All results are about 10% accuracy. (Is this 10 fold cross-validation?)Case 2:I ...Read more

statistics - How to obtain multinomial probabilities in WinBUGS with multiple regression

In WinBUGS, I am specifying a model with a multinomial likelihood function, and I need to make sure that the multinomial probabilities are all between 0 and 1 and sum to 1. Here is the part of the code specifying the likelihood:e[k,i,1:9] ~ dmulti(P[k,i,1:9],n[i,k]) Here, the array P[] specifies the probabilities for the multinomial distribution.These probabilities are to be estimated from my data (the matrix e[]) using multiple linear regressions on a series of fixed and random effects. For instance, here is the multiple linear regression used...Read more

statistics - Mediation analysis with categorical outcome

I want to run a mediation analysis to see the effect of Exposure to a pollutant (continuous) to types of Cancer (categorical with 4 levels-types of cancer) via a Blood biomarker as the mediator (continuous). So the mediation diagram would be something like this:E -> B -> CFor the mediation variable I run the linear regression <- lm(blood_biomarker~exposure+age+sex, data=demographics)but when it comes to the outcome variable, I read from the docs that the only appropriate analysis is multinomial regression analysis such as:ou...Read more

statistics - How can I weight features for better clustering with a very small data set?

I'm working on a program that takes in several (<50) high dimension points in feature space (1000+ dimensions) and performing hierarchical clustering on them by recursively using standard k-clustering.My problem is that in any one k-clustering pass, different parts of the high dimensional representation are redundant. I know this problem follows under the umbrella of either feature extraction, selection, or weighting.In general, what does one take into account when selecting a particular feature extraction/selection/weighting algorithm? And ...Read more

statistics - Untrained sentiment analysis, need help with capturing sentiment variation statistically

The question may be vague but I will try to word it as best as possible.So I came up with a crude algorithm to compute whether a sentence (part of a review snippet) is positive or negative or neutral (let's call this EQ for the sentence). So for 5 sentences I have some ratings for sentence based on [-100, 100]. The review has to be rated on [0, 5] basis(0, 39.88)(1, 73.07)(2, 69.65)(3, 51.43)(4, 76.74)The choice that I am struggling with is what method should I choose to now compute the overall rating for the review snippet. I researched a litt...Read more

statistics - Example for non-iid data

I've read some papers regarding to non-iid data. Based on Wikipedia, I know what iid (independent and identical distributed) data is but am still confused about non-iid. I did some research but cannot find a clear definition and example of it. Can someone help me on this?...Read more