probability - Measuring distance between vectors

I have a set of 300.000 or so vectors which I would like to compare in some way, and given one vector I want to be able to find the closest vector I have thought of three methods.Simple Euclidian distanceCosine similarityUse a kernel (for instance Gaussian) to calculate the Gram matrix.Treat the vector as a discrete probability distribution (which makessense to do) and calculate some divergence measure.I do not really understand when it is useful to do one rather than the other. My data has a lot of zero-elements. With that in mind, is there so...Read more

probability - Another Variance and Expectation of diffrent portfolios

A and B are two competing companies. An investor decides whether to buy(a) 100 shares of A, or(b) 100 shares of B, or(c) 50 shares of A and 50 shares of B.A profit made on 1 share of A is a random variable X with the distribution P(X = 2) = P(X =-2) = 0.5.A profit made on 1 share of B is a random variable Y with the distribution P(Y =4) = 0.2, P(Y = -1) = 0.8.If X and Y are independent, compute the expected value and variance of the total profit for strategies (a), (b), and (c).--- For E(X) for both A and B I get: EA(X) =(2)(.5) + (-...Read more

logistic regression - Getting uncalibrated probability outputs with Vowpal Wabbit, ad-conversion prediction

I'm trying to use Vowpal Wabbit to predict conversion rate for ads display and I'm getting non-intuitive probability outputs, which are centered at around 36% when the global frequency of the positive class is less than 1%.The positive/negative imbalance I have in my dataset is 1/100 (I already undersampled the negative class), so I use a weight of 100 in the positive examples.Negative examples have label -1, and positive ones 1. I used shuf to shuffle positive and negative examples for online learning to work properly.Sample lines in the vw fi...Read more

svm - Is there a way to convert discriminant values in SVMLight multi-class classification into probability scores

I am using SVM Light multi-class classifier for training a classifier with four classes. In the classification stage the classifier outputs the predicted label and the scores for the 4 classes. As the SVM Light website says, these scores are "the discriminant values for each of the k classes". I want to show the probability value of each of the class to the users. So I was wondering if there is some mathematical trick or some other way using which I can "convert" these values into probability values or at least into a normalised score in betwee...Read more

probability - Define hidden markov model for word

I'm attempting to define a hidden markov model and predict if given sequence of words is correct using Viterbi algorithm ( ). In order to aid understanding I've attempted to define the model paramters : The letters in the corpus are abbd. From this I've defined : states : a,b,b,dtrans_p (transition probabilities) : There are a : 1/4 b : 2/4 d : 1/4emit_p (emission probabilities) : count(a->b) / count(a) = 1/1 = 1 count(b->b) / count(b) = 1/2 = 1/2 count(b->d) / count(b) = 1/2 = 1/2Is abo...Read more

probability - Predict.proba in Naive Bayes

I've question about how to convert the result of predict.proba in Naive Bayes into percent. I've already try some but failed. I wanna get the result become like 50%, 100%. This is the sample of my codeimport numpy as npfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import classification_reportfrom sklearn.naive_bayes import GaussianNBfrom sklearn.metrics import confusion_matriximport pandas as pdimport matplotlib.pyplot as pltfrom sklearn.preprocessing import LabelEncoderimport'ggplot')class bay...Read more

How to extract meaningful noun phrases based on probability using OpenNlp's chunking parser

I am newbie to Natural Language processing. I need to extract meaningful noun and noun phrases based on their probability (eg. 75% and above) to make a auto-suggest dictionary. I have been reading on-line posts, articles for a couple of days, but only found pieces of informations. I am thinking to use en-parser-chunking.binmodel. Could someone recommend good resources/examples that cover a use case similar to above?Where I stand now:Model = en-parser-chunking.binString line = "Tutorialspoint is the largest tutorial library.";Tree object (outpu...Read more

probability - Design of Bayesian networks: Understanding the difference between "States" and "Nodes"

I'm designing a small Bayesian Network using the program "Hugin Lite".The problem is that I have difficulty understanding the difference between "Nodes"(visual circles) and "States"(witch are the "fields" of a node).I will write an example where it is clear,and another which I can't understand.The example I understand:There are two women (W1 and W2) and one men (M).M get a child with W1. Child's name is: C1Then M get a child with W2. Child's name is: C2The resulting network is:The four possibles STATES of every Node (W1,W2,M,C1,C2) are:AA: the ...Read more

Bayesian Probability

I have probability P(A|B=T,C=F,D=F,G=T) is this same as computing P(A|B=T)*P(A|C=F)*P(A|D=F) *P(A|G=T) ? P(A|B=T,C=F,D=F,G=T)=P(A|B=T)*P(A|C=F)*P(A|D=F) *P(A|G=T) ? where A is the child of B, C, D, G thanks!...Read more

probability - Markov entropy when probabilities are uneven

I've been thinking about information entropy in terms of the Markov equation:H = -SUM(p(i)lg(p(i)), where lg is the base 2 logarithm.This assumes that all selections i have equal probability. But what if the probability in the given set of choices is unequal? For example, let's say that StackExchange has 20 sites, and that the probability of a user visiting any StackExchange site except StackOverflow is p(i). But, the probability of a user visiting StackExchange is 5 times p(i).Would the Markov equation not apply in this case? Or is there an ad...Read more

probability - number possible outcomes of an event

i need to calculate possible number of outcomes with detail screens.the detail are: we have 1 textbox in which there has to enter any number from 0 to 7. There are 13 categories of the outcomes but average of all outcomes should be equal to the number entered in the texbox.for example : textbox : __enter a number from 1 to 7__(if 3)______.categories 1: 1, 2, 3, 4, 5, 6, 7categories 2: 1, 2, 3, 4, 5, 6, 7 categories 3: 1, 2, 3, 4, 5, 6, 7categories 4: 1, 2, 3, 4, 5, 6, 7categories 5: 1, 2, 3, 4, 5, 6, 7categories 6: 1, 2, 3, 4, 5, 6, 7cat...Read more