Intuition for higher moments in circular statistics

In circular statistics, the expectation value of a random variable $Z$ with values on the circle $S$ is defined as$$m_1(Z)=\int_S z P^Z(\theta)\textrm{d}\theta$$(see wikipedia).This is a very natural definition, as is the definition of the variance$$\mathrm{Var}(Z)=1-|m_1(Z)|.$$So we didn't need a second moment in order to define the variance!Nonetheless, we define the higher moments$$m_n(Z)=\int_S z^n P^Z(\theta)\textrm{d}\theta.$$I admit that this looks rather natural as well at first sight, and very similar to the definition in linear statis...Read more

mathematical statistics - intuition for moments about the mean of a distribution?

can someone provide an intuition on why the higher moments of a probability distribution p(x) like the third and fourth moments correspond to skewness and kurtosis, respectively?specifically, why does the deviation about the mean raised to the 3rd or 4th power end up translating into a measure of skewness and kurtosis? Is there a way to relate this to the third or fourth derivatives of the function? consider this definition of kurtosis:$Kurtosis(X) = E[(x - \mu_{X})^4] / \sigma^4$again, not clear why raising $(x-\mu)^4$ gives "peakedness" or wh...Read more

mathematical statistics - What is the heaviest tail possible for a continuous normalizable distribution?

The heaviest tailed smooth normalizable continuous distributions that I am familiar with are those with fat power-law tails $\frac{1}{x^{1+\alpha}}$, e.g. a Pareto with $\alpha\rightarrow 0^+$ or a Student's t with $\nu\rightarrow 0^+$, but are there distributions with heavier tails? I am curious about what is the worst case possible for a distribution that decreases monotonically away from a peak positive value towards a minimum of 0.I think that the heaviest possible normalizable heavy tails are indeed those asymptotic to $\frac{k}{x}$ as $x...Read more

mathematical statistics - Asking help with Taylor approximation of expectation of ratio

I am trying to understand how I should approach the problem of a Taylor approximation to the expectation of the ratio of two random variables. In my particular problem I am concerned with the following ratio estimated using a sample of size $n$$$\hat{\gamma_i}=\frac{x_i\sum_{i=1}^{n} y_i}{\sum_{i=1}^{n} x_i}=\frac{x_i\bar{y}}{\bar{x}}$$We may assume for simplicity $E(x_i)=\mu_x$ and $E(y_i)=\mu_y$, but we may not have $E(x_iy_i) \ne E(x_i)E(y_i)$. I try to find $E(\hat{\gamma})$. How should I approach this problem?...Read more

mathematical statistics - How to rigorously define the likelihood?

The likelihood could be defined by several ways, for instance :the function $L$ from $\Theta\times{\cal X}$ which maps $(\theta,x)$ to $L(\theta \mid x)$ i.e. $L:\Theta\times{\cal X} \rightarrow \mathbb{R} $.the random function $L(\cdot \mid X)$we could also consider that the likelihood is only the "observed" likelihood $L(\cdot \mid x^{\text{obs}})$in practice the likelihood brings information on $\theta$ only up to a multiplicative constant, hence we could consider the likelihood as an equivalence class of functions rather than a functionAnot...Read more

mathematical statistics - Null distribution of subspaces similarity, or what is the distribution of $\mathrm{tr}(AA'BB')$?

What is the distribution of $\mathrm{tr}(AA'BB')$ where $A$ and $B$ are two random matrices of $d \times k$ size with orthonormal columns?Maybe the expected value is easier to compute? A fallback solution would be to use a simulation. What would be the most effective scheme? Typical values for $d$ would be around 2000, while $k$ ranges from ~10 to a few hundreds.Below is a more detailed account of my problem and its context, how I ended up to ask this question and what I tried.ContextI want to check if the principal components computed from a s...Read more

references - Path to mathematical statistics without analysis background: ideal textbook for self study

I'm fairly mathematically inclined — had 6 semesters of Math in my undergrad — though I'm a bit out of practice and slow with say partial differential equations and path integrals my concepts come back with a bit of practice. I have not had a course on mathematical proofs (mathematical thinking) or one on analysis.I also understand graduate level probability — have studied it formally and refreshed my knowledge lately.I also have had a couple of graduate level courses on statistics and statistical learning.I want to, out of personal interest, s...Read more

mathematical statistics - Can CCA model any linear transformation?

I have recently been looking into canonical correlation analysis (CCA) as a way to map between different spaces. As I understand it, CCA maps data from both distinct spaces to a common (possibly lower dimensional) space where they can be compared. It works in a similar way to PCA, choosing the direction from each input space which maximises the correlation between datasets, subject to the chosen directions being uncorrelated. Now, the descriptions I've seen suggest that CCA can learn any linear transformation. However, I can't see how it's poss...Read more

mathematical statistics - How should I evaluate the expectation of the ratio of two random variables?

Let $A$ and $B$ be random variables and $f(A,B)=\frac{A}{B}$. How should I approximate $E(f(A,B))$? I think a Taylor expansion may be in order, but I am not sure how to fire it off in this function.My question comes from a practical problem in survey statistics. It may be discussed in textbooks, but I would not know where. Let a sample of size $n$ be taken from an (infinite) population. Not every sample unit may reply to the survey. Let $S$ indicate response ($S=1$) or non-response ($S=0$). The mean estimator $\hat{\mu}=\frac{1}{\sum{S_i}}\sum{...Read more

mathematical statistics - Topologies for which the ensemble of probability distributions is complete

I have been struggling quite a bit with reconciling my intuitive understanding of probability distributions with the weird properties that almost all topologies on probability distributions possess.For example, consider a mixture random variable $X_n$: pick a Gaussian centered at 0 with variance 1, and with probability $\frac{1}{n}$, add $n$ to the result. A sequence of such random variables would converge (weakly and in total variation) to a Gaussian centered at 0 with variance 1, but the mean of the $X_n$ is always $1$ and the variances conve...Read more

mathematical statistics - Can KL-Divergence ever be greater than 1?

I've been working on building some test statistics based on the KL-Divergence,\begin{equation}D_{KL}(p \| q) = \sum_i p(i) \log\left(\frac{p(i)}{q(i)}\right),\end{equation}And I ended up with a value of $1.9$ for my distributions. Note that the distributions have support of $140$K levels, so I don't think plotting out the whole distributions would be reasonable here.What I'm wondering is, is it possible to have a KL-Divergence of greater than 1? A lot of the interpretations I've seen of KL-Divergence are based on an upper bound of 1. If it can ...Read more

How to show this curious combination of Exponential order statistics has a Chi-squared distribution?

Let $X_1, \ldots, X_n$ be i.i.d. exponentially distributed random variables with density$$\eqalign{\theta^{-1} e^{-x/\theta}, &x \ge 0 \\ 0, &x \lt 0} $$and let $Y_i = X_{(i)}$ denote the order statistics such that $Y_1 \leq \cdots \leq Y_n$.How to show that$$ 2\frac{\left(\sum_{i=1}^{r}Y_i\right) + (n-r)Y_r}{\theta}$$ has a chi-square distribution with $2r$ degrees of freedom?I wrote the joint density of $(Y_1,Y_2,...,Y_r)$ but nothing became apparent....Read more

mathematical statistics - Random Variable

Three components are randomly sampled, one at atime, from a large lot. As each component is selected,it is tested. If it passes the test, a success (S) occurs; ifit fails the test, a failure (F) occurs.Assume that 80%of the components in the lot will succeed in passing thetest. Let X represent the number of successes amongthe three sampled components.What are the possible values for X? And There Probabilities ?...Read more