retrieve topic-word array & document-topic array from lda gensim

Situation:I have a numpy term-document matrix example: [[0,1,0,0....],....[......0,0,0,0]].I have plugged in the above matrix to the ldamodel method of the gensim. And it is working fine with the lad method lda = LdaModel(corpus, num_topics=10). corpus is my term-document matrix mentioned above.I needed two intermediate matrices( topic-word array & document-topic array) for research purpose. 1) per document-topic probability matrix (p_d_t) 2) per topic-word probability matrix (p_w_t)Question:How to get those array from the gensim LdaMod...Read more

LDA topic modeling - Training and testing

I have read LDA and I understand the mathematics of how the topics are generated when one inputs a collection of documents.References say that LDA is an algorithm which, given a collection of documents and nothing more (no supervision needed), can uncover the “topics” expressed by documents in that collection. Thus by using LDA algorithm and the Gibbs Sampler (or Variational Bayes), I can input a set of documents and as output I can get the topics. Each topic is a set of terms with assigned probabilities.What I don't understand is, if the above...Read more

lda - Latent Dirichlet Allocation Solution Example

I am trying to learn about Latent Dirichlet Allocation (LDA). I have basic knowledge of machine learning and probability theory and based on this blog post http://goo.gl/ccPvE I was able to develop the intuition behind LDA. However I still haven't got complete understanding of the various calculations that goes in it. I am wondering can someone show me the calculations using a very small corpus (let say of 3-5 sentences and 2-3 topics)....Read more

lda - how to predict topics for a batch of documents with mallet

I am using mallet from a scala project. After training the topic models and got the inferencer file, I tried to assign topics to new texts. The problem is I got different results with different calling methods. Here are the things I tried:creating a new InstanceList and ingest just one document and get the topic results from the InstanceListsomecontentList.map(text=>getTopics(text, model))def getTopics(text:String, inferencer: TopicInferencer):Array[Double]={ val testing = new InstanceList(pipe) testing.addThruPipe(new Instance(text, n...Read more

How to inference the topic distribution of a new document with LDA/pLSA?

I have a question when using topic models like pLSA/LDA: how to inference the topic distribution of a new document after we got the distribution for each words in each topics? I have tried "fold-in" Gibbs Sampling when using LDA, but when the unseen document is very short this method doesn't work because the randomness assignment of the topic to each words contained in the document. For example, considering a model with two topics, there's a token w which p(w|z1)=0.09 and p(w|z2) = 0.01. Then a document which contains only one word w, it's p(z|...Read more

lda - raise_FirstSetError in SpaCy topic modeling

I want to create a LDA topic model and am using SpaCy to do so, following a tutorial. The error I receive when I try to use spacy is one I cannot find on google, so I'm hoping someone here knows what it's about.I'm running this code on Anaconda:import numpy as npimport pandas as pdimport re, nltk, spacy, gensim# Sklearnfrom sklearn.decomposition import LatentDirichletAllocation, TruncatedSVDfrom sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizerfrom sklearn.model_selection import GridSearchCVfrom pprint import pprint# Plott...Read more

topic modeling - Infer LDA models

I'm new to LDA and topic modeling and I would like to understand the inference mechanism.I would like to apply LDA on activity recognition.Say that I have defined 10 topics composed by a probability distribution of events.for example TOPIC_1 = event1 (0.5), event2 (0.4), event3 (0.0), event4 (0.0) and event5 (0.1).I would like to uderstand wich topics are active across the day of a person.One day of a person is composed by a sequence of events sampled every minutes.What I'm doing to see wich topic is active is:1) select 1 hour window in the dai...Read more

lda - MALLET Ranking of Words in a topic

I am relatively new to mallet and need to know:- are the words in each topic that mallet produces rank ordered in some way?- if so, what is the ordering (i.e.) is 1st in a topic list the one with the highest distribution across the corpus?Thanks!...Read more

Lda using mallet

I run the file simple lda.java and I got exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0 at cc.mallet.topics.SimpleLDA.main(SimpleLDA.java:560)...Read more

topic modeling - Why do we need the hyperparameters beta and alpha in LDA?

I'm trying to understand the technical part of Latent Dirichlet Allocation (LDA), but I have a few questions on my mind:First: Why do we need to add alpha and gamma every time we sample the equation below? What if we delete the alpha and gamma from the equation? Would it still be possible to get the result? Second: In LDA, we randomly assign a topic to every word in the document. Then, we try to optimize the topic by observing the data. Where is the part which is related to posterior inference in the equation above?...Read more

lda - Topic models evaluation in Gensim

I've been experimenting with LDA topic modelling using Gensim. I couldn't seem to find any topic model evaluation facility in Gensim, which could report on the perplexity of a topic model on held-out evaluation texts thus facilitates subsequent fine tuning of LDA parameters (e.g. number of topics). It would be greatly appreciated if anyone could shed some light on how I can perform topic model evaluation in Gensim. This question has also been posted on metaoptimize....Read more

lda - topic proportions in my corpus?

Thanks for reading and taking the time to think about and respond to this.I am using Gensim's wrapper for Mallet (ldamallet.py), and it works like a charm. I need to get the topic proportions for my corpus (over all my documents) and I do not know how to do that. model.alpha is not it as it is not normalized to 1. Plus, alpha contains my Dirichlet parameters, and not the topic proportions. Am I correct?Any help is much appreciated....Read more