search - elasticsearch ngrams: why is shorter token matched instead of longer?

I have an index with the following mapping and analyzer:settings: { analysis: { char_filter: { custom_cleaner: { # remove - and * (we don't want them here) type: "mapping", mappings: ["-=>", "*=>"] } }, analyzer: { custom_ngram: { tokenizer: "standard", filter: [ "lowercase", "custom_ngram_filter" ], char_filter: ["custom_cleaner"] } }, filter: { custom_ngram_filter: { type: "nGram", min_gram: 3, ...Read more

search - nGram partial matching & limiting nGram results in multiple field query

Background: I've implemented a partial search on a name field by indexing the tokenized name (name field) as well as a trigram analyzed name (ngram field). I've boosted the name field to have exact token matches bubble up to the top of the results.Problem: I am trying to implement a query that limits the nGram matches to ones that only match some threshold (say 80%) of the query string. I understand that minimum_should_match seems to be what I am looking for, but my problem is forming the query to actually produce those results.My exact token...Read more

nlp - social media search engine question

I came across this site called social mention and am curious about how applications like this work, hopefully somebody can offer some glimpses/suggestions on this. Upon looking at the search results, I realize that they grab results from facebook, twitter, google.... I suppose this is done on the fly, probably through some REST api exposed by the mentioned? If what I mention in point 1 is probably true, does that means sentiment analysis on the documents/links return is done on the fly too? Wouldn't that be too computationally intensive? I am ...Read more

Create an EdgeNGram analyzer supporting both sides in Azure Search

When defining a custom analyzer for Azure Search there is an option of defining a token filter from this list.I am trying to support search of both prefix and infix.For example: if a field contains the name: 123 456, I want the searchable terms to contain:112123233445456566When using the EdgeNGramTokenFilterV2 which seems to do the trick, there is an option of defining a "side" property, but only "front" and "back" are supported, not both.the "front" (default) value generates this list:112123445456and back generates:123233456566I tried using tw...Read more

search - ElasticSearch Highlighting fails on match queries against index with Ngram Analyzer

I have created an index with ngram analyzer set on all fields in the index and custom _all. After indexing few documents, I am trying to query against the index to have suggestion like feature. The output of the query does return results but they are not highlighted.Analyzer Settings:"analysis": { "analyzer": { "my_edgegram_analyzer": { "filter": [ "lowercase" ], "tokenizer": "my_edge_tokenizer" } }, "tokenizer": { "my_edge_tokenizer": { "token_chars": [ "letter", "digit", "punctuation...Read more

search - How to use ngrams matching with Solr

I am learning solr. I want to use ngrams in Solr. For example:If a document contains new york car driver , that document should not return for the following queries:/select?q=york/select?q=new/select?q=new carbut it should return for the following queries /select?q=new york/select?q=car/select?q=driver/select?q=car driver( it should consider New York as a single word for better results.There are word sequences that need consider as single word. eg:-New York,Tom Cruise,etc. These words are predefined; all other words should be treated as normal ...Read more

elasticsearch - Elastic Search Java API Multi match query prefix query on tokens

I am looking for some way that I want to perform search on my index with NativeSearchQueryBuilder from Elastic java api but I want to add the following things while search.Index details:Filter type EdgeNgramWhite space tokenizerI am looking for autocomplete functionality so here i want to apply the search keyword on multiple fields but it should apply using prefix to improve the performance, also I want to the results needs to be returned if they reach my specified page limit instead of keep on searching the index even it found enough results.E...Read more

search - Elasticsearch Failed to find analyzer but it creates index without any error

Here is my index settings json, when I testhttp://localhost:9200/myIndex/_analyze?text="testing the analyzer"&analyzer=nGram_analyzer I am getting the following exception.{ "error": { "root_cause": [ { "type": "remote_transport_exception", "reason": "[Infectia][127.0.0.1:9300][indices:admin/analyze[s]]" } ], "type": "illegal_argument_exception", "reason": "failed to find analyzer [nGram_analyzer]" }, "status": 400}index settings { "myIndex": { "mappings": { "practices": { ...Read more

search - ElasticSearch Issue With Matching Results

I have an issue querying where if 'ford' is in the database and I search for 'fordddddddd' it returns a match. I have ngrams for sub partial matching for queries like 'fo', for', ford' but 'fordddddd' should not match. What could be the issue? Below are my setting, mappings, and query.Settings: settings: { number_of_shards: 1, analysis: { filter: { ngram_filter: { type: 'edge_ngram', min_gram: 2, max_gram: 15 } }, analyzer: { ngram_analyzer: { type: 'custom', ...Read more

search - Higher score for first word in ElasticSearch

Right now my search gives me unwanted results when I search, say for "egg". I get following:_score: 2.7645843_source: django_id: "18003" text: "Bagels, egg" content_auto: "Bagels, egg" django_ct: "web.fooddes" allergies: [] outdated: false id: "web.fooddes.18003"_explanation: value: 2.7645843 description: "weight(_all:egg in 516) [PerFieldSimilarity], result of:" details: - value: 2.7645843 description: "fieldWeight in 516, product of:" details: - value: 1.4142135 description: "tf(freq=2.0), with freq of:" details:...Read more

Twitter like search for users using Elasticsearch and python

I am trying to build a twitter like search for users using elasticsearch and python. That is a search across first_name, last_name and username. I have decided to go in with ngram. This is how the analyzer is configured: settings = { "analysis": { "analyzer": { "ngram_analyzer": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "asciifolding", "mynGram" ] } }, "f...Read more