Better search result order

It’s about the priority and purpose of space in search criteria. In natural language it serves as a word adjacency operator (as can be seen in this sentence) whereas starting with Alta Vista and through Yahoo!, Ask Jeeves, and now either Google or DuckDuckGo it has been downgraded to an implicit OR operator. Those of us who work(ed) in text retrieval and good search systems the natural language version is preferable and the best choice.

1 Like

[EDITED. I was totally wrong. My tests were coincidences.]

Edited /deleted to make the thread clearer for anyone reading it in the future.

Same — I don’t see any special treatment for adjacency.

That request was a little buried — I overlooked it myself (the sentence primarily asks for it as default). It might be worth making a separate request for it, as this one has gotten a bit muddled.

1 Like

I’m embarrassed. You’re right. I just had a coincidence that the patterns I tested were the most recent files, and I sort by modified time. Sorry folks!

EDITED MY POST: I think I might be wrong. After more testing, I think I get what you were originally saying @Calion apologies for not getting it right away. I was too focused on the syntax.

There is this old thread about sorting relevancy. Sort search results by relevance (and what relevance is)

But now I see that it IS nearly impossible to search for a note that contains 3 separate words, and only if it contains those 3 words. Quotes only works if the phrase is contiguous. That isn’t about relevancy. It’s difficult just to filter out things.


I’ve been testing Omnisearch to see if it helps, but for example, if I search “rare” and other terms, I get a lot of results for “are”. But the description says it has smarter weighting. So I’m going to spend some time experimenting with the plugin’s settings. It has a lot of options. There may be other search plugins too.

2 Likes

It baffled me before Google why this wasn’t the default, and it baffles me now that it is no longer the default. I understand some fuzziness for misspellings, different word orders, and even the possibility of not having every word in the results, but I completely fail to understand the thinking behind using space as an implicit OR. Does anyone like the results that gives?

@Calion Good idea! So, to be clear, are you suggesting that the results follow the following order?

  1. Notes that contain exact matches (if they exist)
  2. Notes that contain each and every word in the search phrase (using the implicit AND)
  3. Notes that contain some or even just one of the words in the search phrase (using the implicit OR)

Additionally, I was wondering if it might be helpful to add another set between 1 and 2. These could be notes that contain all search words (using the implicit AND) that also contain partial exact phrase matches.

Of course, once we start imagining the perfect search algorithm, many rabbit holes open up. But, in a software such as Obsidian that doesn’t impose the need for strict organization via folders, the search should be our best friend. I would find the partial phrase match in conjunction with the suggested order above especially useful during times where I am querying some phrase I remember writing but cannot recall it exactly.

Come to think of it, maybe another set between 3 and 3 could use the implicit OR, but weighted to match those with more of the search words and also weighted to favor those with the largest partial exact phrase matches.

Anyways, I really appreciate the request and wish it good luck!

Nothing to be embarrassed about. So much going on in Obsidian and so many parameters and variables. Thanks for clarifying. :pray:

That sounds good (including your 1.5). Except if option 3 does not already exist, I am not requesting to add it.

1 Like

Option 3 does not already exist. Option 2 alone is the current behavior.

3 Likes

So this is my current understanding: And apologies if I’m repeating what you all were saying the whole time. Search results are only by the logic of whether or not that search is true or not. And then sorted by your sort settings. (using block:(), line:(), or section:() operators can help narrow down by proximity. But it still doesn’t change the sorting.)

For example, if I search “anything can be everything” (no quotes), I have a book with that exact phrase. My results are all notes that contain ALL those words, which is logically correct. But they do not get sorted by whether they are contiguous or not, or how close in proximity they are. They only get sorted by the sorting option in the search. So that book does not show at the top of my results.

So correct me if I’m wrong: The actual intended feature request here seems to be adding sorting for relevancy. And that discussion probably belongs in that other feature request.

I’m still not sure about Omnisearch. It has some relevancy sorting, but I can still get results that don’t weight by order/proximity inside content. It seems more weighted by title and header.

2 Likes

I think this is right and combining the FRs makes sense, but I’d like to hear if the requesters agree!

2 Likes

As someone who used to work in text retrieval, I disagree because as I understand this it is a request to treat space as an word adjacency operator not as an implicit OR. The other suggestions that word adjacency hits are sorted before AND-in-the-same-document. before either but not both opearands are found, before only on term is found are not also a request for relevance ranking. Unless one knows and understands the algorithm being used relevance ranking is rank.

2 Likes

Again, space is an implicit AND, not an implicit OR.

3 Likes

Maybe but it should be explicit word adjacency by 1.

I think either way works. I am nothing like an expert in this particular field; I just want my searches to bring up the results I want at the top. I think my basic preference is, as @glimfeather said, " to treat space as an word adjacency operator not as an implicit [AND].”

But that’s just a knee-jerk, uninformed bias. I’d have to see the results of this approach vs. changing the sort order of results to have a meaningful opinion. Could anyone lay out a bit more clearly what the different results might look like?

Of course, I assume changing the sort order is a vastly simpler solution to implement than to re-code the search function.

OK, I’ve changed the post title to hopefully reduce misunderstanding if anyone new comes in (for posterity: was “Please use default AND in search”).

1 Like

Specificity is better, though I, too, have had trouble figuring out exactly what the problem is here.

Another attempt: results should rank matches higher when they contain the search terms in closer proximity. Does that catch it?

1 Like

Yeah, I went with “vague but accurate” as a stopgap. The original title was based on a misunderstanding.

I really think a new feature request should be posted once the discussion settles (or maybe before). This one, you have to read too many comments to understand.

That’s probably closer, tho I’m not sure the request has fully solidified.

2 Likes