Skip to content

Message search: substring matching #1437

@jhwheeler

Description

@jhwheeler

Problem

Stream Chat's message search seems to lack support for continuous substring/text matching. The current operators either match individual tokenized words ($q) or require exact full string matches ($eq). Unless I'm missing something -- which I hope I am!

Current Behavior

When searching for the phrase "a b", using text: { $q: "a b" }, we get these results:

  • "a b" (good match)
  • "a b c" (good match)
  • "a" (unwanted match)
  • "a message" (unwanted match)
  • "in a row" (unwanted match)

Even wrapping in quotes doesn't help:

text: { $q: '"a b"' }

Desired Behavior

We need to match messages where the search string appears as a continuous substring. For example, searching for "a b" should match:

"a b"
"a b c"
"hello a b there"

But NOT match:

"a thing b"
"this is a message with b"

Why Client-Side Filtering Isn't an Ideal Solution

While results could be filtered client-side, this breaks pagination:

  1. User searches for "a b"
  2. Server returns first page of 100 results, many without the exact phrase
  3. After client-side filtering, might only have 20 actual matches
  4. When user scrolls, we need multiple page requests to get enough true matches
  5. Results in excessive API calls and inaccurate result counts

Current Implementation

const messageFilter: MessageFilters = {
  text: { $q: `"${searchTerm}"` }
};

const sort: SearchMessageSort<DefaultGenerics> = [{ created_at: -1 }];

const results = await chatClient.search(
  channelFilter,
  messageFilter,
  {
    sort,
    limit: PAGE_SIZE
  }
);

(I added the quotes around the searchTerm in order to try to force the substring matching, but to no avail.)

Proposal

Hopefully, there is a way to do this already, and I simply haven't been able to figure it out. If so, please disabuse me of my ignorance and let me know how to do this with Stream Chat's existing capabilities.

If not, then I propose we add support for MongoDB's $regex operator for message search. This would allow for flexible pattern matching including continuous substrings:

text: { $regex: "a b" }

This is a well-established pattern that would solve the continuous text matching problem while leveraging existing MongoDB functionality. It would also provide additional flexibility for other search patterns when needed.

Thank you!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions