Skip to content

Conversation

codluca
Copy link
Member

@codluca codluca commented Aug 28, 2025

Description

Add Top-N (ORDER BY + LIMIT) pushdown support for the MongoDB connector.

Additional context and related issues

MongoDB considers null values to be less than any other value. We can handle sort items with SortOrder.ASC_NULLS_LAST or SortOrder.DESC_NULLS_FIRST with a MongoDB aggregation pipeline, where we add computed fields to sort correctly.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(X ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Implement feature. ({issue}`26366`)

@cla-bot cla-bot bot added the cla-signed label Aug 28, 2025
@github-actions github-actions bot added the mongodb MongoDB connector label Aug 28, 2025
@codluca codluca requested a review from ebyhr August 28, 2025 07:53
@ebyhr ebyhr requested a review from chenjian2664 August 28, 2025 07:57
@codluca codluca marked this pull request as draft August 29, 2025 10:08
Add Top-N (ORDER BY + LIMIT) pushdown support for the MongoDB connector.
MongoDB considers null values to be less than any other value.
We can handle sort items with SortOrder.ASC_NULLS_LAST or SortOrder.DESC_NULLS_FIRST
with a MongoDB aggregation pipeline, where we add computed fields to sort correctly.
We add sort fields to the projection, so that they are considered by MongoDB.
We add Top-N pushdown support only for simple nested queries (that work on the same collection).
@codluca codluca force-pushed the 26366 branch 2 times, most recently from 8973aee to 0331d72 Compare August 29, 2025 13:27
@codluca codluca marked this pull request as ready for review August 29, 2025 15:57
@codluca
Copy link
Member Author

codluca commented Aug 29, 2025

The pull request is ready for review. Thanks!

OptionalInt.of(toIntExact(limit))),
true,
false));
}

@Override
public Optional<TopNApplicationResult<ConnectorTableHandle>> applyTopN(ConnectorSession session, ConnectorTableHandle table, long topNCount, List<SortItem> sortItems, Map<String, ColumnHandle> assignments)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put each variable on a new line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

Fixed.

sortDocument.append(columnName, direction);
}
List<MongoTableSort> tableSortList = handle.sort().orElse(new ArrayList<>());
MongoTableSort tableSort = new MongoTableSort(sortDocument, sortNullFieldsDocument == null ? Optional.empty() : Optional.of(sortNullFieldsDocument), toIntExact(topNCount));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use Optional.ofNullable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

if (sortItem.getSortOrder() == SortOrder.ASC_NULLS_LAST || sortItem.getSortOrder() == SortOrder.DESC_NULLS_FIRST) {
String sortColumnName = "_sortNulls_" + columnName;
Document condition = new Document();
condition.append("$cond", List.of(new Document("$eq", new ArrayList<>(Arrays.asList("$" + columnName, null))), 1, 0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
condition.append("$cond", List.of(new Document("$eq", new ArrayList<>(Arrays.asList("$" + columnName, null))), 1, 0));
condition.append("$cond", ImmutableList.of(new Document("$eq", Arrays.asList("$" + columnName, null)), 1, 0));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Document sortNullFieldsDocument = null;
for (SortItem sortItem : sortItems) {
String columnName = sortItem.getName();
int direction = (sortItem.getSortOrder() == SortOrder.ASC_NULLS_FIRST || sortItem.getSortOrder() == SortOrder.ASC_NULLS_LAST) ? 1 : -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we give 1 and -1 a meaningful name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

.toList();
builder.append(", orderBy = ").append(sortEntries);
if (sortNullFields.isPresent()) {
List<String> nullFields = sortNullFields.get().keySet().stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ImmutableList.copyOf

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

for (String sortField : tableSort.sort().keySet()) {
// Sorting on the field does not work unless we add it to the projection
if (!projection.containsKey(sortField)) {
projection.append(sortField, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, it's not obvious for the meaning of 1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

public Optional<TopNApplicationResult<ConnectorTableHandle>> applyTopN(
ConnectorSession session,
ConnectorTableHandle table,
long topNCount,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the topNCount is more than range of integer ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added check similar to applyLimit method

        // MongoDB doesn't support topN number greater than integer max
        if (topNCount > Integer.MAX_VALUE) {
            return Optional.empty();
        }```

MongoColumnHandle columnHandle = (MongoColumnHandle) columnHandleEntry.getValue();
if (!columnHandleEntry.getKey().equals(columnHandle.baseName())) {
// We don't support complex nested queries
return Optional.empty();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about for dbRef column type ?

Copy link
Member Author

@codluca codluca Sep 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, applyTopN is called only for one table, no JOIN. The dbRef from MongoDB would be a reference to another collection, what we achieve with a JOIN. If it's something else, please clarify.
I wrote a simple test, and the execution doesn't enter the applyTopN method.

     * Test topN dbRef.
     */
    @Test
    public void testJoinTopNDbRef() {
        Session session = Session.builder(getSession())
                .setSystemProperty(IGNORE_STATS_CALCULATOR_FAILURES, "false")
                .build();

        assertQuery(
                session,
                "SELECT n.name, c.name " +
                        "FROM nation n " +
                        "JOIN customer c ON c.nationkey = n.nationkey " +
                        "ORDER BY n.nationkey, c.name LIMIT 2");
    }```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed mongodb MongoDB connector
Development

Successfully merging this pull request may close these issues.

3 participants