Skip to content

Join operation with na_matches="na" returns invalid query for Redshift #1521

@krystian8207

Description

@krystian8207

Defining joins for Redshift (e.g. using left_join) with na_matches="na" argument returns query that includes IS NOT DISTINCT FROM statement which is not supported by the driver.

db <- DBI::dbConnect(
  RPostgres::Redshift(),
  dbname   = Sys.getenv("REDSHIFT_DBNAME"),
  host     = Sys.getenv("REDSHIFT_HOST"),
  port     = Sys.getenv("REDSHIFT_PORT"),
  user     = Sys.getenv("REDSHIFT_USERNAME"),
  password = Sys.getenv("REDSHIFT_PASSWORD")
)
my_tbl <- dplyr::tbl(db, "mytbl")
join_query <- left_join(my_tbl, my_tbl, na_matches = "na")

join_query |> dplyr::show_query()
#> <SQL>
#>   SELECT "mytbl_LHS".*
#>   FROM "mytbl" AS "mytbl_LHS"
#> LEFT JOIN "mytbl" AS "mytbl_RHS"
#> ON (
#>   "mytbl_LHS"."id" IS NOT DISTINCT FROM "mytbl_RHS"."id" 
#> )

Trying to collect the data responds with the following error:

dplyr::collect(join_query)
#> Error in `dplyr::collect()`:
#> ! Failed to collect lazy table.
#> Caused by error:
#> ! Failed to prepare query: ERROR:  syntax error at or near "DISTINCT"
#> LINE 5:     "mytbl_LHS"."id" IS NOT DISTINCT FROM "mytbl...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions