-
When I run SQL on a parquet file's column name has both Upper and Lower letter, I had to use double quotation marks for them. from datafusion import SessionContext
ctx = SessionContext()
ctx.register_parquet('a', 'a.parquet')
df = ctx.sql('select * from a')
df
DataFrame()
+------+-----+
| User | Age |
+------+-----+
| Tom | 12 |
| Jack | 30 |
+------+-----+
df = ctx.sql('select "Age" from a')
df
DataFrame()
+-----+
| Age |
+-----+
| 12 |
| 30 |
+-----+ if I don't use double quotation marks, it reports following error. >>> df = ctx.sql('select Age from a')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Exception: DataFusion error: SchemaError(FieldNotFound { field: Column { relation: None, name: "age" }, valid_fields: [Column { relation: Some(Bare { table: "a" }), name: "User" }, Column { relation: Some(Bare { table: "a" }), name: "Age" }
] }, Some("")) sometimes, it's diffcult to write a sql when there are many those fieldnames. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 4 replies
-
I think you can set set datafusion.sql_parser.enable_ident_normalization = false; |
Beta Was this translation helpful? Give feedback.
-
thanks, could you tell me how to set it in python CLI? |
Beta Was this translation helpful? Give feedback.
-
seems no effect after setting >>> import datafusion
>>> sc=datafusion.SessionConfig(config_options=None)
>>> sc.set("datafusion.sql_parser.enable_ident_normalization" , "false")
<datafusion.SessionConfig object at 0x3000c15ec0>
>>> from datafusion import SessionContext
>>> ctx = SessionContext()
>>> df = ctx.sql('create table x as select 1 "A"')
>>> df2 = ctx.sql('select * from x')
>>> df2
DataFrame()
+---+
| A |
+---+
| 1 |
+---+
>>> df2 = ctx.sql('select a from x')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Exception: DataFusion error: SchemaError(FieldNotFound { field: Column { relation: None, name: "a" }, valid_fields: [Column { relation: Some(Bare { table: "x" }), name: "A" }] }, Some("")) |
Beta Was this translation helpful? Give feedback.
-
passed sc, still not work >>> sc.set("datafusion.sql_parser.enable_ident_normalization" , "false")
<datafusion.SessionConfig object at 0x300078bf10>
>>> ctx = SessionContext(sc)
>>> df = ctx.sql('create table x as select 1 "A"')
>>> df2 = ctx.sql('select a from x')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Exception: DataFusion error: SchemaError(FieldNotFound { field: Column { relation: None, name: "a" }, valid_fields: [Column { relation: Some(Bare { table: "x" }), name: "A" }] }, Some(""))
>>> |
Beta Was this translation helpful? Give feedback.
I think you can set
datafusion.sql_parser.enable_ident_normalization
to false (docs here https://arrow.apache.org/datafusion/user-guide/configs.html)