Skip to content

Parse String to UUID #1006 #1287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 3, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ import java.time.format.DateTimeFormatterBuilder
import java.time.temporal.Temporal
import java.time.temporal.TemporalQuery
import java.util.Locale
import java.util.UUID
import kotlin.properties.Delegates
import kotlin.reflect.KClass
import kotlin.reflect.KType
Expand All @@ -62,6 +63,8 @@ import java.time.LocalDate as JavaLocalDate
import java.time.LocalDateTime as JavaLocalDateTime
import java.time.LocalTime as JavaLocalTime



private val logger = KotlinLogging.logger { }

internal interface StringParser<T> {
Expand Down Expand Up @@ -491,6 +494,15 @@ internal object Parsers : GlobalParserOptions {
posixParserToDoubleWithOptions,
// Boolean
stringParser<Boolean> { it.toBooleanOrNull() },
//UUID
stringParser<UUID> {str ->
try{
UUID.fromString(str)
} catch(e: IllegalArgumentException){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this approach works, the parse() function can make this piece of code be called very often (if you try to parse a column with 100 Strings, it gets called 100 times). Exceptions are meant for exceptional cases, not for default flows, as they are quite heavy to throw (a stacktrace needs to be built every time).
For this reason, we tried to avoid throwing exceptions for the other types we can parse, like dates etc. I think for UUIDs we can do the same thing :)

We could namely check the string by a Regex first, return null when the regex doesn't match, and call UUID.fromString() only when the regex does match. You can keep the try {} catch() around it for safety of course. This will save a lot of stack traces being created :)

I found this regex: "[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}" in several places online; I think it's safe to use it, but if you're unsure, please add some more tests.

null
}
},

// BigInteger
stringParser<BigInteger> { it.toBigIntegerOrNull() },
// BigDecimal
Expand Down
23 changes: 23 additions & 0 deletions core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/parse.kt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package org.jetbrains.kotlinx.dataframe.api

import io.kotest.matchers.should
import io.kotest.matchers.shouldBe
import io.kotest.matchers.shouldNotBe
import kotlinx.datetime.DateTimeUnit
import kotlinx.datetime.Instant
import kotlinx.datetime.LocalDate
Expand All @@ -18,6 +19,7 @@ import org.jetbrains.kotlinx.dataframe.impl.catchSilent
import org.jetbrains.kotlinx.dataframe.type
import org.junit.Test
import java.util.Locale
import java.util.UUID
import kotlin.random.Random
import kotlin.reflect.typeOf
import kotlin.time.Duration
Expand Down Expand Up @@ -481,6 +483,27 @@ class ParseTests {
df.parse()
}

@Test
fun `parse valid UUID`() {
val uuidString = "550e8400-e29b-41d4-a716-446655440000"
val column by columnOf(uuidString)
val parsed = column.parse()

parsed.type() shouldBe typeOf<UUID>()
(parsed[0] as UUID).toString() shouldBe uuidString
}

@Test
fun `parse invalid UUID`(){
val invalidUUID = "this is not a UUID"
val column = columnOf(invalidUUID)
val parsed = column.tryParse() // tryParse as string is not formatted.

parsed.type() shouldNotBe typeOf<UUID>()
parsed.type() shouldBe typeOf<String>()
}


/**
* Asserts that all elements of the iterable are equal to each other
*/
Expand Down
Loading