Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-48596][SQL] Perf improvement for calculating hex string for long
### What changes were proposed in this pull request? This pull request optimizes the `Hex.hex(num: Long)` method by removing leading zeros, thus eliminating the need to copy the array to remove them afterward. ### Why are the changes needed? - Unit tests added - Did a benchmark locally (30~50% speedup) ```scala Hex Long Tests: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Legacy 1062 1094 16 9.4 106.2 1.0X New 739 807 26 13.5 73.9 1.4X ``` ```scala object HexBenchmark extends BenchmarkBase { override def runBenchmarkSuite(mainArgs: Array[String]): Unit = { val N = 10_000_000 runBenchmark("Hex") { val benchmark = new Benchmark("Hex Long Tests", N, 10, output = output) val range = 1 to 12 benchmark.addCase("Legacy") { _ => (1 to N).foreach(x => range.foreach(y => hexLegacy(x - y))) } benchmark.addCase("New") { _ => (1 to N).foreach(x => range.foreach(y => Hex.hex(x - y))) } benchmark.run() } } def hexLegacy(num: Long): UTF8String = { // Extract the hex digits of num into value[] from right to left val value = new Array[Byte](16) var numBuf = num var len = 0 do { len += 1 // Hex.hexDigits need to be seen here value(value.length - len) = Hex.hexDigits((numBuf & 0xF).toInt) numBuf >>>= 4 } while (numBuf != 0) UTF8String.fromBytes(java.util.Arrays.copyOfRange(value, value.length - len, value.length)) } } ``` ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? no Closes #46952 from yaooqinn/SPARK-48596. Authored-by: Kent Yao <[email protected]> Signed-off-by: Kent Yao <[email protected]>
- Loading branch information