Skip to content

Commit

Permalink
[MINOR] Clarify that xxhash64 seed is 42
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

State that the hash seed used for xxhash64 is 42 in docs.

### Why are the changes needed?

It's somewhat non-standard not seed to 0. Users would have to know this seed to reproduce the hash value.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes apache#38010 from srowen/Hash42.

Authored-by: Sean Owen <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
  • Loading branch information
srowen authored and zhengruifeng committed Sep 27, 2022
1 parent f2ba6b5 commit 311a855
Show file tree
Hide file tree
Showing 4 changed files with 5 additions and 4 deletions.
2 changes: 1 addition & 1 deletion R/pkg/R/functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -974,7 +974,7 @@ setMethod("hash",
#' @details
#' \code{xxhash64}: Calculates the hash code of given columns using the 64-bit
#' variant of the xxHash algorithm, and returns the result as a long
#' column.
#' column. The hash computation uses an initial seed of 42.
#'
#' @rdname column_misc_functions
#' @aliases xxhash64 xxhash64,Column-method
Expand Down
2 changes: 1 addition & 1 deletion python/pyspark/sql/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -5092,7 +5092,7 @@ def hash(*cols: "ColumnOrName") -> Column:

def xxhash64(*cols: "ColumnOrName") -> Column:
"""Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm,
and returns the result as a long column.
and returns the result as a long column. The hash computation uses an initial seed of 42.
.. versionadded:: 3.0.0
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -643,7 +643,8 @@ object Murmur3HashFunction extends InterpretedHashFunction {
* A xxHash64 64-bit hash expression.
*/
@ExpressionDescription(
usage = "_FUNC_(expr1, expr2, ...) - Returns a 64-bit hash value of the arguments.",
usage = "_FUNC_(expr1, expr2, ...) - Returns a 64-bit hash value of the arguments. " +
"Hash seed is 42.",
examples = """
Examples:
> SELECT _FUNC_('Spark', array(123), 2);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2569,7 +2569,7 @@ object functions {
/**
* Calculates the hash code of given columns using the 64-bit
* variant of the xxHash algorithm, and returns the result as a long
* column.
* column. The hash computation uses an initial seed of 42.
*
* @group misc_funcs
* @since 3.0.0
Expand Down

0 comments on commit 311a855

Please sign in to comment.