Skip to content

Latest commit

 

History

History
118 lines (76 loc) · 3.69 KB

File metadata and controls

118 lines (76 loc) · 3.69 KB

Strings

Note: Functions taking Tensor arguments can also take anything accepted by tf.convert_to_tensor.

[TOC]

Hashing

String hashing ops take a string input tensor and map each element to an integer.


tf.string_to_hash_bucket_fast(input, num_buckets, name=None) {#string_to_hash_bucket_fast}

Converts each string in the input Tensor to its hash mod by a number of buckets.

The hash function is deterministic on the content of the string within the process and will never change. However, it is not suitable for cryptography.

Args:
  • input: A Tensor of type string. The strings to assing a hash bucket.
  • num_buckets: An int that is >= 1. The number of buckets.
  • name: A name for the operation (optional).
Returns:

A Tensor of type int64. A Tensor of the same shape as the input string_tensor.


tf.string_to_hash_bucket(string_tensor, num_buckets, name=None) {#string_to_hash_bucket}

Converts each string in the input Tensor to its hash mod by a number of buckets.

The hash function is deterministic on the content of the string within the process.

Note that the hash function may change from time to time.

Args:
  • string_tensor: A Tensor of type string.
  • num_buckets: An int that is >= 1. The number of buckets.
  • name: A name for the operation (optional).
Returns:

A Tensor of type int64. A Tensor of the same shape as the input string_tensor.

Joining

String joining ops concatenate elements of input string tensors to produce a new string tensor.


tf.reduce_join(inputs, reduction_indices, keep_dims=None, separator=None, name=None) {#reduce_join}

Joins a string Tensor across the given dimensions.

Computes the string join across dimensions in the given string Tensor of shape [d_0, d_1, ..., d_n-1]. Returns a new Tensor created by joining the input strings with the given separator (default: empty string). Negative indices are counted backwards from the end, with -1 being equivalent to n - 1. Passing an empty reduction_indices joins all strings in linear index order and outputs a scalar string.

For example:

# tensor `a` is [["a", "b"], ["c", "d"]]
tf.reduce_join(a, 0) ==> ["ac", "bd"]
tf.reduce_join(a, 1) ==> ["ab", "cd"]
tf.reduce_join(a, -2) = tf.reduce_join(a, 0) ==> ["ac", "bd"]
tf.reduce_join(a, -1) = tf.reduce_join(a, 1) ==> ["ab", "cd"]
tf.reduce_join(a, 0, keep_dims=True) ==> [["ac", "bd"]]
tf.reduce_join(a, 1, keep_dims=True) ==> [["ab"], ["cd"]]
tf.reduce_join(a, 0, separator=".") ==> ["a.c", "b.d"]
tf.reduce_join(a, [0, 1]) ==> ["acbd"]
tf.reduce_join(a, [1, 0]) ==> ["abcd"]
tf.reduce_join(a, []) ==> ["abcd"]
Args:
  • inputs: A Tensor of type string. The input to be joined. All reduced indices must have non-zero size.
  • reduction_indices: A Tensor of type int32. The dimensions to reduce over. Dimensions are reduced in the order specified. If reduction_indices has higher rank than 1, it is flattened. Omitting reduction_indices is equivalent to passing [n-1, n-2, ..., 0]. Negative indices from -n to -1 are supported.
  • keep_dims: An optional bool. Defaults to False. If True, retain reduced dimensions with length 1.
  • separator: An optional string. Defaults to "". The separator to use when joining.
  • name: A name for the operation (optional).
Returns:

A Tensor of type string. Has shape equal to that of the input with reduced dimensions removed or set to 1 depending on keep_dims.