Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add skipping index benchmark test #291

Conversation

dai-chen
Copy link
Collaborator

@dai-chen dai-chen commented Mar 20, 2024

Description

This PR introduces a benchmark test focusing on the skipping index data structure. It is positioned between end-to-end benchmarking and low-level microbenchmarking. It aims to provide insights into the read and write performance of different skipping data structure using OpenSearch as index store.

The reasons behind this addition instead of E2E or microbenchmarking are:

  • End-to-End Benchmarking:

    • This type of benchmarking necessitates splitting test data into files and persisting them on disk. While feasible within the current test framework, this process is exceptionally slow for local testing. Future plans might include the addition of another benchmark test class that executes by submitting to a Spark cluster.
    • Additionally, it requires the presence of SQL plugin and UI in OpenSearch Dashboards, alongside the EMR-S Spark environment.
  • Microbenchmarking:

    • The implementation of Flint BloomFilter resides in flint-core, making it straightforward to incorporate into microbenchmarking.
    • However, other skipping data structures are implemented within Spark aggregate functions, posing challenges in isolating and testing the algorithms independently.

TODO

  1. Check if any cache or other factors in Spark and OpenSearch may impact the test result
  2. Include more metrics such as resource consumption, OS index size etc
  3. Analyze test results and update user manual with recommendation

Test Cases

Please find details in the Javadoc on FlintSkippingIndexBenchmark. The test is based on Spark benchmark test framework and will be triggered manually.

Test Results

https://github.com/dai-chen/opensearch-spark/blob/add-skipping-index-benchmark-rebased/docs/benchmark-skipping-index.txt

Test Data (Generated)

The schema and size of skipping index written in the test:

curl "localhost:64541/_cat/indices?v"
health status index                                                                                 uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   flint_benchmark_bloom_filter_cardinality_64_adaptive_false_fpp_0.03_num_items_1000000 krhbKOALQB2Ezxn-9PRCZQ   1   1          1            0    902.3kb        902.3kb
yellow open   flint_benchmark_bloom_filter_cardinality_64_adaptive_true_fpp_0.03_num_candidates_10  qyYpKbY7R0C5plNn16QIXQ   1   1          1            0      4.8kb          4.8kb
yellow open   flint_benchmark_value_set_cardinality_64_max_size_100                                 wdhxAEv4QNSep0YXn3d2Uw   1   1          2            0      7.2kb          7.2kb
yellow open   flint_benchmark_value_set_cardinality_64_max_size_2147483647                          FcbPKJa2TLSKVmjSIOWlMA   1   1          2            0      7.2kb          7.2kb
yellow open   flint_benchmark_min_max_cardinality_64_default                                        WU8cLd6SSSKuppOSRvAREQ   1   1          2            0      6.7kb          6.7kb
yellow open   flint_benchmark_partition_cardinality_1_default                                       55R97W4zR3Kzt55BpgTzAA   1   1          1            0      3.1kb          3.1kb
yellow open   flint_benchmark_bloom_filter_cardinality_64_adaptive_true_fpp_0.03_num_candidates_5   e4sLSBtcROCJ4yIsd406gw   1   1          1            0      4.8kb          4.8kb
yellow open   flint_benchmark_bloom_filter_cardinality_64_adaptive_true_fpp_0.03_num_candidates_15  f9aqpip1TdeIjhT3imJQww   1   1          1            0      4.8kb          4.8kb
yellow open   flint_benchmark_bloom_filter_cardinality_64_adaptive_false_fpp_0.03_num_items_64      ATGUO7OnRi-f570mvWOhEQ   1   1          1            0      3.2kb          3.2kb

curl "localhost:64541/flint_benchmark_*/_mapping?pretty"
{
  "flint_benchmark_bloom_filter_cardinality_64_adaptive_true_fpp_0.03_num_candidates_15" : {
    "mappings" : {
      "properties" : {
        "value" : {
          "type" : "binary",
          "doc_values" : true
        }
      }
    }
  },
  "flint_benchmark_bloom_filter_cardinality_64_adaptive_false_fpp_0.03_num_items_1000000" : {
    "mappings" : {
      "properties" : {
        "value" : {
          "type" : "binary",
          "doc_values" : true
        }
      }
    }
  },
  "flint_benchmark_bloom_filter_cardinality_64_adaptive_true_fpp_0.03_num_candidates_5" : {
    "mappings" : {
      "properties" : {
        "value" : {
          "type" : "binary",
          "doc_values" : true
        }
      }
    }
  },
  "flint_benchmark_value_set_cardinality_64_max_size_2147483647" : {
    "mappings" : {
      "properties" : {
        "value" : {
          "type" : "long"
        }
      }
    }
  },
  "flint_benchmark_bloom_filter_cardinality_64_adaptive_false_fpp_0.03_num_items_64" : {
    "mappings" : {
      "properties" : {
        "value" : {
          "type" : "binary",
          "doc_values" : true
        }
      }
    }
  },
  "flint_benchmark_value_set_cardinality_64_max_size_100" : {
    "mappings" : {
      "properties" : {
        "value" : {
          "type" : "long"
        }
      }
    }
  },
  "flint_benchmark_partition_cardinality_1_default" : {
    "mappings" : {
      "properties" : {
        "value" : {
          "type" : "long"
        }
      }
    }
  },
  "flint_benchmark_min_max_cardinality_64_default" : {
    "mappings" : {
      "properties" : {
        "MinMax_value_0" : {
          "type" : "long"
        },
        "MinMax_value_1" : {
          "type" : "long"
        }
      }
    }
  },
  "flint_benchmark_bloom_filter_cardinality_64_adaptive_true_fpp_0.03_num_candidates_10" : {
    "mappings" : {
      "properties" : {
        "value" : {
          "type" : "binary",
          "doc_values" : true
        }
      }
    }
  }
}

Skipping index data written:

curl "localhost:64541/flint_benchmark_*/_search?pretty"
{
  "took" : 105,
  "timed_out" : false,
  "_shards" : {
    "total" : 9,
    "successful" : 9,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 12,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "flint_benchmark_bloom_filter_cardinality_64_adaptive_false_fpp_0.03_num_items_1000000",
        "_id" : "zItIYo4BM5f06XztvDKX",
        "_score" : 1.0,
        "_source" : {
          "value" : "AAAAA..."
        }
      },
      {
        "_index" : "flint_benchmark_bloom_filter_cardinality_64_adaptive_false_fpp_0.03_num_items_64",
        "_id" : "zYtJYo4BM5f06Xzt4TL2",
        "_score" : 1.0,
        "_source" : {
          "value" : "AAAAAQAAAAUAAAAIXfxLWNX6W3GBF/TxtjFPE7pdpOfrUCWgDKuuovFceYEu3SPLCdFRGprA9cOk07kApVD2oIHCron49EOAANoFZw=="
        }
      },
      {
        "_index" : "flint_benchmark_bloom_filter_cardinality_64_adaptive_true_fpp_0.03_num_candidates_10",
        "_id" : "zotJYo4BM5f06Xzt-zKD",
        "_score" : 1.0,
        "_source" : {
          "value" : "AAAAAQAAAAUAAAB1AAAAAAAAIAAAAAAAAAAAAAAAAAAAABAAAAAAQAAAAAAIAAAAARAAAAAAAAAEQQEAAADEBAAAAAASAgAAAEAMAAAgAAAAAAAAAQBAAAABgAACAAIAAAAAAAAAAAAAAAAAAAEAAAAAABAAAAAgAIAAAACABIBAAAAAEEACAAAAAAAAUCEAIIAAAAAAEAAIAAQBAAAEAAAAAAACAABAADAAAAAoAABAAAAAAJAAAAAAAAAAAAACAAAAAAAAAAAAAAACAAAAAAAAAAAAAACAgBgAAEEAAIAAAigAAAACEAAAAAAAAAIAAAAQAAABAAAAAAAAAAAAACAAAACACAAAAAAAAAAAIAABAAAAAAAABAAUAAgIABAAAAAAAQAAAkBAAAAAABAgBIDAAACgQAgAAAABAQAAAAAQAAACAACAAAAAAAAIQAAAAAAAAAAAAQAAAAAEAAAFwAAICgAAECQABAAEAKAEAAAAAAADAAAAAAECABAAGAAjgEAAAAACAAAAQAQAAAQgQAAgAAAAAAAAAIAAABAAAAIAAAAAAABAAIAAAgAAAACAAAAAAAAQAAKAAAAAAAAAAAAAQBAAAAAIIAAAYAAAAAAAABACAAAAAAAAAAAAUAAAAAAgQQABAABAAAABCACAAAAAAEIAAEAIAAAAgCAABAAAAAgAAACAAAAAEAAIgACAgAAAAAAAAAABAAAAAAAAAAACCAAABABAAABACAAAQAAAAAAAAIAAABAABQAAACAAAgAAAAAAAAAAmCAAoAAAAAAAAAAIEEAAAAAAAABAABAAQACAAAAAACAAAAAAIAAAAgAAIAAAICABAgAAAAABAIAAAABAAAAAAAAAAAwQAA4AQAAAAgAgAAAAAAAAgACAAAAAIIAQAAAAAAAAAQAAAAAAAAAQAAAABAAJACQQAIAQAAAAAAAAAAAAAAAAAAAgAAAAgAgACAAAAAIBAEAAQAAAAAAgAAAAABAAQAAAAAAAAEAAAACAAAAAAAAAAAAAAAAAAAEAEAEAAQkAAAAAAAAAIAAAAAAAAAAAAAgAAAAAAAAAAAAAAAACAAAAAAAAIgAAAAEEAAAAAQAAAAQgAAAAAAAAAASQAAABAAAAAAAAAAAAAAAgABAAAQABAAAAIAIAAAIIAAAAQAAAAAAAQAgAAAAIAIAAgABAAAQBAAAACAAAAAACIQAAAAAAAAAAAAAAAEQAAAAAEIAAAAQAAAAQEACAIAAAAEAAAAAAIEABAAEA"
        }
      },
      {
        "_index" : "flint_benchmark_bloom_filter_cardinality_64_adaptive_true_fpp_0.03_num_candidates_15",
        "_id" : "0ItKYo4BM5f06XztJTJf",
        "_score" : 1.0,
        "_source" : {
          "value" : "AAAAAQAAAAUAAAB1AAAAAAAAIAAAAAAAAAAAAAAAAAAAABAAAAAAQAAAAAAIAAAAARAAAAAAAAAEQQEAAADEBAAAAAASAgAAAEAMAAAgAAAAAAAAAQBAAAABgAACAAIAAAAAAAAAAAAAAAAAAAEAAAAAABAAAAAgAIAAAACABIBAAAAAEEACAAAAAAAAUCEAIIAAAAAAEAAIAAQBAAAEAAAAAAACAABAADAAAAAoAABAAAAAAJAAAAAAAAAAAAACAAAAAAAAAAAAAAACAAAAAAAAAAAAAACAgBgAAEEAAIAAAigAAAACEAAAAAAAAAIAAAAQAAABAAAAAAAAAAAAACAAAACACAAAAAAAAAAAIAABAAAAAAAABAAUAAgIABAAAAAAAQAAAkBAAAAAABAgBIDAAACgQAgAAAABAQAAAAAQAAACAACAAAAAAAAIQAAAAAAAAAAAAQAAAAAEAAAFwAAICgAAECQABAAEAKAEAAAAAAADAAAAAAECABAAGAAjgEAAAAACAAAAQAQAAAQgQAAgAAAAAAAAAIAAABAAAAIAAAAAAABAAIAAAgAAAACAAAAAAAAQAAKAAAAAAAAAAAAAQBAAAAAIIAAAYAAAAAAAABACAAAAAAAAAAAAUAAAAAAgQQABAABAAAABCACAAAAAAEIAAEAIAAAAgCAABAAAAAgAAACAAAAAEAAIgACAgAAAAAAAAAABAAAAAAAAAAACCAAABABAAABACAAAQAAAAAAAAIAAABAABQAAACAAAgAAAAAAAAAAmCAAoAAAAAAAAAAIEEAAAAAAAABAABAAQACAAAAAACAAAAAAIAAAAgAAIAAAICABAgAAAAABAIAAAABAAAAAAAAAAAwQAA4AQAAAAgAgAAAAAAAAgACAAAAAIIAQAAAAAAAAAQAAAAAAAAAQAAAABAAJACQQAIAQAAAAAAAAAAAAAAAAAAAgAAAAgAgACAAAAAIBAEAAQAAAAAAgAAAAABAAQAAAAAAAAEAAAACAAAAAAAAAAAAAAAAAAAEAEAEAAQkAAAAAAAAAIAAAAAAAAAAAAAgAAAAAAAAAAAAAAAACAAAAAAAAIgAAAAEEAAAAAQAAAAQgAAAAAAAAAASQAAABAAAAAAAAAAAAAAAgABAAAQABAAAAIAIAAAIIAAAAQAAAAAAAQAgAAAAIAIAAgABAAAQBAAAACAAAAAACIQAAAAAAAAAAAAAAAEQAAAAAEIAAAAQAAAAQEACAIAAAAEAAAAAAIEABAAEA"
        }
      },
      {
        "_index" : "flint_benchmark_bloom_filter_cardinality_64_adaptive_true_fpp_0.03_num_candidates_5",
        "_id" : "z4tKYo4BM5f06XztFDKb",
        "_score" : 1.0,
        "_source" : {
          "value" : "AAAAAQAAAAUAAAB1AAAAAAAAIAAAAAAAAAAAAAAAAAAAABAAAAAAQAAAAAAIAAAAARAAAAAAAAAEQQEAAADEBAAAAAASAgAAAEAMAAAgAAAAAAAAAQBAAAABgAACAAIAAAAAAAAAAAAAAAAAAAEAAAAAABAAAAAgAIAAAACABIBAAAAAEEACAAAAAAAAUCEAIIAAAAAAEAAIAAQBAAAEAAAAAAACAABAADAAAAAoAABAAAAAAJAAAAAAAAAAAAACAAAAAAAAAAAAAAACAAAAAAAAAAAAAACAgBgAAEEAAIAAAigAAAACEAAAAAAAAAIAAAAQAAABAAAAAAAAAAAAACAAAACACAAAAAAAAAAAIAABAAAAAAAABAAUAAgIABAAAAAAAQAAAkBAAAAAABAgBIDAAACgQAgAAAABAQAAAAAQAAACAACAAAAAAAAIQAAAAAAAAAAAAQAAAAAEAAAFwAAICgAAECQABAAEAKAEAAAAAAADAAAAAAECABAAGAAjgEAAAAACAAAAQAQAAAQgQAAgAAAAAAAAAIAAABAAAAIAAAAAAABAAIAAAgAAAACAAAAAAAAQAAKAAAAAAAAAAAAAQBAAAAAIIAAAYAAAAAAAABACAAAAAAAAAAAAUAAAAAAgQQABAABAAAABCACAAAAAAEIAAEAIAAAAgCAABAAAAAgAAACAAAAAEAAIgACAgAAAAAAAAAABAAAAAAAAAAACCAAABABAAABACAAAQAAAAAAAAIAAABAABQAAACAAAgAAAAAAAAAAmCAAoAAAAAAAAAAIEEAAAAAAAABAABAAQACAAAAAACAAAAAAIAAAAgAAIAAAICABAgAAAAABAIAAAABAAAAAAAAAAAwQAA4AQAAAAgAgAAAAAAAAgACAAAAAIIAQAAAAAAAAAQAAAAAAAAAQAAAABAAJACQQAIAQAAAAAAAAAAAAAAAAAAAgAAAAgAgACAAAAAIBAEAAQAAAAAAgAAAAABAAQAAAAAAAAEAAAACAAAAAAAAAAAAAAAAAAAEAEAEAAQkAAAAAAAAAIAAAAAAAAAAAAAgAAAAAAAAAAAAAAAACAAAAAAAAIgAAAAEEAAAAAQAAAAQgAAAAAAAAAASQAAABAAAAAAAAAAAAAAAgABAAAQABAAAAIAIAAAIIAAAAQAAAAAAAQAgAAAAIAIAAgABAAAQBAAAACAAAAAACIQAAAAAAAAAAAAAAAEQAAAAAEIAAAAQAAAAQEACAIAAAAEAAAAAAIEABAAEA"
        }
      },
      {
        "_index" : "flint_benchmark_min_max_cardinality_64_default",
        "_id" : "xotIYo4BM5f06XztZzKc",
        "_score" : 1.0,
        "_source" : {
          "MinMax_value_0" : 1,
          "MinMax_value_1" : 64
        }
      },
      {
        "_index" : "flint_benchmark_min_max_cardinality_64_default",
        "_id" : "x4tIYo4BM5f06XztbDII",
        "_score" : 1.0,
        "_source" : {
          "MinMax_value_0" : 1,
          "MinMax_value_1" : 64
        }
      },
      {
        "_index" : "flint_benchmark_partition_cardinality_1_default",
        "_id" : "xYtIYo4BM5f06XztYjIx",
        "_score" : 1.0,
        "_source" : {
          "value" : 1
        }
      },
      {
        "_index" : "flint_benchmark_value_set_cardinality_64_max_size_100",
        "_id" : "yItIYo4BM5f06XztcTK3",
        "_score" : 1.0,
        "_source" : {
          "value" : [
            60,
            52,
            31,
            2,
            54,
            25,
            4,
            48,
            27,
            19,
            50,
            42,
            21,
            44,
            23,
            15,
            46,
            38,
            17,
            61,
            40,
            11,
            63,
            34,
            13,
            5,
            57,
            36,
            28,
            7,
            59,
            51,
            30,
            9,
            1,
            53,
            32,
            24,
            3,
            55,
            47,
            26,
            49,
            20,
            43,
            22,
            14,
            45,
            37,
            16,
            39,
            18,
            10,
            62,
            41,
            33,
            12,
            64,
            56,
            35,
            6,
            58,
            29,
            8
          ]
        }
      }
      ...
    ]
  }
}

Issues Resolved

opensearch-project/sql#1399

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dai-chen dai-chen added performance Make it fast! 0.3 labels Mar 20, 2024
@dai-chen dai-chen self-assigned this Mar 20, 2024
@dai-chen dai-chen force-pushed the add-skipping-index-benchmark-rebased branch from 679569a to 61c4dc4 Compare March 20, 2024 22:41
@dai-chen dai-chen marked this pull request as ready for review March 21, 2024 21:08
Comment on lines +52 to +92
OpenJDK 64-Bit Server VM 11.0.20+0 on Mac OS X 14.3.1
Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
Skipping Index Read 1000000 Rows with Cardinality 64: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------------------
Partition Read 54 65 9 0.0 54473389.0 1.0X
MinMax Read 57 65 8 0.0 56855820.0 1.0X
ValueSet Read (Default Size 100) 50 61 11 0.0 49529808.0 1.1X
ValueSet Read (Unlimited Size) 43 54 8 0.0 43301469.0 1.3X
BloomFilter Read (1M NDV) 2648 2733 60 0.0 2647662965.0 0.0X
BloomFilter Read (Optimal NDV) 2450 2484 24 0.0 2450135369.0 0.0X
Adaptive BloomFilter Read (Default 10 Candidates) 2441 2458 18 0.0 2441226280.0 0.0X
Adaptive BloomFilter Read (5 Candidates) 2451 2476 26 0.0 2450510244.0 0.0X
Adaptive BloomFilter Read (15 Candidates) 2397 2461 44 0.0 2397133383.0 0.0X

OpenJDK 64-Bit Server VM 11.0.20+0 on Mac OS X 14.3.1
Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
Skipping Index Read 1000000 Rows with Cardinality 2048: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
--------------------------------------------------------------------------------------------------------------------------------------
Partition Read 31 35 5 0.0 31101827.0 1.0X
MinMax Read 33 40 6 0.0 33385163.0 0.9X
ValueSet Read (Default Size 100) 30 37 6 0.0 30479810.0 1.0X
ValueSet Read (Unlimited Size) 31 37 6 0.0 31004587.0 1.0X
BloomFilter Read (1M NDV) 2477 2537 51 0.0 2477281890.0 0.0X
BloomFilter Read (Optimal NDV) 2408 2461 45 0.0 2408002056.0 0.0X
Adaptive BloomFilter Read (Default 10 Candidates) 2367 2413 43 0.0 2366950203.0 0.0X
Adaptive BloomFilter Read (5 Candidates) 2399 2429 26 0.0 2399147197.0 0.0X
Adaptive BloomFilter Read (15 Candidates) 2382 2421 34 0.0 2381512783.0 0.0X

OpenJDK 64-Bit Server VM 11.0.20+0 on Mac OS X 14.3.1
Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
Skipping Index Read 1000000 Rows with Cardinality 65536: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------------------
Partition Read 26 30 5 0.0 25781731.0 1.0X
MinMax Read 30 34 7 0.0 29514335.0 0.9X
ValueSet Read (Default Size 100) 27 34 6 0.0 27338628.0 0.9X
ValueSet Read (Unlimited Size) 39 45 6 0.0 39315292.0 0.7X
BloomFilter Read (1M NDV) 2374 2433 55 0.0 2373982609.0 0.0X
BloomFilter Read (Optimal NDV) 2354 2415 60 0.0 2354204521.0 0.0X
Adaptive BloomFilter Read (Default 10 Candidates) 2322 2407 51 0.0 2321669934.0 0.0X
Adaptive BloomFilter Read (5 Candidates) 2413 2465 44 0.0 2413487418.0 0.0X
Adaptive BloomFilter Read (15 Candidates) 2351 2401 36 0.0 2351322414.0 0.0X
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the intended result? Can we include some explanations for why the BF read takes so long?

Copy link
Collaborator Author

@dai-chen dai-chen Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because BF is translated to Painless script filtering which deserializes BF out of bytes and then do the membership check. Reading other data structures is simply translated to OpenSearch match query.

Skipping Index Write 1000000 Rows with Cardinality 65536: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
----------------------------------------------------------------------------------------------------------------------------------------
Partition Write 1304 1304 0 0.8 1304.1 1.0X
MinMax Write 1287 1287 0 0.8 1286.8 1.0X
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why BF latency is lower than MinMax?

@noCharger
Copy link
Collaborator

@dai-chen do we want to move this to 0.4?

@dai-chen dai-chen added 0.4 and removed 0.3 labels Apr 2, 2024
@dai-chen
Copy link
Collaborator Author

dai-chen commented Apr 2, 2024

@dai-chen do we want to move this to 0.4?

Moved to 0.4. Will try to address the comments this week. Thanks!

@dai-chen
Copy link
Collaborator Author

Will reopen and address the comments.

@dai-chen dai-chen closed this Apr 18, 2024
@dai-chen dai-chen added 0.5 and removed 0.4 labels May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.5 performance Make it fast!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants