Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve cryptography tests. #3979

Merged
merged 32 commits into from
Aug 20, 2024
Merged

Conversation

teo-tsirpanis
Copy link
Member

@teo-tsirpanis teo-tsirpanis commented Mar 20, 2023

SC-51030

Improved test coverage for the hash algorithms by using official test vectors.

Prerequisite for #3973.


TYPE: NO_HISTORY
DESC: Improve cryptography tests.

@@ -32,11 +32,15 @@

#ifdef _WIN32

#include "tiledb/sm/crypto/crypto_win32.h"
#include <windows.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NOMINMAX macro definition needs to accompany the inclusion of windows.h.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted.

#include "tiledb/sm/crypto/crypto_win32.h"
#include <windows.h>

#include <bcrypt.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need <windows.h>, or is this header alone sufficient?

Also, if there's an inclusion order dependency between external headers, that should be documented.

tiledb/sm/crypto/crypto_win32.h Outdated Show resolved Hide resolved
tiledb/sm/crypto/crypto_win32.h Outdated Show resolved Hide resolved
tiledb/sm/crypto/test/unit_crypto.cc Outdated Show resolved Hide resolved
// hex. The function is generic over the hash and the input type.
template <class Hash>
static void test_hash(
const uint8_t* input, uint64_t length, const std::string& expected_hash) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer span of separate character pointer and length.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean std::span? It in C++ 20. Until then I used std::string, to unify tests for SHA256 where the input is in hex and MD5 where it isn't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a replacement for span in our code base. It's available through common.h.

expected_hash);
}

struct MD5Hash {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a traits class and should be named accordingly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the implementation a bit. Instead of trait types the test function directly takes the hash function and its digest size as template arguments.

tiledb/sm/crypto/test/unit_crypto.cc Outdated Show resolved Hide resolved
tiledb/sm/crypto/test/unit_crypto.cc Outdated Show resolved Hide resolved
};

TEST_CASE("Crypto: Test MD5", "[crypto][md5]") {
auto test_md5 = [](const std::string input, const std::string expected_hash) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function definition is more or less repeated below. It would be appropriate to define a function template and to instantiate it in a using statement. using test_md5 = validate_argument_vs_return<MD5Hash>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My refactorings have reduced the code duplication.

Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it would be better to put the citations for where the test vectors come from up in the file-level documentation at the top. @section Test Plan can contain them. It should explain that the test plan is to verify the test vectors in the reference materials.

The test plan should also have argument validation, for example against nullptr. I'm not seeing those tests.

tiledb/sm/crypto/test/unit_crypto.cc Outdated Show resolved Hide resolved
@teo-tsirpanis
Copy link
Member Author

(force-pushed to update the branch)

@teo-tsirpanis
Copy link
Member Author

The test plan should also have argument validation, for example against nullptr. I'm not seeing those tests.

The hash and RNG implementations don't check for null. I don't think anything more than an assert(ptr != nullptr) in the code is warranted, these APIs are internal.

Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New code needs span. We have incorporated it into the global namespace in common.h. The implementation is a replacement for C++20 std::span. It's in our code base as part of an effort to write out occurrences of Buffer. Using string instead of span is hack that we don't need.

Switching to span and adding citations for the AES-GCM look like the only further changes we'll need.

FYI: We also want to replace Buffer with span in all the crypto functions. I think there's a story for that already.

}

Status Crypto::get_random_bytes(unsigned char* output, unsigned num_bytes) {
return PlatformCrypto::get_random_bytes(output, num_bytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new function should use span.

tiledb/sm/crypto/test/unit_crypto.cc Outdated Show resolved Hide resolved
*
* SHA-256 test vectors were taken from SHA256ShortMsg.rsp, from "SHA Test
* Vectors for Hashing Bit-Oriented Messages"
* (https://csrc.nist.gov/Projects/cryptographic-algorithm-validation-program/Secure-Hashing)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly what's needed. Citations allow auditing the code against published standards.

0);
auto test_md5 =
test_expected_hash_value<Crypto::md5, Crypto::MD5_DIGEST_BYTES>;
static const std::vector<std::pair<std::string, std::string>> test_cases{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string_view can be initialized with string literals and converts readily to span.

@teo-tsirpanis teo-tsirpanis mentioned this pull request May 5, 2023
2 tasks
@teo-tsirpanis teo-tsirpanis force-pushed the crypto-tests branch 2 times, most recently from 243fbb3 to 10ab36d Compare July 12, 2024 15:42
@teo-tsirpanis
Copy link
Member Author

CI is green, this is ready for review.

@teo-tsirpanis teo-tsirpanis marked this pull request as ready for review July 15, 2024 16:50
Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major: Why do we need get_random_bytes exposed. Doesn't seem like it has any upside.

Comment on lines 40 to 41
#this_target_link_libraries(OpenSSL::Crypto)
target_link_libraries(tiledb_crypto PRIVATE OpenSSL::Crypto)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're not going to use this_target_link_libraries, you need documentation explaining why.

Nit: indentation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed indentation. I don't know why we didn't use this_target_link_libraries, let me investigate…

(void)sizeof(tiledb::sm::Crypto);
return 0;
}
#define CATCH_CONFIG_MAIN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there's only one test file, and there are unlikely to ever be more than that one, this definition could just go in the test source file, which would allow this file to be eliminated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this file is not needed at all and the entry point is provided by the Catch2::Catch2WithMain target. I removed it.

@@ -158,6 +158,14 @@ class Crypto {
*/
static Status sha256(
const void* input, uint64_t input_read_size, Buffer* output);

/**
* Generates a number of cryptographically random bytes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single line of documentation for something acting as a PRNG is inadequate.

  • How's the IV initialized?
  • Where's the state held between calls?
  • Is the state global?
  • Is it thread safe?

As long as these functions were not exposed to the outside, these issues didn't matter as much. If you're going to run tests against them, it matter. And the one test that does anything would pass with poor RNGs such as CRC shift or linear congruential.

Better yet. Simplify this PR and eliminate this function and its do-almost-nothing tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also. We have a PRNG in the code that has all this documented. We don't need another, and we certainly do not want another that's poorly documented.

@@ -129,16 +143,6 @@ class Win32CNG {
uint64_t input_read_size,
Buffer* output,
LPCWSTR hash_algorithm);

private:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave it private. Or better yet, since it's static, take it out of the class and define it only in the .cc file.

Comment on lines 65 to 67
CHECK(Crypto::get_random_bytes(buf1).ok());
CHECK(Crypto::get_random_bytes(buf2).ok());
CHECK(buf1 != buf2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't do anything useful. What's it here for?

* Test that the given input (optionally in hex) has the expected hash value in
* hex. The function is generic over the hash and the input size.
*/
template <Status Hash(const void*, uint64_t, Buffer*), int Digest_Bytes>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sizes should be size_t unless they need to be compatible with existing code that (incorrectly) uses any other type for in-memory sizes. Here's that's Digest_Bytes.

Nit: template arguments should be lower case.

Copy link
Member Author

@teo-tsirpanis teo-tsirpanis Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions in the Crypto class use uint64_t, but changing the test code here to size_t works. Updated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the change to size_t due to macOS CI failures. It will go away soon either way when we use spans instead of pointer-size pairs.

This file is no longer necessary with Catch2 v3, which provides an entry point through the `Catch2::Catch2WithMain` target.
@teo-tsirpanis
Copy link
Member Author

Feedback addressed. I had added the get_random_bytes test as minimal coverage, but now removed it.

Switching it to `size_t` caused failures in macOS CI.
Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One remaining question about OpenSSL::Crypto in the object library specification.

#this_target_link_libraries(OpenSSL::Crypto)
target_link_libraries(tiledb_crypto PRIVATE OpenSSL::Crypto)
#this_target_link_libraries(OpenSSL::Crypto)
target_link_libraries(tiledb_crypto PRIVATE OpenSSL::Crypto)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the resolution on this? It hasn't been addressed from last time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that this_target_link_libraries links with PUBLIC visibility, which is undesirable in this case, because we use OpenSSL as an internal implementation detail. Changing it right now does not break anything, but it will have undesirable effects once the tiledb_crypto object library is incorporated to the main build (and the OpenSSL::Crypto target would be a requirement even for shared libraries).

I am going to remove the commented line, and add a comment on why we can't use this_target_link_libraries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

tiledb/sm/crypto/crypto_openssl.h Show resolved Hide resolved
@teo-tsirpanis
Copy link
Member Author

@eric-hughes-tiledb do you have any additional comments?

@teo-tsirpanis
Copy link
Member Author

@eric-hughes-tiledb, @KiterLuc is there anything left to do here? The last review comment was on a line that already existed before this PR.

Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@KiterLuc KiterLuc merged commit e51e6cd into TileDB-Inc:dev Aug 20, 2024
60 checks passed
@teo-tsirpanis teo-tsirpanis deleted the crypto-tests branch August 20, 2024 16:25
KiterLuc pushed a commit that referenced this pull request Aug 22, 2024
Fixes #5249.

Because the `<span>` header in MSVC does not include `<array>`, a
simultaneous merging of #5233 which uses `<span>` with #3979 which used
`std::array` in the Windows crypto code caused compile errors.

---
TYPE: NO_HISTORY
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants