Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eliminating Warnings #64

Merged
merged 16 commits into from
Dec 13, 2024
Merged

eliminating Warnings #64

merged 16 commits into from
Dec 13, 2024

Conversation

oschonrock
Copy link
Contributor

turning on -Wconversion and -Wsign-conversion

putting these 2 options into Makefile (but not CMakeLists.txt) and producing a silent compile

see individual commits for details

There were some minor changes in functions parameter and return types, which could do with checking.

All tests pass.

binaryfusefilter.h only

issues addressed

1. changed some return value and parameter types of (static) functions
-- PLEASE CHECK THIS IN REVIEW

2. sprinkled 'U' into bitwise operations to silence warnings

3. casting to avoid "standard integer promotion rules" which resulted
in signedness warnings

4. explicitly reducing results to the target type rather than letting
it happen implicitly

tests still passing
binaryfusefilter.h only

issues addressed

1. changed some return value and parameter types of (static) functions
-- PLEASE CHECK THIS IN REVIEW

2. sprinkled 'U' into bitwise operations to silence warnings

3. casting to avoid "standard integer promotion rules" which resulted
in signedness warnings

4. explicitly reducing results to the target type rather than letting
it happen implicitly

5. when and `if` statements ends in break or return, then a following
`else if` can be just a new `if`

tests still passing
mostly casting size_t down to uint32_t - maybe some internal struct
types should have been size_t?

also some integer promotion casts
mostly casting blocklengt to uint32_t to fit into keyindex.index

should keyindex.index be a size_t?
very repetitive casting of mainly sizes and doubles.
with -Wconversion and -Wsign-conversion

so putting these in the Makefile, so during "private" development with
the Makefile new warnings will be noticed straight away

but not in CMakeLists.txt, because as this is a header-only INTERFACE
library, it would force these warning levels on the users.
turned up these additional 'U' tweaks
@oschonrock
Copy link
Contributor Author

this resolves #62

@thomasmueller
Copy link
Contributor

The changes make the code harder to read, unfortunately. I wonder if it would be possible to change the type of the variables in at least some of the places, so that less casting is needed?

@oschonrock
Copy link
Contributor Author

oschonrock commented Dec 12, 2024

Yes, I agree.

The casts mostly indicate, where there are problems with the types. So I do think that having them there, as a first step, is helpful. They act as a flag: "Problem here". The casts are happening anyway, just in an "uncontrolled and invisible way". (which is why it is considered best practice to turn these warnings on)

I have already changed some types (as per comments above), of function params and return results. More can be done for sure. But in some cases it touches the core structs I believe, so I did not want to just go ahead and change that. If you want me to go ahead and make more wholesale changes I can easily do that.

Specifically:

  1. The single most common problem is with size_t vs uint32_t... For reasons I don't understand some quantities, which I think are related to the number of keys, use uint32_t. IMO, these should not be bounded by 4 billion, they should all just be size_t, unless there is some deeper algorithmic reason, which I do not understand. Same goes for everything related to malloc, or are we trying to support 32bit systems in some way? The sizeof(structs) should not be a factor IMO, the bulk of the data is in the Fingerprints anyway. -- This change alone would get rid of more than half the casts.

  2. The 'U's are unobtrusive IMO, so they can stay.

  3. If more unsigned int's were used some more could eliminated. In this kind of code, almost nothing should be signed? And using int creates a lot of casting problems.

  4. the "integer promotion casts"... that's just a "feature" of C unfortunately, but I think it is still important to be aware that these promotions are happening. In some cases these will also reduce when we do the other things above. It also helps to just not use uint16_t and uint8_t for all the calculations, the compiler will just "promote them anyway". Only reduce to uint8|16_t when we are ready for storage. That will replace a whole class of casts with just one at the end, and will also be clearer.

  5. The "Maths casts"... specifically in unit.c .. are very repetitive. I actually thought that some helper functions might be able to abstract away some of the repetitive nature of this code.. I know it's unit tests, but having sth very similar to the below in every test does not add anything IMO (note: I have not deeply investigated, how easily this can be pulled into a common "test_and_print_report" function):

  size_t random_matches = 0;
  size_t trials = 10000000;
  for (size_t i = 0; i < trials; i++) {
    uint64_t random_key = ((uint64_t)rand() << 32U) + (uint64_t)rand();
    if (xor8_contain(random_key, &filter)) {
      if (random_key >= size) {
        random_matches++;
      }
    }
  }
  double fpp = (double)random_matches * 1.0 / (double)trials;
  printf(" fpp %3.5f (estimated) \n", fpp);
  double bpe = (double)xor16_size_in_bytes(&filter) * 8.0 / (double)size;
  printf(" bits per entry %3.2f\n", bpe);
  printf(" bits per entry %3.2f (theoretical lower bound)\n", - log(fpp)/log(2));
  printf(" efficiency ratio %3.3f \n", bpe /(- log(fpp)/log(2)));

If it is abstracted away, we can cast once or maybe get rid of casts altogether.

Overall I would say that this process, in my experience, while a little messy initially, if you haven't had warning turned on since beginning, will almost always result in clearer and more robust code, when all the types get sorted out.

@oschonrock
Copy link
Contributor Author

oschonrock commented Dec 13, 2024

Alternative:

If you decide that you don't want to go on this journey to improve the code/types by enabling -Wconversion and -Wsign-conversion, then one alternative is to offer a "non-header-only" version.

Right now, because the library is header only, if a downstream user wants to follow best practices and enable those warnings, then they will be hit with a wall of warnings from xor_singleheader. This makes it very difficult to keep on top of the resulting warnings in their own code.

If they can use a "separately compiled" version of the library with a minimal interface header file, then the compile options for the library shared objects can be independently controlled in CMakeLists.txt.

My personal sense/opinion is that the trend of "header_only" is coming to an end, because the community is improving its processes of integrating external dependencies. cmake, conan, vcpkg, etc are having an impact. Personally if I see that a project offers a simple add_sudirectory() way of including it via cmake, I will be happier to use that than a "just include this single header", because it's never quite as easy as that.. compile times, warnings, paths... etc...

all sections were indentical except for the call to *contain()
and *size_in_bytes

some void* and function pointer juggling allowed to make this generic

report code reduced by 2/3rds
for all but the special "failure rate test"

the large function dispatch table is a litle annoying, but can be
removed as well...TBC

tests all pass
@lemire lemire changed the title elimintaing Warnings eliminating Warnings Dec 13, 2024
@lemire
Copy link
Member

lemire commented Dec 13, 2024

@oschonrock

I'll merge this.

I agree with @thomasmueller but that's an internal issue that can be improved later.

@lemire lemire merged commit 890b31d into FastFilter:master Dec 13, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants