Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keeping bloat under control #280

Open
psiha opened this issue Sep 2, 2024 · 1 comment
Open

Keeping bloat under control #280

psiha opened this issue Sep 2, 2024 · 1 comment

Comments

@psiha
Copy link

psiha commented Sep 2, 2024

Please reconsider the possible overuse of forceinlining. For example for unordered_flat_set insert() is fully inlined at each callsite (even with rather complex non-default hashers) except for the unchecked_emplace_with_rehash() part.
Considering that unordered containers are complex objects that themselves can never reside in registers I'm failing to see how this forceinlining helps anything.
The only thing I see is the common case when the objects contained are trivial objects that can be passed and returned in registers (as is my use case) - then the generic Args&&... signature would unnecessarily cause them to be passed through memory (ignoring the lucky case you get SRA optimization to kick in) - for this I'd rather some kind of modernized boost::call_traits-like mechanism be used.

On a related note - consider skipping double key construction from ...args - rather move the key (constructed with key_from at the beginning of emplace_impl) and peel of ...args used for key construction before forwarding them to unchecked_emplace*.

Also could you not extract the construction from unchecked_emplace_at() and unchecked_emplace_with_rehash() - make them only allocate the space/positon and simply do an inplace construction at one place at the end of emplace_impl()? (that way you wouldn't have to forward Args&& around at all)

@joaquintides
Copy link
Member

joaquintides commented Sep 3, 2024

Please reconsider the possible overuse of forceinlining. For example for unordered_flat_set insert() is fully inlined at each callsite (even with rather complex non-default hashers) except for the unchecked_emplace_with_rehash() part.
Considering that unordered containers are complex objects that themselves can never reside in registers I'm failing to see how this forceinlining helps anything.

The fact that the containers are complex objects much larger than a register does not have anything to do with inlining their member functions (such as insert). The container itself (i.e. *this) will be passed/used by reference, it's the implementation code of insert that gets inlined inside the caller's code. We forced inlining because it actually improved performance for some compilers that didn't inline automatically in some cases.

The only thing I see is the common case when the objects contained are trivial objects that can be passed and returned in registers (as is my use case) - then the generic Args&&... signature would unnecessarily cause them to be passed through memory (ignoring the lucky case you get SRA optimization to kick in) - for this I'd rather some kind of modernized boost::call_traits-like mechanism be used.

Usually, the compiler is smart enough to decide whether the arguments of an inlined function are treated as references or passed around as values, regardless of what the function signature says --this is one of the benefits of inlining. Consider this example: when no optimizations are applied, foo and bar are indeed different (the former accepts its arg by reference), but with -O3 all differences disappear.

On a related note - consider skipping double key construction from ...args - rather move the key (constructed with key_from at the beginning of emplace_impl) and peel of ...args used for key construction before forwarding them to unchecked_emplace*.

The key is not constructed twice from args ever, although an examination of emplace_impl in isolation may suggest otherwise. The critical part is in the implementation of emplace, the entry function that later calls emplace_impl:

template<typename... Args>
BOOST_FORCEINLINE std::pair<iterator,bool> emplace(Args&&... args)
{
alloc_cted_insert_type<type_policy,Allocator,Args...> x(
this->al(),std::forward<Args>(args)...);
return emplace_impl(type_policy::move(x.value()));
}
/* Optimization for value_type and init_type, to avoid constructing twice */
template <typename T>
BOOST_FORCEINLINE typename std::enable_if<
detail::is_similar_to_any<T, value_type, init_type>::value,
std::pair<iterator, bool> >::type
emplace(T&& x)
{
return emplace_impl(std::forward<T>(x));
}

In the first case, an object x is constructed from args that holds the key, and this key is moved to its final destination upon successful insertion. If emplace is passed a key directly (second case), then we omit any kind of construction and pass the key along.

Also could you not extract the construction from unchecked_emplace_at() and unchecked_emplace_with_rehash() - make them only allocate the space/positon and simply do an inplace construction at one place at the end of emplace_impl()? (that way you wouldn't have to forward Args&& around at all)

This is in principle doable, but:

  • It'd probably get us nothing in terms of performance or binary size.
  • It'd complicate things extraordinarily if construction fails (i.e. it throws an exception), as we'd need to undo all the insertion work; by contrast, unchecked_emplace_at and unchecked_emplace_with_rehash call nosize_unchecked_emplace_at, which only makes permanent changes to the data structure after construction is successful:

template<typename... Args>
locator nosize_unchecked_emplace_at(
const arrays_type& arrays_,std::size_t pos0,std::size_t hash,
Args&&... args)
{
for(prober pb(pos0);;pb.next(arrays_.groups_size_mask)){
auto pos=pb.get();
auto pg=arrays_.groups()+pos;
auto mask=pg->match_available();
if(BOOST_LIKELY(mask!=0)){
auto n=unchecked_countr_zero(mask);
auto p=arrays_.elements()+pos*N+n;
construct_element(p,std::forward<Args>(args)...);
pg->set(n,hash);
BOOST_UNORDERED_ADD_STATS(cstats.insertion,(pb.length()));
return {pg,n,p};
}
else pg->mark_overflow(hash);
}
}
};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants