Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Segmentation fault when properties exceed ~100kb #223

Open
emanueledomingo opened this issue Aug 27, 2024 · 6 comments
Open

[Bug] Segmentation fault when properties exceed ~100kb #223

emanueledomingo opened this issue Aug 27, 2024 · 6 comments

Comments

@emanueledomingo
Copy link

emanueledomingo commented Aug 27, 2024

Hi everyone, i'm running an event driven app and i use apache pulsar as backbone. I use message's properties to exchange metadata between the services.

I noted that, when the properties exceed ~100kb the client gives Segmentation fault (core dumped) (sometimes corrupted double-linked list instead)

I haven't done a proper test on the size, so take the size as a guideline

I was able to simulate the behaviour in a notebook:

import pulsar
import sys
import json

# need to serialize to json nested object because client accepts properties only
# of type (self: _pulsar.MessageBuilder, arg0: str, arg1: str)
nested_dict = json.dumps({
    "foo": "bar"
})

big_dict = {
    f"key{i}": nested_dict for i in range(3000)
}

print(sys.getsizeof(big_dict))  # 103856

client = pulsar.Client('pulsar://localhost:6650')
producer = client.create_producer('test-topic')

producer.send(content="test-message".encode(), properties=big_dict)  # segfault

Unfortunately i don't have details in the stacktrace:

In [4]: producer.send(content="test-message".encode(), properties=big_dict)
Segmentation fault (core dumped)

Pulsar client version is 3.5.0.

@lhotari lhotari transferred this issue from apache/pulsar-client-python Aug 27, 2024
@lhotari
Copy link
Member

lhotari commented Aug 27, 2024

🙈 I made a mistake in moving this issue. I first thought that it was in the apache/pulsar repository and transferred it to pulsar-client-cpp. I'm sorry about that. Perhaps this is pulsar-client-cpp related in any case.

@lhotari lhotari changed the title Segmentation fault when properties exceed ~100kb [Bug] Segmentation fault when properties exceed ~100kb Aug 27, 2024
@emanueledomingo
Copy link
Author

Ok, thanks! I also think it has to do with the core c++ implementation. But as a python user I wasn't sure.

@BewareMyPower
Copy link
Contributor

Yeah, it's related to the C++ core. And it's a known issue. It might be easy to fix. I will take some time for it when I'm free.

@BewareMyPower
Copy link
Contributor

I just double checked the issue and it looks like a bug with protobuf:

apache/pulsar-client-cpp#1  0x0000ffffb1a58d2c in pulsar::proto::KeyValue::_InternalSerialize(unsigned char*, google::protobuf::io::EpsCopyOutputStream*) const () from /usr/local/lib/python3.10/dist-packages/pulsar_client.libs/libpulsar-8dc83e12.so
apache/pulsar-client-cpp#2  0x0000ffffb1cc50e0 in google::protobuf::internal::WireFormatLite::InternalWriteMessage(int, google::protobuf::MessageLite const&, int, unsigned char*, google::protobuf::io::EpsCopyOutputStream*) ()
   from /usr/local/lib/python3.10/dist-packages/pulsar_client.libs/libpulsar-8dc83e12.so
apache/pulsar-client-cpp#3  0x0000ffffb1a5b718 in pulsar::proto::MessageMetadata::_InternalSerialize(unsigned char*, google::protobuf::io::EpsCopyOutputStream*) const () from /usr/local/lib/python3.10/dist-packages/pulsar_client.libs/libpulsar-8dc83e12.so
apache/pulsar-client-cpp#4  0x0000ffffb1c96478 in google::protobuf::SerializeToArrayImpl(google::protobuf::MessageLite const&, unsigned char*, int) () from /usr/local/lib/python3.10/dist-packages/pulsar_client.libs/libpulsar-8dc83e12.so
apache/pulsar-client-cpp#5  0x0000ffffb1c94e98 in google::protobuf::MessageLite::SerializePartialToArray(void*, int) const () from /usr/local/lib/python3.10/dist-packages/pulsar_client.libs/libpulsar-8dc83e12.so
apache/pulsar-client-cpp#6  0x0000ffffb1c94d50 in google::protobuf::MessageLite::SerializeToArray(void*, int) const () from /usr/local/lib/python3.10/dist-packages/pulsar_client.libs/libpulsar-8dc83e12.so
apache/pulsar-client-cpp#7  0x0000ffffb1920704 in pulsar::Commands::newSend(pulsar::SharedBuffer&, pulsar::proto::BaseCommand&, pulsar::Commands::ChecksumType, pulsar::SendArguments const&) ()
   from /usr/local/lib/python3.10/dist-packages/pulsar_client.libs/libpulsar-8dc83e12.so

I tried the python client on macOS and it does not have this error but encountered the same issue for Ubuntu.

However, I tried the C++ client 3.5.1 on Ubuntu and this error does not happen:

#include <pulsar/Client.h>
using namespace pulsar;

int main() {
    const std::string topic = "test-topic";
    Client client("pulsar://host.docker.internal:6650");
    Producer producer;
    client.createProducer(topic, producer);

    MessageBuilder::StringMap properties;
    for (int i = 0; i < 3000; i++) {
        properties["key" + std::to_string(i)] = "{\"foo\": \"bar\"}";
    }
    auto msg = MessageBuilder().setProperties(properties).setContent("test-message").build();
    producer.send(msg);

    client.close();
}

Increasing the i in loop does not fail with segfault as well. (if the properties are too big, ResultMessageTooLarge will be returned)

It might be an issue from the Python client side. So let me move it to pulsar-client-python first.

@BewareMyPower BewareMyPower transferred this issue from apache/pulsar-client-cpp Aug 29, 2024
@BewareMyPower
Copy link
Contributor

BewareMyPower commented Aug 29, 2024

Oh I realized where the issue is, there is a bug with the C++ client when batching is disabled. However, the Python client somehow disables batching by default.

You can manually enable the batching to get the issue around, while I'm going to push a fix at the C++ client.

I also pushed a PR (#224) to enable batching by default.

BewareMyPower added a commit to BewareMyPower/pulsar-client-cpp that referenced this issue Aug 29, 2024
…ze exceeds 64KB

See apache/pulsar-client-python#223

### Motivation

Currently a shared buffer is used to store serialized message metadata
for each send request. However, its capacity is only 64KB, when the metadata
size exceeds 64KB, buffer overflow could happen.

### Modifications

When the metadata size is too large, allocate a new buffer instead of
using the shared buffer. Add `testLargeProperties` to cover it.
shibd pushed a commit to apache/pulsar-client-cpp that referenced this issue Aug 29, 2024
…ze exceeds 64KB (#443)

See apache/pulsar-client-python#223

### Motivation

Currently a shared buffer is used to store serialized message metadata
for each send request. However, its capacity is only 64KB, when the metadata
size exceeds 64KB, buffer overflow could happen.

### Modifications

When the metadata size is too large, allocate a new buffer instead of
using the shared buffer. Add `testLargeProperties` to cover it.
@emanueledomingo
Copy link
Author

I confirm that by enabling batching, the bug does not appear. Thanks for the workaround suggestion!

shibd pushed a commit to apache/pulsar-client-cpp that referenced this issue Aug 29, 2024
…ze exceeds 64KB (#443)

See apache/pulsar-client-python#223

### Motivation

Currently a shared buffer is used to store serialized message metadata
for each send request. However, its capacity is only 64KB, when the metadata
size exceeds 64KB, buffer overflow could happen.

### Modifications

When the metadata size is too large, allocate a new buffer instead of
using the shared buffer. Add `testLargeProperties` to cover it.

(cherry picked from commit 8f269e8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants