-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[posproal] Task Resource Isolation #71
Comments
Specifically how to achieve back pressure, can you expand it in detail? |
The persistence of the site can depend on the upstream Event BUS |
How is isolation achieved here? Prevent some tasks from taking up too many resources, so that some tasks have not been processed? |
first, In the code segment of the above producer-consumer mode, it shows how to set up the logic of blocking queues for data synchronization tasks of different business lines. I will refer to some event-driven products such as eventmesh, and compare and analyze how to introduce this feature is more appropriate |
we are not only consider EventTargetPush module, but also EventBusListener, EventRuleTransfer to deal with back pressure problem. Try to stand on the consumer point, design the water line proposal for its producer |
At present, I am still in the familiarization stage of eventbridge. Next week, I will carefully familiarize myself with the details of the project and discuss it again. |
The formula for calculating the current production and consumption speed is as follows:
I will coordinate whether the upstream and downstream ConnectRecord records are refreshed according to the speed of the two |
Do not post too much code in an issue; it is hard to read. For example, in cases like ThreadPoolTaskExecutor, consider extending ThreadPoolExecutor instead of delegating it. Some getter and setter methods have meaningless comments. There are methods to achieve back pressure, depending on how the producer delivers records to the consumer: When the producer communicates with the consumer through a queue, you can use a fixed queue size to achieve this. When the producer cannot push to the queue, it immediately goes into idle mode. There should be a common specification for this everywhere. If the producer communicates with the consumer through some RPC mechanism, you can let the consumer poll from the producer's delivery buffer. This works well in a P2P mode, but once you have multiple consumers for one producer, you must fix the delivery buffer size and react to it, which means applying back pressure to the producer upstream. |
This is only the internal implementation within a program. If we are talking about back pressure between multiple applications, I believe there are two ways: PUSH Mode: In this mode, the consumer explicitly returns an ACK message. The producer delivers messages in an "ack-pre-record" manner. To improve throughput, messages can be sent in batches, and the ACK returned would be the UUID at the end of the batch. When the producer receives this ACK, it considers the preceding messages as consumed. POLL Mode: In this mode, the consumer actively pulls data from the producer. It's evident that the traffic will never exceed the processing limit of the consumer. |
Hi, community. I am a developer who is new to rocketmq-eventbridge. I have some opinions on resource consumption in event push. The following is the draft I designed. Can you give me some input and supplements for application scenarios?
document: https://shimo.im/docs/25q5M9QVB8i80XqD/
Motivation
Due to the large amount of data in the current process of pushing from the event source to the event target, it may be necessary to design a set of logic for reading event source data and multi-threaded push event target logic to meet the concurrency under a large amount of data consumption, and observability during backpressure.
Design
The production end continuously obtains source data and puts it into the blocking queue. BlockQueue sets a length limit. If the consumer end cannot extract the content in time, it will block the production end and cannot continue to store data in it until there is free space. Judging the production end The end flag is, and the last acquired data is null, indicating that there is no new source data.
The consumer side will extract the source data from the blockQueue for consumption. Of course, most of the time-consuming logic is in the specific processing details. It will not take too long to fetch the data itself, so the consumer side also adopts a single-threaded system. After fetching the source data , will be thrown to a thread pool executor ThreadPoolTaskExecutor, which will control the execution of specific tasks. ThreadPoolTaskExecutor also has a task queue. In order to prevent the queue from being too long and bursting the memory, there is an upper limit control. If it is less than the limit, the consumer thread will submit tasks to ThreadPoolTaskExecutor
Then a task manager is needed to manage the production and consumption logic of EventTargetPusher. And create corresponding blocking queues for different kinds of tasks (the tasks here can be high, medium and low priority)
TaskControlManager
Datautils
Custom thread pool ThreadPoolTaskExecutor
When the thread pool occupancy rate is relatively high, you can set tasks for the corresponding blocking queues and monitor the progress of push tasks
advantage:
Disadvantage: The message is stored in the cache, if there is network jitter or the producer hangs up. Then the last successfully consumed message cannot be saved. Can't restore progress either
Secondly, when the task traffic is too large, or the delay is high. Back pressure can not be better resolved
Response generation: large amount of data or high latency
Backpressure solution in stream processing (take Flink as an example): Flink is mainly composed of two major components, operators and streams, at runtime. Each operator consumes an intermediate stream, performs transformations on the stream, and generates a new stream. In Flink, these logical flows are like distributed blocking queues, and the queue capacity is realized through the buffer pool (LocalBufferPool). Each stream that is produced and consumed is assigned a buffer pool. The buffer pool manages a group of buffers (Buffer), which can be recycled after being consumed.
The figure above shows the data transfer between two tasks:
So can we use this as an idea to transform the first solution?
Now that we're ready to refactor EventTargetPusher into a producer-consumer model. Then there is bound to be a batching process for production and consumption Tasks. You only need to set a certain threshold for the batched record collection, and after reaching the flush to the downstream event target, you can control the flushing timing of each type of task. In this way, the blocking problem caused by disk brushing based on record records can be avoided.
Let us take the scenario of obtaining the interactive query results of the select statement in the flink-sql-gateway project as an example
flink-sql-gateway:
https://github.com/ververica/flink-sql-gateway.git
In the following code segment, the startRetrieval function connects to the master process query through SocketStreamIterator.
And return the result asynchronously. Here startRetrieval corresponds to the production logic of the EventTargetPusher , and the ResultRetrievalThread thread corresponds to the consumption logic. We can set changeRecordBuffer in memory and set the size of maxBufferSize, and automatically push it to the downstream event target after reaching the water level
Since the above does not consider the downtime of the producer or consumer machine, perhaps we can persist the task processing progress of the first solution and the information of the current consumption topic when an exception is thrown. The current consumption topic information is as follows:
When starting the task next time, start hui'fu from the saved last failed task information
The text was updated successfully, but these errors were encountered: