Log Serializer Task Metrics Granularity #1569

jkosh44 · 2021-04-27T19:55:56Z

Discussion

The following code is responsible for tracking metrics in the LogSerializerTask

noisepage/src/storage/write_ahead_log/log_serializer_task.cpp

Lines 21 to 67 in 97eb7ec

    
           uint64_t num_bytes = 0, num_records = 0, num_txns = 0; 
        
           // Initialize whether to collect metrics outside of the spin loop so as not to count each loop iteration as a sample 
        
           // (by calling ComponentToRecord this increments the sample count) 
        
           bool logging_metrics_enabled = 
        
               common::thread_context.metrics_store_ != nullptr && 
        
               common::thread_context.metrics_store_->ComponentToRecord(metrics::MetricsComponent::LOGGING); 
        
           do { 
        
             if (logging_metrics_enabled && !common::thread_context.resource_tracker_.IsRunning()) { 
        
               // start the operating unit resource tracker 
        
               common::thread_context.resource_tracker_.Start(); 
        
             } 
        
             // Serializing is now on the "critical txn path" because txns wait to commit until their logs are serialized. Thus, 
        
             // a sleep is not fast enough. We perform exponential back-off, doubling the sleep duration if we don't process any 
        
             // buffers in our call to Process. Calls to Process will process as long as new buffers are available. We only 
        
             // sleep as part of this exponential backoff when there are logs that need to be processed and we wake up when there 
        
             // are new logs to be processed. 
        
             if (empty_) { 
        
               std::unique_lock<std::mutex> guard(flush_queue_latch_); 
        
               sleeping_ = true; 
        
               flush_queue_cv_.wait_for(guard, curr_sleep); 
        
               sleeping_ = false; 
        
             } 
        
             // If Process did not find any new buffers, we perform exponential back-off to reduce our rate of polling for new 
        
             // buffers. We cap the maximum back-off, since in the case of large gaps of no txns, we don't want to unboundedly 
        
             // sleep 
        
             std::tie(num_bytes, num_records, num_txns) = Process(); 
        
             curr_sleep = std::min(num_records > 0 ? serialization_interval_ : curr_sleep * 2, max_sleep); 
        
             if (num_records > 0) { 
        
               if (common::thread_context.resource_tracker_.IsRunning()) { 
        
                 // Stop the resource tracker for this operating unit 
        
                 common::thread_context.resource_tracker_.Stop(); 
        
                 auto &resource_metrics = common::thread_context.resource_tracker_.GetMetrics(); 
        
                 common::thread_context.metrics_store_->RecordSerializerData(num_bytes, num_records, num_txns, 
        
                                                                             serialization_interval_.count(), resource_metrics); 
        
               } 
        
               num_bytes = num_records = num_txns = 0; 
        
               // Update whether to collect metrics only if we did work (starting a new event) so as not to count each loop 
        
               // iteration as a sample (by calling ComponentToRecord this increments the sample count) 
        
               logging_metrics_enabled = 
        
                   common::thread_context.metrics_store_ != nullptr && 
        
                   common::thread_context.metrics_store_->ComponentToRecord(metrics::MetricsComponent::LOGGING); 
        
             } 
        
           } while (run_task_);

Process() Has its own loop, which continuously loops as long as there are more log records to serialize. That means that Process() doesn't return until there's a break in log records. The two consequences of this are

If there's a steady stream of logs then you can end up with a very long interval that contains a lot of log records.
While there's a steady stream of logs then you won't get any metrics until there's a break in the logs.

So if you're running a workload with a lot of small transactions, you may end up with a single interval that contains the metrics of the entire workload.

As an example here are some metrics from running 3000 transactions where each transaction is a single insert.
logs.csv
That last interval lasts for roughly 17 minutes

I'm not actually familiar enough with the metrics and their usages to know if this is an issue or if this is ok.

Solution

If this is an issue, then below are two potential solutions though feel free to suggest others.

There was some brief discussion earlier about removing the while(true) loop from Process(). It seems a bit unnecessary that both Process() and LogSerializerTaskLoop() continuously loop. Though it's possible that there's a good reason for the internal while loop and I haven't looked closely enough.
We can push all of the metrics code into Process() itself, that way each iteration of the loop in Process() is a single collection interval.

The text was updated successfully, but these errors were encountered:

jkosh44 added the question/discussion Further information is requested. label Apr 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log Serializer Task Metrics Granularity #1569

Log Serializer Task Metrics Granularity #1569

jkosh44 commented Apr 27, 2021

Log Serializer Task Metrics Granularity #1569

Log Serializer Task Metrics Granularity #1569

Comments

jkosh44 commented Apr 27, 2021

Discussion

Solution