Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1679281: High CPU Utilization and Event Loop Blockage Due to Synchronous Code Leading to CPU Blockage For large Files in PUT Command #917

Closed
SushmaPydichetti opened this issue Sep 24, 2024 · 6 comments
Assignees
Labels
enhancement The issue is a request for improvement or a new feature status-triage_done Initial triage done, will be further handled by the driver team

Comments

@SushmaPydichetti
Copy link

SushmaPydichetti commented Sep 24, 2024

Please answer these questions before submitting your issue.
In order to accurately debug the issue this information is required. Thanks!

  1. What version of NodeJS driver are you using?
    1.9.3

  2. What operating system and processor architecture are you using?
    MacOs arm64

  3. What version of NodeJS are you using?
    (node --version and npm --version)
    node : 18.12.1 , npm: 8.19.2

  4. What are the component versions in the environment (npm list)?

5.Server version:* E.g. 1.90.1
You may get the server version by running a query:

SELECT CURRENT_VERSION();
  1. What did you do?
    During the execution of file uploads via the put command, there are repeated instances where the event loop was blocked for over 7 seconds, leading to CPU spikes and unresponsiveness. This was identified as happening during the compression and upload stages, particularly involving synchronous code in the file transfer process.
    File details: 2.5GB CSV file, each row size ~6KB (random data, In the response of put command the compressed file size ~1.8GB)
    PUT file://${yourFilePath} @~ AUTO_COMPRESS=TRUE; Running this query
    To Reproduce the issue:
    Use the Snowflake SDK put command to upload a large file (e.g., 2.5 GB CSV).
    Observe the CPU utilization and event loop blockage during the process, especially in the file compression and transfer phases.

  2. What did you expect to see?
    we suspect the synchronous code is causing this issue

    What should have happened and what happened instead?
    Other observations

  • Synchronous Code in File Uploads: Using synchronous methods that significantly block the CPU during execution, particularly in the file transfer process. (ref1, ref2, ref3)
  • Long Event Loop Blockages: Several occurrences of event loop blockages for 7+ seconds were recorded. For example:
Event loop was blocked for 7372.725958007813ms by Snowflake SDK operation
use blocked-at module for this observation
  '    at AsyncHook.init (/Users/sushmapydichetti/Documents/io/data-warehouse-adaptor/node_modules/blocked-at/index.js:31:11)',
  '    at emitInitNative (node:internal/async_hooks:202:43)',
  '    at emitInitScript (node:internal/async_hooks:505:3)',
  '    at promiseInitHook (node:internal/async_hooks:324:3)',
  '    at promiseInitHookWithDestroyTracking (node:internal/async_hooks:328:3)',
  '    at FileUtil.getDigestAndSizeForFile (/Users/sushmapydichetti/Documents/io/data-warehouse-adaptor/node_modules/snowflake-sdk/lib/file_transfer_agent/file_util.js:113:11)',
  '    at uploadOneFile (/Users/sushmapydichetti/Documents/io/data-warehouse-adaptor/node_modules/snowflake-sdk/lib/file_transfer_agent/file_transfer_agent.js:362:46)',
  '    at async uploadFilesinSequential (/Users/sushmapydichetti/Documents/io/data-warehouse-adaptor/node_modules/snowflake-sdk/lib/file_transfer_agent/file_transfer_agent.js:327:22)',
  '    at async upload (/Users/sushmapydichetti/Documents/io/data-warehouse-adaptor/node_modules/snowflake-sdk/lib/file_transfer_agent/file_transfer_agent.js:311:7)',
  '    at async FileTransferAgent.execute (/Users/sushmapydichetti/Documents/io/data-warehouse-adaptor/node_modules/snowflake-sdk/lib/file_transfer_agent/file_transfer_agent.js:193:9)',
  '    at async FileStatementPreExec.context.onStatementRequestSucc (/Users/sushmapydichetti/Documents/io/data-warehouse-adaptor/node_modules/snowflake-sdk/lib/connection/statement.js:805:5)',
  '    at async BaseStatement.context.onStatementRequestComp (/Users/sushmapydichetti/Documents/io/data-warehouse-adaptor/node_modules/snowflake-sdk/lib/connection/statement.js:583:7)',
  '    at async callback (/Users/sushmapydichetti/Documents/io/data-warehouse-adaptor/node_modules/snowflake-sdk/lib/connection/statement.js:1538:9)',
  '    at async options.callback (/Users/sushmapydichetti/Documents/io/data-warehouse-adaptor/node_modules/snowflake-sdk/lib/connection/statement.js:1492:7)',
  '    at async options.callback (/Users/sushmapydichetti/Documents/io/data-warehouse-adaptor/node_modules/snowflake-sdk/lib/services/sf.js:1254:9)',
  '    at async Object.callback (/Users/sushmapydichetti/Documents/io/data-warehouse-adaptor/node_modules/snowflake-sdk/lib/services/sf.js:639:13)'
  1. Can you set logging to DEBUG and collect the logs?

https://community.snowflake.com/s/article/How-to-generate-log-file-on-Snowflake-connectors

e.g
Add this to get standard output.

var snowflake = require('snowflake-sdk');
snowflake.configure(
{
  logLevel: 'trace'
});
  1. What is your Snowflake account identifier, if any? (Optional)
@SushmaPydichetti SushmaPydichetti added the bug Something isn't working label Sep 24, 2024
@github-actions github-actions bot changed the title High CPU Utilization and Event Loop Blockage Due to Synchronous Code Leading to CPU Blockage For large Files in PUT Command SNOW-1679281: High CPU Utilization and Event Loop Blockage Due to Synchronous Code Leading to CPU Blockage For large Files in PUT Command Sep 24, 2024
@sfc-gh-dszmolka
Copy link
Collaborator

hi, thank you for raising this concern about the driver with us - we'll take a look

@sfc-gh-dszmolka sfc-gh-dszmolka self-assigned this Sep 25, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka added the status-triage Issue is under initial triage label Sep 25, 2024
@sfc-gh-dszmolka
Copy link
Collaborator

took a quick look at it. without having information on how exactly the performance and eventloop blockings was measured, apparently there might be at least 2 operations in uploadOneFile seen in the stacktrace, which could require blocking i/o:

  • compressFileWithGZIP, to compress the input file (and normalize gzipped file header)
  • getDigestAndSizeForFile to calcuate sha256 hash of the file

all of the involved functions are async (except maybe the part where we normalize timestamp in gzipped file header with fs.writeSync, which by the way i would expect to be a quick operation regardless of the file size) and at this point i'm not sure how compressing and hashing a file could be made non blocking, but we will look into the possible improvements.

in the meantime as mitigation, you could try:

  • pre-GZIPping the big CSV somewhere outside in a separate process, and passing the pre-compressed GZIP to the PUT command and using AUTO_COMPRESS=FALSE SOURCE_COMPRESSION=GZIP; (reference: PUT command)

by this you could eliminate the need of blocking the eventloop by the CPU-intensive operation of compressing the file inside the node process

  • or even depending on the use-case, instead of a massive 2.5G CSV, create multiple smaller CSV and upload them (again maybe from a separate thread or even pre-GZIPping the smaller CSVs too)

Not sure what the ultimate goal is with the files put to the stage, if it's ingesting them with COPY INTO, that can also be run once and made to match multiple files (or run on the whole stage), doesn't necessarily need one single file to work on.

Nevertheless; we'll look at the possible enhancement options here (without any ETA), but at this stage I don't feel this is a bug per se. If you feel i'm missing something, please provide futher inputs.

@sfc-gh-dszmolka sfc-gh-dszmolka added enhancement The issue is a request for improvement or a new feature status-triage_done Initial triage done, will be further handled by the driver team and removed bug Something isn't working status-triage Issue is under initial triage labels Sep 26, 2024
@sfc-gh-dszmolka
Copy link
Collaborator

sfc-gh-dszmolka commented Sep 30, 2024

edit now i see #922 was raised by someone else, for the exact same topic. closing this one in favor of 922 for tracking.

@SushmaPydichetti
Copy link
Author

edit now i see #922 was raised by someone else, for the exact same topic. closing this one in favor of 922 for tracking.

#922 seems to be related to the memory consumption Will you be fixing CPU utilization as well in the same issue because there is significant CPU usage( in getDigestAndSizeForFile and encryption) ?
The graph below is for a pre-compressed file PUT query execution
image (19)

@sfc-gh-dszmolka
Copy link
Collaborator

internally, we're tracking it as one enhancement indeed

@SushmaPydichetti
Copy link
Author

internally, we're tracking it as one enhancement indeed

Sure, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement The issue is a request for improvement or a new feature status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

3 participants