Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azurerm_storage_blob parallelism not working as expected? #1976

Closed
mchouque opened this issue Sep 25, 2018 · 3 comments
Closed

azurerm_storage_blob parallelism not working as expected? #1976

mchouque opened this issue Sep 25, 2018 · 3 comments

Comments

@mchouque
Copy link

Hello,

I'm running terraform v0.11.7 with the following plugins azurerm v1.15.0_x4 and azure v0.1.1_x4.

I'm trying to upload a VHD image to Azure using the following resource.

resource "azurerm_storage_blob" "linux-image" {
  name                   = "${var.image_name}"
  resource_group_name    = "${var.resource_group}"
  storage_account_name   = "${var.bucket_name}"
  storage_container_name = "${var.image_family}"
  type                   = "page"
  source                 = "${var.image_path}/${var.image_name}"
  parallelism            = "16"
  attempts               = "2"
}

The file is 50 GB and the terraform apply run sometimes times out after 43 or 44 minutes with an "unexpected EOF". When it works it takes something like 33 minutes.

When using the azure CLI, I can play with max-connections to increase the parallelism and thus decrease the overall time it takes to upload the image. With a --max-connections at 16, it takes me less than 11 minutes to upload the image.

Looking at what terraform does, I never see more than 3 or 4 simultaneous TCP connections at any given time, in fact it's more often 1 than anything else, a lot less than I would expect. Plus I saw in the code that workerCount := parallelism * runtime.NumCPU() and given I have 2C/4T on this VM, I'd expect a lot more parallel TCP).

So my question is: what am I doing wrong or is this a bug? What is parallelism supposed to do compare to what Azure CLI does? I mean terraform is like 3 times as slow as the CLI.

Regards,
Mathieu

@metacpp metacpp self-assigned this Oct 3, 2018
@metacpp metacpp removed their assignment Jan 29, 2019
@tombuildsstuff tombuildsstuff self-assigned this Jul 16, 2019
@tombuildsstuff tombuildsstuff modified the milestones: v1.32.0, v1.33.0 Jul 16, 2019
@tombuildsstuff tombuildsstuff modified the milestones: v1.33.0, v1.34.0 Aug 19, 2019
@tombuildsstuff
Copy link
Contributor

hi @mchouque

Thanks for opening this issue - apologies for the delayed response here!

In recent versions of the Azure Provider we've been working to move off the deprecated Storage SDK that's in the Azure SDK for Go in favour of our replacement SDK, Giovanni - as a part of this we've been working through all of the Storage Resources to migrate them across.

As a part of the v1.34.0 release we've switched the azurerm_storage_blob resource over to using the new Giovanni SDK - which meant that we've been going through this resource (including the parallelism functionality).

Whilst working through this migration we've identified a bug in the old implementation where, rather than chunking the blob up and then uploading the chunks in parallel N times, we instead uploaded /all/ of the chunks in parallel N times - which would explain the behaviour you're seeing here. In order to make this change as compatible as possible for the moment we've opted to leave this behaviour the same in the upcoming v1.34.0 of the Azure Provider - but we plan to fix this in a future release.

Since the Giovanni SDK exposes some helper methods for both uploading Block Blobs (today) and Page Blobs (in future) from Files - there's two issues tracking this on the upstream repository: Block Blobs should support parallelism and Page Blobs should expose a Helper supporting parallelism. As this issue will ultimately be fixed in the Giovanni repository, I'm going to close this issue in favour of the two upstream issues mentioned above, but once that's supported we'll update the version of the Giovanni SDK used in this repository which should fix this.

Thanks!

@ghost
Copy link

ghost commented Sep 18, 2019

This has been released in version 1.34.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 1.34.0"
}
# ... other configuration ...

@ghost
Copy link

ghost commented Oct 14, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Oct 14, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants