Skip to content
This repository has been archived by the owner on Sep 14, 2021. It is now read-only.

Need better fleet isolation in Azure #46

Open
SolomonShorser-OICR opened this issue Oct 5, 2015 · 5 comments
Open

Need better fleet isolation in Azure #46

SolomonShorser-OICR opened this issue Oct 5, 2015 · 5 comments

Comments

@SolomonShorser-OICR
Copy link
Member

I had two fleets in Azure. I could not provision from the second fleet, until the first fleet was completely gone (or at least invisible to the second fleet). I accomplished this by changing the tag name of the tag youxia.managed_tag on the remaining VM of the first fleet, and only then did the second fleet begin provisioning again.

I think that an additional tag for Azure VMs might be necessary to indicate exactly which fleet that VM belongs to, so that the Deployer won't get confused when it looks to see if there are any existing VMs.

I think this might be relevant:

if (deployment.getName().startsWith(managedTagValue)) {

@denis-yuen
Copy link
Member

This doesn't make sense. The value of youxia.managed_tag is the tag that indicates which fleet a VM belongs to. Are you sure you didn't start two fleets with the same name?

@denis-yuen
Copy link
Member

Or did you happen to use a name for one fleet that is a prefix of the second fleet? The names in Azure are followed by random uuids, I wonder if the names tripped that up.

@SolomonShorser-OICR
Copy link
Member Author

Sorry, I meant youxia.managed_state, that was the one I changed to IGNORE_youxia.managed_state.

The fleet names were broad-fleet-2 and broadfleet. They both have the prefix broad, but one is not the prefix of the other...

@denis-yuen
Copy link
Member

The line number you've linked to points to youxia.managed_tag not youxia.managed_state. And its the former that determines which fleet a worker belongs to. I think that if the latter is affecting which fleet a node belongs to, then this is just plain a bug.

@SolomonShorser-OICR
Copy link
Member Author

I think we need to try this again, to get a better idea on exactly what happened. It could have been a one-off issue, but I'm pretty sure this is reproducible, if was get a bit of time. Of course, I'm a little wary about trying to reproduce it in the same environment that's doing Production workflows...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants