Skip to content

Commit

Permalink
Update retry examples to include App Mesh default retry policies. (#332)
Browse files Browse the repository at this point in the history
Co-authored-by: Alex Barcenas <[email protected]>
  • Loading branch information
AKBarcenas and atbarce authored Sep 2, 2020
1 parent 7131ef7 commit fecf6b8
Show file tree
Hide file tree
Showing 8 changed files with 110 additions and 2 deletions.
26 changes: 26 additions & 0 deletions walkthroughs/howto-http-retries/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,32 @@ This example shows how we can set retry duration and attempts within route confi
1. Curl the endpoint again and this time you should receive a 200. Verify this by checking the logs and confirming that despite sending a single request from the frontend, the envoy sidecar on the blue service attempted to retry the request based on the route spec.
## Default Retry Policy
App Mesh provides customers with a default retry policy when an explicit retry policy is not set on a route. However, this is not currently available to all customers. If default retry policies are not currently available to you then you will not be able to run this upcoming section and can skip ahead to the clean up section. To learn more about the default retry policy you can read about it here: https://docs.aws.amazon.com/app-mesh/latest/userguide/envoy.html#default-retry-policy
1. Let's swap back to a route that has no explicit retry policy to have the default retry policy get applied. Update your route configuration to not include retries by running the following command:
```
aws appmesh update-route --mesh-name howto-http-retries --cli-input-json file://blue-route-no-retry.json
```
1. Curl the endpoint again and this time you should receive a 503. This is due to the fact that our application is currently configured to consectively send back 503s until 1 second has passed since the initial request. Although the default retry policy is present and we are retying the request, we are unable to get back a successful request due to the application returning faults for a period of time that will likely exhaust all retries. In order to better observe the default retry policy in action let's make a change to the application.
1. Open the `serve.py` file found in the `colorapp` folder in an editor. Look for the `FAULT_TIME` variable towards the top of the file. This should be currently set to `1` and we will now change this value to be `.02`. Save this change and you can now close this file.
1. To apply this change to our application we must update our application image and redeploy our application. You can do this by running the following command:
```
./deploy.sh update-blue-service
```
The effect of running this command will not be immediate because it will task some time for the application to get redeployed with our change to track the status we can run the following command and take a look at the runningCount and pendingCount:
```
aws ecs describe-services --cluster howto-http-retries --services BlueService
```
We want the runningCount to be 1 and the pendingCount to be 0. This will indicate that an ECS task with our change is now running and that the previous task running the old version of the application has been torn down. Once this state has been reached then we can move on to making a request.
5. Curl the endpoint again and this time you should receive a 200. Verify this by checking the logs and confirming that despite sending a single request from the frontend, the envoy sidecar on the blue service attempted to retry the request based on the route spec. This should look similar to when we set an explicit retry policy on our route except we are now retrying a fewer amount of times when compared to the explicit strategy.
This showcases that the App Mesh default retry policy can help prevent failed requests in some cases. However, there may be cases where you will want to set an explicit retry strategy depending on your application and use case. To read more about what recommendations we give for retry policies you can read more here: https://docs.aws.amazon.com/app-mesh/latest/userguide/best-practices.html#route-retries
## Clean up
Run the following command to remove all resources created from this demo (will take 5-10 minutes):
Expand Down
2 changes: 2 additions & 0 deletions walkthroughs/howto-http-retries/app.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -440,6 +440,7 @@ Resources:
- WebLoadBalancerRule
Properties:
Cluster: !Ref Cluster
ServiceName: FrontService
DeploymentConfiguration:
MaximumPercent: 200
MinimumHealthyPercent: 100
Expand Down Expand Up @@ -467,6 +468,7 @@ Resources:
Type: AWS::ECS::Service
Properties:
Cluster: !Ref Cluster
ServiceName: BlueService
DeploymentConfiguration:
MaximumPercent: 200
MinimumHealthyPercent: 100
Expand Down
21 changes: 21 additions & 0 deletions walkthroughs/howto-http-retries/blue-route-no-retry.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"virtualRouterName": "color-router",
"routeName": "color-route-blue",
"spec": {
"priority": 1,
"httpRoute": {
"match": {
"prefix": "/"
},
"action": {
"weightedTargets": [
{
"virtualNode": "blue-node",
"weight": 1
}
]
}

}
}
}
4 changes: 3 additions & 1 deletion walkthroughs/howto-http-retries/colorapp/serve.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
except Exception as e:
print(f'[ERROR] {e}')

FAULT_TIME = 1

COLOR = os.environ.get('COLOR', 'no color!')
print(f'COLOR is {COLOR}')

Expand All @@ -24,7 +26,7 @@ def do_GET(self):
curr_time = time.time()
time_diff = curr_time - req_time

if time_diff > 1 :
if time_diff > FAULT_TIME :
print('success!')
self.send_response(200)
else :
Expand Down
8 changes: 8 additions & 0 deletions walkthroughs/howto-http-retries/deploy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -127,4 +127,12 @@ if [ "$action" == "delete" ]; then
exit 0
fi

if [ "$action" == "update-blue-service" ]; then
echo "updating app image..."
deploy_images
echo "updating blue service..."
aws ecs update-service --force-new-deployment --cluster ${PROJECT_NAME} --service BlueService
exit 0
fi

deploy_stacks
35 changes: 35 additions & 0 deletions walkthroughs/howto-k8s-retry-policy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,3 +67,38 @@ You can use v1beta1 example manifest with [aws-app-mesh-controller-for-k8s](http
4. You should now see more 200 OK responses due to retries.
Now go to https://www.envoyproxy.io/docs/envoy/v1.8.0/api-v1/route_config/route#config-http-conn-man-route-table-route-retry and https://www.envoyproxy.io/learn/automatic-retries for details on how retries work in Envoy.
## Default Retry Policy
App Mesh provides customers with a default retry policy when an explicit retry policy is not set on a route. However, this is not currently available to all customers. If default retry policies are not currently available to you then you will not be able to run this upcoming section and can skip this section. To learn more about the default retry policy you can read about it here: https://docs.aws.amazon.com/app-mesh/latest/userguide/envoy.html#default-retry-policy
1. Let's swap back to a route that has no explicit retry policy to have the default retry policy get applied. Update your route configuration to not include retries by commenting out or removing the retryPolicy that you uncommented earlier in manifest.yaml.template and run `./deploy.sh`:
```
# COMMENT back out or remove below to disable explicit retries
retryPolicy:
maxRetries: 4
perRetryTimeoutMillis: 2000
httpRetryEvents:
- server-error
```
2. Send requests to the front service again in a seperate terminal to observe that we are once again getting back 503s for some of the requests
```
while true; do curl -s -o /dev/null -w "%{http_code}" http://localhost:8080 ; sleep 0.5; echo ; done
```
3. In order to better see the default retry policy in action let's lower the fault rate on our application. Currently at a 50% fault rate we are likely going to exhaust all of our retries for some requests resulting in the 503s that we see getting returned. Let's make a change to the `serve.py` in the `colorapp` folder by reducing the fault rate from 50% to 10% by making a changing the fault rate variable at the top of the file from 50 to 10.
```
# Change this value to 10
FAULT_RATE = 50
```
4. With this change let's redeploy the application to use this new fault rate by running the following
```
REDEPLOY=true ./deploy.sh
```
5. Now let's again send requests to the front service again to observe that we are now should be getting almost exclusively 200s at this point from all of our requests despite 10% of them failing.
```
while true; do curl -s -o /dev/null -w "%{http_code}" http://localhost:8080 ; sleep 0.5; echo ; done
```
This showcases that the App Mesh default retry policy can help prevent failed requests in some cases. However, there may be cases where you will want to set an explicit retry strategy depending on your application and use case. To read more about what recommendations we give for retry policies you can read more here: https://docs.aws.amazon.com/app-mesh/latest/userguide/best-practices.html#route-retries
4 changes: 3 additions & 1 deletion walkthroughs/howto-k8s-retry-policy/colorapp/serve.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
except Exception as e:
print(f'[ERROR] {e}')

FAULT_RATE = 50

COLOR = os.environ.get('COLOR', 'no color!')
print(f'COLOR is {COLOR}')

Expand All @@ -21,7 +23,7 @@ def do_GET(self):
return
r = random.randint(1, 100)
status_code=200
if r <= 50:
if r <= FAULT_RATE:
status_code=503
self.send_response(status_code)
self.end_headers()
Expand Down
12 changes: 12 additions & 0 deletions walkthroughs/howto-k8s-retry-policy/deploy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,12 @@ EOF
kubectl apply -f ${EXAMPLES_OUT_DIR}/manifest.yaml
}

redeploy_app() {
EXAMPLES_OUT_DIR="${DIR}/_output/"
kubectl delete -f ${EXAMPLES_OUT_DIR}/manifest.yaml
deploy_app
}

main() {
check_appmesh_k8s

Expand All @@ -112,6 +118,12 @@ main() {
deploy_images
fi

if [ "$REDEPLOY" = true ]; then
echo "redeploying app..."
redeploy_app
exit 0
fi

deploy_app
}

Expand Down

0 comments on commit fecf6b8

Please sign in to comment.