Update retry examples to include App Mesh default retry policies. (#332)

Co-authored-by: Alex Barcenas <[email protected]>
aws · Sep 2, 2020 · fecf6b8 · fecf6b8
1 parent 7131ef7
commit fecf6b8
Show file tree

Hide file tree

Showing 8 changed files with 110 additions and 2 deletions.
diff --git a/walkthroughs/howto-http-retries/README.md b/walkthroughs/howto-http-retries/README.md
@@ -51,6 +51,32 @@ This example shows how we can set retry duration and attempts within route confi
 
 1. Curl the endpoint again and this time you should receive a 200. Verify this by checking the logs and confirming that despite sending a single request from the frontend, the envoy sidecar on the blue service attempted to retry the request based on the route spec. 
 
+## Default Retry Policy
+App Mesh provides customers with a default retry policy when an explicit retry policy is not set on a route. However, this is not currently available to all customers. If default retry policies are not currently available to you then you will not be able to run this upcoming section and can skip ahead to the clean up section. To learn more about the default retry policy you can read about it here: https://docs.aws.amazon.com/app-mesh/latest/userguide/envoy.html#default-retry-policy
+
+1. Let's swap back to a route that has no explicit retry policy to have the default retry policy get applied. Update your route configuration to not include retries by running the following command:
+    ```
+    aws appmesh update-route --mesh-name howto-http-retries --cli-input-json file://blue-route-no-retry.json
+    ```       
+
+1. Curl the endpoint again and this time you should receive a 503. This is due to the fact that our application is currently configured to consectively send back 503s until 1 second has passed since the initial request. Although the default retry policy is present and we are retying the request, we are unable to get back a successful request due to the application returning faults for a period of time that will likely exhaust all retries. In order to better observe the default retry policy in action let's make a change to the application.
+
+1. Open the `serve.py` file found in the `colorapp` folder in an editor. Look for the `FAULT_TIME` variable towards the top of the file. This should be currently set to `1` and we will now change this value to be `.02`. Save this change and you can now close this file.
+
+1. To apply this change to our application we must update our application image and redeploy our application. You can do this by running the following command:
+```
+./deploy.sh update-blue-service
+```
+The effect of running this command will not be immediate because it will task some time for the application to get redeployed with our change to track the status we can run the following command and take a look at the runningCount and pendingCount:
+```
+aws ecs describe-services --cluster howto-http-retries --services BlueService
+```
+We want the runningCount to be 1 and the pendingCount to be 0. This will indicate that an ECS task with our change is now running and that the previous task running the old version of the application has been torn down. Once this state has been reached then we can move on to making a request.
+
+5. Curl the endpoint again and this time you should receive a 200. Verify this by checking the logs and confirming that despite sending a single request from the frontend, the envoy sidecar on the blue service attempted to retry the request based on the route spec. This should look similar to when we set an explicit retry policy on our route except we are now retrying a fewer amount of times when compared to the explicit strategy. 
+
+This showcases that the App Mesh default retry policy can help prevent failed requests in some cases. However, there may be cases where you will want to set an explicit retry strategy depending on your application and use case. To read more about what recommendations we give for retry policies you can read more here: https://docs.aws.amazon.com/app-mesh/latest/userguide/best-practices.html#route-retries
+
 ## Clean up 
 
 Run the following command to remove all resources created from this demo (will take 5-10 minutes): 

diff --git a/walkthroughs/howto-http-retries/app.yaml b/walkthroughs/howto-http-retries/app.yaml
@@ -440,6 +440,7 @@ Resources:
       - WebLoadBalancerRule
     Properties:
       Cluster: !Ref Cluster
+      ServiceName: FrontService
       DeploymentConfiguration:
         MaximumPercent: 200
         MinimumHealthyPercent: 100
@@ -467,6 +468,7 @@ Resources:
     Type: AWS::ECS::Service
     Properties:
       Cluster: !Ref Cluster
+      ServiceName: BlueService
       DeploymentConfiguration:
         MaximumPercent: 200
         MinimumHealthyPercent: 100

diff --git a/walkthroughs/howto-http-retries/blue-route-no-retry.json b/walkthroughs/howto-http-retries/blue-route-no-retry.json
@@ -0,0 +1,21 @@
+{
+  "virtualRouterName": "color-router",
+  "routeName": "color-route-blue",
+  "spec": {
+    "priority": 1,
+    "httpRoute": {
+      "match": {
+        "prefix": "/"
+      },
+      "action": {
+        "weightedTargets": [
+          {
+            "virtualNode": "blue-node",
+            "weight": 1
+          }
+        ]
+      }
+
+    }
+  }
+}
diff --git a/walkthroughs/howto-http-retries/colorapp/serve.py b/walkthroughs/howto-http-retries/colorapp/serve.py
@@ -7,6 +7,8 @@
 except Exception as e:
     print(f'[ERROR] {e}')
 
+FAULT_TIME = 1
+
 COLOR = os.environ.get('COLOR', 'no color!')
 print(f'COLOR is {COLOR}')
 
@@ -24,7 +26,7 @@ def do_GET(self):
         curr_time = time.time()
         time_diff = curr_time - req_time
 
-        if time_diff > 1 :
+        if time_diff > FAULT_TIME :
             print('success!')
             self.send_response(200)
         else :

diff --git a/walkthroughs/howto-http-retries/deploy.sh b/walkthroughs/howto-http-retries/deploy.sh
@@ -127,4 +127,12 @@ if [ "$action" == "delete" ]; then
     exit 0
 fi
 
+if [ "$action" == "update-blue-service" ]; then
+    echo "updating app image..."
+    deploy_images
+    echo "updating blue service..."
+    aws ecs update-service --force-new-deployment --cluster ${PROJECT_NAME} --service BlueService
+    exit 0
+fi
+
 deploy_stacks
diff --git a/walkthroughs/howto-k8s-retry-policy/README.md b/walkthroughs/howto-k8s-retry-policy/README.md
@@ -67,3 +67,38 @@ You can use v1beta1 example manifest with [aws-app-mesh-controller-for-k8s](http
 4. You should now see more 200 OK responses due to retries.
 
 Now go to https://www.envoyproxy.io/docs/envoy/v1.8.0/api-v1/route_config/route#config-http-conn-man-route-table-route-retry and https://www.envoyproxy.io/learn/automatic-retries for details on how retries work in Envoy.
+
+## Default Retry Policy
+App Mesh provides customers with a default retry policy when an explicit retry policy is not set on a route. However, this is not currently available to all customers. If default retry policies are not currently available to you then you will not be able to run this upcoming section and can skip this section. To learn more about the default retry policy you can read about it here: https://docs.aws.amazon.com/app-mesh/latest/userguide/envoy.html#default-retry-policy
+
+1. Let's swap back to a route that has no explicit retry policy to have the default retry policy get applied. Update your route configuration to not include retries by commenting out or removing the retryPolicy that you uncommented earlier in manifest.yaml.template and run `./deploy.sh`:
+   ```
+      # COMMENT back out or remove below to disable explicit retries
+        retryPolicy:
+          maxRetries: 4
+          perRetryTimeoutMillis: 2000
+          httpRetryEvents:
+            - server-error
+   ``` 
+2. Send requests to the front service again in a seperate terminal to observe that we are once again getting back 503s for some of the requests
+    ```
+    while true; do curl -s -o /dev/null -w "%{http_code}" http://localhost:8080 ; sleep 0.5; echo ; done
+    ```
+
+3. In order to better see the default retry policy in action let's lower the fault rate on our application. Currently at a 50% fault rate we are likely going to exhaust all of our retries for some requests resulting in the 503s that we see getting returned. Let's make a change to the `serve.py` in the `colorapp` folder by reducing the fault rate from 50% to 10% by making a changing the fault rate variable at the top of the file from 50 to 10.
+    ```
+    # Change this value to 10
+    FAULT_RATE = 50
+    ```
+
+4. With this change let's redeploy the application to use this new fault rate by running the following
+    ```
+    REDEPLOY=true ./deploy.sh
+    ```
+
+5. Now let's again send requests to the front service again to observe that we are now should be getting almost exclusively 200s at this point from all of our requests despite 10% of them failing.
+    ```
+    while true; do curl -s -o /dev/null -w "%{http_code}" http://localhost:8080 ; sleep 0.5; echo ; done
+    ```
+
+This showcases that the App Mesh default retry policy can help prevent failed requests in some cases. However, there may be cases where you will want to set an explicit retry strategy depending on your application and use case. To read more about what recommendations we give for retry policies you can read more here: https://docs.aws.amazon.com/app-mesh/latest/userguide/best-practices.html#route-retries
diff --git a/walkthroughs/howto-k8s-retry-policy/colorapp/serve.py b/walkthroughs/howto-k8s-retry-policy/colorapp/serve.py
@@ -7,6 +7,8 @@
 except Exception as e:
     print(f'[ERROR] {e}')
 
+FAULT_RATE = 50
+
 COLOR = os.environ.get('COLOR', 'no color!')
 print(f'COLOR is {COLOR}')
 
@@ -21,7 +23,7 @@ def do_GET(self):
             return
         r = random.randint(1, 100)
         status_code=200
-        if r <= 50:
+        if r <= FAULT_RATE:
             status_code=503
         self.send_response(status_code)
         self.end_headers()

diff --git a/walkthroughs/howto-k8s-retry-policy/deploy.sh b/walkthroughs/howto-k8s-retry-policy/deploy.sh
@@ -104,6 +104,12 @@ EOF
     kubectl apply -f ${EXAMPLES_OUT_DIR}/manifest.yaml
 }
 
+redeploy_app() {
+    EXAMPLES_OUT_DIR="${DIR}/_output/"
+    kubectl delete -f ${EXAMPLES_OUT_DIR}/manifest.yaml
+    deploy_app
+}
+
 main() {
     check_appmesh_k8s
 
@@ -112,6 +118,12 @@ main() {
         deploy_images
     fi
 
+    if [ "$REDEPLOY" = true ]; then
+        echo "redeploying app..."    
+        redeploy_app
+        exit 0
+    fi
+
     deploy_app
 }