-
Notifications
You must be signed in to change notification settings - Fork 61
/
AWS Notes 02 - Monitoring.txt
223 lines (198 loc) · 12.4 KB
/
AWS Notes 02 - Monitoring.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
===============================================
===============================================
=
= AWS
= Certified Solutions Architect - Professional
= SAP-C01 Exam Study Notes
=
= *** Note 02: Monitoring ***
=
= Created:
= By: Hicham El Alaoui
= Email: [email protected]
= Date: Mar-2021
=
= Source: https://github.com/helalaoui/AWS-Solutions-Architect-Certification
=
===============================================
===============================================
CloudTrail:
- Is an AWS service that records actions in AWS that were taken by any user, role or service.
- Is enabled at the account level.
- Events types:
- Management events = control plane events.
- Data events = data plane events. Supported on few services only (S3, Lambda).
- Insights = unusual activity in your AWS account.
- Events history:
- You can view, search, and download the past 90 days of activity in your AWS account.
- Events download file formats: CSV or JSON.
- To keep more history, configure a trail that stores events to S3 or CLoudWatch Logs.
- CloudTrail trails:
- A trail is a configuration that enables delivery of events to an Amazon S3 bucket, CloudWatch Logs and CloudWatch Events.
- Enables you to archive, analyze, and respond to changes in your AWS resources.
- A trail can limit the scope of events.
- A trail can be applied to all regions.
- If you use AWS Organizations, the Master Account can apply a trail to the whole organization.
- Events delivery time:
- Logs are delivered within an average of about 15 minutes of an API call. No SLA.
- Events are delivered to the CloudWatch Events bus in near-realtime.
- Delivering log files to S3:
- Log files are compressed JSON files.
- Log files can be encrypted.
- A trail can send an SNS notification upon log file delivery.
- You can enable "log file validation": validates that the log files have not been modified/replaced.
- A trail can be applied to all Regions or a single Region.
- For global services such as IAM, STS, and CloudFront, events are delivered to any trail that includes global services (--include-global-service-events option).
- To create an alarm on some API activity, you should create a trail that sends the events to CloudWatch Logs and create an alarm in CloudWatch Logs.
=================================
CloudWatch:
- (Should be called "CloudWatch Metrics").
- Monitors AWS resources and your applications in real time to collect and track metrics.
- Generates statistics and graphs.
- If you put your own custom metrics into the repository, you can retrieve statistics on these metrics as well.
- The CloudWatch home page automatically displays metrics about every AWS service you use. You can additionally create custom dashboards.
- CloudWatch concepts: Namespaces, Metrics, Dimensions, Statistics, Percentiles, Alarms.
- Namespace:
- a container for CloudWatch metrics. Example: "AWS/EC2".
- Metrics in different namespaces are isolated from each other, so that metrics from different applications are not mistakenly aggregated into the same statistics.
- Metric:
- a time-ordered set of data points that are published to CloudWatch.
- Think of a metric as a variable to monitor, and the data points as representing the values of that variable over time.
- Metrics exist only in the Region in which they are created.
- Metrics cannot be deleted, but they automatically expire after 15 months.
- Each data point in a metric has a time stamp, and (optionally) a unit of measure.
- Metrics resolution:
- Standard resolution: data having a one-minute granularity. All metrics produced by AWS services are standard resolution by default.
- High resolution: data at a granularity of one second.
- Dimension:
- A dimension is a name/value pair that is part of the identity of a metric.
- You can assign up to 10 dimensions to a metric.
- For example, you can get statistics for a specific EC2 instance by specifying the InstanceId dimension when you search for metrics.
- Statistics:
- Statistics are metric data aggregations over specified periods of time.
- Aggregations are made using the namespace, metric name, dimensions, and the data point unit of measure, within the time period you specify.
- Example of statistics functions: min, max, average, sum, percentile.
- Percentile:
- A percentile indicates the relative standing of a value in a dataset.
- For example, the 95th percentile means that 95 percent of the data is lower than this value and 5 percent of the data is higher than this value.
- Some CloudWatch metrics support percentiles as a statistic.
- Alarm:
- Automatically initiate actions on your behalf.
- An alarm watches a single metric over a specified time period, and performs one or more specified actions, based on the value of the metric relative to a threshold over time.
- You can also add alarms to dashboards.
- Action:
- An action can be triggered either when:
- the alarm goes to an "In-Alarm" state, or
- the alarm goes to an "OK" state, or,
- the alarm goes to an "Insufficient data" state.
- Action types: SNS topic, Auto Scaling action, EC2 action or create OpsItems in Systems Manager Ops Center.
- Metrics retention for data points with period of:
- less than 60 sec : 3 hours.
- 1 mn : 15 days.
- 5 mn : 2 months.
- 1 hr : 15 months.
- You can use CloudWatch cross-Region functionality to aggregate statistics from different Regions.
- CloudWatch agent: enables you to collect internal system-level metrics from Amazon EC2 instances and on-prem servers across operating systems.
=================================
CloudWatch Logs:
- Collects and monitors log files from sources like EC2, CloudTrail and Route53.
- Logs can be viewed, searched, filtered and archived.
- By default logs are stored permanently.
- A log event is a record of some activity. Composed of a timestamp and a raw message.
- A log stream is a sequence of log events that share the same source.
- Log groups define groups of log streams that share the same retention, monitoring, and access control settings.
- Metrics from logs:
- You can search and filter the log data coming into CloudWatch Logs and create one or more metric filters.
- Metric filters define the terms and patterns to look for in log data as it is sent to CloudWatch Logs.
- CloudWatch Logs uses these metric filters to turn log data into numerical CloudWatch metrics that you can graph or set an alarm on.
- Subscriptions:
- You can use subscriptions to stream log events from CloudWatch Logs to other services such as Kinesis stream, Kinesis Data Firehose stream, or AWS Lambda for custom processing, analysis, or loading to other systems.
- A subscription filter defines the filter pattern to use for filtering which log events get delivered to your AWS resource.
- Each log group can have up to two subscription filters associated with it.
- Can be used across multiple AWS accounts (requires cross-account access).
- CloudTrail logs can be sent to CloudWatch Logs for monitoring. Uses the CloudWatch Logs APIs to create and send logs to a CloudWatch Log log stream.
CloudWatch Logs Insights:
- Enables you to search and analyze your log data in CloudWatch Logs.
- Includes a purpose-built query language.
- Automatically discovers fields in logs from many AWS services and from 3rd party JSON logs.
=================================
CloudWatch Events:
- Delivers a near real-time stream of system events that describe changes in AWS resources.
- Based on Amazon EventBridge (See "Queues & Buses" notes).
- CloudWatch Event is based on one special event stream included in EventBridge for AWS system events.
- Events can be triggered in CloudWatch Events when:
- A change happens in your AWS environment. For example, EC2 generates an event when the state of an EC2 instance changes from pending to running.
- You make API calls. These events are published by CloudTrail.
- You can generate custom application-level events and publish them to CloudWatch Events.
- You set up scheduled events that are generated on a periodic basis.
- Rules:
- A rule matches incoming events and routes them to targets for processing.
- A single rule can route to multiple targets, all of which are processed in parallel.
- A rule can customize the JSON sent to the target, by passing only certain parts or by overwriting it with a constant.
- Targets:
- A target processes events.
- Targets can include EC2 instances, Lambda functions, Kinesis streams, ECS tasks, Step Functions state machines, SNS topics, SQS queues, and built-in targets.
- A target receives events in JSON format.
- Event Patterns:
- Rules use event patterns to select events and route them to targets.
- A pattern either matches an event or it doesn't.
- Event patterns are represented as JSON objects with a structure that is similar to that of events
=================================
AWS X-Ray:
- A service that collects data about requests that your application serves.
- Provides tools you can use to view, filter, and gain insights into that data to identify issues and opportunities for optimization.
- For any traced request to your application, you can see detailed information not only about the request and response, but also about calls that your application makes to downstream AWS resources, microservices, databases and HTTP web APIs.
=================================
AWS Config:
- Enables you to assess, audit, and evaluate the configurations of your AWS resources.
- You can review the changes history of your configurations.
- You can determine what the resource’s configuration looked like at any point in the past.
- You can track relationships and dependencies between AWS resources prior to making changes.
- Configuration snapshot: a point-in-time capture of all your resources and their configurations. Generated on demand.
- AWS Config can monitor your configurations with the following workflow: Triggers ==> Rules ==> Notification ==> Remediation.
- Triggers:
- Conditions under which AWS Config runs your rules.
- Two trigger types (not exclusive):
- Configuration changes: AWS Config runs evaluations for the rule when certain types of resources are created, changed, or deleted.
- Periodic: AWS Config runs evaluations for the rule at a frequency that you choose.
- Rules:
- evaluation of recorded configurations against desired configurations.
- AWS managed rules: predefined, customizable rules that AWS Config uses to evaluate whether your AWS resources comply with common best practices.
- Example of AWS Config Managed Rule: checks that all EBS volumes are encrypted.
- Custom rules: you need to develop an AWS Lambda function, which contains the logic that evaluates whether your AWS resources comply with the rule.
- Notification:
- Configuration changes that deviate from your rules automatically trigger SNS notifications and Amazon CloudWatch events.
- Remediation:
- You can use remediation actions.
- A Conformance Pack = Rules + Remediation actions.
=================================
Trusted Advisor:
- Inspects your AWS environment, and then makes recommendations on:
- Cost optimization. Examples:
- EC2 Reserved Instances optimization.
- Low utilization Amazon EC2 instances.
- Idle load balancers.
- RDS idle DB instances.
- Fault Tolerance. Examples:
- EBS snapshots.
- EC2 availability zone balance.
- Performance. Examples:
- High utilization of EC2 CPU or EBS IOPS.
- CloudFront cache hit ratio.
- Security. Examples:
- Security groups - Unrestricted access (0.0.0.0/0).
- S3 bucket permissions.
- IAM password policy and access key rotation.
- Service Limits: Checks whether your account approaches or exceeds the limits (quotas) for AWS services and resources.
- Some checks are free and other are paid.
=================================
AWS Personal Health Dashboard (PHD):
- Provides alerts and remediation guidance when AWS is experiencing events that may impact you.
- While the Service Health Dashboard displays the general status of AWS services, Personal Health Dashboard gives you a personalized view into the performance and availability of the AWS services underlying your AWS resources.
- Supported notification channels: API, email, and CloudWatch Events (SQS, SNS, Lambda, Kinesis).
- Supports showing alerts in AWS Management Console navigation bar.
- AWS Abuse alerts can be sent to AWS PHD, Health APIs, and the AWS Health CloudWatch Events channel.
=================================
Automated AWS Account Audits:
- Audit cost, performance, security, fault tolerance.
- Free version and expanded paid version.