-
Notifications
You must be signed in to change notification settings - Fork 4
Week 07
To set up monitoring we changed our docker compose files to include Prometheus
and Grafana
. Configuring Prometheus involved some changes and configurations to our codebase:
- Adding the corresponding package to our project:
prometheus-net.AspNetCore
(Prometheus client that allows us to configure a/metrics
endpoint and write parameterized application metrics) - Adding the corresponding metric implementation: These can be seen in the
ApplicationMetrics.cs
file. - Selecting when to trigger such measurements: the use of them can be inspected in
CatchAllMiddleware.cs
.
Prometheus server is provisioned through the previously mentioned docker-compose.yml file, so it runs inside the same Droplet as the main server. Prometheus server is configured through the prometheus.yml
file.
NOTE: We are provisioning a persistent Grafana Dashboard through a Digital Ocean droplet.
Besides the default metrics provided by prometheus-net.AspNetCore package, we decided to create a middleware that is triggered with all incoming requests so we provide some extra metrics to the /metrics endpoint so it can be scrapped by Prometheus and sent to Grafana.
-
minitwit_http_request_duration_seconds
: through the measurement of the request duration and the provisioning of endpoint labels, we intend to create a Histogram that presents the average response time by endpoint. This could be used to further improve on poorly behaving endpoints in the future. -
minitwit_http_requests_total
: through the measurement of the total requests received by the application and the provisioning of endpoint labels, we can have visibility of which endpoints are the most important for our application, and also work as a way of detecting other metrics such as total registered users or total messages written. -
minitwit_http_response_status_code_total
: through the measurement of the total of status codes returned by the application responses we could introduce some monitoring targeting the most critical status codes, such as 401(Unauthorized), 404 (Not found), 429(Too Many Requests), 500 (Internal Server Error), 503(Service Unavailable), etc.
Ask Ellie before completing on how it was configured.
These metrics are registered by querying the database on some specific indicators, such as:
- Messages registered (application usage)
- Users registered (conversion)
- Follower registrations (users interaction level)
- Rate of HTTP requests received per endpoint: we can identify the most requested endpoints. In combination with the average request duration by endpoint, it can help us identifying critical parts of our application to be refactored in the future.
- Total number of requests (last 24hs): overall visibility of application load.
- Average request duration by endpoint: it helps in monitoring the performance of each endpoint, might help in identifying weaknesses.
- Total count of errors per status code (last 24hs).
- Top 10 unhandled exception endpoints.
- Top 10 Requested endpoints (API).
- Even though we haven't set this monitoring ourselves, Digital Ocean provides some out of the box monitoring for its droplets, such as CPU usage, memory usage, DISK I/O, Disk Usage, Bandwith, etc.
By analyzing metrics, specifically "average request duration by endpoint", and incorporating research conducted by Ellie, we successfully reduced latency across our endpoints. This improvement is evident in the screenshot below:
The primary cause of the high latency was the geographical separation between our Database (hosted in the NYC region) and our application server (in the FRA region). Digital Ocean's documentation highlights the potential performance issues caused by hosting database droplets and application servers in different regions. For more details, refer to the Digital Ocean community question on managed MySQL performance.
We are using semantic-release plugin together with the "release" workflow file. semantic-release is originally meant for publishing and working with node/npm packages, but we are using it purely for the functionality of automating and publishing github releases. Check the following link for more examples: LINK
- feat: initial commit # => v1.0.0 on @latest
- fix: a fix # => v1.0.1 on @latest
- feat: initial commit # => v1.0.0 on @latest
- fix: a fix # => v1.0.1 on @latest
- feat: adding a small feature # => v1.1.0 on @latest
- feat: initial commit # => v1.0.0 on @latest
- feat: drop Node.js 6 support \n\n BREAKING CHANGE: Node.js >= 8 required # => v2.0.0 on @latest
By default semantic-release uses the Angular Commit Message Conventions and triggers releases based on the following rules: https://github.com/angular/angular.js/blob/master/DEVELOPERS.md#-git-commit-guidelines
- feat: A new feature
- fix: A bug fix
- docs: Documentation only changes
- style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
- refactor: A code change that neither fixes a bug nor adds a feature
- perf: A code change that improves performance
- test: Adding missing or correcting existing tests
- chore: Changes to the build process or auxiliary tools and libraries such as documentation generation
To make sure that the files we have placed on the server is dynamically updated, we added to the workflow that the files should be secured copied (scp
) into the server every time the workflow runs. Therefore, if we change something in the files (in the remote_files
and remote_files_preproduction
folders) locally it will be automatically deployed to the server when the workflow runs.