-
Notifications
You must be signed in to change notification settings - Fork 4
DevOps Handbook
The following document takes "The DevOps Handbook" by Gene Kim as a guiding material in order to document all the actions taken to support what the author calls the Three Ways. The Three Ways is a set of principles that guide the DevOps mindset and aim at improving the efficiency and efficacy of the development process and they are the output of applying the most trusted principles from the domain of physical manufacturing and leadership to the IT value stream
Important
Even though there might be areas where we have not implemented a specific process yet, they will still be mentioned so future edits to this Handbook can include them.
To organise and make work visible we decided to capitalise Github's built in Project tab, which provides an adaptable spreadsheet for tracking work, and which also integrates with Issues and Pull Requests. This tool provides a comprehensive visibility to the current state of each ticket and an easy-to-understand interface for developers and stakeholders.
A new Issue is created to document a new requirement and is automatically shown in the Project board as a "Todo" item. When a team member is assigned to such issue, it is moved into the "In Progress" state. The development phase takes part, and when the issue is closed, it is automatically moved to the "Done" status in the board. Closing an issue as "not planned" takes the ticket to the "Aborted" status.
We have not implemented a limit to the WIP carried by each developer at a given time, but a started ticket should ideally be finished before starting a new one. This helps avoiding context switching and multitasking, which is proven to reduce productivity.
The team has implicitly adopted a mindset that tries to reduce batch sizes. For example, when migrating the app from Python to .NET ASP, a first controller (AppController) was created and a basic "Index" endpoint was tested before moving forward into the next endpoints.
Moreover, the chosen development flow (Git Flow) supports the use of feature branches. By splitting a big task into multiple smaller features we allow multiple developers to work in smaller and more manageable tasks in parallel, reducing the risk that entail developing bigger features.
The release of new features is done manually every week through Github's Release feature. This could potentially be automatised in the future.
Ideally, a ticket should be started and finished by the same developer/s that were assigned to it. Reducing or eliminating handoffs contributes to avoiding knowledge loss and unnecessary documentation.
We are currently undergoing a process of automating our deployment process. As a countermeasure to this constraint we currently have a deployment
(main branch) and preproduction-deployment
(develop branch -staging-) workflows that uses Github Actions to trigger a deployment process into Digital Ocean.
Both previously mentioned workflows will integrate a job that runs the tests automatically when triggered.
We are undergoing a process to implement ORM into our database, which will decouple our database stack from the application logic, making it easier to flip stack in the future if needed.
Are we doing any work that does not add value to the customer?
Are we developing any feature outside of the course requirements? Are we prioritising hard requirements over extra features?
Do we feel that we need to switch tasks too often?
Do we identify any blockers? Are we communicating them in an open way so others can help us unblock them?
Motion waste can be created when people who need to communicate frequently are not colocated. Handoffs also create motion waste and often require additional communication.
Incorrect, missing, or unclear information, materials, or products create waste, as effort is needed to resolve these issues.
What other parts of the development process do we want to automatise to reduce manual work?
Situations where individuals and teams are put in a position where they must perform unreasonable acts, which may become part of their daily work (e.g., nightly 2:00am problems in production).
Are we having these situations currently? If we are, it is always welcomed to reach out into our communication channels so we can reorganise the workload.
Do our current tests ensure that changes to the code base are safe to deploy to production?
The preproduction server
is one element that we are implementing so we have an opportunity of testing changes before deploying to production.
What mechanisms do we have in place for monitoring and logging our systems in production? Do we have a mechanism to ensure feedback is incorporated into development practices?
We currently use the provided status page for all groups (http://206.81.24.116/status.html) to visualise problems and errors in our application. In the future we will add more tools to visualise them, such as Grafana to add metrics and logging tools. Digital Ocean also provides Monitoring and Resource alert tools that we are not currently using.
Digital Ocean's Droplets and Volumes do not auto-resize, but we have created two Resource alerts that will keep the development team well informed in case we need to scale up the available resources.
The mentioned alerts under "See problems as they occur" will trigger an email to some of the developers in the team so they are timely informed to solve the issues.
Triggering questions:
-
What is our approach to fostering a culture that encourages experimentation and learning from failure?
- This includes celebrating failures as learning opportunities.
-
How do we allocate time and resources for learning new technologies or processes?
- Dedicated time for exploration can be beneficial.
-
What mechanisms do we have for sharing knowledge and learnings within the team and organization?
- Knowledge sharing sessions, wikis, etc.
- How do we encourage and support contributions to open source, public speaking, and other forms of external knowledge sharing?
-
What processes do we have for conducting post-mortems on failures or incidents?
- Focus on blameless post-mortems to understand what happened and why.