Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup Automation Manager logging and error alerts for relevant repositories #327

Closed
8 tasks done
maxachis opened this issue Jun 16, 2024 · 14 comments
Closed
8 tasks done

Comments

@maxachis
Copy link
Contributor

maxachis commented Jun 16, 2024

To avoid cases where an automated action fails and nobody realizes it for a while, we should set up a system for logging performance of automated actions, and develop a means to send alerts to Discord or other locations when builds fail or something else occurs in an unexpected way.

While there are scattered logging and post-to-discord functions located within different scripts, a more sophisticated result would probably be setting up a Jenkins server, for which there are guidelines on how to do so on Digital Ocean. This would enable a centralized user interface with sophisticated logging and scheduling functionality that would reduce the amount of boilerplate code we have to develop and allow us to review and manage tasks in a more user-friendly way.

We may need to upgrade the AM droplet in order to properly do this -- the article above recommends at least 1GB of RAM, while the AM droplet currently only has 512 MB.

Requirements

  • initial Jenkins setup in DO
  • migrate scripts in the automation-manager to jenkins
  • migrate remaining scripts with cron jobs
    • exceptions
      • data-sources-app (not the v2 fork) has some tests which are deprecated by v2 anyway
      • github actions demo; either we leave as-is or turn it into a jenkins demo
  • set up a domain name automation.pdap.io
    • rationale: automation is maybe more clear than am; .io is for "real stuff" we use that affects things, .dev is for stuff we're still building; even though automation is for devs, it's real and affects things.
  • wind down original automation manager droplet

Scripts to migrate to Jenkins

@maxachis
Copy link
Contributor Author

maxachis commented Jun 16, 2024

Note that I have limited experience with Jenkins and hence there would be a learning curve for me in setting this up. For safety's sake, it may be useful to set up a second droplet to perform/test out this functionality and then gradually transition from the original droplet to the new droplet, if we decide to go in this direction.

@maxachis
Copy link
Contributor Author

maxachis commented Jun 16, 2024

I set up a test version of Jenkins! This is on a new Digital Ocean droplet called AM-Jenkins.

image

Because I have a little version of @josh-chamberlain that sits on my shoulder and asks me questions about my decisions, allow me to respond to tiny Shoulder Josh and by extension Big Josh:

Why should we use this instead of GitHub Actions?

  1. Jenkins is much more extensible and flexible compared to GH.
  2. Jenkins enables better separation of concerns. Rather than adding additional clutter to various Github Repositories, these are defined in a separate user interface, where we can control jobs deriving from multiple repositories.
  3. Jenkins improves security. User accounts can be created with distinct permissions granted, and control over job construction can be delegated in an environment separate from the Github environment.
  4. Jenkins has a very friendly user interface. Scheduling, job sequencing, log archiving, and so on is defined clearly in the configurations and the results available for easy review.
  5. Jenkins has a large-open source community, with a lot of plentiful documentation.
  6. Jenkins has a large library of plugins which can be utilized to customize our experience, much more easily than GH. For example, if we want to send notifications on failed builds to a Discord Webhook, there's a Plugin for that.
  7. Build control is superior. Triggered builds run much faster than in GitHub actions and can be debugged on a step-by-step basis, making them easier to develop and tweak.
  8. Unlike GitHub Actions, Jenkins builds can persist after running, which reduces the amount of time spent rebuilding jobs.

@josh-chamberlain
Copy link
Contributor

Nice. I was able to log in and things seem pretty clear.

A few questions:

  1. it appears a user can, within Jenkins, see which repos contain automation by inspecting the logs. That's probably enough re: docs, but I made a stub of a Notion doc. Please feel free to add anything that might be useful.
  2. do you think we should migrate all our recurring github actions now, or wait?
  3. do you think it's still worth prototyping new things in GA, or should we just use Jenkins from the jump?
  4. what else needs to happen before we close this?

@maxachis
Copy link
Contributor Author

Nice. I was able to log in and things seem pretty clear.

A few questions:

  1. it appears a user can, within Jenkins, see which repos contain automation by inspecting the logs. That's probably enough re: docs, but I made a stub of a Notion doc. Please feel free to add anything that might be useful.
  2. do you think we should migrate all our recurring github actions now, or wait?
  3. do you think it's still worth prototyping new things in GA, or should we just use Jenkins from the jump?
  4. what else needs to happen before we close this?
  1. Will do!
  2. I think doing it now for anything that operates off a Cron Job is a good idea. I think the first thing is to migrate the other scripts in the automation manager, and confirm they function properly.
  3. I think using Jenkins from the jump is probably the best option, if only because it's less of a pain to set up jobs in Jenkins compared to GitHub (once you understand how it works, of course).
  4. As of right now, the droplet is accessible by an IP address. I think we should get the domain name registered and have the SSL certificate so we can access it more professionally and securely. I already created a domain name for the droplet in Digital Ocean and had it point to the IP address, but that domain name will need to be registered in whatever manner we do that in.
  5. Then, we'll need to wind down the original automation manager droplet and replace it with this.

@josh-chamberlain
Copy link
Contributor

@maxachis thanks! I tweaked your parent comment a bit.

our domains are configured with squarespace, but digital ocean manages our nameservers. so, the am.pdap.dev you created seems to simply work! Neat. I think we should do automation.pdap.io as mentioned above.

@maxachis
Copy link
Contributor Author

@josh-chamberlain Done! Ready for the domain to be configured!

@josh-chamberlain
Copy link
Contributor

josh-chamberlain commented Jun 18, 2024

@maxachis without any intervention from me, it's up at automation.pdap.io! DO makes it pretty easy, I have to say...

@maxachis
Copy link
Contributor Author

@josh-chamberlain Excellent! I've begun migrating jobs to Jenkins, although I have noted that my migration of the data sources mirror repository has run into some hitches.

To whit, I'm not able to get it to work on Jenkins, I'm not able to manually get it working by triggering it while in root in the original Automation Manager droplet, and because of the absences of logs in the original droplet, I'm not even sure when it last worked at all. The output from Jenkins and running it in the droplet are below:

requests.exceptions.HTTPError: ('403 Client Error: Forbidden for url: https://api.airtable.com/v0/app473MWXVJVaD7Es/Agencies?filterByFormula=DATETIME_DIFF%28NOW%28%29%2Cairtable_agency_last_modified%2C%27hours%27%29+%3C+2', "{'type': 'INVALID_PERMISSIONS_OR_MODEL_NOT_FOUND', 'message': 'Invalid permissions, or the requested model was not found. Check that both your user and your token have the required permissions, and that the model names and/or ids are correct.'}")

So that poses a problem.

@maxachis
Copy link
Contributor Author

@josh-chamberlain Additionally, for the Automatic Archives action, I'll need an API key I can use to call the PDAP endpoint to perform the archives action.

@josh-chamberlain
Copy link
Contributor

josh-chamberlain commented Jun 20, 2024

@maxachis do you think we should use the same SECRET_KEY as the client (global org secret) or create a new key for automation?

re: the airtable error, it looks like it's hitting the Airtable API for a list of sources to archive, instead of our own API. We could fix the Airtable permissions by getting the read only API key, or refactor so we hit our internal endpoint instead. I like the second option better, personally

@maxachis
Copy link
Contributor Author

@josh-chamberlain For safety's sake, I think creating a new key for automation is best.

And agreed, we should refactor to hit our internal endpoint. I smell a new issue! 👃

@maxachis
Copy link
Contributor Author

@josh-chamberlain

  1. Airtable-To-Postgres-Database-Mirror is now running on Automation, although I'm not sure how to verify that's working exactly as expected.
  2. I've got a PR up for fixing the automatic archives. Once that's fixed, we should be able to get it running on Jenkins.
  3. For the common crawler, we've had this discussion previously, but I think the it's best if we set up an endpoint to cache common crawler searches and to retrieve a cache for given search parameters if they exist, rather than having them in the repository to commit. If the common crawler runs daily, that means we'll quickly accrue commits rapidly, which will make things difficult for us if we ever have to rollback commits.

@josh-chamberlain
Copy link
Contributor

josh-chamberlain commented Jun 25, 2024

@maxachis thank you for all your work on this—it's already much easier to see how automations are working. Like, I don't ever want to have to SSH anywhere.

  1. I'm not really sure either. We mostly just want to be sure that it's running. Typical activity will be a few additions, a modification or two, or nothing at all most days.
  2. Nice, approved! Thought I did that earlier, but I must have got distra
  3. Yes. That said, at this moment, we don't have a reason to run it daily, or close to daily, or automatically. I think it can wait, would you agree?

@maxachis
Copy link
Contributor Author

@josh-chamberlain Automatic Archives is now working as well! Set to run on a daily basis.

In response to 3: Yes, agreed! So perhaps we put that in a separate issue so we can close this one out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants