Skip to content

Commit

Permalink
Merge pull request #4221 from willkg/system-checklist
Browse files Browse the repository at this point in the history
Add e2e-tests and system checklist to docs
  • Loading branch information
willkg authored Nov 28, 2017
2 parents 168f782 + 91d3f34 commit 218082e
Show file tree
Hide file tree
Showing 6 changed files with 266 additions and 5 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ socorro/unittest/database/logs/
socorro/unittest/testlib/logs/
stackwalk/
exploitable/
docs/_build/
*~
coverage.xml
.coverage
Expand Down Expand Up @@ -52,3 +51,7 @@ e2e-tests/.cache/
.cache/
docker-compose.override.yml
my.env

# docs things
docs/_build/
docs/tests/e2e_readme.md
7 changes: 7 additions & 0 deletions docker/run_build_docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,11 @@
# file, You can obtain one at http://mozilla.org/MPL/2.0/.

cd docs

# Copy the e2e-test README to this directory because otherwise we can't
# pull it in because there isn't a way to do a file include for Markdown
# files.
cp ../e2e-tests/README.md tests/e2e_readme.md

# Build the docs
make html
9 changes: 7 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@
# All configuration values have a default; values that are commented out
# serve to show the default.

import sys, os
import os
import sys
import sphinx_rtd_theme

# If extensions (or modules to document with autodoc) are in another directory,
Expand All @@ -39,7 +40,11 @@
# You can specify multiple suffix as a list of strings:
#
# source_suffix = ['.rst', '.md']
source_suffix = '.rst'
source_suffix = ['.rst', '.md']

source_parsers = {
'.md': 'recommonmark.parser.CommonMarkParser',
}

# The master toctree document.
master_doc = 'index'
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,5 +45,6 @@ Contents
architecture/*
components/*
services/*
tests/*
deploy
howto
36 changes: 34 additions & 2 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,34 @@
Sphinx==1.6.5
sphinx-rtd-theme==0.2.4
Sphinx==1.6.5 \
--hash=sha256:fdf77f4f30d84a314c797d67fe7d1b46665e6c48a25699d7bf0610e05a2221d4 \
--hash=sha256:c6de5dbdbb7a0d7d2757f4389cc00e8f6eb3c49e1772378967a12cfcf2cfe098
sphinx_rtd_theme==0.2.4 \
--hash=sha256:62ee4752716e698bad7de8a18906f42d33664128eea06c46b718fc7fbd1a9f5c \
--hash=sha256:2df74b8ff6fae6965c527e97cca6c6c944886aae474b490e17f92adfbe843417
future==0.16.0 \
--hash=sha256:e39ced1ab767b5936646cedba8bcce582398233d6a627067d4c6a454c90cfedb
snowballstemmer==1.2.1 \
--hash=sha256:9f3bcd3c401c3e862ec0ebe6d2c069ebc012ce142cce209c098ccb5b09136e89 \
--hash=sha256:919f26a68b2c17a7634da993d91339e288964f93c274f1343e3bbbe2096e1128
alabaster==0.7.10 \
--hash=sha256:2eef172f44e8d301d25aff8068fddd65f767a3f04b5f15b0f4922f113aa1c732 \
--hash=sha256:37cdcb9e9954ed60912ebc1ca12a9d12178c26637abdf124e3cde2341c257fe0
typing==3.6.2 \
--hash=sha256:349b1f9c109c84b53ac79ac1d822eaa68fc91d63b321bd9392df15098f746f53 \
--hash=sha256:63a8255fe7c6269916baa440eb9b6a67139b0b97a01af632e7bd2842e1e02f15 \
--hash=sha256:d514bd84b284dd3e844f0305ac07511f097e325171f6cc4a20878d11ad771849
imagesize==0.7.1 \
--hash=sha256:6ebdc9e0ad188f9d1b2cdd9bc59cbe42bf931875e829e7a595e6b3abdc05cdfb \
--hash=sha256:0ab2c62b87987e3252f89d30b7cedbec12a01af9274af9ffa48108f2c13c6062
Babel==2.5.1 \
--hash=sha256:f20b2acd44f587988ff185d8949c3e208b4b3d5d20fcab7d91fe481ffa435528 \
--hash=sha256:6007daf714d0cd5524bbe436e2d42b3c20e68da66289559341e48d2cd6d25811
sphinxcontrib-websupport==1.0.1 \
--hash=sha256:7a85961326aa3a400cd4ad3c816d70ed6f7c740acd7ce5d78cd0a67825072eb9 \
--hash=sha256:f4932e95869599b89bf4f80fc3989132d83c9faa5bf633e7b5e0c25dffb75da2
recommonmark==0.4.0 \
--hash=sha256:cd8bf902e469dae94d00367a8197fb7b81fcabc9cfb79d520e0d22d0fbeaa8b7 \
--hash=sha256:6e29c723abcf5533842376d87c4589e62923ecb6002a8e059eb608345ddaff9d
# recommonmark 0.4.0 requires CommonMark <= 0.5.4
CommonMark==0.5.4 \
--hash=sha256:34d73ec8085923c023930dfc0bcd1c4286e28a2a82de094bb72fabcc0281cbe5

213 changes: 213 additions & 0 deletions docs/tests/system_checklist.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
======================
Socorro Test Checklist
======================

This is a high-level system-wide checklist for making sure Socorro is working
correctly in a specific environment. It's a helpful template for figuring out
what you need to change if you're pushing out a significant change.

**Note:** This is used infrequently, so if you're about to make a significant change,
you should go through the checklist to make sure the checklist is correct and
that everything is working as expected and fix anything that's wrong, THEN
make your change, then go through the checklist again.

Lonnen the bear says, "Only you can prevent production fires!"

Last updated: November 15th, 2017


How to use
==========

"Significant change" can mean any number of things, so this is just a template.
You should do the following:

1. Copy and paste the contents of this into a Google Doc, Etherpad, or
whatever system you plan to use to keep track of status and outstanding
issues.

2. Go through what you copy-and-pasted, remove things that don't make sense,
and add additional things that are important. (Please uplift any changes
via PR to this document that are interesting.)


Checklist
=========

::

Migrations
==========

Make sure we can run migrations

* Django migrations

Local dev environment:

1. "docker-compose run webapp bash"
2. "cd webapp-django/"
3. "./manage.py showmigrations"

-stage/-prod:

1. See Mana

* Alembic migrations

Local dev environment:

1. "docker-compose run processor bash"
2. "alembic -c docker/config/alembic.ini current"

-stage/-prod:

1. See Mana


Collector (Antenna)
===================

Is the collector handling incoming crashes?

* Check datadog Antenna dashboard for the appropriate environment.

localdev: Check the logging in the console
stage: https://app.datadoghq.com/dash/272676/antenna--stage
prod: https://app.datadoghq.com/dash/274773/antenna--prod

* Log into Sentry and check for errors.

* Submit a crash to the collector. Verify raw crash made it to S3.


Processor
=========

Is the processor process running?

* Log into a processor node and watch the processor logs for errors.

Log file: "/var/log/messages"

To check for errors: "grep ERROR /var/log/messages | less"

* Check Datadog "processor.save_raw_and_processed" for appropriate
environment.

localdev: Check the logging in the console
stage: https://app.datadoghq.com/dash/272676/antenna--stage
prod: https://app.datadoghq.com/dash/274773/antenna--prod

Is the processor saving to ES? Postgres? S3?

* Check Datadog
"processor.es.ESCrashStorageRedactedJsonDump.save_raw_and_processed.avg"

stage: https://app.datadoghq.com/dash/272676/antenna--stage
prod: https://app.datadoghq.com/dash/274773/antenna--prod

* Check Datadog
"processor.s3.BotoS3CrashStorage.save_raw_and_processed" for
appropriate environment.

stage: https://app.datadoghq.com/dash/272676/antenna--stage
prod: https://app.datadoghq.com/dash/274773/antenna--prod

* Check Datadog
"processor.postgres.PostgreSQLCrashStorage.save_raw_and_processed"

stage: https://app.datadoghq.com/dash/272676/antenna--stage
prod: https://app.datadoghq.com/dash/274773/antenna--prod


Submit a crash or reprocess a crash. Wait a few minutes. Verify the crash was
processed and made it to S3, ES and Postgres.

**FIXME:** We should write a script that uses envconsul to provide vars and takes
a uuid via the command line and then checks all the things to make sure it's
there. This assumes we don't already have one--we might!


Webapp
======

Is the webapp up?

* Use a browser and check the healthcheck (/monitoring/healthcheck)

It should say "ok: true".

Is the webapp throwing errors?

* Check Sentry for errors
* Log into webapp node and check logs for errors.

Log file: "/var/log/messages"

To check for errors: "grep ERROR /var/log/messages | less"

* Run QA Selenium tests.

localdev: ?
stage: In IRC: "webqatestbot build socorro.stage.saucelabs"
prod: In IRC: "webqatestbot build socorro.prod.saucelabs"

Can we log into the webapp?

* Log in and check the profile page.

Is the product home page working?

* Check the Firefox product home page (/ redirects to /home/product/Firefox)

Is super search working?

* Click "Super Search" and make a search that is not likely to be cached.
For example, filter on a specific date.

Top Crashers Signature report and Report index

1. Browse to Top Crashers
2. Click on a crash signature to browse to Signature report
3. Click on a crash id to browse to report index

Can you upload a symbols file?

* Download https://github.com/mozilla/socorro/blob/master/webapp-django/crashstats/symbols/tests/sample.zip
to disk
* Log in with a user with permission to upload symbols.
* Go to the symbol upload section (/symbols/upload/web)
* Try to upload the "sample.zip" file.
* To verify that it worked, go to the public symbols S3 bucket and check
that there is a "xpcshell.sym" file in the root with a recent modify
date.


Crontabber
==========

Is crontabber working?

* Check healthcheck endpoint (/monitoring/crontabber/)

It should say ALLGOOD.

* Check the webapp crontabber-state page (/crontabber-state/)

Is crontabber throwing errors?

* Check Sentry for errors
* Log into admin node and check logs for errors

Log file: "/var/log/socorro/crontabber"

To check for errors: "grep ERROR /var/log/messages | less"


Stage submitter
===============

Is the stage submitter running and sending crashes?

* Check Datadog dashboard for Antenna on -stage

0 comments on commit 218082e

Please sign in to comment.