Every CI build owned by the Azure-SDK team also needs to verify that the documentation within the target repo meets a set of standards. Doc-warden
is intended to ease the implementation of these checks in CI builds.
Features:
- Enforces Readme Standards
- Readmes present
- Readmes have appropriate contents
- Generates report for included observed packages
This package is tested on Python 2.7 -> 3.8.
This package is intended to be run as part of a pipeline within Azure DevOps. As such, Python must be installed prior to attempting to install or use Doc-Warden.
While pip
comes pre-installed on most modern Python installs, if pip
is an unrecognized command when attempting to install warden
, run the following command after your Python installation is complete.
In addition, warden
is distributed using setuptools
and wheel
, so those packages should also be present prior to install.
/:> python -m ensurepip
/:> pip install setuptools wheel
Right now, warden
supports two main purposes. Readme enforcement (scan
, content
, presence
) and package indexing (index
).
<pre-step, clone target repository>
...
/:> pip install setuptools wheel
/:> pip install doc-warden
...
<next task, because PATH doesn't update without another one>
/:> ward scan -d $(Build.SourcesDirectory)
Notes for example above
- Assumption is that the
.docsettings
file is placed at the root of the repository.
To provide a different path (like azure-sdk-for-java
does...), use:
/:> ward scan -d $(Build.SourcesDirectory) -c $(Build.SourcesDirectory)/eng/.docsettings.yml
command
Currently supports 3 commands. Values: ['scan', 'presence', 'content',
index]
Required.
scan
- Run both
content
andpresence
enforcement on the targeted directory.
- Run both
content
- Run only
content
readme enforcement on the target directory. Ensures content in each matches the regex patterns defined in the .docsettings file.
- Run only
presence
- Run only
presence
readme enforcement on the target directory. Ensures readmes exist where they should.
- Run only
index
- Take inventory of the target folder. Attempts to leverage selected docsettings to discover all packages within the directory, and generate a
packages.md
index file.
- Take inventory of the target folder. Attempts to leverage selected docsettings to discover all packages within the directory, and generate a
--scan-directory
The target directory warden
should be scanning. Required.
--scan-language
warden
checks for packages by convention, so it needs to understand what language it is looking at. This must be populated either in .docsettings file
or by parameter. Required.
--config-location
By default, warden
looks for the .docsettings
file in the root of the repository. However, populating this location will override this behavior and instead pull the file from the location in this parameter. Optional.
--package-output
Override the default location that the generated packages.md
file is dropped to during execution of the index
command.
--verbose-output
Enable or disable output of an html report. Defaults to false. Optional.
The -d
argument should be $(Build.SourcesDirectory)
. This will point warden
at the repo that has been associated with CI.
When should we expect a readme to be present?
Always:
- At the root of the repo
- Associated with a
package
directory
A package directory is indicated by:
- a
*,csproj
file under thesdk
directory- Note that this is just a proxy.
warden
attempts to omit test projects by convention.
- Note that this is just a proxy.
A package directory is indicated by:
- the presence of a
setup.py
file
A package directory is indicated by:
- the presence of a
pom.xml
file- The POM
<packaging>
value within is set toJAR
- The POM
A package directory is indicated by:
- The presence of a
package.json
file
doc-warden
has the ability to check discovered readme files to ensure that a set of configured sections is present. How does it work? doc-warden
will ensure that each regex defined in required_readme_sections
matches against at least one section header in the readme. If all the patterns match at least one header, the readme will pass content verification.
Other Notes:
- A
section
header is any markdown or RST that will result in a<h1>
to<h2>
html tag. warden
will content verify anyreadme.rst
orreadme.md
file found outside theomitted_paths
in the targeted repo.
Special cases often need to be configured. It seems logical that there needs be a central location (per repo) to override conventional settings. To that end, a new .docsettings.yml
file will be added to each repo.
<repo-root>
│ README.md
│ .docsettings.yml
│
└───.azure-pipelines
│ │ <build def>
│
└───<other files and folders>
The presence of this file allows each repository to customize how enforcement takes place within their repo.
Example DocSettings File for Java Repo
omitted_paths:
- archive/*
language: java
root_check_enabled: True
required_readme_sections:
- "(Client Library for Azure .*|Microsoft Azure SDK for .*)"
- Getting Started
known_presence_issues:
- ['cognitiveservices/data-plane/language/bingspellcheck', '#2847']
known_content_issues:
- ['sdk/template/azure-sdk-template/README.md','#1368']
The above configuration tells warden
...
- The language within the repo is
java
- To ensure that a
README.md
is present at the root of the repository. - To omit any paths under
archive/
from the readme checks.
Possible values for language
right now are ['net', 'java', 'js', 'python']
. Greater than one target language is not currently supported.
This section instructs warden
to verify that there is at least one matching section title for each provided section
pattern in any discovered readme. Regex is fully supported.
The two items listed from the example .docsettings
file will:
- Match a header matched by a simple regex expression
- Match a header exactly titled "Getting Started"
Note that the regex is surrounded by quotation marks where the regex will break yml
parsing of the configuration file.
doc-warden
is designed to crash builds if it detects failures. However, the vast majority of the time, these issues cannot be fixed immediately. In the above configuration, there are two paths highlighted as known issues.
The first, known_presence_issues
, tells warden that a presence failure detected at the specified paths should be ignored and should not result in a crashed build. A tuple
describing each known issue specifies both what the known issue is, as well as some sort of justification. Having an exception with an issueId attached is a good justification for not failing the build.
We're aware of this issue, and it is tracked in the following github issue.
The known_content_issues
parameter functions identically to the known_presence_issues
check. If a readme is listed as "already known" to have failures, the entire CI build will not be crashed by Warden.
Indexing packages is often done as part of nightly (or triggered) automation. With this being the case, sometimes warden
may detect a PackageId that users wish to omit from the generated packages.md
file. The Azure SDK team leverages
the package_indexing_exclusion_list
array members to enable just this sort of scenario.
package_indexing_traversal_stops
is used during parse of .NET language repos only. This is due to how the discovery logic for readme and changelog is implemented for .NET projects. Specifically, readmes for a .csproj are often a couple directories up from their parent .csproj location!
For .net, warden
will traverse up one directory at a time, looking for the readme and changelog files in each traversed directory. warden
will continue to traverse until...
- It discovers a folder with a
.sln
within it - It encounters a folder that exactly matches one present in
package_indexing_traversal_stops
Note that warden
will not even execute an index against a .NET repo unless the traversal stops are set.
SDK for net .docsettings is a great example for both the exclusion list as well as the traversal stops.
If you encounter any bugs or have suggestions, please file an issue here and assign to scbedd
.