Documentation [treated] as Code¶
Estimated time to read: 20 minutes
- Originally Written: December, 2023
Documentation is quite often an afterthought. If it exists, it's likely out of date (including this site!), may be spread across multiple locations, and written in different formats by different people.
You may never reach a stage of perfect documentation but hopefully the examples below give you a start to address some of these challenges. By the end of the post we want to have:
- Standard data format for documents
- Dynamic content (Optional)
- Automated formatting
- Version control
- Collaboration
- Centrally hosted documentation
Info
The data formats and tools used in this post are examples only. There are many more options available.
But the documentation is not generated from code?¶
First a note about the title in case you're confused. Documentation as code in the context of this post (and often in other blog posts on the same subject) refers to treating documentation like software. It does not necessarily mean using code to generate documentation, although this is possible.
Docs treated as code are stored in version control systems, can be reviewed and collaborated on just like any other piece of code. This approach allows for better integration with the development workflow, fosters collaboration, and helps in maintaining consistency and accuracy of the documentation. It ensures that documentation is up-to-date and evolves along with the codebase, as they are stored together and handled in the same way.
Getting started¶
The outcomes in the list above can be grouped into three main categories.
Let's go through each and see how the outcomes can be achieved.
Write¶
There are many ways you may be writing and storing documentation, for example plain text files, Word documents, PDF, Wikis, posts in a Sharepoint site.
In this example Markdown is chosen to provide a consistent standard document format as one of the benefits is that we get a plain text document but can add formatting through a basic syntax.
You'll see later that Markdown is also supported across many platforms so re-using or re-hosting can be much simpler.
Markdown cheatsheet¶
Markdown files end in the .md
extension.
There are many features of Markdown but here is some of the more common syntax to get you started.
Markdown text¶
Info
Note that the example below has \`\`\` for the code block to preserve the formatting. The slash would not otherwise be needed as shown in the preview screenshots that follow.
Markdown text
# Hashtag creates a header
## You can have up to six levels
- This is a list item
- This is a second list item
- List items can also be sub-items
*This text will be italic*
**This text will be bold**
> Block quotes use a `greater than` sign
[You can add links](http://tl10k.dev)
`Highlighting` uses a ` backtick which is different from a single quote
You can also highlight code block with three backticks. Syntax highlighting may also be available depending on the Markdown parser
\`\`\`python
print("Hello World")
\`\`\`
\`\`\`yaml
elements:
- id: 11
type: fruit
name: apple
- id: 12
type: fruit
name: banana
- id: 13
type: vegetable
name: carrot
- id: 14
type: vegetable
name: onion
\`\`\`
Markdown preview¶
Adding images and diagrams¶
There are a few ways to add images.
- You could use the same format as a link but starting with an exclamation mark
!
![Alternate text](./images/1.png)
- Alternatively you can use also use HTML within Markdown files so could use an
<img>
tag which would allow you to specify the width
<img src="./images/1.png" width="300">
Styling content
As noted above, HTML can be used within Markdown so you could also change the colour of elements using a <span>
tag for example.
### Adding images and diagrams
- Mermaid Diagrams and similar libraries can create diagrams such as flowcharts, gantt charts, or mindmaps
flowchart TD
A[Start: Issue Reported]
B{Is it a known issue?}
C[Lookup Documentation]
D[Attempt to Replicate Issue]
E{Can issue be replicated?}
F[Log Issue]
G[Perform Troubleshooting Steps]
H{Is issue resolved?}
I[Escalate Issue]
J[End: Issue Resolved]
A --> B
B -- Yes --> C
C --> G
B -- No --> D
D --> E
E -- Yes --> G
E -- No --> F
G --> H
H -- Yes --> J
H -- No --> I
I --> J
Converting between formats¶
Although the tools in the example below use Markdown, in some cases you may need other formats. I find Pandoc works well for this purpose.
Pandoc can convert documents between various formats.
Examples:
- Markdown to PDF
pandoc my_documentation.md -o my_documentation.pdf
- Markdown to Word
pandoc my_documentation.md -o my_documentation.docx
- Word to PDF
pandoc my_documentation.docx -o my_documentation.pdf
Info
Some of the formatting may not be correct depending on what you are trying to convert
Dynamic content¶
As previously mentioned, Documentation as code in the context of this post (and often in other blog posts on the same subject) refers to treating documentation like software. However sometimes you might want to also generate some documentation automatically.
Here are some examples of what's possible.
Generating Terraform documentation¶
If you're building Terraform modules which include a large number of inputs and outputs it can be challenging to maintain up-to-date and comprehensive documentation. The terraform-docs
tool automates this process and can generate documentation in several formats, including Markdown, JSON, and YAML.
Dynamic content from an API¶
Perhaps you want to automatically generate documentation from the latest information found across multiple systems. Combining a Python script, a Jinja template, and an API allows you to quickly build documentation without having to write a word in a markdown file.
The example script below connects to Netbox and pulls device connection data. It then renders a Jinja template using this data. Finally, it writes the output to a markdown file.
Jinja is a popular templating language for Python and can be used to generate any text-based file. This makes it useful in various cases such as generating config files or code. In this case it's generating a markdown file.
The template contains placeholders for data that will be inserted when the template is rendered. The placeholders are enclosed by curly braces {{ ... }}
.
The example template below, connection_diagrams.j2
, has the structure for a Mermaid diagram as previously seen, however the device connection data pulled from Netbox will be substituted in the placeholders, {{connection.device_a}}
and {{connection.device_b}}
When the script runs it creates a file, connection_diagrams.md
, with the Mermaid code to produce a diagram of connections between devices using data taken from Netbox. This is shown in the screenshot on the left-hand side below. The right-hand side is the preview of the diagram.
This file can then be added alongside any static documentation you have. You may also want to schedule the script to run every now and then (e.g. with a cronjob) to create a refreshed diagram or document.
Format, test and store¶
Once the content is written it needs to be stored. Keeping with the theme of the post we can use a Version Control System (VCS) in the same way one would be used for source code. Github and Gitlab are two popular options and this example will use Github.
Both are centralized frontends that use the Git software to distributed provide version control and also provide additional features.
Organizations often put change requests or approval processes in place to reduce the risk of downtime. Although a documentation change may not directly impact a production environment, having out of date, incorrect, or missing docs can cause issues at a later time. Therefore similar approval structures can be implemented with these centralized platforms, including tracking comments and discussions related to documentation changes.
These platforms also offer different ways to automatically run custom workflows such as a formatting job. This helps to not only catch issues but to reduce friction of creating and maintaining documentation. This is covered at a later stage.
Common Git terminology¶
To get started there are a few key components to understand.
Repository (Repo)
- a folder for storing version controlled filesWorking Directory
- the visible directory and its contentsVersioned Files
- files you have asked Git to trackUn-Versioned Files
- files in your working directory not tracked by GitCommit
- snapshot in time (of your version controlled files)Branches
- allow you to keep different versions of your code cleanly separated
A very brief intro of the Git workflow¶
For a simple Git workflow there are four key stages.
git init
will initialize a new/empty folder as a Git repository i.e. it creates the hidden.git
folder with all metadata etc. Alternatively you cangit clone
an existing repository which will create a local copy all the files tracked by Git and the existing history.
Once a repository is created you can run the following commands to create a new commit and then push that commit to the remote repository (e.g. one hosted on Github)
A quick analogy
git add
is like selecting which files you want to include in your compressed file. You're telling the system which files you want to package up, but you haven't created the compressed file yet.
git commit -m
is like compressing the files into a single zipped file and giving it a name that describes what's inside. You're finalizing the package and providing a summary of the changes.
git push origin main
is like uploading your zipped file to a cloud storage. origin
represents the destination of the file (like a specific cloud drive URL), and main
is the specific folder or bucket on the destination.
-
git add
- stages changes for the next commit. When you modify files in your repository, Git recognizes that a file has changed but doesn't incorporate the changes until you tell it to do so. So,git add
is the command that tells Git to take a snapshot of the changes you've made in those files and prepare them for a commit. -
git commit -m "some descriptive message here"
- used to save your changes to the local repository (e.g. your laptop). Think of this stage like bundling all the added files into a package The-m
option stands for "message" which allows you to provide a short message describing the changes you made in this commit. This doesn't actually push the commit to the remote repository. -
git push origin main
- sends your committed changes in your local repository (e.g. laptop) to a remote repository (e.g. Github repo).origin
- This is the default name given to the remote repository where you cloned from. It's essentially a shorthand name for the URL of the remote repository.main
- This is the branch that you're pushing to the remote repository. It's often "main" or "master", depending on the repository, but it can be any branch name.
Automated checks with pre-commit hooks¶
Developers use several tools such as coding guidelines and formatters to maintain consistent code styles. Git hooks are scripts that run automatically on your local machine at specific stages, for example before each commit, and are often used to automate this formatting process (among other purposes).
Hooks can also be used for documentation.
You can write your own hooks or alternatively use a community created hook. There are also helpful tools and frameworks such as https://pre-commit.com which make it very easy to use pre-commit-hooks.
To give you an idea of tasks you could automate:
Spell Check
- pre-commit hooks can be used to run a spell check on the documentation to catch any spelling errors.Linters
- a linter such as a markdown linter can be used to catch any syntax errors or inconsistencies such as though found in https://github.com/markdownlint/markdownlint/blob/main/docs/RULES.mdLink Check
- if your documentation includes links, a pre-commit hook can check that all links are valid and not broken.Docstring Check
- for code documentation, a pre-commit hook can be used to ensure that all functions have accompanying docstrings.Grammar Check
- these are tools that can check grammar in your documentation.Formatting
- pre-commit hooks can be used to enforce consistent formatting in your documentation.
An example¶
I'm using the pre-commit
framework in the following example. There is a .pre-commit-config.yaml
file which contains the hooks I want to use and I also have a .mdl_style.rb
configuration file for the markdownLint hook.
Files are added using git add
but when the git commit
command is entered, the pre-commit tests are first run before the files are committed.
Serve¶
There are many options to host and serve the content.
You could share the markdown files with colleagues or just store them locally to read through. Since they're already stored in a centralized Git VCS (if you've been following along) you could just view them using that frontend e.g. Github. In many cases platforms such as Github will also render the markdown files.
A third option would be to create a static website from the content. A static site generator is a tool that creates static HTML pages from raw data, such as markdown files, and a template. These HTML pages are pre-built when the site is deployed, so when a user visits the site, they are served the pre-rendered HTML.
There are many options available but in this example we will look at Mkdocs which is geared towards building project documentation. In fact this blog is built using Mkdocs and the Material theme. Mkdocs takes markdown files as input and generates static HTML, CSS, and javascript assets which can then be served to a user.
I have Mkdocs installed on my laptop and run the commands below to generate the static site.
Mkdocs structure¶
A minimal setup requires a directory which contains the documentation in Markdown format. In this case it's stored in the docs/
folder. The second requirement, which can you can see in the screenshot below, is an mkdocs.yml
file to store the Mkdocs configuration.
There are four main parts to the configuration below. At the top you should see the site details. Following on is the theme of the Mkdocs page. In this case it's Material but there are many themes available. The theme section contains relevant configuration properties such as which features to enable and colour schemes.
Below the theme you will find any additional markdown extensions or plugins you want to use.
The fourth section and perhaps most important is the navigation. This tells Mkdocs where to find the markdown files you wish to publish and you can have nested categories as seen in the screenshot.
Developing and testing the static site¶
If you need to test the site as you're writing the content you can use the mkdocs serve
command which starts the built-in development server.
Once it's running you can navigate to http://localhost:8000
(8000 is the default port) in your browser to see a local copy of your site.
The server also supports auto-reloading, which means it will automatically rebuild your documentation and refresh your browser to whenever a file is modified.
Building the site¶
Info
Depending on how you want to host the site, this step may be optional (e.g. you won't require the build
command when using Github Pages as shown below)
When you're ready to generate the static site you can use the mkdocs build
command. This will create a new site
folder containing all the HTML, CSS, and Javascript assets.
Hosting and serving the content with Github Pages¶
In some cases you may want to serve the content in the site folder using your own web server (e.g. Apache, NGNIX). For example an internal document portal for your team.
Another option which may be simpler is to use a static site hosting service such as Github Pages. Rather than running your own web server Github Pages serves content found in a specified branch of a Github repository. You can find the configuration for Pages under the Settings
menu in the repository where you store your documentation.
The common practice is to create a gh-pages
branch to store the contents of the site folder, although you could select any, including the default main
branch.
You can use the mkdocs gh-deploy
command to automate the building and publishing of the site.
This command will first build the site contents, and then run the Git commands to add, commit, and push the new files to the gh-pages
branch of your document repository.
After a few minutes your new or updated site will be available at http(s)://<username>.github.io/<repository>
. This is the default URL although you can also setup a custom domain.
Centralized automation¶
We've seen some automation in the form of local pre-commit hooks which were used by the author to help catch any errors before the documentation change was submitted to the central repository. For example this may help with YAML/JSON formatting, spell checks, or even finding credentials and secrets buried in the docs.
It can also be helpful to have centralized workflows running each time a change is pushed to the repository. For example based on what has been covered so far, you may want to generate up to date Terraform module documentation on each commit or perhaps refresh the Netbox connectivity diagrams every night to ensure the latest copy is always available.
In my case I have many links throughout the posts and find it helpful to know if any are broken and need to be updated. You can use a tool such as Github Actions to build and run workflows to perform this any many more tasks.
Actions are custom workflows that are defined by a YAML file and triggered by GitHub events such as a push, pull request, or another specified event such as a schedule.
As you can see from the screenshot below I have a .github/workflows
directory within the documentation repository. Within this folder, you create one or more YAML files that define your workflows. Each YAML file represents a separate workflow and contains the steps that the workflow will execute. These steps can include checking out the repository code, running scripts, installing dependencies, and deploying software.
This particular workflow runs at 12:00AM on the first of every month and contains a single job. I use the linkcheck Github Action to perform the actual checking of the links and like any CLI program, I can pass in various arguments.
In my case I check if all my site links and all external (-e
) links are working. I also have a number of URLs/files I want to skip which are stored in a separate file (.link-checker-skip-these-files
).
When the workflow is triggered Github Actions will run through the jobs/steps and perform the required actions, resulting in either a pass or fail of the workflow.