Elegant command line parsing in Python using docopt, schema, and yaml

Both for my work and personal use, I tend to develop many command-line Python scripts that are very data-driven and configurable. In Python, it's pretty trivial to use argparse to support complex argument and option structures, and then use the integration with configparser to allow one to write files that capture specific option/configuration values for posterity. Unfortunately, the syntax of argparse and configparser tend to be pretty verbose, and incremental changes after weeks or months long development gaps can be error prone. In addition, I find the ini syntax of configparser rather limiting.

In the past year, I've switched over completely to using docopt, where you write the example usage and options in a string inside your script, and it builds the command-line invocation parser for you automatically. This is an incredibly powerful and concise way to write a self-documenting option parser. Then, you combine that with a data validation library like schema to check the incoming options, and use yaml to allow for options to be supplied from a text file, and you're pretty much in option parsing nirvana.

In this post, I'll go through a script I wrote recently that implements a very common pattern for anyone in an engineering field:

Take a file that defines an engineering simulation
Additionally, take another input file that specifies a set of parameters that define different simulation scenarios
Generate all possible combinations of parameter values
For each combination, emit the simulation file with the specific parameters
Execute an external simulator on the simulation file
Parse the results and log the results in a CSV file

1 Running parametrized SPICE simulations

For the purposes of this example, I'm going to walk-through a simple script that I've developed to do parameter exploration of a digital circuit using SPICE (hspice). The input is a simple spice deck that's a Mako template, and the parameters are specified in a YAML file. The script does simulation and logging of results as two separate flow steps (so you could, for example, re-write the results file based on a previously run set of simulations).

Here's the spice file written as a Mako template. Notice that all the parameters are within the ${...} blocks. We're really not using any of the advanced features of the templating language (like conditionals or loops, or even arbitrary python), but it's available if you need to make more complex simulation files.

*
.temp ${temp}
.option brief=1
.lib 'LIBBFILE.l' ${corner}

vvdd vdd 0 ${vdd}

.subckt xinv in out
xmp out in vdd vdd pfet w=20 l=2
xmn out in 0 0 nfet w=10 l=2
.ends xinv


xinv_t qf qt xinv

.meas tran StaticCurrentVdd avg i(vdd) from=1n to=3n
.meas tran StaticSupplyPowerVdd PARAM='-StaticCurrentVdd*${vdd}'

.tran 0.005ns 3ns sweep monte=1000

So, we have the following list of parameters that need to be supplied to this simulation file:

temp - Circuit simulation temperature
corner - Device/transistor corner (e.g. typical, fast, slow, etc)
vdd - Simulation voltage

We'll define these in another file using the YAML syntax like so:

temp:
    - 85
    - 125
vdd:
    - 0.6
    - 0.8
    - 1.0
corner:
    - TT
    - FF
    - SS

Here, we've defined values for the different parameters, and we'd like to simulate every single combination of these. So, for example, temp=125, vdd=0.6, corner=SS would be one possible substitution into the simulation file.

2 Setting up the options

First, let's start by setting up the usage of this script using docopt.

Here's the beauty of docopt: the entire options parsing is defined in the string at the start of the file. If you just type simulate.py -h, you'll get the following output:

Usage:
                simulate.py [options] PARAMFILE SPICEDECK all
                simulate.py [options] PARAMFILE SPICEDECK (sim|log)...

                simulate.py -h

Arguments:
                PARAMFILE   YAML file with variables to iterate over
                SPICEDECK   Mako templated spice deck
                all         Run all steps in the flow
                sim         Run the simulations
                log         Just collate the available results into one file

Options:
                -h --help        show this message
                -v --verbose     show more information
                --rundir=PATH    set path for running simulations in [default: runs]
                --resultsfile=FILE  set filename for writing results to [default: index.txt]

The script is setup to take in the parameter file (with all the values defined), followed by the simulation spice template file, followed by the flow step we want to run (either all, or one or more of sim/log). You can also optionally set the run directory and the name of the resultsfile. Here's the output with a valid command line:

[1] virantha@virantha-macbook-243> python code_sim.py params.yml sim.sp all
{'--help': False,
 '--resultsfile': 'index.txt',
 '--rundir': 'runs',
 '--verbose': False,
 'PARAMFILE': 'params.yml',
 'SPICEDECK': 'sim.sp',
 'all': True,
 'log': 0,
 'sim': 0}

And all the arguments are parsed into a nice dictionary! One idiosyncracy is that the way we've defined sim/log as one or more optional keywords, we end up with a counter for each keyword. So, for example, if we specified sim as a flow step, then all: False, sim: 1 would appear in our args dictionary.

3 Validating options with schema

Now, let's do some rudimentary checking of the user supplied options. For example, it would be nice to make sure that the parameter and spice template file actually exist and are readable, so let's add those checks using schema to validate our dictionary:

We've made sure PARAMFILE is a file and we've opened and converted it into a file handle, since we'll be passing that into the yaml loader in the next step. Next, we've just checked that the spice simulation file exists; no need to open it since we'll just be passing that off to the Mako template engine in a later step.

Very simple, and you can read on the schema docs to add more complex checking of your options.

4 Using a configuration file to supply options

Now, let's add in configparser type functionality, except we'll use the yaml syntax for more flexibility.

Now, we can supply options from a configuration file (which takes priority) in YAML format like so:

--verbose: True
SPICEDECK: sim2.sp

Let's call this file conf.yml, and now we'll get the following:

[1] virantha@virantha-macbook-243> python simulate.py params.yml sim.sp all --conf=conf.yml
{'--help': False,
 '--resultsfile': 'index.txt',
 '--rundir': 'runs',
 '--verbose': True,
 'PARAMFILE': 'params.yml',
 'SPICEDECK': 'sim2.sp',
 'all': True,
 'log': 0,
 'sim': 0}

Note that the configuration file values take precedence over the command line (you could see this example to make the command-line take precedence instead). We've also put in a check to catch any typos in the conf file by erroring out if an option not specified in the docopt is found.

5 Loading in the parameters

Next, let's add in some more option processing and load in the parameters for each simulation scenario from the PARAMFILE argument.

Notice that we've defined a custom loader for reading in the parameter yaml, so that we can keep everything in an OrderedDict that keeps the same order as present in the YAML file. While not strictly necessary, this makes the simulation order predictable for the user.

6 Generating and running the scenarios

Now, we'll introduce a generator function that will yield scenarios based on the parameter values. This funtion, _get_spice_run uses Python's built-in itertools product function to generate the cross-product of all the parameter values. Each combination of parameter values is then provided as a context to Mako to substitute into the template simulation file (spice deck). The run method just iterates over this generator function, and calls the run_sim method to run the simulator on each scenario.

7 Logging the results and complete script

Now, we just introduce some results parsing and outputting to a CSV file for the log flow step, which gives us our complete script. Notice that the results CSV output is completely data-driven and based on the context dictionary provided for each scenario.

Category: Tech

Tags: python, docopt, schema, yaml