Posted by virantha on Thu 23 June 2016

Elegant command line parsing in Python using docopt, schema, and yaml

Both for my work and personal use, I tend to develop many command-line Python scripts that are very data-driven and configurable. In Python, it's pretty trivial to use argparse to support complex argument and option structures, and then use the integration with configparser to allow one to write files that capture specific option/configuration values for posterity. Unfortunately, the syntax of argparse and configparser tend to be pretty verbose, and incremental changes after weeks or months long development gaps can be error prone. In addition, I find the ini syntax of configparser rather limiting.

In the past year, I've switched over completely to using docopt, where you write the example usage and options in a string inside your script, and it builds the command-line invocation parser for you automatically. This is an incredibly powerful and concise way to write a self-documenting option parser. Then, you combine that with a data validation library like schema to check the incoming options, and use yaml to allow for options to be supplied from a text file, and you're pretty much in option parsing nirvana.

In this post, I'll go through a script I wrote recently that implements a very common pattern for anyone in an engineering field:

  • Take a file that defines an engineering simulation
  • Additionally, take another input file that specifies a set of parameters that define different simulation scenarios
  • Generate all possible combinations of parameter values
  • For each combination, emit the simulation file with the specific parameters
  • Execute an external simulator on the simulation file
  • Parse the results and log the results in a CSV file

1   Running parametrized SPICE simulations

For the purposes of this example, I'm going to walk-through a simple script that I've developed to do parameter exploration of a digital circuit using SPICE (hspice). The input is a simple spice deck that's a Mako template, and the parameters are specified in a YAML file. The script does simulation and logging of results as two separate flow steps (so you could, for example, re-write the results file based on a previously run set of simulations).

Here's the spice file written as a Mako template. Notice that all the parameters are within the ${...} blocks. We're really not using any of the advanced features of the templating language (like conditionals or loops, or even arbitrary python), but it's available if you need to make more complex simulation files.

*
.temp ${temp}
.option brief=1
.lib 'LIBBFILE.l' ${corner}

vvdd vdd 0 ${vdd}

.subckt xinv in out
xmp out in vdd vdd pfet w=20 l=2
xmn out in 0 0 nfet w=10 l=2
.ends xinv


xinv_t qf qt xinv

.meas tran StaticCurrentVdd avg i(vdd) from=1n to=3n
.meas tran StaticSupplyPowerVdd PARAM='-StaticCurrentVdd*${vdd}'

.tran 0.005ns 3ns sweep monte=1000

So, we have the following list of parameters that need to be supplied to this simulation file:

  • temp - Circuit simulation temperature
  • corner - Device/transistor corner (e.g. typical, fast, slow, etc)
  • vdd - Simulation voltage

We'll define these in another file using the YAML syntax like so:

temp:
    - 85
    - 125
vdd:
    - 0.6
    - 0.8
    - 1.0
corner:
    - TT
    - FF
    - SS

Here, we've defined values for the different parameters, and we'd like to simulate every single combination of these. So, for example, temp=125, vdd=0.6, corner=SS would be one possible substitution into the simulation file.

2   Setting up the options

First, let's start by setting up the usage of this script using docopt.

"""Simulate

Usage:
        simulate.py [options] PARAMFILE SPICEDECK all
        simulate.py [options] PARAMFILE SPICEDECK (sim|log)...

        simulate.py -h

Arguments:
        PARAMFILE   YAML file with variables to iterate over
        SPICEDECK   Mako templated spice deck
        all         Run all steps in the flow
        sim         Run the simulations
        log         Just collate the available results into one file

Options:
        -h --help        show this message
        -v --verbose     show more information
        --rundir=PATH    set path for running simulations in [default: runs]
        --resultsfile=FILE  set filename for writing results to [default: index.txt]

"""
from docopt import docopt

class Sim(object):

    def get_options(self):
            args = docopt(__doc__)
            print(args)

if __name__ == "__main__":
    prog = Sim()
    prog.get_options()

Here's the beauty of docopt: the entire options parsing is defined in the string at the start of the file. If you just type simulate.py -h, you'll get the following output:

Usage:
                simulate.py [options] PARAMFILE SPICEDECK all
                simulate.py [options] PARAMFILE SPICEDECK (sim|log)...

                simulate.py -h

Arguments:
                PARAMFILE   YAML file with variables to iterate over
                SPICEDECK   Mako templated spice deck
                all         Run all steps in the flow
                sim         Run the simulations
                log         Just collate the available results into one file

Options:
                -h --help        show this message
                -v --verbose     show more information
                --rundir=PATH    set path for running simulations in [default: runs]
                --resultsfile=FILE  set filename for writing results to [default: index.txt]

The script is setup to take in the parameter file (with all the values defined), followed by the simulation spice template file, followed by the flow step we want to run (either all, or one or more of sim/log). You can also optionally set the run directory and the name of the resultsfile. Here's the output with a valid command line:

[1] virantha@virantha-macbook-243> python code_sim.py params.yml sim.sp all
{'--help': False,
 '--resultsfile': 'index.txt',
 '--rundir': 'runs',
 '--verbose': False,
 'PARAMFILE': 'params.yml',
 'SPICEDECK': 'sim.sp',
 'all': True,
 'log': 0,
 'sim': 0}

And all the arguments are parsed into a nice dictionary! One idiosyncracy is that the way we've defined sim/log as one or more optional keywords, we end up with a counter for each keyword. So, for example, if we specified sim as a flow step, then all: False, sim: 1 would appear in our args dictionary.

3   Validating options with schema

Now, let's do some rudimentary checking of the user supplied options. For example, it would be nice to make sure that the parameter and spice template file actually exist and are readable, so let's add those checks using schema to validate our dictionary:

"""Simulate

Usage:
        simulate.py [options] PARAMFILE SPICEDECK all
        simulate.py [options] PARAMFILE SPICEDECK (sim|log)...

        simulate.py -h

Arguments:
        PARAMFILE   YAML file with variables to iterate over
        SPICEDECK   Mako templated spice deck
        all         Run all steps in the flow
        sim         Run the simulations
        log         Just collate the available results into one file

Options:
        -h --help        show this message
        -v --verbose     show more information
        --rundir=PATH    set path for running simulations in [default: runs]
        --resultsfile=FILE  set filename for writing results to [default: index.txt]

"""
from docopt import docopt
import os
from schema import Schema, And, Optional, Or, Use, SchemaError

class Sim(object):

    def get_options(self):
        args = docopt(__doc__)
        print(args)
        schema = Schema({
            'PARAMFILE': Use(open, error='PARAMFILE should be readable'),
            'SPICEDECK': os.path.isfile,
            object: object
            })
        try:
            args = schema.validate(args)
        except SchemaError as e:
            exit(e)

if __name__ == "__main__":
    prog = Sim()
    prog.get_options()

We've made sure PARAMFILE is a file and we've opened and converted it into a file handle, since we'll be passing that into the yaml loader in the next step. Next, we've just checked that the spice simulation file exists; no need to open it since we'll just be passing that off to the Mako template engine in a later step.

Very simple, and you can read on the schema docs to add more complex checking of your options.

4   Using a configuration file to supply options

Now, let's add in configparser type functionality, except we'll use the yaml syntax for more flexibility.

"""Simulate

Usage:
        simulate.py [options] PARAMFILE SPICEDECK all
        simulate.py [options] PARAMFILE SPICEDECK (sim|log)...
        simulate.py --conf=FILE
        simulate.py -h

Arguments:
        PARAMFILE   YAML file with variables to iterate over
        SPICEDECK   Mako templated spice deck
        all         Run all steps in the flow
        sim         Run the simulations
        log         Just collate the available results into one file

Options:
        -h --help        show this message
        -v --verbose     show more information
        --rundir=PATH    set path for running simulations in [default: runs]
        --resultsfile=FILE  set filename for writing results to [default: index.txt]
        --conf=FILE      load options from file

"""
from docopt import docopt
import os, sys
from schema import Schema, And, Optional, Or, Use, SchemaError
import yaml

class Sim(object):


    def merge_args(self, conf_args, orig_args):
        """ Return new dict with args, and then conf_args merged in.
            Make sure that any keys in conf_args are also present in args
        """
        args = {}
        for k in conf_args.keys():
            if k not in orig_args:
                print("ERROR: Configuration file has unknown option %s" % k)
                sys.exit(-1)

        args.update(orig_args)
        args.update(conf_args)
        return args


    def get_options(self):
        args = docopt(__doc__)

        if args['--conf']:
            with open(args['--conf']) as f:
                conf_args = yaml.load(f)
        else:
            conf_args = {}

        args = self.merge_args(conf_args, args)
        print (args)
        schema = Schema({
            'PARAMFILE': Use(open, error='PARAMFILE should be readable'),
            'SPICEDECK': os.path.isfile,
            object: object
            })
        try:
            args = schema.validate(args)
        except SchemaError as e:
            exit(e)


if __name__ == "__main__":
    prog = Sim()
    prog.get_options()

Now, we can supply options from a configuration file (which takes priority) in YAML format like so:

--verbose: True
SPICEDECK: sim2.sp

Let's call this file conf.yml, and now we'll get the following:

[1] virantha@virantha-macbook-243> python simulate.py params.yml sim.sp all --conf=conf.yml
{'--help': False,
 '--resultsfile': 'index.txt',
 '--rundir': 'runs',
 '--verbose': True,
 'PARAMFILE': 'params.yml',
 'SPICEDECK': 'sim2.sp',
 'all': True,
 'log': 0,
 'sim': 0}

Note that the configuration file values take precedence over the command line (you could see this example to make the command-line take precedence instead). We've also put in a check to catch any typos in the conf file by erroring out if an option not specified in the docopt is found.

5   Loading in the parameters

Next, let's add in some more option processing and load in the parameters for each simulation scenario from the PARAMFILE argument.

"""Simulate

Usage:
        simulate.py [options] PARAMFILE SPICEDECK all
        simulate.py [options] PARAMFILE SPICEDECK (sim|log)...
        simulate.py --conf=FILE
        simulate.py -h

Arguments:
        PARAMFILE   YAML file with variables to iterate over
        SPICEDECK   Mako templated spice deck
        all         Run all steps in the flow
        sim         Run the simulations
        log         Just collate the available results into one file

Options:
        -h --help        show this message
        -v --verbose     show more information
        --rundir=PATH    set path for running simulations in [default: runs]
        --resultsfile=FILE  set filename for writing results to [default: index.txt]
        --conf=FILE      load options from file

"""
from docopt import docopt
import os, sys
from schema import Schema, And, Optional, Or, Use, SchemaError
import yaml
from collections import OrderedDict

def ordered_load(stream, Loader=yaml.Loader, object_pairs_hook=OrderedDict):
    """ Helper function to allow yaml load routine to use an OrderedDict instead of regular dict.
        This helps keeps things sane when ordering the runs and printing out routines
    """
    class OrderedLoader(Loader):
        pass
    def construct_mapping(loader, node):
        loader.flatten_mapping(node)
        return object_pairs_hook(loader.construct_pairs(node))
    OrderedLoader.add_constructor(
        yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,
        construct_mapping)
    return yaml.load(stream, OrderedLoader)

class Sim(object):


    def merge_args(self, conf_args, orig_args):
        """ Return new dict with args, and then conf_args merged in.
            Make sure that any keys in conf_args are also present in args
        """
        args = {}
        for k in conf_args.keys():
            if k not in orig_args:
                print("ERROR: Configuration file has unknown option %s" % k)
                sys.exit(-1)

        args.update(orig_args)
        args.update(conf_args)
        return args


    def get_options(self):
        args = docopt(__doc__)

        if args['--conf']:
            with open(args['--conf']) as f:
                conf_args = yaml.load(f)
        else:
            conf_args = {}

        args = self.merge_args(conf_args, args)
        print (args)
        schema = Schema({
            'PARAMFILE': Use(open, error='PARAMFILE should be readable'),
            'SPICEDECK': os.path.isfile,
            object: object
            })
        try:
            args = schema.validate(args)
        except SchemaError as e:
            exit(e)

        self.flow = ['sim', 'log']
        if args['all'] == 0:
            if args['sim'] == 0: self.flow.remove('sim')
            if args['log'] == 0: self.flow.remove('log')

        self.parameters = ordered_load(args['PARAMFILE'])
        self.run_dir = args['--rundir']
        self.results_file = args['--resultsfile']
        self.spice_filename = args['SPICEDECK']
        print self.parameters


if __name__ == "__main__":
    prog = Sim()
    prog.get_options()

Notice that we've defined a custom loader for reading in the parameter yaml, so that we can keep everything in an OrderedDict that keeps the same order as present in the YAML file. While not strictly necessary, this makes the simulation order predictable for the user.

6   Generating and running the scenarios

Now, we'll introduce a generator function that will yield scenarios based on the parameter values. This funtion, _get_spice_run uses Python's built-in itertools product function to generate the cross-product of all the parameter values. Each combination of parameter values is then provided as a context to Mako to substitute into the template simulation file (spice deck). The run method just iterates over this generator function, and calls the run_sim method to run the simulator on each scenario.

"""Simulate

Usage:
        simulate.py [options] PARAMFILE SPICEDECK all
        simulate.py [options] PARAMFILE SPICEDECK (sim|log)...
        simulate.py --conf=FILE
        simulate.py -h

Arguments:
        PARAMFILE   YAML file with variables to iterate over
        SPICEDECK   Mako templated spice deck
        all         Run all steps in the flow
        sim         Run the simulations
        log         Just collate the available results into one file

Options:
        -h --help        show this message
        -v --verbose     show more information
        --rundir=PATH    set path for running simulations in [default: runs]
        --resultsfile=FILE  set filename for writing results to [default: index.txt]
        --conf=FILE      load options from file

"""
from docopt import docopt
import os, sys, subprocess
from schema import Schema, And, Optional, Or, Use, SchemaError
import yaml
from collections import OrderedDict
from mako.template import Template
from mako.lookup import TemplateLookup
from itertools import product

def ordered_load(stream, Loader=yaml.Loader, object_pairs_hook=OrderedDict):
    """ Helper function to allow yaml load routine to use an OrderedDict instead of regular dict.
        This helps keeps things sane when ordering the runs and printing out routines
    """
    class OrderedLoader(Loader):
        pass
    def construct_mapping(loader, node):
        loader.flatten_mapping(node)
        return object_pairs_hook(loader.construct_pairs(node))
    OrderedLoader.add_constructor(
        yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,
        construct_mapping)
    return yaml.load(stream, OrderedLoader)

class Sim(object):


    def merge_args(self, conf_args, orig_args):
        """ Return new dict with args, and then conf_args merged in.
            Make sure that any keys in conf_args are also present in args
        """
        args = {}
        for k in conf_args.keys():
            if k not in orig_args:
                print("ERROR: Configuration file has unknown option %s" % k)
                sys.exit(-1)

        args.update(orig_args)
        args.update(conf_args)
        return args

    def _get_spice_run(self, filename):
        mylookup = TemplateLookup(directories=['.'])
        my_template = Template(filename=filename, lookup=mylookup)
        k = self.parameters.keys()

        # For each cross-product of all the parameter values
        for params in product(*self.parameters.values()):

            # Generate a dict of param name to value for the context of the template
            context = OrderedDict([(k,v) for k,v in zip(self.parameters.keys(), params)])
            yield (context, my_template.render(**context))

    def get_options(self):
        args = docopt(__doc__)

        if args['--conf']:
            with open(args['--conf']) as f:
                conf_args = yaml.load(f)
        else:
            conf_args = {}

        args = self.merge_args(conf_args, args)
        print (args)
        schema = Schema({
            'PARAMFILE': Use(open, error='PARAMFILE should be readable'),
            'SPICEDECK': os.path.isfile,
            object: object
            })
        try:
            args = schema.validate(args)
        except SchemaError as e:
            exit(e)

        self.flow = ['sim', 'log']
        if args['all'] == 0:
            if args['sim'] == 0: self.flow.remove('sim')
            if args['log'] == 0: self.flow.remove('log')

        self.parameters = ordered_load(args['PARAMFILE'])
        self.run_dir = args['--rundir']
        self.results_file = args['--resultsfile']
        self.spice_filename = args['SPICEDECK']

    def _make_dirs(self, d):
        if not os.path.exists(d):
            os.makedirs(d)

    def run_sim(self, index, parent_dir, spice_deck):
        cwd = os.getcwd()
        run_dir = os.path.join(parent_dir, str(index))
        self._make_dirs(run_dir)
        os.chdir(run_dir)
        with open('run.sp', 'w') as f:
            f.write(spice_deck)
        try:
            output = subprocess.check_output(['hspice', 'run.sp'])
        except subprocess.CalledProcessError as e:
            print("WARNING: %s" % e)
            output = e.output
        os.chdir(cwd)
        return output

    def run(self):
        run_dir = self.run_dir
        self._make_dirs(run_dir)
        for i, (context,spice_deck) in enumerate(self._get_spice_run(self.spice_filename)):
            print(context.values())
            if 'sim' in self.flow:
                output = self.run_sim(i, run_dir, spice_deck)

if __name__ == "__main__":
    prog = Sim()
    prog.get_options()
    prog.run()

7   Logging the results and complete script

Now, we just introduce some results parsing and outputting to a CSV file for the log flow step, which gives us our complete script. Notice that the results CSV output is completely data-driven and based on the context dictionary provided for each scenario.

"""Simulate

Usage:
        simulate.py [options] PARAMFILE SPICEDECK all
        simulate.py [options] PARAMFILE SPICEDECK (sim|log)...
        simulate.py --conf=FILE
        simulate.py -h

Arguments:
        PARAMFILE   YAML file with variables to iterate over
        SPICEDECK   Mako templated spice deck
        all         Run all steps in the flow
        sim         Run the simulations
        log         Just collate the available results into one file

Options:
        -h --help        show this message
        -v --verbose     show more information
        --rundir=PATH    set path for running simulations in [default: runs]
        --resultsfile=FILE  set filename for writing results to [default: index.txt]
        --conf=FILE      load options from file

"""
from docopt import docopt
import os, sys, subprocess
from schema import Schema, And, Optional, Or, Use, SchemaError
import yaml
from collections import OrderedDict
from mako.template import Template
from mako.lookup import TemplateLookup
from itertools import product

def ordered_load(stream, Loader=yaml.Loader, object_pairs_hook=OrderedDict):
    """ Helper function to allow yaml load routine to use an OrderedDict instead of regular dict.
        This helps keeps things sane when ordering the runs and printing out routines
    """
    class OrderedLoader(Loader):
        pass
    def construct_mapping(loader, node):
        loader.flatten_mapping(node)
        return object_pairs_hook(loader.construct_pairs(node))
    OrderedLoader.add_constructor(
        yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,
        construct_mapping)
    return yaml.load(stream, OrderedLoader)

class Sim(object):


    def merge_args(self, conf_args, orig_args):
        """ Return new dict with args, and then conf_args merged in.
            Make sure that any keys in conf_args are also present in args
        """
        args = {}
        for k in conf_args.keys():
            if k not in orig_args:
                print("ERROR: Configuration file has unknown option %s" % k)
                sys.exit(-1)

        args.update(orig_args)
        args.update(conf_args)
        return args

    def _get_spice_run(self, filename):
        mylookup = TemplateLookup(directories=['.'])
        my_template = Template(filename=filename, lookup=mylookup)
        k = self.parameters.keys()

        # For each cross-product of all the parameter values
        for params in product(*self.parameters.values()):

            # Generate a dict of param name to value for the context of the template
            context = OrderedDict([(k,v) for k,v in zip(self.parameters.keys(), params)])
            yield (context, my_template.render(**context))

    def get_options(self):
        args = docopt(__doc__)

        if args['--conf']:
            with open(args['--conf']) as f:
                conf_args = yaml.load(f)
        else:
            conf_args = {}

        args = self.merge_args(conf_args, args)
        print (args)
        schema = Schema({
            'PARAMFILE': Use(open, error='PARAMFILE should be readable'),
            'SPICEDECK': os.path.isfile,
            object: object
            })
        try:
            args = schema.validate(args)
        except SchemaError as e:
            exit(e)

        self.flow = ['sim', 'log']
        if args['all'] == 0:
            if args['sim'] == 0: self.flow.remove('sim')
            if args['log'] == 0: self.flow.remove('log')

        self.parameters = ordered_load(args['PARAMFILE'])
        self.run_dir = args['--rundir']
        self.results_file = args['--resultsfile']
        self.spice_filename = args['SPICEDECK']

    def _make_dirs(self, d):
        if not os.path.exists(d):
            os.makedirs(d)

    def run_sim(self, index, parent_dir, spice_deck):
        cwd = os.getcwd()
        run_dir = os.path.join(parent_dir, str(index))
        self._make_dirs(run_dir)
        os.chdir(run_dir)
        with open('run.sp', 'w') as f:
            f.write(spice_deck)
        try:
            output = subprocess.check_output(['hspice', 'run.sp'])
        except subprocess.CalledProcessError as e:
            print("WARNING: %s" % e)
            output = e.output
        os.chdir(cwd)
        return output

    def get_log(self, index, parent_dir):
        run_dir = os.path.join(parent_dir, str(index))
        output_file = os.path.join(run_dir, 'run.mpp0')
        with open(output_file) as f:
            results_text = f.readlines()
        return results_text

    def get_results(self, results_text):
        res = OrderedDict()
        for line in results_text:
            if line.startswith('staticsupplypowervdd'):
                splits = line.split()
                res['mean'] = splits[1]
                res['median'] = splits[2]
                res['stdev'] = splits[3]
                return res
    def run(self):
        run_dir = self.run_dir
        self._make_dirs(run_dir)
        f = open(os.path.join(run_dir, self.results_file), 'w')
        print self.parameters.keys()
        print >> f, ','.join(['Run', 'Mean', 'Median', 'Stdev']+self.parameters.keys())
        for i, (context,spice_deck) in enumerate(self._get_spice_run(self.spice_filename)):
            print(context.values())

            if 'sim' in self.flow:
                output = self.run_sim(i, run_dir, spice_deck)

            if 'log' in self.flow:
                output = self.get_log(i, run_dir)
                results = self.get_results(output)
                print >> f, ','.join([str(i)]+results.values()+[str(x) for x in context.values()])
                f.flush()
                print results
        f.close()

if __name__ == "__main__":
    prog = Sim()
    prog.get_options()
    prog.run()

© Virantha Ekanayake. Built using Pelican. Modified svbhack theme, based on theme by Carey Metcalfe