Developer Guide¶
- Created
2022-04-19
- Last Updated
2022-04-28
This document aims to be a guide for those contributing to the repository.
Local Development¶
Prerequisites:
git for version control
Python (3.9 or higher) for development
poetry for Python dependency management
Getting the source code¶
$ git clone git@github.com:sourcery-ai/sourcery-analytics.git
Install Dependencies¶
From the top-level sourcery-analytics
directory:
$ poetry install
Note
you should already have installed poetry as indicated in the prerequisites.
Run Tests¶
From the top-level sourcery-analytics
directory:
$ poetry run pytest
Run tests with a coverage report, including missing lines:
$ poetry run pytest --cov=sourcery_analytics --cov-report term-missing
Build this documentation¶
$ poetry run sphinx-apidoc -eMTf --templatedir ./docs/source/_templates/apidoc -o docs/source/api sourcery_analytics
$ poetry run sphinx-build -b html docs/source docs/build
Releases¶
Create a new section in CHANGELOG.md
Add a new release in GitHub (see release documentation), creating a new tag
CI in
on_release.yml
will apply the tag as the build version and publish to PyPi
Architecture¶
Parsing¶
In order to analzye code, we need to parse it into a structure we can manipulate. Code is typically parsed into an Abstract Syntax Tree, or AST 1. Python provides a standard implementation 2 to parse code into an AST, but it misses features we need for analysis, most notably a link from a child node to its parent.
As a result, we’ve opted to use astroid as the principal parser. As well as providing an enhanced AST, astroid provides several convenient parsing functions which make testing and developing interfaces much easier than the built-in Python parser.
Visitors¶
The Visitor pattern 3 is a well-known pattern for analyzing trees of any type. At a high-level, it separates the calculation over elements (also known as nodes) within the tree from the calculation handling the traversal of the tree.
For code analysis, we typically need to calculate some property of a node, such as its “complexity”, with respect to the context of the node, for instance whether or not it is in a conditional.
Underlying the high-level analysis in sourcery-analytics
is a set of generic visitors which operate
on astroid’s ASTs (see visitors
). Visitors implement two methods: _enter
, which handles the context,
and _touch
which returns a “fact” about the node, based on the context. Keeping these separate helps us be very
clear about how the calculation works. For an example, see cognitive_complexity
, in which the visitor
increments its context penalty for nested structures, and returns the complexity of individual nodes.
What about walking the tree? Well, the list of sub-nodes of a node is a “fact” about that node, so we can implement
the walker as a visitor! This is the job of the TreeVisitor
which is used throughout the codebase.
Let’s dig a bit further into how the TreeVisitor
works, as it’s important for development.
The Tree Visitor¶
Let’s set up a short example.
>>> import astroid
>>> from sourcery_analytics.visitors import TreeVisitor
>>> src = '''
... def add(x, y):
... z = x + y
... return z
... '''
>>> node = astroid.extract_node(src)
>>> tree_visitor = TreeVisitor()
By default, the TreeVisitor
will return every sub-node of the node as an iterator.
>>> tree_visitor.visit(node)
<generator object TreeVisitor._visit at 0x...>
>>> list(tree_visitor.visit(node))
[<FunctionDef.add l.2 at 0x...>, <Arguments l.2 at 0x...>, <AssignName.x l.2 at 0x...>, <AssignName.y l.2 at 0x...>, <Assign l.3 at 0x...>, <AssignName.z l.3 at 0x...>, <BinOp l.3 at 0x...>, <Name.x l.3 at 0x...>, <Name.y l.3 at 0x...>, <Return l.4 at 0x...>, <Name.z l.4 at 0x...>]
Instead of returning the nodes, we can use a sub-visitor to return alternative information.
One useful generic visitor is the FunctionVisitor
which wraps a function for use as a visitor.
Let’s return just the name of each node in the tree:
>>> from sourcery_analytics.visitors import FunctionVisitor
>>> name_visitor = FunctionVisitor(lambda node: node.__class__.__name__)
>>> tree_visitor = TreeVisitor(name_visitor)
>>> list(tree_visitor.visit(node))
['FunctionDef', 'Arguments', 'AssignName', 'AssignName', 'Assign', 'AssignName', 'BinOp', 'Name', 'Name', 'Return', 'Name']
How about counting the nodes in the tree? The philosophy in sourcery-analytics is to break this down:
Question: what is number of nodes in one node? Answer: 1
Question: how do we aggregate in that case? Answer:
sum()
.
>>> tree_visitor = TreeVisitor(FunctionVisitor(lambda node: 1), sum)
>>> tree_visitor.visit(node)
11
Of course, there are other ways to calculate this, but the flexibility of the tree visitor means it is useful
throughout sourcery-analytics
. See the source for extractors
, analysis
, or
metrics.cognitive_complexity
for some examples.