fst
Version 0.3.2
Introduction
pfst (Python Formatted Syntax Tree) exists in order to allow quick and easy modification of Python source without
losing formatting or comments. The goal is simple, Pythonic, container-like access to the AST, with the ability to
modify any node while preserving formatting in the rest of the tree.
Yes, we said "formatting" and "AST" in the same sentence.
Normally AST nodes don't store any explicit formatting, much less, comments. But pfst works by adding FST nodes to
existing Python AST nodes as an .f attribute (type-safe accessor castf() provided). This keeps extra structure
information, the original source, and provides the interface to format-preserving operations. Each operation through
FST nodes is a simultaneous edit of the AST tree and the source code, and those are kept synchronized so that the
current source will always parse to the current tree.
pfst automatically handles:
- Operator precedence and parentheses
- Indentation and line continuations
- Commas, semicolons, and tuple edge cases
- Comments and docstrings
- Various Python version-specific syntax quirks
- Lots more...
If you just want to dive into the examples then go to fst.docs.d14_examples.
Index
Links
Getting Started
Since pfst is built directly on Python's standard AST nodes, if you are familiar with those then you already know
the FST node structure. Our focus on simple Pythonic operations means you can get up to speed quickly.
- Parse source
>>> import ast, fst # pip install pfst, import fst
>>> a = fst.parse('def func(): pass # comment')
- Modify via
.f
>>> f = a.body[0].f
>>> f.returns = ast.Name('int') # use nodes or text
>>> f.args.append('arg: int = 0')
>>> f.body.extend('call() # call comment\n\nreturn arg')
>>> f.put_docstr("I'm a happy\nlittle docstring")
>>> f.body[1:1] = '\n'
- View formatted source
>>> print(f.src)
def func(arg: int = 0) -> int:
"""I'm a happy
little docstring"""
pass # comment
call() # call comment
return arg
- Verify AST synchronization
>>> print(ast.unparse(a))
def func(arg: int=0) -> int:
"""I'm a happy
little docstring"""
pass
call()
return arg
Beyond basic editing, pfst provides syntax-ordered traversal, scope symbol analysis, structural pattern matching and
substitution, and a mechanism for reconciling external AST mutations with the formatted tree, preserving comments and
layout wherever the structure still permits it.
Here is an example of more advanced substitution usage.
>>> from fst.match import *
>>> print(FST('i = j.k = a + b[c]').sub(Mexpr(ctx=Load), 'log(__FST_)', True).src)
i = log(j).k = log(a) + log(log(b)[log(c)])
Comparison with LibCST
LibCST is a powerful, industrial-grade library built for precise, large-scale codemods. It models Python as a fully concrete syntax tree, preserving every token and piece of whitespace. That gives you complete control, but it also means you are responsible for managing formatting details when making non-trivial changes.
pfst takes a different approach. Instead of requiring you to explicitly manage formatting, it treats layout as something to preserve and reconcile automatically. You focus on structural and semantic transformations and pfst handles the "formatting math" needed to keep the result clean and stable.
In short:
- LibCST: “Do exactly what I say.”
- pfst: “Do what I mean.”
Here is a concrete example with minimal code for both LibCST and pfst to add a keyword argument with a comment to all
logger.info() calls which don't already have the specific keyword argument correlation_id.
LibCST function:
from libcst import *
from libcst.matchers import *
def inject_logging_metadata(src: str) -> str:
module = parse_module(src)
new_arg = (parse_module('f(correlation_id=CID # blah\n)')
.body[0].body[0].value.args[0])
class AddArg(CSTTransformer):
def leave_Call(self, _, node):
if matches(node.func, Attribute(Name('logger'), Name('info'))):
if not any(a.keyword and a.keyword.value == 'correlation_id'
for a in node.args):
return node.with_changes(args=[*node.args, new_arg])
return node
module = module.visit(AddArg())
return module.code
pfst function:
from fst import *
from fst.match import *
def inject_logging_metadata(src: str) -> str:
module = FST(src)
for m in module.search(MCall(
func=MAttribute('logger', 'info'),
keywords=MNOT([MQSTAR, Mkeyword('correlation_id'), MQSTAR]),
)):
m.matched.append('correlation_id=CID # blah', trivia=())
return module.src
Input source:
logger.info('Hello world...') # hey
logger.info('Already have id', correlation_id=other_cid) # ho
logger.info() # its off to work we go
class cls:
def method(self, thing, extra):
(logger) . info( # start
f'a {thing}', # this is fine
extra=extra, # also this
) # end
LibCST output:
logger.info('Hello world...', correlation_id=CID # blah
) # hey
logger.info('Already have id', correlation_id=other_cid) # ho
logger.info(correlation_id=CID # blah
) # its off to work we go
class cls:
def method(self, thing, extra):
(logger) . info( # start
f'a {thing}', # this is fine
extra=extra, # also this
correlation_id=CID # blah
) # end
pfst output:
logger.info('Hello world...', correlation_id=CID # blah
) # hey
logger.info('Already have id', correlation_id=other_cid) # ho
logger.info(correlation_id=CID # blah
) # its off to work we go
class cls:
def method(self, thing, extra):
(logger) . info( # start
f'a {thing}', # this is fine
extra=extra, # also this
correlation_id=CID # blah
) # end
If you want LibCST to align the argument its significantly more code, but that can be left for a formatter after the file processing.
Notes
Disclaimer: You can reformat a large codebase with pfst but it won't be quite as sprightly as other libraries more apt to the task. The main focus of pfst is not necessarily to be fast but rather easy and to handle all the weird cases of python syntax correctly so that functional code always results. Use a formatter as needed afterwards.
pfst was written and tested on Python versions 3.10 through 3.15a.
pfst does not do any parsing of its own but rather relies on the builtin Python parser. This means you get perfect parsing but also that it is limited to the syntax of the running Python version (many options exist for running any specific version of Python).
pfst validates for parsability, not compilability. This means that
*a, *b = canddef f(a, a): passare both considered valid even though they are uncompilable.If you will be playing with pfst then the
FST.dump()method will be your friend.