AST-Based Code Formatting Tools | Web Formatter Blog

Introduction to AST-Based Formatting

Code formatting is a critical aspect of software development that ensures readability, consistency, and maintainability. While traditional formatters often rely on regular expressions and character manipulation, AST-based formatters take a more sophisticated approach by working with the code's underlying structure. This guide explores how Abstract Syntax Tree (AST) based formatting tools function and why they offer superior results for modern development workflows.

What is an Abstract Syntax Tree?

An Abstract Syntax Tree (AST) is a tree representation of the abstract syntactic structure of source code. Each node of the tree denotes a construct in the source code:

Root nodes represent entire programs or modules
Branch nodes represent control structures, functions, classes, etc.
Leaf nodes represent variables, literals, operators, etc.

Unlike the raw text of source code, an AST captures the semantic meaning and hierarchical relationship between different code elements. This structured representation allows tools to analyze and transform code with a deep understanding of language semantics rather than just manipulating text.

// This JavaScript code:
function add(a, b) {
  return a + b;
}

// Becomes an AST structure (simplified):
{
  "type": "Program",
  "body": [{
    "type": "FunctionDeclaration",
    "id": {
      "type": "Identifier",
      "name": "add"
    },
    "params": [
      {
        "type": "Identifier",
        "name": "a"
      },
      {
        "type": "Identifier",
        "name": "b"
      }
    ],
    "body": {
      "type": "BlockStatement",
      "body": [{
        "type": "ReturnStatement",
        "argument": {
          "type": "BinaryExpression",
          "operator": "+",
          "left": {
            "type": "Identifier",
            "name": "a"
          },
          "right": {
            "type": "Identifier",
            "name": "b"
          }
        }
      }]
    }
  }]
}

AST vs. Regex-Based Formatting

To understand the advantages of AST-based formatters, let's compare them with traditional regex-based approaches:

Feature	Regex-Based Formatters	AST-Based Formatters
Code Understanding	Treats code as text patterns	Understands code semantics and structure
Language Support	Often language-specific with limited support	Can support multiple languages with appropriate parsers
Accuracy	Can cause syntax errors with complex constructs	Preserves code semantics and avoids syntax errors
Transformation Capabilities	Limited to text replacement operations	Can perform complex code transformations
Edge Cases	Struggles with comments, strings, and complex syntax	Properly handles code context and special cases
Performance	Generally faster for simple operations	May be slower but more reliable for complex codebase

Benefits of AST-Based Formatters

AST-based formatting tools offer several significant advantages:

Semantic Awareness: Understand code's meaning, not just its text representation
Consistency: Apply formatting rules with greater consistency across complex codebases
Safety: Preserve code functionality while changing its formatting
Extensibility: Provide a foundation for additional code analysis and transformation
Language Agnosticism: With appropriate parsers, the same concepts apply across languages
Integration: Work well with other tools in modern development ecosystems

Popular AST-Based Formatting Tools

Several widely-used tools leverage AST for code formatting and transformation:

Prettier

Prettier is an opinionated code formatter that supports multiple languages. It parses code into an AST, then regenerates the code with consistent formatting.

// Install Prettier
npm install --save-dev prettier

// Format a file
npx prettier --write source.js

// Configuration in .prettierrc.json
{
  "printWidth": 100,
  "tabWidth": 2,
  "singleQuote": true,
  "trailingComma": "es5",
  "bracketSpacing": true,
  "semi": true
}

ESLint

While primarily a linting tool, ESLint uses AST to analyze JavaScript code and can automatically fix many formatting issues. Its pluggable architecture allows extension with custom rules.

// Install ESLint
npm install --save-dev eslint

// Initialize ESLint configuration
npx eslint --init

// Run ESLint with auto-fix
npx eslint --fix .

// Example rule in .eslintrc.json
{
  "rules": {
    "indent": ["error", 2],
    "quotes": ["error", "single"],
    "semi": ["error", "always"]
  }
}

Babel

Babel is a JavaScript compiler that uses AST for code transformation. Beyond its primary transpilation role, it provides tools for AST manipulation that can be used for formatting.

// Simple Babel plugin for code transformation
module.exports = function(babel) {
  const { types: t } = babel;
  
  return {
    visitor: {
      // Transform var declarations to let
      VariableDeclaration(path) {
        if (path.node.kind === 'var') {
          path.node.kind = 'let';
        }
      }
    }
  };
};

TypeScript Compiler

The TypeScript compiler uses AST for type checking and transpilation to JavaScript. Its API can be used to build formatting tools with type awareness.

// Example using TypeScript's compiler API to parse TypeScript code
import * as ts from 'typescript';

const sourceFile = ts.createSourceFile(
  'example.ts',
  'function greet(name: string) { return "Hello, " + name; }',
  ts.ScriptTarget.Latest
);

// Visit and process nodes in the AST
ts.forEachChild(sourceFile, node => {
  if (ts.isFunctionDeclaration(node) && node.name) {
    console.log(`Found function: \${node.name.text}`);
  }
});

Creating a Custom AST Parser

Understanding how to create a basic AST parser provides insight into how formatting tools work internally. The process typically involves three main steps:

Steps in AST Parsing

Creating a custom AST parser involves several distinct steps:

Lexical Analysis (Tokenization): Break the source code into tokens (keywords, identifiers, operators, etc.)
Syntactic Analysis (Parsing): Analyze tokens according to grammar rules to create the syntax tree
Semantic Analysis: Verify the parsed tree follows language-specific rules

// Simplified example of a basic parser in JavaScript
function tokenize(code) {
  // Convert code string into tokens
  const tokens = [];
  // ...tokenization logic
  return tokens;
}

function parse(tokens) {
  // Convert tokens into an AST
  const ast = { type: 'Program', body: [] };
  // ...parsing logic
  return ast;
}

function compile(code) {
  const tokens = tokenize(code);
  const ast = parse(tokens);
  return ast;
}

AST Transformation

Once the AST is created, we can traverse and transform it to apply formatting rules:

function transform(ast) {
  // Visitor pattern for traversing and modifying the AST
  function visit(node) {
    // Apply transformations based on node type
    switch (node.type) {
      case 'FunctionDeclaration':
        // Format function declarations
        node.params = formatParameters(node.params);
        break;
      // Handle other node types...
    }
    
    // Recursively visit child nodes
    Object.keys(node).forEach(key => {
      const child = node[key];
      if (child && typeof child === 'object') {
        if (Array.isArray(child)) {
          child.forEach(item => {
            if (item && typeof item === 'object') {
              visit(item);
            }
          });
        } else {
          visit(child);
        }
      }
    });
    
    return node;
  }
  
  return visit(ast);
}

Code Generation

Finally, we convert the transformed AST back into formatted code:

function generate(ast) {
  // Convert AST back to code
  let code = '';
  
  function emit(node, indent = 0) {
    const indentation = ' '.repeat(indent);
    
    switch (node.type) {
      case 'Program':
        node.body.forEach(item => {
          emit(item, indent);
          code += '\\n';
        });
        break;
      case 'FunctionDeclaration':
        code += indentation + 'function ' + node.id.name + '(';
        code += node.params.map(p => p.name).join(', ');
        code += ') {\\n';
        emit(node.body, indent + 2);
        code += indentation + '}';
        break;
      // Handle other node types...
    }
  }
  
  emit(ast);
  return code;
}

// Complete formatting process
function format(sourceCode) {
  const ast = compile(sourceCode);
  const transformedAst = transform(ast);
  return generate(transformedAst);
}

Real-World Use Cases

AST-based tools extend beyond basic formatting to enable various code transformation scenarios:

Linting and Static Analysis

ASTs enable deep analysis of code to detect potential bugs, enforce style guides, and identify anti-patterns. Tools like ESLint use AST to understand code flow and relationships.

// ESLint rule to enforce proper error handling
module.exports = {
  create: function(context) {
    return {
      CatchClause: function(node) {
        if (node.body.body.length === 0) {
          context.report({
            node: node,
            message: "Empty catch block is not allowed"
          });
        }
      }
    };
  }
};

Automated Refactoring

AST-based tools can safely rename variables, extract functions, and perform other complex refactorings while preserving code semantics.

// Using jscodeshift for automated refactoring
module.exports = function(fileInfo, api) {
  const j = api.jscodeshift;
  const root = j(fileInfo.source);

  // Find all instances of jQuery's $.ajax() and convert to fetch
  return root
    .find(j.CallExpression, {
      callee: {
        type: 'MemberExpression',
        object: { type: 'Identifier', name: '$' },
        property: { type: 'Identifier', name: 'ajax' }
      }
    })
    .replaceWith(path => {
      const ajaxCall = path.node.arguments[0];
      // Transform $.ajax() to fetch()
      // ...transformation logic
      return j.callExpression(
        j.identifier('fetch'),
        [/* transformed arguments */]
      );
    })
    .toSource();
};

Code Minification

AST-based minifiers like Terser analyze code structure to perform optimizations that would be unsafe with simple text manipulation.

// Original code
function calculateTotal(items) {
  const TAX_RATE = 0.07;
  let result = 0;
  for (let i = 0; i < items.length; i++) {
    result += items[i].price;
  }
  return result * (1 + TAX_RATE);
}

// After AST-based minification
function calculateTotal(a){const b=.07;let c=0;for(let d=0;d



      Transpilation
      
        Transpilers like Babel convert modern JavaScript to backward-compatible versions by transforming the AST.
      
      // Modern JavaScript (ES2022)
const getUser = async (id) => {
  try {
    const response = await fetch(`/api/users/\${id}`);
    if (!response.ok) throw new Error('User not found');
    return await response.json();
  } catch (error) {
    console.error(error?.message ?? 'Unknown error');
    return null;
  }
};

// Transpiled to ES5
"use strict";
var getUser = function getUser(id) {
  return regeneratorRuntime.async(function getUser$(_context) {
    while (1) {
      switch (_context.prev = _context.next) {
        case 0:
          _context.prev = 0;
          _context.next = 3;
          return regeneratorRuntime.awrap(fetch("/api/users/".concat(id)));
        // ...rest of transpiled code
      }
    }
  });
};

      Challenges and Limitations
      
        Despite their advantages, AST-based formatters face several challenges:
      
      
        
          Performance Overhead: Parsing code into an AST and generating code again 
          is more computationally intensive than simple text manipulation
        
        
          Comment Preservation: Maintaining the position and association of comments 
          when transforming ASTs is notoriously difficult
        
        
          Whitespace Control: Precise control over whitespace can be challenging when 
          generating code from an AST
        
        
          Custom Syntax: Code with non-standard syntax (like JSX) requires specialized parsers
        
        
          Configuration Complexity: More sophisticated tools often have more complex configuration options
        
      

      Performance Considerations
      
        When working with AST-based formatters, consider these performance optimizations:
      
      
        
          Incremental Parsing: Only re-parse files that have changed
        
        
          Caching: Cache AST representations to avoid redundant parsing
        
        
          Parallelization: Process multiple files concurrently
        
        
          Selective Formatting: Format only the necessary parts of large files
        
        
          Memory Management: Large ASTs can consume significant memory; consider streaming approaches
        
      
      // Performance optimization with worker threads
const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');
const prettier = require('prettier');

if (isMainThread) {
  // Main thread distributes work to worker threads
  function formatInParallel(files) {
    return Promise.all(files.map(file => {
      return new Promise((resolve, reject) => {
        const worker = new Worker(__filename, { workerData: file });
        worker.on('message', resolve);
        worker.on('error', reject);
      });
    }));
  }
} else {
  // Worker thread formats a single file
  const filePath = workerData;
  prettier.resolveConfig(filePath)
    .then(options => prettier.format(fs.readFileSync(filePath, 'utf8'), options))
    .then(formattedCode => parentPort.postMessage({ file: filePath, code: formattedCode }));
}

      Best Practices
      
        To get the most from AST-based formatters in your development workflow:
      
      
        
          Standardize: Use the same formatter across your entire codebase
        
        
          Automate: Configure formatters to run automatically on save or pre-commit
        
        
          Version Control: Commit formatter configurations to ensure consistency across the team
        
        
          CI Integration: Verify formatting in continuous integration pipelines
        
        
          Editor Integration: Use editor plugins for immediate feedback
        
        
          Progressive Adoption: For large legacy codebases, format files as they're modified
        
      
      
        A common pattern is to combine tools for maximum effectiveness:
      
      // package.json
{
  "scripts": {
    "format": "prettier --write \"**/*.{js,jsx,ts,tsx,json,md}\"",
    "lint": "eslint \"**/*.{js,jsx,ts,tsx}\" --fix",
    "check-format": "prettier --check \"**/*.{js,jsx,ts,tsx,json,md}\"",
    "validate": "npm run check-format && npm run lint"
  },
  "husky": {
    "hooks": {
      "pre-commit": "lint-staged"
    }
  },
  "lint-staged": {
    "*.{js,jsx,ts,tsx}": [
      "prettier --write",
      "eslint --fix"
    ],
    "*.{json,md,html,css}": [
      "prettier --write"
    ]
  }
}

      Conclusion
      
        AST-based code formatting represents a significant advancement over traditional text-manipulation approaches. 
        By understanding and manipulating code at a structural level, these tools provide more intelligent, reliable, 
        and powerful formatting capabilities.
      
      
        Modern development workflows increasingly rely on AST-based tools not just for formatting, but for a wide 
        range of code transformation tasks. Understanding how these tools work "under the hood" helps developers 
        make better use of them and even extend them for project-specific needs.
      
      
        As programming languages and development practices evolve, AST-based tools will continue to play a crucial 
        role in maintaining code quality, consistency, and developer productivity.