Architecture Overview¶
Codefang follows the Unix philosophy: small, focused tools joined by pipes. The project ships two binaries that can be used independently or composed together in pipelines, CI systems, and AI agent workflows.
Two Binaries¶
| Binary | Purpose | Entry point |
|---|---|---|
uast | Parse source code into Universal Abstract Syntax Trees | cmd/uast/ |
codefang | Run static and history analyzers on code and repositories | cmd/codefang/ |
Both CLIs are built with Cobra and share the same pkg/ libraries.
# Parse a Go file into UAST JSON
uast parse main.go
# Run static complexity analysis
codefang run -a static/complexity .
# Run git history analysis
codefang run -a history/burndown,history/devs .
# Compose: parse, then analyze
uast parse main.go | codefang run -a static/* --format json
Package Structure¶
CLI Layer¶
| Package | Description |
|---|---|
cmd/codefang/ | CLI entry point for the analyzer. Cobra commands for run, mcp, etc. |
cmd/uast/ | CLI entry point for the UAST parser. Commands: parse, query, diff, explore, server. |
Core Libraries¶
| Package | Description |
|---|---|
pkg/uast/ | UAST parser engine. Tree-sitter integration, DSL engine, language mappings, pre-compiled matchers. |
pkg/analyzers/ | All analysis logic -- static analyzers and history analyzers. |
pkg/framework/ | Pipeline orchestration: runner, coordinator, streaming, blob/diff/UAST pipelines, profiling, watchdog. |
pkg/gitlib/ | Git operations via libgit2 (git2go): repository, commit, tree, changes, worker pool, batch processing. |
pkg/config/ | Configuration system: types with mapstructure tags, Viper-based loader, compiled defaults, validation. |
pkg/mcp/ | Model Context Protocol server: tools for codefang_analyze, uast_parse, codefang_history. |
pkg/observability/ | OpenTelemetry integration: tracing, RED metrics, structured logging, HTTP middleware, attribute filter. |
pkg/streaming/ | Streaming pipeline planner: chunk sizing, memory budgets, double-buffered pipelining. |
pkg/cache/ | Generic LRU cache used by blob and diff caches. |
pkg/checkpoint/ | Checkpoint manager for crash recovery across streaming chunks. |
pkg/budget/ | Memory budget solver for auto-tuning pipeline parameters. |
Analyzers (pkg/analyzers/)¶
Shared Components (plumbing/)¶
The plumbing package provides the shared pipeline components that all history analyzers depend on. These run as "core" analyzers in the pipeline before any leaf analyzers consume their output.
| Component | File | Purpose |
|---|---|---|
TreeDiffAnalyzer | tree_diff.go | Computes per-commit tree diffs via libgit2 |
BlobCacheAnalyzer | blob_cache.go | Caches blob content for efficient re-reads |
FileDiffAnalyzer | file_diff.go | Computes file-level diffs from blobs |
IdentityDetector | identity.go | Maps commit authors to canonical identities |
LanguagesDetectionAnalyzer | languages.go | Detects file languages via enry |
TicksSinceStart | ticks.go | Assigns tick indices to commits for time-series |
LinesStatsCalculator | line_stats.go | Computes per-commit line addition/deletion stats |
UASTChangesAnalyzer | uast.go | Parses UAST for changed files |
Static Analyzers (analyze/)¶
| Analyzer | ID | Description |
|---|---|---|
| Complexity | static/complexity | Cyclomatic complexity per function |
| Comments | static/comments | Comment density and documentation coverage |
| Halstead | static/halstead | Halstead software science metrics |
| Cohesion | static/cohesion | Class/module cohesion metrics |
| Imports | static/imports | Import graph and dependency analysis |
The analyze/ package also contains the static analysis service, analyzer registry, factory, output formatting (JSON, text, compact, YAML, plot, binary, timeseries), and cross-format conversion logic.
History Analyzers¶
| Analyzer | ID | Description |
|---|---|---|
| Burndown | history/burndown | Code age and survival analysis |
| Couples | history/couples | File co-change coupling detection |
| Devs | history/devs | Developer contribution statistics |
| File History | history/file-history | Per-file change timeline |
| Sentiment | history/sentiment | Code comment sentiment over time |
| Shotness | history/shotness | Function-level change frequency |
| Typos | history/typos | Identifier typo detection in diffs |
| Imports | history/imports | Import evolution over time |
| Anomaly | history/anomaly | Temporal anomaly detection via Z-score |
| Quality | history/quality | UAST-based quality metrics over time |
Data Flow¶
The following diagram shows how data flows through the system during a combined static + history analysis run.
flowchart TD
subgraph Input
REPO[Git Repository]
SRC[Source Files]
end
subgraph gitlib["pkg/gitlib (libgit2)"]
OPEN[Open Repository]
COMMITS[Load Commits]
TREEDIFF[Tree Diff]
BLOBS[Blob Lookup]
end
subgraph uast_engine["pkg/uast (Tree-sitter)"]
DETECT[Language Detection<br/><em>enry</em>]
PARSE[DSL Parser]
NODES[UAST Nodes]
end
subgraph framework["pkg/framework"]
COORD[Coordinator<br/><em>worker pool</em>]
BLOB_PIPE[Blob Pipeline]
DIFF_PIPE[Diff Pipeline]
UAST_PIPE[UAST Pipeline]
RUNNER[Runner<br/><em>chunk processing</em>]
end
subgraph analyzers["pkg/analyzers"]
PLUMB[Plumbing<br/><em>core analyzers</em>]
STATIC[Static Analyzers<br/><em>complexity, cohesion,<br/>halstead, comments, imports</em>]
HISTORY[History Analyzers<br/><em>burndown, couples, devs,<br/>sentiment, anomaly, ...</em>]
end
subgraph output["Output"]
FORMAT[Output Formatter]
JSON_OUT[JSON]
TEXT_OUT[Text / Compact]
YAML_OUT[YAML]
PLOT_OUT[Plot]
TS_OUT[TimeSeries]
end
REPO --> OPEN --> COMMITS
COMMITS --> COORD
COORD --> BLOB_PIPE --> BLOBS
COORD --> DIFF_PIPE --> TREEDIFF
COORD --> UAST_PIPE --> PARSE
SRC --> DETECT --> PARSE --> NODES
BLOBS --> PLUMB
TREEDIFF --> PLUMB
NODES --> PLUMB
PLUMB --> HISTORY
SRC --> STATIC
RUNNER --> COORD
STATIC --> FORMAT
HISTORY --> FORMAT
FORMAT --> JSON_OUT
FORMAT --> TEXT_OUT
FORMAT --> YAML_OUT
FORMAT --> PLOT_OUT
FORMAT --> TS_OUT Two Analysis Modes¶
Codefang operates in two distinct modes that can run independently or be combined in a single codefang run invocation.
flowchart LR
subgraph static_mode["Static Analysis Mode"]
direction TB
S1[Read source files from disk]
S2[Parse to UAST via Tree-sitter]
S3[Run static analyzers<br/><em>complexity, cohesion,<br/>halstead, comments, imports</em>]
S4[Produce per-file metrics]
S1 --> S2 --> S3 --> S4
end
subgraph history_mode["History Analysis Mode"]
direction TB
H1[Open Git repository via libgit2]
H2[Walk commit history]
H3[Coordinator pipelines<br/><em>blob, diff, UAST workers</em>]
H4[Core plumbing analyzers<br/><em>tree_diff, blob_cache,<br/>identity, ticks, ...</em>]
H5[Leaf history analyzers<br/><em>burndown, couples, devs,<br/>sentiment, anomaly, ...</em>]
H6[Streaming chunks +<br/>hibernate/boot cycles]
H1 --> H2 --> H3 --> H4 --> H5
H2 --> H6
end
CMD["codefang run -a static/*,history/*"]
CMD --> static_mode
CMD --> history_mode
static_mode --> MERGE[Combined Output]
history_mode --> MERGE
MERGE --> OUT["JSON / Text / YAML / Plot / TimeSeries"] Static Mode¶
- Reads source files from the filesystem.
- Parses each file into a UAST using Tree-sitter with DSL-based mappings.
- Runs selected static analyzers (complexity, cohesion, halstead, comments, imports).
- Produces per-file and aggregate metrics.
Static analysis is fast, parallelized across files, and requires no Git history.
History Mode¶
- Opens the Git repository via libgit2 (supports both normal and bare repos).
- Loads the commit history (optionally filtered by
--limit,--since,--first-parent). - The Coordinator orchestrates a worker pool with three pipeline stages: blob loading, diff computation, and UAST parsing.
- Core plumbing analyzers (tree diff, blob cache, identity detection, tick assignment, line stats, language detection, UAST changes) process each commit first.
- Leaf history analyzers consume the plumbing output and accumulate their state using the generic aggregator framework or custom memory-efficient data structures.
- For large repositories, the streaming pipeline splits commits into memory-bounded chunks with hibernate/boot cycles and optional double-buffered pipelining. The
BaseHistoryAnalyzermanages state serialization transparently. - Checkpointing after each chunk enables crash recovery.
Combined Mode¶
When both static and history analyzers are selected, Codefang runs both phases sequentially, encodes each phase to an internal binary format, then decodes and merges the results into a single output document in the requested format.
Pipeline Architecture¶
The history analysis pipeline is built around a Runner that coordinates the full lifecycle:
Each ProcessChunk call:
- Feeds commits to the Coordinator worker pool.
- The Coordinator dispatches work across three parallel pipeline stages (blob, diff, UAST).
- Collected
CommitDatais fed sequentially to each analyzer'sConsumemethod. - Between chunks, hibernatable analyzers serialize their state to compact form and reboot.
The Runner supports two execution strategies:
- Single-pass: All commits in one chunk (small repos or unlimited memory).
- Streaming: Memory-bounded chunks with hibernate/boot cycles, planned by the
streaming.Planner. See Streaming Pipeline for details.
Configuration Layers¶
Configuration follows a clear priority chain:
The pkg/config/ package uses Viper for loading and merging. Environment variables use the CODEFANG_ prefix with underscore-separated nesting (e.g., CODEFANG_PIPELINE_WORKERS=8).
See the Configuration reference for the full config file structure and validation rules.
Observability¶
Every layer of the pipeline is instrumented with OpenTelemetry:
- Tracing: Hierarchical spans from
codefang.runthrough coordinator pipeline, chunk processing, individual analyzers, git operations, and UAST parsing. - Metrics: RED (Rate, Errors, Duration) metrics plus analysis-specific counters (commits, chunks, cache hit rates).
- Logging: Structured
sloglogging with automatic trace context injection.
See Observability for the full instrumentation guide.