Configuration¶

Codefang uses a layered configuration system built on Viper. Settings can come from a YAML config file, environment variables, or CLI flags, and they merge with a well-defined priority order.

Configuration Sources¶

Search Order¶

When no explicit --config flag is provided, Codefang searches for a .codefang.yaml file in the following locations (first match wins):

Priority	Location
1	Current working directory (`./.codefang.yaml`)
2	User home directory (`$HOME/.codefang.yaml`)

Merge Priority¶

When the same setting is specified in multiple sources, the highest-priority source wins:

CLI flags  >  Environment variables  >  Config file  >  Compiled defaults

Override Example

A config file sets pipeline.workers: 2, but the command line passes --workers 8. The effective value is 8 because CLI flags have the highest priority.

Explicit Config Path¶

Use --config to point at a specific file:

codefang run -a 'history/*' --config /etc/codefang/production.yaml .

Environment Variables¶

All configuration keys can be set via environment variables using the CODEFANG_ prefix. Nested keys use _ as a separator.

Config Key	Environment Variable
`pipeline.workers`	`CODEFANG_PIPELINE_WORKERS`
`pipeline.memory_budget`	`CODEFANG_PIPELINE_MEMORY_BUDGET`
`history.burndown.granularity`	`CODEFANG_HISTORY_BURNDOWN_GRANULARITY`
`history.sentiment.gap`	`CODEFANG_HISTORY_SENTIMENT_GAP`
`checkpoint.enabled`	`CODEFANG_CHECKPOINT_ENABLED`

# Set workers via environment
export CODEFANG_PIPELINE_WORKERS=4
export CODEFANG_PIPELINE_MEMORY_BUDGET=2GiB
codefang run -a 'history/*' .

Full Configuration Reference¶

Below is the complete .codefang.yaml file with all fields set to their compiled defaults:

analyzers: []

pipeline:
  workers: 0              # 0 = auto (GOMAXPROCS)
  memory_budget: ""       # e.g. "4GiB"
  blob_cache_size: ""     # e.g. "1GB" (default when empty)
  diff_cache_size: 0      # 0 = default (10000)
  blob_arena_size: ""     # e.g. "4MB" (default when empty)
  commit_batch_size: 0    # 0 = default (100)
  gogc: 0                 # 0 = Go default (100)
  ballast_size: "0"       # "0" = disabled
  memory_limit: ""        # e.g. "8GiB"
  worker_timeout: ""      # e.g. "60s"
  # Advanced pipeline tuning.
  uast_spill_threshold: 32
  intra_commit_parallel_threshold: 4
  max_intra_commit_workers: 4
  max_uast_blob_size: 262144          # 256 KiB
  uast_parse_timeout: "10s"
  max_changes_per_commit: 10000
  max_diff_batch_size: 1000
  memory_budget_ratio: 50
  memory_budget_cap: "2GiB"
  memory_limit_ratio: 75
  # Extended pipeline tuning.
  uast_spill_trim_interval: 16
  native_trim_interval: 10
  max_streaming_buffering: 3
  drain_prefetch_timeout: "30s"
  sampler_interval: "2s"
  worker_ratio: 100
  uast_worker_ratio: 40
  leaf_worker_divisor: 3
  min_leaf_workers: 4
  buffer_size_multiplier: 2
  budget_limit_ratio: 95
  system_ram_limit_ratio: 90
  diff_job_buffer_multiplier: 10
  static_max_workers: 8
  malloc_trim_interval: 50
  static_memory_limit_ratio: 90

history:
  couples:
    coupling_threshold_high: 10
    ownership_few_threshold: 3
    ownership_moderate_threshold: 5
    batch_coupling_threshold: 100
    hll_precision: 10
    top_k_per_file: 100
    min_edge_weight: 2
  burndown:
    granularity: 30
    sampling: 30
    track_files: false
    track_people: false
    hibernation_threshold: 1000
    hibernation_to_disk: true
    hibernation_directory: ""
    debug: false
    goroutines: 0
  devs:
    consider_empty_commits: false
    anonymize: false
    bus_factor_threshold: 0.5
    risk_threshold_critical: 90.0
    risk_threshold_high: 80.0
    risk_threshold_medium: 60.0
    active_threshold_ratio: 0.7
    default_active_days: 90
    hll_precision: 14
  file_history:
    hotspot_threshold_critical: 50
    hotspot_threshold_high: 30
    hotspot_threshold_medium: 15
  imports:
    goroutines: 4
    max_file_size: 1048576
    max_dependency_risk_rows: 30
  sentiment:
    min_comment_length: 20
    gap: 0.5
    neutralizer_weight: 0.8
    max_weight_ratio: 3.0
    positive_threshold: 0.6
    negative_threshold: 0.4
    trend_threshold: 0.1
    low_sentiment_risk_thresh: 0.2
  clones:
    max_clone_pairs: 1000
    num_hashes: 128
    num_bands: 16
    num_rows: 8
    shingle_size: 5
    similarity_type2: 0.8
    similarity_type3: 0.5
    threshold_ratio_yellow: 0.1
    threshold_ratio_red: 0.3
    threshold_pairs_yellow: 5
    threshold_pairs_red: 20
  shotness:
    dsl_struct: 'filter(.roles has "Function")'
    dsl_name: ".props.name"
  typos:
    max_distance: 4
  anomaly:
    threshold: 2.0
    window_size: 20

checkpoint:
  enabled: true
  dir: ""
  resume: true
  clear_prev: false

Section Reference¶

`analyzers`¶

Field	Type	Default	Description
`analyzers`	`[]string`	`[]`	Default list of analyzer IDs to run when `-a` is not provided on the command line. Accepts the same IDs and glob patterns as the `-a` flag.

analyzers:
  - static/complexity
  - history/burndown
  - history/devs

`pipeline`¶

Resource and tuning knobs for the analysis pipeline.

Field	Type	Default	Description	Validation
`workers`	`int`	`0`	Number of parallel workers. `0` uses `GOMAXPROCS`.	Must be >= 0
`memory_budget`	`string`	`""`	Total memory budget for auto-tuning cache sizes (e.g. `"512MB"`, `"2GiB"`). Empty means no budget-based tuning.	Valid byte-size string or empty
`blob_cache_size`	`string`	`""`	Maximum size of the blob cache (e.g. `"256MB"`, `"1GB"`). Empty uses the built-in default of 1 GB.	Valid byte-size string or empty
`diff_cache_size`	`int`	`0`	Maximum number of entries in the diff cache. `0` uses the built-in default of 10000.	Must be >= 0
`blob_arena_size`	`string`	`""`	Memory arena allocation for blob loading (e.g. `"4MB"`). Empty uses the built-in default of 4 MB.	Valid byte-size string or empty
`commit_batch_size`	`int`	`0`	Number of commits processed per batch. `0` uses the built-in default of 100.	Must be >= 0
`gogc`	`int`	`0`	Go garbage collector target percentage. `0` uses the Go default of 100. Higher values reduce GC frequency at the cost of memory.	Must be >= 0
`ballast_size`	`string`	`"0"`	GC ballast allocation size. `"0"` disables ballast. Useful for reducing GC pauses in memory-rich environments.	Valid byte-size string
`memory_limit`	`string`	`""`	Hard memory limit passed to the Go runtime (`GOMEMLIMIT`). Empty means no limit.	Valid byte-size string or empty
`worker_timeout`	`string`	`""`	Maximum duration a single worker may run before being terminated (e.g. `"60s"`, `"5m"`). Empty means no timeout.	Valid Go duration string or empty
`uast_spill_threshold`	`int`	`32`	File changes per commit before UAST trees are spilled to disk to cap memory.	Must be >= 0
`intra_commit_parallel_threshold`	`int`	`4`	Minimum file changes in a commit before intra-commit parallel UAST parsing is used.	Must be >= 0
`max_intra_commit_workers`	`int`	`4`	Maximum goroutines for parallel UAST parsing within a single commit.	Must be >= 0
`max_uast_blob_size`	`int`	`262144`	Maximum blob size in bytes for UAST parsing (256 KiB). Larger files are skipped.	Must be >= 0
`uast_parse_timeout`	`string`	`"10s"`	Per-file timeout for UAST parsing. Prevents pathological tree-sitter behavior.	Valid Go duration string or empty
`max_changes_per_commit`	`int`	`10000`	Commits with more file changes than this are skipped entirely.	Must be >= 0
`max_diff_batch_size`	`int`	`1000`	Maximum number of diff requests batched together for efficiency.	Must be >= 0
`memory_budget_ratio`	`int`	`50`	Percentage of system RAM to use as the auto-detected memory budget.	Must be 0–100
`memory_budget_cap`	`string`	`"2GiB"`	Maximum auto-detected memory budget.	Valid byte-size string or empty
`memory_limit_ratio`	`int`	`75`	Percentage of system RAM to use as Go's soft memory limit.	Must be 0–100
`uast_spill_trim_interval`	`int`	`16`	How often to call `MallocTrim` during UAST spill-mode parsing (every N commits).	Must be >= 0
`native_trim_interval`	`int`	`10`	How often to call `malloc_trim` within a chunk (every N commits).	Must be >= 0
`max_streaming_buffering`	`int`	`3`	Maximum buffering factor for streaming (1=single, 2=double, 3=triple).	Must be >= 1
`drain_prefetch_timeout`	`string`	`"30s"`	Timeout for abandoned prefetch goroutines before they are leaked.	Valid Go duration string
`sampler_interval`	`string`	`"2s"`	Pipeline sampler polling interval for memory triage logging.	Valid Go duration string
`worker_ratio`	`int`	`100`	Percentage of CPU cores to use for pipeline workers.	Must be 0–100
`uast_worker_ratio`	`int`	`40`	Percentage of CPU cores to use for UAST pipeline workers.	Must be 0–100
`leaf_worker_divisor`	`int`	`3`	Leaf worker count = `NumCPU / divisor`.	Must be >= 1
`min_leaf_workers`	`int`	`4`	Minimum number of leaf workers when enabled.	Must be >= 1
`buffer_size_multiplier`	`int`	`2`	Buffer size = `workers * multiplier`.	Must be >= 1
`budget_limit_ratio`	`int`	`95`	Percentage of memory budget used as Go's soft memory limit.	Must be 0–100
`system_ram_limit_ratio`	`int`	`90`	Memory limit cap as percentage of system RAM.	Must be 0–100
`diff_job_buffer_multiplier`	`int`	`10`	Scales the diff job queue buffer relative to pipeline buffer size.	Must be >= 1
`static_max_workers`	`int`	`8`	Maximum concurrent workers for static analysis phase.	Must be >= 1
`malloc_trim_interval`	`int`	`50`	Files between `malloc_trim` calls in static analysis. `-1` disables.	Must be >= -1
`static_memory_limit_ratio`	`int`	`90`	Percentage of budget applied as Go's memory limit during static phase.	Must be 0–100

Memory Budget Auto-Tuning

When memory_budget is set, Codefang automatically calculates optimal values for blob_cache_size, diff_cache_size, and blob_arena_size. Explicit values for those fields take precedence over the auto-tuned ones.

`history.couples`¶

Controls the file coupling and ownership analyzer.

Field	Type	Default	Description	Validation
`coupling_threshold_high`	`int`	`10`	Minimum co-change count for a file pair to be classified as "high" coupling.	Must be >= 0
`ownership_few_threshold`	`int`	`3`	Maximum number of contributors for a file to be in the "few owners" bucket.	Must be >= 0
`ownership_moderate_threshold`	`int`	`5`	Maximum number of contributors for a file to be in the "moderate owners" bucket.	Must be >= 0
`batch_coupling_threshold`	`int`	`100`	Maximum number of file pairs per commit considered for coupling. Limits quadratic growth on large commits.	Must be >= 0
`hll_precision`	`int`	`10`	HyperLogLog precision for contributor count sketches. Higher = more accurate, more memory. Valid range: 4–18.	Must be 4–18 when set
`top_k_per_file`	`int`	`100`	Maximum coupling pairs per file in store output.	Must be >= 0
`min_edge_weight`	`int`	`2`	Minimum co-change count for a coupling edge to be included.	Must be >= 0

`history.burndown`¶

Controls the burndown (code ownership aging) analyzer.

Field	Type	Default	Description	Validation
`granularity`	`int`	`30`	Time granularity in days for burndown bands.	Must be > 0
`sampling`	`int`	`30`	Sampling interval in days for burndown snapshots.	Must be > 0
`track_files`	`bool`	`false`	Track per-file burndown data. Increases memory usage.	--
`track_people`	`bool`	`false`	Track per-person burndown data. Increases memory usage.	--
`hibernation_threshold`	`int`	`1000`	Number of file entries before hibernation activates.	--
`hibernation_to_disk`	`bool`	`true`	Spill hibernated state to disk instead of keeping in memory.	--
`hibernation_directory`	`string`	`""`	Directory for hibernated state files. Empty uses a temp directory.	--
`debug`	`bool`	`false`	Enable verbose debug output for the burndown analyzer.	--
`goroutines`	`int`	`0`	Parallel goroutines for burndown computation. `0` uses a sensible default.	--

`history.devs`¶

Controls the developer activity analyzer.

Field	Type	Default	Description	Validation
`consider_empty_commits`	`bool`	`false`	Include empty (no-diff) commits in developer statistics.	--
`anonymize`	`bool`	`false`	Replace developer names with anonymous identifiers in output.	--
`bus_factor_threshold`	`float64`	`0.5`	Cumulative ownership fraction at which the bus factor count stops.	Must be 0.0–1.0
`risk_threshold_critical`	`float64`	`90.0`	Ownership percentage above which a file is at critical risk (single-owner).	Must be 0–100
`risk_threshold_high`	`float64`	`80.0`	Ownership percentage above which a file is at high risk.	Must be 0–100
`risk_threshold_medium`	`float64`	`60.0`	Ownership percentage above which a file is at medium risk.	Must be 0–100
`active_threshold_ratio`	`float64`	`0.7`	Fraction of the most-active developer's commits required to consider a developer "active".	Must be 0.0–1.0
`default_active_days`	`int`	`90`	Lookback window in days for determining whether a developer is currently active.	Must be >= 0
`hll_precision`	`int`	`14`	HyperLogLog precision for developer count sketches. Higher = more accurate, more memory. Valid range: 4–18.	Must be 4–18 when set

`history.file_history`¶

Controls the file history (churn and hotspot) analyzer.

Field	Type	Default	Description	Validation
`hotspot_threshold_critical`	`int`	`50`	Commit count above which a file is classified as a critical hotspot.	Must be >= 0
`hotspot_threshold_high`	`int`	`30`	Commit count above which a file is classified as a high hotspot.	Must be >= 0
`hotspot_threshold_medium`	`int`	`15`	Commit count above which a file is classified as a medium hotspot.	Must be >= 0

`history.imports`¶

Controls the import/dependency history analyzer.

Field	Type	Default	Description	Validation
`goroutines`	`int`	`4`	Number of parallel goroutines for import extraction.	Must be > 0
`max_file_size`	`int`	`1048576`	Maximum file size in bytes to analyze for imports (1 MiB default).	Must be > 0
`max_dependency_risk_rows`	`int`	`30`	Maximum number of rows in the dependency risk table in plot output.	Must be >= 0

`history.sentiment`¶

Controls the comment sentiment analyzer.

Field	Type	Default	Description	Validation
`min_comment_length`	`int`	`20`	Minimum comment character length to include in sentiment analysis.	Must be > 0
`gap`	`float64`	`0.5`	Sentiment classification gap threshold. Comments with scores within this gap of neutral are considered neutral.	Must be between 0.0 and 1.0
`neutralizer_weight`	`float64`	`0.8`	How strongly SE-domain adjustments affect the final score. `0` = no effect, `1` = full adjustment.	Must be 0.0–1.0
`max_weight_ratio`	`float64`	`3.0`	Maximum weight ratio for comment length weighting. Prevents single long comments from dominating.	Must be > 0
`positive_threshold`	`float64`	`0.6`	Sentiment score at or above this is classified as "positive".	Must be 0.0–1.0
`negative_threshold`	`float64`	`0.4`	Sentiment score at or below this is classified as "negative".	Must be 0.0–1.0
`trend_threshold`	`float64`	`0.1`	Minimum change in sentiment needed to classify a trend as "improving" or "declining".	Must be >= 0
`low_sentiment_risk_thresh`	`float64`	`0.2`	Sentiment at or below this is flagged as HIGH risk (vs MEDIUM).	Must be 0.0–1.0

`history.clones`¶

Controls the clone detection analyzer.

Field	Type	Default	Description	Validation
`max_clone_pairs`	`int`	`1000`	Maximum number of clone pairs reported in the aggregated result.	Must be >= 0
`num_hashes`	`int`	`128`	MinHash signature size. More hashes = better accuracy, more memory.	Must be > 0
`num_bands`	`int`	`16`	Number of LSH bands. `num_bands * num_rows` must equal `num_hashes`.	Must be > 0
`num_rows`	`int`	`8`	Number of rows per LSH band.	Must be > 0
`shingle_size`	`int`	`5`	Token shingle window size for MinHash input.	Must be > 0
`similarity_type2`	`float64`	`0.8`	Minimum Jaccard similarity for Type-2 (renamed) clone detection.	Must be 0.0–1.0
`similarity_type3`	`float64`	`0.5`	Minimum Jaccard similarity for Type-3 (near-miss) clone detection.	Must be 0.0–1.0
`threshold_ratio_yellow`	`float64`	`0.1`	Clone ratio above which a yellow warning is issued.	Must be 0.0–1.0
`threshold_ratio_red`	`float64`	`0.3`	Clone ratio above which a red warning is issued.	Must be 0.0–1.0
`threshold_pairs_yellow`	`int`	`5`	Clone pair count above which a yellow warning is issued.	Must be >= 0
`threshold_pairs_red`	`int`	`20`	Clone pair count above which a red warning is issued.	Must be >= 0

`history.shotness`¶

Controls the shotness (function co-change) analyzer.

Field	Type	Default	Description	Validation
`dsl_struct`	`string`	`filter(.roles has "Function")`	DSL expression to identify structural elements (functions, methods) for co-change tracking.	Valid UAST DSL expression
`dsl_name`	`string`	`.props.name`	DSL expression to extract the name of each structural element.	Valid UAST DSL path expression

Custom Shotness Targets

Track co-change at the class level instead of functions:

history:
  shotness:
    dsl_struct: 'filter(.type == "Class")'
    dsl_name: ".props.name"

`history.typos`¶

Controls the typo detection analyzer.

Field	Type	Default	Description	Validation
`max_distance`	`int`	`4`	Maximum Levenshtein edit distance for two identifiers to be considered a potential typo pair.	Must be > 0

`history.anomaly`¶

Controls the temporal anomaly detection analyzer.

Field	Type	Default	Description	Validation
`threshold`	`float64`	`2.0`	Z-score threshold for flagging a commit as anomalous. Lower values are more sensitive.	Must be > 0
`window_size`	`int`	`20`	Sliding window size (in commits) for computing the rolling mean and standard deviation.	Must be >= 2

`checkpoint`¶

Controls checkpoint and resume behavior for long-running history analyses.

Field	Type	Default	Description	Validation
`enabled`	`bool`	`true`	Enable periodic checkpointing for crash recovery.	--
`dir`	`string`	`""`	Directory for checkpoint files. Empty uses `~/.codefang/checkpoints`.	Valid directory path or empty
`resume`	`bool`	`true`	Automatically resume from an existing checkpoint if one is found.	--
`clear_prev`	`bool`	`false`	Delete any existing checkpoint data before starting a new run.	--

Checkpoint Directory Permissions

The checkpoint directory must be writable by the user running Codefang. When using the default (~/.codefang/checkpoints), the directory is created automatically on first use.

Minimal Examples¶

CI / HeadlessLarge RepositorySecurity Audit

.codefang.yaml

analyzers:
  - history/burndown
  - history/devs

pipeline:
  workers: 2
  memory_budget: "1GiB"

checkpoint:
  enabled: false

.codefang.yaml

pipeline:
  workers: 8
  memory_budget: "8GiB"
  gogc: 50
  commit_batch_size: 200

history:
  burndown:
    hibernation_to_disk: true
    goroutines: 4

checkpoint:
  enabled: true
  dir: "/tmp/codefang-checkpoint"

.codefang.yaml

analyzers:
  - static/complexity
  - static/cohesion
  - history/anomaly

history:
  anomaly:
    threshold: 1.5
    window_size: 30

Configuration¶

Configuration Sources¶

Search Order¶

Merge Priority¶

Explicit Config Path¶

Environment Variables¶

Full Configuration Reference¶

Section Reference¶

analyzers¶

pipeline¶

history.couples¶

history.burndown¶

history.devs¶

history.file_history¶

history.imports¶

history.sentiment¶

history.clones¶

history.shotness¶

history.typos¶

history.anomaly¶

checkpoint¶

Minimal Examples¶

`analyzers`¶

`pipeline`¶

`history.couples`¶

`history.burndown`¶

`history.devs`¶

`history.file_history`¶

`history.imports`¶

`history.sentiment`¶

`history.clones`¶

`history.shotness`¶

`history.typos`¶

`history.anomaly`¶

`checkpoint`¶