Continuous Integration for Mechanical Design: A Pytest-Style Workflow for STEP Assemblies

Software has had continuous integration for a quarter of a century. Every pull request runs a unit-test suite, a linter, a type checker, and a coverage gate before a human is allowed to look at it. Mechanical design has the engineer's eyeballs and an expensive prototype.

CADCLAW is the open-source framework that closes that gap. It is software for makers, an extension of the M3-CRETE work into a tool that any mechanical engineer can use to build hardware: machines that make stuff. It runs validation gates over a STEP assembly the same way pytest runs assertions over a Python module: on every commit, in CI, with structured machine-readable findings and a pass/fail exit code. It is not a CAD authoring tool. It is the layer of automated checks that your CAD repository has never had.

This piece is for mechanical, aerospace, robotics, and machine-design engineers who are tired of catching CAD bugs in fabrication. It explains the gap, walks through CADCLAW's five validation families, gives you a working pytest-style test, shows the exact GitHub Actions YAML to run it on every pull request, and finishes with a case study from the M3-CRETE concrete-printer project, the assembly CADCLAW was first developed against, where the harness caught a back-side gantry-plate clip that would otherwise have surfaced under an angle grinder.

1. The CAD-test gap

A modern Python codebase ships with pytest, mypy, ruff, branch protection, and a dashboard that turns red the moment any of them complains. None of that exists for CAD. The closest most teams come to "CAD CI" is a senior engineer rotating the assembly model in Fusion before sign-off, looking for the obvious clips. This works for a small bracket. It does not scale to a large gantry where a sub-millimetre clip on a back-side rail is invisible from any single camera angle.

The bugs this gap ships to fabrication fall into a small set of categories:

These are not exotic failure modes. They are the everyday output of any non-trivial mechanical-design process. Software engineers solved the equivalent classes (off-by-one, type confusion, unhandled null, race condition) by encoding them as automated assertions in CI. Mechanical engineering has never had the runtime; the STEP file format and the CAD applications that produce it were never built around continuous validation.

CADCLAW is what that runtime looks like.

A small note on the analogy. pytest is exact: a Python value either equals the expected value, or it doesn't. CAD geometry is analog. A part is not binary present or absent; it can be slightly the wrong size, slightly clipping, slightly misplaced. That difference is real, and CADCLAW's design accommodates it. Findings are emitted with severity (PASS, WARN, FAIL), with structured evidence (bbox tuples, overlap volumes, distance values), and against a project-defined tolerance budget. The pytest-style merge gate is still pass-or-fail; under the hood, the gate is reasoning about real numbers with real noise. That is what makes CADCLAW different from a CAD-vendor "model checker": the framework is designed for the messy reality of mechanical parts, not the cleanroom of a single-vendor parametric tree.

2. What CADCLAW is

CADCLAW is an open-source Python framework that loads STEP assemblies and runs validation gates against them. It is MIT-licensed, distributed on PyPI as cadclaw, archived on Zenodo with DOI 10.5281/zenodo.19647391, and authored by Sunnyday Technologies. It is software for makers: a tool to enable mechanical engineers to make machines so they can make stuff. The audience is anyone driving hardware from a CAD repository, from aerospace bracketry to robotics chassis to custom industrial fixtures, optical mounts, surgical instruments, and prosumer 3D printers. M3-CRETE is the project the framework was first deployed against, and serves as a published case study; it is not the boundary of CADCLAW's scope.

Architecturally, CADCLAW has three layers:

  1. cadharness is a Python package containing the gate implementations: cadharness.inventory, cadharness.interference, cadharness.adjacency, cadharness.dimensional, cadharness.kinematics, cadharness.tolerance, cadharness.disassembly, cadharness.parity, cadharness.bom_audit. Each module is independently importable and follows the same pattern: a *Check class, a *Result dataclass, structured Finding objects with severity and evidence.
  2. cadclaw CLI is a console-script entry point that drives the harness from a declarative cadclaw.yaml rule file. Subcommands include cadclaw doctor, cadclaw harness, cadclaw bom-audit, cadclaw inspect, cadclaw publish-audit, and cadclaw claim-audit. Exit codes are pytest-style: 0 pass, 1 fail, 2 warn-only, 3 internal error.
  3. cadclaw_mcp is an MCP (Model Context Protocol) server that exposes every gate as a tool an AI coding assistant can call directly. We come back to this in section 6.

The framework is honest about its scope. CADCLAW reads geometry; it does not certify designs. It does not substitute for FEA, fatigue analysis, or physical testing. Every report ships with a confidence budget that lists what was checked, what was not, and what assumptions were baked in (mm units, rigid bodies, STEP exports faithful to the native CAD model). The README explicitly disclaims structural certification, hidden-suppressed-part detection in the native CAD package, vendor-stock validation, and physical-build conformance. CADCLAW does the geometric checks. The engineer is still the engineer.

What it does do is run those geometric checks on every commit, in CI, with no human in the loop, with copy-pasteable assertions and a structured report. That is the missing infrastructure.

3. The five gate families

CADCLAW ships a set of gates that cluster into five families that map cleanly onto the CAD-bug taxonomy from section 1. Each gate is implemented as a module under cadharness/; each takes a parts list and rule objects, returns a result dataclass, and emits structured findings the harness aggregates into a report.

Dimensional

The dimensional gate catches the "the drawing says one thing, the STEP says another" class of bug. Implementation lives in cadharness/dimensional.py. The user defines DimRule instances against bbox-signature labels:

from cadharness.dimensional import DimRule

DimRule(label='ymount', thin_axis=4.0, thin_tol=0.5)

DimensionalCheck.run() walks every part with that label, sorts the bounding-box dimensions, and asserts the smallest axis is within tolerance of the declared thin_axis. A violation reports the actual dimensions and a human-readable message. This is the gate that catches a swapped-argument box(80, 90, 4) versus box(80, 4, 90) regression because the sorted bbox tuple changes when the thin axis is wrong.

Kinematic

The kinematics gate detects locked-up degrees of freedom and undersized actuators. Implementation lives in cadharness/kinematics.py. Three computations are exposed: beam_deflection() (Euler-Bernoulli simply-supported beam with point load and distributed self-weight), motor_torque_budget() (acceleration plus friction plus gravity force budget against motor holding torque, with belt efficiency and torque derating), and belt tension / tooth-skip resistance for GT2 drives.

Worked example: an X-axis carrying a moderate payload on a long extrusion, driven by a NEMA23 through a small pulley. motor_torque_budget(...) returns a MotorResult with a safety_factor field. If safety drops below the project's threshold, the gate fails. This is the bug that ships when an engineer swaps to a heavier toolhead and forgets to re-check the motor budget; in fabrication it manifests as missed steps under acceleration.

Tolerance

Tolerance stack-up is the gate that historically required a high-end commercial seat of CETOL or 3DCS. CADCLAW's cadharness/tolerance.py does it in vanilla Python. A ToleranceChain accumulates Dimension objects with nominal, plus, minus, distribution, and direction fields, then analyze() reports three answers in one pass: worst-case (sum of all tolerances, conservative), RSS (root-sum-square, statistical), and Monte Carlo (sampled from each dimension's declared distribution). The result includes a Cpk process-capability index and a per-dimension variance decomposition, so the engineer can see which dimension is dominating the stack and where to spend tolerance budget.

A real failure mode this catches: motor-shaft alignment. A chain of beam length, shim, plate, and motor offset, each with its own tolerance, can pass RSS while failing worst-case. The report tells you which dimension contributes the most variance. That is a tolerance-budget conversation grounded in numbers.

Adjacency

The adjacency gate (cadharness/adjacency.py) catches the "motor far from its mount" class of bug: parts that should be near each other but aren't. The user declares AdjacencyRule(source='motor', target='bracket', max_distance=...). The AdjacencyCheck.run() method groups parts by label, finds the nearest target for every source, and emits an AdjacencyViolation if the distance exceeds the threshold. Floating parts (no target of the right type anywhere) report nearest_distance=inf.

This is also the gate that catches missing fasteners (no m5_screw within design clearance of a bracket_hole) and unintended scattering (a pulley lost in space relative to any motor). It is the cheapest gate to write and one of the most useful on any non-trivial assembly.

Interference and disassembly

Two gates share this slot because they are the geometric-truth checks. cadharness/interference.py does pairwise BRep boolean intersection (BRepAlgoAPI_Common) with a bbox pre-filter; reported overlaps are real volumes, not bbox approximations. Every reported Clip carries a suggest_axis, suggest_shift_mm, and clearance_mm field, so the report reads plate at (...) clips cbeam by (volume), shift +Y by (distance) to clear with running clearance instead of leaving the engineer to derive the fix vector by hand.

cadharness/disassembly.py answers the dual question: can this thing actually come apart? DisassemblySequence.auto_sequence() orders parts by type priority and distance from the assembly centroid, then computes a per-part removal axis. export_frames() writes individual STEP files showing each step of the disassembly. If the auto-sequencer cannot find a removal direction that doesn't pass through another solid, you have a serviceability bug, and it shows up before you ship.

4. A worked example

Let's write a CADCLAW test for an imagined gantry_corner sub-assembly: one C-beam at a fixed length, two NEMA23 motors, one motor-mount plate, one belt. We want to assert: parts are present, nothing clips, motors have a bracket nearby, and the mount plate is the right thickness.

from cadharness.harness import Harness
from cadharness.adjacency import AdjacencyRule
from cadharness.dimensional import DimRule

def test_gantry_corner():
    h = Harness("CAD/gantry_corner.step")

    h.add_inventory(
        labels={
            (40.0, 80.0, 1000.0): 'cbeam',
            (56.4, 56.4, 76.6):   'motor',
            (4.0, 80.0, 90.0):    'mount',
        },
        expected={'cbeam': 1, 'motor': 2, 'mount': 1, 'belt': 1},
    )

    h.add_interference(skip_labels={'belt'}, min_clearance_mm=1.0)

    h.add_adjacency(rules=[
        AdjacencyRule(source='motor', target='mount', max_distance=80),
    ])

    h.add_dimensional(rules=[
        DimRule(label='mount', thin_axis=4.0, thin_tol=0.3),
    ])

    report = h.run()
    assert report.passed, str(report)

Drop that in tests/test_gantry.py, run pytest tests/test_gantry.py, and you have continuous integration for the gantry-corner assembly.

When it fails, the report is structured. Suppose the engineer authored mount with the wrong thickness and re-exported the STEP. The output is:

CAD HARNESS REPORT - FAILED
  [PASS] inventory
  [PASS] interference
  [PASS] adjacency
  [FAIL] dimensional
         thin axis 5.0mm, expected 4 +/- 0.3mm

If the engineer instead shifted the mount plate so it clips a beam, the interference gate fires:

  [FAIL] interference
         mount at (1495, 540, 366) clips cbeam
         shift +Y to clear with running clearance

The fix vector is in the report. The engineer applies the shift in Fusion, re-exports, re-runs. The cycle is the same one a software engineer runs against a unit-test failure: read the assertion, fix the cause, rerun until green. The difference is that the assertion is over geometry, and the cost of skipping it is a fabricated part rather than a runtime exception.

This is the workflow described under "place authored parts; do not generate them" in CADCLAW's AGENTS.md. The engineer authors geometry in their CAD package of choice (Fusion, Rhino, FreeCAD, SolidWorks); a CadQuery script places copies and emits a unified STEP; CADCLAW asserts against the unified STEP. CADCLAW does not draw parts. It checks them.

5. CADCLAW in CI (GitHub Actions)

CADCLAW is pip install cadclaw and a single console script. That makes the GitHub Actions configuration boring, which is the point.

# .github/workflows/cad-check.yml
name: CAD assembly validation

on:
  pull_request:
    paths:
      - 'CAD/**.step'
      - 'CAD/**.py'
      - 'cadclaw.yaml'
      - 'bom/data.json'
  push:
    branches: [main]

jobs:
  cadclaw:
    runs-on: ubuntu-latest
    timeout-minutes: 15

    steps:
      - uses: actions/checkout@v4
        with:
          lfs: true   # STEP files are usually in LFS

      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install CADCLAW
        run: |
          python -m pip install --upgrade pip
          python -m pip install cadclaw

      - name: Verify environment
        run: cadclaw doctor

      - name: Run validation harness
        run: cadclaw harness --rules cadclaw.yaml --report-format md -o cadclaw-report.md

      - name: BOM-vs-CAD audit
        run: cadclaw bom-audit --rules cadclaw.yaml

      - name: Upload report artifact
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: cadclaw-report
          path: cadclaw-report.md

A few notes on this workflow. The paths: filter ensures the harness only runs on PRs that actually touch geometry, BOM, or rules, not on a README typo. cadclaw doctor runs first because in our experience the most common CI failure is a CadQuery / OCC version skew, and doctor reports it in plain English instead of an opaque BRep traceback. The --report-format md flag emits a Markdown report you can post as a PR comment via a follow-up step (omitted here for brevity). The if: always() on the artifact upload preserves the report on failure, which is exactly when you want it.

Exit codes follow pytest: 0 is green, 1 is a fail, 2 is warn-only (the harness ran; nothing blocked but WARN findings exist), 3 is an internal error. Branch protection on main should require the cadclaw job to pass; that gives mechanical engineering the same merge gate that software engineering has had for decades.

The M3-CRETE concrete-printer project uses exactly this pattern: every CAD pull request triggers cadclaw harness against the assembly STEP, and a failing harness blocks merge in the same way a failing pytest would. Section 7 describes what that catches in practice.

6. The MCP server

The fastest-growing class of CAD-editing collaborator in 2026 is not a human. AI coding assistants (Claude, OpenAI's Operator, Google's coding agents, and the broader cohort that has emerged since 2024) are now routinely editing CadQuery scripts, build123d files, and assembly drivers. Without a check layer, that workflow is dangerous. Field testing on real hardware projects has documented sessions where parametric plate generation produced motor-mount hole patterns that were uniformly misaligned with the assembly; rounds of generate-critique-strip later, the shipping code was less code than the start. That experience is the reason AGENTS.md exists in the CADCLAW repository.

The MCP server is the integration layer that makes the AI-assisted workflow safe. CADCLAW ships cadclaw_mcp, a Model Context Protocol server that exposes every gate as a callable tool. An assistant connected over MCP can issue:

Configuration is a few lines in the host's MCP config. For Claude Code:

{
  "mcpServers": {
    "cadclaw": {
      "command": "python",
      "args": ["-m", "cadclaw_mcp"],
      "cwd": "/path/to/CADCLAW"
    }
  }
}

The protocol is open. Any MCP-compatible host (Claude Desktop, Claude Code, Cursor, and a growing list of others) can drive the harness without code generation.

The crucial design decision: the MCP server only exposes CADCLAW's own checks. It does not give the assistant access to your CAD application, your file system outside the loaded STEP, or anything outside the harness. The assistant cannot edit your Fusion model. It can validate the STEP export, propose a fix vector from interference.suggest_shift_mm, and ask you to apply it. The AI is a reader and reasoner; the human is the author. This is the same separation of concerns that makes CI safe in software: the test runner is sandboxed from the codebase it tests.

The full loop is: human prompt, assistant edits a CadQuery placement script, CADCLAW regenerates the assembly STEP, MCP check_interference reports findings, assistant interprets and proposes the next edit, human approves. CADCLAW is the runtime that keeps that loop honest. Without it, the assistant is generating geometry that may or may not fit; with it, every iteration passes through the same gates a human-authored design would.

7. Case study: validating M3-CRETE

CADCLAW was developed alongside M3-CRETE, Sunnyday Technologies' open-source concrete 3D printer, and that project is the published deployment we have permission to cite. M3-CRETE is exactly the assembly density at which manual visual review breaks down: a pallet-scale gantry with NEMA23 motors on every motion axis, V-wheels on both faces of the X-rail, anti-racking belt drives on Y, self-tramming belt drives on Z, and a forest of mounts, brackets, and shims. The pattern generalises to any mechanical-hardware project with a non-trivial part count and a CAD-driven manufacturing pipeline.

The most recent CADCLAW field test against M3-CRETE landed against the v0.6 release, against a real project (not a fixture), and exercised the harness end-to-end. The findings were structured around what worked, what false-positived, and what surfaced during cleanup.

Verified working (the regression-test surface):

Interference families caught before fabrication. The canonical CADCLAW success on M3-CRETE is documented in the v0.6 field-test report and the related session log. An X-carriage gantry plate, after a series of edits to add the X-axis motor mount, ended up clipping the rear X-rail. The clip was a sliver in Y, plate-wide in X, motor-tall in Z, visually invisible from the default Fusion view, and geometrically certain to bind during assembly. The interference gate fired with structured evidence (overlap volume, bbox tuples, suggested shift axis, suggested shift distance), the engineer applied the shift in Fusion, re-saved, re-ran the harness, and watched the gate go green. Total time from finding to fix: minutes. Time to discover the same bug at the fabricator: at minimum a partial rebuild, more likely a custom shim and a multi-day delay.

That single finding paid for the project. The same gate has caught related interference families across M3-CRETE's history: cantilever-pickoff plates running into rails, motor mounts clipping their own brackets after an offset refactor, splice connectors sitting proud of the wheel path, and Z-carriage versus cross-brace clearance at extremes of travel. Many of these were not user-edits; they were drift. A Fusion visibility-toggle export that silently dropped a part. A CadQuery placement script that picked up an off-by-one constant from a refactor. A BOM update that didn't match the geometry it was supposed to describe. This is the everyday output of a real mechanical-design process, and CADCLAW catches it because the harness runs every commit, not just the ones a human remembers to check.

False-positives the field test surfaced (and the next-version backlog). The v0.6 field test caught its own bugs, too, and that is the right kind of honesty for a CI tool. forbidden_terms substring matching flagged the BOM's intentional anti-substitution warnings ("do not substitute the belt used on Y/Z here") because the dumb match treated negation and assertion as equivalent. claim_audit.stale_terms flagged the README's CC BY-SA 4.0 attribution to OpenBuilds, deletion of which would itself be a license violation. The default forbidden_absolute "validated" flagged a sentence that described a third-party service's training data, not an M3-CRETE claim. cad.count_mismatch reported per-rule against the same label, producing redundant findings rather than one aggregated diff. Each of these is in scope for the v0.7 milestone (graduated negation-aware matching, attribution-block exemption, configurable absolute terms, and rule aggregation).

A new finding surfaced during cleanup that the field test had not anticipated: the BOM legitimately specifies a small spare-part quantity on top of the design count for some line items (because some suppliers only sell in pairs, or to leave a margin for kit packaging), and the v0.6 rule schema conflates design count with order count. Workarounds against the v0.6 surface failed, and the right fix is a rule-schema change adding expected_design_qty and spare_qty fields. That is now the v0.7 MED-6 line.

The pattern generalises. Anywhere the cost of a fabrication mistake is more than the cost of a CI run, CADCLAW is positive expected value: aerospace bracketry, robotics chassis, custom industrial fixtures, optical mounts, surgical instruments, prosumer 3D printers, machine-tool fixtures, and bench-scale research apparatus. The M3-CRETE field test is the one we publish because we own it; the gates are the same regardless of the assembly's domain.

8. Where CADCLAW is going

Active backlog items for the next milestone include STEP AP242 PMI attribute support so semantic dimensions and tolerances embedded in the STEP file can drive the dimensional and tolerance gates directly, parametric assembly diff'ing so a pull request can show exactly which parts moved between commits, expanded kinematic gates for rotational axes and ball-screw drives, the next-version aggregation and negation-aware matching from the recent field test, and tighter integration with the BOM-audit gate so a single rule file can describe the complete mechanical CI surface.

Further out, the most interesting unknowns are not technical but social. CAD CI works well for an open-source hardware project with a coherent CadQuery placement layer and a Git-tracked STEP. It does not yet have a clean answer for shops that operate entirely inside SolidWorks PDM, where the canonical model is a binary native file and the STEP is an export artefact rather than a source of truth. The honest answer there is the same as for any CI tool: validation runs against the artefact you ship, and if your artefact diverges from your authoring, you have a process problem upstream of the test runner. CADCLAW's parity gate, which compares two STEP exports for signature drift, is the small first step toward catching that divergence in CI rather than at the fabricator. Track the roadmap in the CADCLAW repository's HANDOFF.md and the GitHub milestones.

Install

pip install cadclaw
cadclaw doctor                              # verify environment
python examples/init_rules.py --step my.step --bom bom.json   # scaffold cadclaw.yaml
cadclaw harness --rules cadclaw.yaml        # run every declared gate
cadclaw bom-audit --rules cadclaw.yaml      # run the BOM-vs-CAD audit standalone

A handful of lines from a clean Python 3.10+ environment to a working CAD CI pipeline. Distributed via PyPI, source on GitHub, MIT-licensed, no commercial CAD software required for CADCLAW's own checks.

Cite this work

If you use CADCLAW in published research or derivative work, please cite via the project's CITATION.cff:

Sonnentag, N. (2026). CADCLAW: Automated validation framework for
STEP-based CAD assemblies [Software]. Sunnyday Technologies.
https://github.com/sunnyday-technologies/CADCLAW
DOI: 10.5281/zenodo.19647391

A CITATION.cff file is shipped in the repository for automated citation tooling (Zenodo, GitHub's citation widget, JOSS).

Schema