Pipeline

How AuditProven Shield Works

From Documents to Audit-Ready Reports in Eight Steps

AuditProven Shield operates as a deterministic pipeline. Every step is reproducible, traceable, and verifiable. There is no randomness, no generative AI hallucination, and no manual intervention required between upload and delivery.

Step 1: Upload Your Documents

Upload your organization's policy documents in PDF, DOCX, or plain text format. Shield accepts security policies, employee handbooks, incident response plans, vendor management procedures, change management documentation, and any other document that describes how your organization implements controls.

What happens technically: Each document is parsed into structured sections. PDF files are processed with layout-aware extraction that preserves tables and reading order. DOCX files are parsed with heading hierarchy preservation. Every extracted section receives a SHA-256 hash — this is the first link in the provenance chain.

You see: A list of uploaded documents with section counts, total pages, and a confirmation that parsing completed successfully.

Step 2: Select Your Framework

Choose one or more compliance frameworks to assess against: SOC 2, ISO 27001, GDPR, HIPAA, PCI DSS, or NIST CSF. For multi-framework assessments, Shield identifies where a single control satisfies requirements across multiple frameworks.

What happens technically: The compliance knowledge graph loads the selected framework's requirement nodes — between 25 and 120 individual requirements depending on the framework. Each requirement carries a detailed definition describing what must be done, how it is typically implemented, and what evidence auditors examine.

You see: The framework name, version, total requirement count, and a brief scope description.

Step 3: Statement Extraction

Shield reads every sentence in every uploaded document and classifies it by compliance function.

What happens technically: Sentences are classified into four categories using pattern matching on modal verbs, control indicators, and evidence markers. "All employees must use MFA" is classified as an OBLIGATION with MUST strength. "The company implements RBAC" is classified as a CONTROL. "As documented in the audit log" is classified as an EVIDENCE_REF. Everything else is NARRATIVE context.

You see: A summary showing how many obligations, controls, evidence references, and narrative sentences were extracted, along with the compliance keywords found.

Step 4: Domain Classification

Each extracted statement is classified into one of twenty compliance domains: access control, encryption, incident response, business continuity, change management, asset management, HR security, physical security, vendor risk, data classification, logging and monitoring, network security, application security, identity management, privacy, risk assessment, security awareness, vulnerability management, configuration hardening, and backup and recovery.

What happens technically: Classification uses TF-IDF cosine similarity against the compliance knowledge graph's domain definitions, boosted by domain-specific keyword matching. Each statement receives a domain assignment with a confidence score.

You see: A breakdown of statements by domain, showing which areas of your documentation are well-covered and which are sparse.

Step 5: Requirement Mapping

Classified statements are mapped to specific framework requirements. Your access control obligation is matched to the specific SOC 2, ISO 27001, or HIPAA requirement it addresses.

What happens technically: TF-IDF cosine similarity compares each classified statement against the definitions of all requirements in the selected framework. Matches receive a confidence score: HIGH (above 0.6), MEDIUM (0.3 to 0.6), or LOW (below 0.3). The mapping records which document, which section, and which sentence addresses which requirement.

You see: A mapping table showing each requirement and whether it is addressed, weakly addressed, or not addressed. Each mapping shows the source document and section.

Step 6: Gap Analysis

Every framework requirement that has no matching policy statement is identified as a gap. Requirements with only low-confidence matches are flagged as weak. Controls with no evidence references are flagged as needing evidence.

What happens technically: Gaps are scored by risk propagation. A gap in access control affects downstream requirements in encryption, monitoring, and vendor management. The risk score reflects how many other controls depend on the missing one. Priority is assigned automatically: CRITICAL (risk above 0.8), HIGH (above 0.5), MEDIUM (above 0.2), LOW (below 0.2).

You see: A prioritized gap list with severity color coding, affected downstream controls, and suggested remediation for each gap. Remediation text is drawn from the requirement's own implementation guidance in the knowledge graph.

Step 7: Report Generation

Shield generates the complete compliance report with ten sections.

What happens technically: Control narratives are composed using template-based generation with strict caps — each template pattern is used at most twice per section to prevent repetitive language. Every generated sentence passes an eleven-pattern garble detection gate before inclusion. Every sentence receives a provenance record linking it to the source document, section, page, and requirement. If all templates for a category are exhausted, a plain-but-grammatical fallback is used. The complete set of provenance records is sealed into a Merkle tree.

You see: The full report with executive summary, control narratives, gap analysis, evidence matrix, risk assessment, and remediation plan.

Step 8: Export and Verify

Download your compliance package in your preferred format. Verify the provenance chain.

What happens technically: The report is exported as JSON (machine-readable), PDF (print-ready with professional formatting), DOCX (editable with heading styles and TOC), or XLSX (evidence matrix as a filterable spreadsheet). The provenance appendix contains the Merkle root hash and all individual claim hashes.

You see: Download buttons for each format, the Merkle root hash, and a verification tool where you can check any individual claim's provenance chain.

The Provenance Guarantee

At the end of this process, every sentence in your compliance report can answer the question: "Where did this come from?" The answer is not "a model generated it." The answer is a cryptographic proof pointing to a specific section of a specific document that you uploaded.

This is the AuditProven Shield provenance guarantee. It is what makes our reports fundamentally different from any AI-generated compliance document.