Data Integrity Requirements in Compliance Verification
Data integrity requirements establish the conditions under which records, measurements, and reported values produced during compliance verification can be trusted to accurately represent the underlying operational reality. This page covers the regulatory foundations of data integrity obligations, the mechanisms through which verifiers assess whether data meets those standards, and the decision criteria used when data quality is in dispute. Across federal environmental, healthcare, and financial frameworks, failures of data integrity have produced enforcement actions, invalidated verification outcomes, and exposed organizations to substantial liability.
Definition and scope
Data integrity, in the compliance verification context, refers to the completeness, consistency, accuracy, and traceability of information submitted to or relied upon by a verifying body. The U.S. Food and Drug Administration (FDA) defines data integrity as the extent to which all data are complete, consistent, and accurate throughout the data lifecycle — a formulation that has been adopted by inspection programs under 21 CFR Parts 210 and 211 for pharmaceutical manufacturers. The U.S. Environmental Protection Agency (EPA) applies analogous standards under 40 CFR Part 75, which governs continuous emissions monitoring systems (CEMS) data submitted to agency reporting portals.
Scope encompasses the full lifecycle of a data record: initial capture, storage, transmission, transformation, and archival. A record that is accurate at the point of measurement but altered — even inadvertently — during aggregation or reporting falls outside integrity standards. Documentation requirements for compliance verification interact directly with data integrity obligations because the record-keeping format itself determines whether a trail sufficient for independent reconstruction exists.
Data integrity requirements apply across two broad categories:
- Electronic records and signatures — governed at the federal level by 21 CFR Part 11 (FDA) and by National Institute of Standards and Technology (NIST) SP 800-53 controls for federal information systems.
- Physical and observational records — governed by agency-specific field data collection protocols, chain-of-custody standards, and laboratory accreditation requirements under programs such as the EPA's National Environmental Laboratory Accreditation Program (NELAP).
How it works
Verifiers assessing data integrity operate through a structured evaluation that follows discrete phases:
- Record identification — Cataloging all data streams relevant to the compliance obligation, including raw sensor outputs, manually entered values, calculated fields, and third-party submissions.
- Metadata and audit trail review — Examining system-generated logs for timestamps, user identifiers, edit histories, and access records. Under 21 CFR Part 11, audit trails must be computer-generated and must capture date, time, and the identity of the operator for any modification of an electronic record.
- Gap and anomaly detection — Comparing reported values against independent reference data, process parameters, or contemporaneous operational records. EPA Method 19 under 40 CFR Part 60, for example, provides calculation procedures that verifiers use to cross-check reported stack flow rates against fuel use data.
- Chain of custody confirmation — Tracing physical samples or electronic datasets from origin to final reporting destination. Chain-of-custody verification is a distinct but closely related discipline that overlaps with data integrity review when physical specimens are involved.
- Corrective action assessment — Determining whether anomalies represent isolated recording errors or systemic control failures. The distinction governs whether a finding triggers a minor nonconformance or a material finding requiring escalation. Nonconformance findings in verification are classified partly on the basis of whether underlying data integrity was compromised.
Evidence standards in compliance verification set the threshold above which data anomalies become material. A verifier cannot arbitrarily reject data; rejection must be grounded in a documented finding that the data fails a specific integrity criterion.
Common scenarios
Pharmaceutical manufacturing — FDA 483 observations frequently cite data integrity deficiencies including overwriting of raw chromatography data, deletion of out-of-specification test results, and shared login credentials that prevent attribution of data entries to individual analysts. The FDA's 2018 guidance document on data integrity and CGMP compliance identifies these as violations of 21 CFR 211.68 and 211.194.
Environmental emissions reporting — Facilities subject to EPA's Acid Rain Program under 40 CFR Part 75 must submit quarterly emissions data via the EPA's Electronic Data Reporting (EDR) system. Substitution data codes, which replace missing CEMS measurements with conservative values per the regulation's missing data substitution hierarchy, are themselves subject to integrity review — verifiers confirm that facilities applied the correct substitution tier rather than selecting more favorable values.
Financial and federal grant compliance — The Office of Management and Budget (OMB) Uniform Guidance at 2 CFR Part 200 requires that financial records supporting federal expenditures be supported by source documentation and maintained for a minimum of 3 years from the date of submission of the final expenditure report (2 CFR § 200.334). Verifiers under Single Audit requirements assess whether grantees' accounting systems produce traceable, unaltered records.
Healthcare billing records — Centers for Medicare & Medicaid Services (CMS) audit contractors examine claims data for edit anomalies, duplicate billing patterns, and mismatches between clinical documentation and billed service codes, all of which are treated as potential data integrity failures under the False Claims Act framework.
Decision boundaries
The central classification question in data integrity review is whether a data deficiency constitutes a recording error or a data falsification. These categories carry fundamentally different regulatory consequences.
A recording error — such as a transposed digit in a manual entry that is correctable through the contemporaneous raw record — typically results in a corrected submission and a minor finding. Data falsification — the intentional deletion, alteration, or backdating of records to misrepresent compliance status — triggers enforcement referral and may invoke penalties for false verification claims under statutes including 18 U.S.C. § 1001 (false statements to federal agencies) and the Clean Air Act's criminal provisions at 42 U.S.C. § 7413(c).
A second boundary governs materiality: not every data anomaly invalidates a verification outcome. Materiality in compliance verification determines whether a data gap is large enough to alter the verifier's overall conclusion. Quantitative materiality thresholds differ by program — EPA's Greenhouse Gas Reporting Rule at 40 CFR Part 98 specifies that missing data substitution values must not exceed 20% of annual operating hours for certain source categories before triggering additional reporting obligations.
A third boundary distinguishes systemic from isolated integrity failures. A single misfiled record in a 12-month dataset is isolated. A pattern of missing audit trails across 8 of 12 production batches in a pharmaceutical facility is systemic — requiring root cause analysis and corrective action under CAPA frameworks before a verification opinion can be issued. Corrective action and verification follow-up protocols govern how systemic findings are resolved before a compliance status determination is finalized.