Data intake + governance checklist

The operational gate before any institutional cohort enters modeling.

This checklist turns governance and intake into an explicit workflow: before transfer, after receipt, during linkage audit, during schema audit, and before any dataset is allowed into the analytical stack.

Core principle

The project does not need direct identifiers or full health-system integration to begin. It does need stable pseudonymized linkage, explicit date logic, documented units and device lineage, and a narrow scope for the first pilot.

Pseudonymized linkage Explicit date logic Units and lineage No fabricated joins

Downloads

Markdown checklist JSON summary Pilot protocol Data requirements pack

Stages

Intake should be a gate, not an assumption

A dataset should not enter modeling because transfer succeeded. It should enter only after linkage, units, scope, and governance all pass an explicit review.

Before transfer

named data owner
cohort description
transfer path
DUA or approval path

File receipt

manifest match
delivery integrity
source system logged
raw delivery preserved read-only

Identity and linkage

stable `participant_uid`
explicit participant-night linkage
no fabricated joins
date logic documented

Schema and unit audit

field mapping complete
units normalized
device modality documented
score semantics documented

Quality and feasibility

participant count
median nights
endpoint density
missingness summary

Decision gate

`accept_for_pilot`
`accept_with_limits`
`hold_pending_clarification`
`reject_for_current_scope`

Red flags

These should stop the workflow

The point of a checklist is to block weak datasets early, not to rationalize them after transfer.

Hard stops

join would depend on row order
direct identifiers appear in analytical tables
no sleep-date to report-date logic
missing unit definitions for core physiology
only aggregate exports are available

Required minimal governance posture

pseudonymized subject IDs
no direct identifiers in analytical tables
narrow documented research scope
file-level provenance
raw and transformed outputs separated