Rare disease patient data is trapped in hundreds of siloed registries that can't talk to each other, making every clinical trial start from scratch
healthcarehealthcare0 views
There are over 10,000 known rare diseases, and for most of them, the total body of clinical knowledge exists in fragmented registries run by individual hospitals, patient advocacy groups, pharma companies, and academic consortia. These registries use different data standards, different consent frameworks, different variable definitions, and different access policies. A researcher studying Niemann-Pick disease type C cannot combine data from the NIH registry, the European registry, the pharma-sponsored registry, and the patient advocacy registry into a single dataset without years of negotiation and manual harmonization. The result is that each new clinical trial for a rare disease must re-recruit patients and re-collect natural history data from scratch, as if previous research never happened.
This fragmentation directly harms patients. Families are asked to fill out the same intake questionnaires, provide the same medical records, and undergo the same assessments for multiple overlapping registries -- a burden that the Orphanet Journal of Rare Diseases has documented as increasingly unsustainable. Meanwhile, researchers can't answer basic questions about disease progression because the data exists in five places, none of which has enough patients alone to reach statistical significance, and none of which can be legally or technically combined with the others. Drug developers can't use existing registry data as external control arms for clinical trials because the data wasn't collected with consistent endpoints.
The structural cause is a tragedy of the commons. Each stakeholder has rational reasons to maintain their own registry: pharma companies want proprietary competitive advantage, academic groups need to control data for publications, patient groups want to maintain trust with their community, and hospitals face HIPAA and GDPR constraints on sharing. The GA4GH (Global Alliance for Genomics and Health) has been working on FAIR data principles for rare diseases, but adoption is voluntary and slow. The European Reference Networks have attempted cross-border registry harmonization but face persistent challenges with non-interoperable systems. Until someone solves the incentive problem -- making it more valuable to share data than to hoard it -- rare disease research will continue to waste millions of dollars and years of patient time re-collecting information that already exists somewhere in a silo nobody can access.
Evidence
Orphanet Journal of Rare Diseases: 'Data silos are undermining drug development and failing rare disease patients': https://ojrd.biomedcentral.com/articles/10.1186/s13023-021-01806-4 | GA4GH session on rare disease data sharing challenges (2024): https://www.ga4gh.org/news_item/uncovering-and-overcoming-common-data-sharing-challenges-in-the-rare-disease-landscape/ | Orphanet Journal on FAIRification of fragmented rare disease registries: https://ojrd.biomedcentral.com/articles/10.1186/s13023-022-02558-5 | 'Sharing is caring' call for new era of rare disease R&D: https://ojrd.biomedcentral.com/articles/10.1186/s13023-022-02529-w