Healthsheet
The Healthsheet is a structured data quality document that accompanies each dataset. It is based on the Datasheets for Datasets framework and provides transparency about how the data was created, what it contains, and how it should and should not be used.
Access the Healthsheet at /app/datasets/[datasetId]/healthsheet/.
Responses are written in Markdown using the built-in editor.
Sections
Motivation
Why was the dataset created? Who created it and for what purpose? Was there specific funding or a commissioning organization?
Composition
What does the dataset contain? What types of instances are included (e.g., images, measurements, text)? How many instances are there? Is there a label or ground truth? Does the dataset contain sensitive information such as personally identifiable data, health information, or genetic data?
Collection
How was the data collected? Who collected it? What mechanisms or instruments were used? Over what timeframe? Were participants notified and did they consent? Does the dataset relate to people?
Preprocessing
Was any preprocessing, cleaning, or labeling applied to the raw data? If so, what was done? Was the raw data saved in addition to the processed version? Is the preprocessing software available?
Distribution
How is the dataset distributed? What license applies? Will it be updated? Is there a DOI or persistent identifier? Are there any export controls or restrictions?
Uses
What tasks has the dataset been used for already? What tasks could it be used for? Are there tasks it should not be used for? Are there potential harms from misuse?
Maintenance
Who is responsible for maintaining the dataset? How can errors be reported? Will the dataset be updated over time? Will older versions be supported?
Why Complete the Healthsheet?
A complete Healthsheet helps downstream users:
- Understand whether the dataset is appropriate for their use case
- Identify potential biases or limitations
- Comply with ethical and legal requirements when using the data
- Reproduce or extend the original research
Healthsheet content is reviewed as part of the publishing workflow before a dataset is made public.