The cVEDA Dataset

DOI: 10.25720/veda-cmrh

The c-VEDA dataset contains data collected and processed by the c-VEDA study from an accelerated and planned missing longitudinal cohort of over 10,000 individuals within specific age ranges 6-11, 12-17 and 18-23 years in 7 recruitment centres covering 5 geographical regions of India: Punjab and adjoining states (PGIMER), Eastearn Coalfields (KOLKATA), Northeast India (IMPHAL), Bengaluru & Mysuru (MYSORE, NIMHANS, SJRI) and Chittoor (RISHIVALLEY).

Age bands

Age band ID	Age band name	Age range
C1	Children	6-11
C2	Adolescents	12-17
C3	Adults	18-23

Recruitment centres

Centre ID	Centre name	Site	Location	MRI
11	PGIMER	Postgraduate Institute of Medical Education & Research	Chandigarh, Punjab	Siemens Verio
12	IMPHAL	Regional Institute of Medical Sciences	Imphal, Manipur
13	KOLKATA	National Institute of Occupational Health Centre	Kolkata, West Bengal	Siemens Trio
14	RISHIVALLEY	Rishi Valley Rural Health Centre	Rishy Valley, Andhra Pradesh	acquired in NIMHANS
15	MYSORE	CSI Holdsworth Memorial Hospital	Mysuru, Karnataka	Philips Ingenia
16	NIMHANS	National Institute of Mental Health and Neurosciences	Bengaluru, Karnataka	Siemens Skyra / Philips Ingenia
17	SJRI	St. John’s Research Institute	Bengaluru, Karnataka	acquired in NIMHANS

Time points

Baseline (BL) data are collected at inclusion and follow-ups are scheduled either 1 year (FU1) or 2 years (FU2) after inclusion.

Data collection and management

Biological sampling and MRI scanner protocol are described in Standard Operating Procedures. Details about the actual MRI sequences used by each centre are available via SFTP:
sftp://cveda.nimhans.ac.in/documentation/mri/

Other data have been collected using Psytools. We provide a short overview of the c-VEDA instruments. The detailed Excel resource file used to build the Psytools questionnaires is available via SFTP only:
sftp://cveda.nimhans.ac.in/documentation/psytools/
Some of the questionnaires are used under restrictive licensing, therefore this Excel resource file must remain within the c-VEDA consortium. Please do not disseminate.

We have written specific software and scripts for imaging data collection and initial processing. We have published and maintain them in GitHub. We roughly describe the data acquisition and management process from the perspective of the databank operators in the cveda_databank wiki.

Data and samples are pseudonymized once before leaving acquisition centres. Then data are pseudonymized a second time by the database team, acting as a trusted third party, distinct from acquisition centres. We provide a list of valid participant identifiers to help end-users detect and investigate possible identifier errors.

Data processing

See cveda_processing on GitHub.

Data publication

We release formal versions of the c-VEDA dataset, appending additional data or fixing existing data with each release:

Dataset release details

Dataset version	DOI	Release date	Download link
0.9	10.25720/veda-c09w	2018-11-21	`sftp://cveda.nimhans.ac.in/data/0.9/`
1.0	10.25720/veda-c10e	2018-11-21	`sftp://cveda.nimhans.ac.in/data/1.0/`
1.1	10.25720/veda-c11f	2019-04-17	`sftp://cveda.nimhans.ac.in/data/1.1/`
1.2	10.25720/veda-c12g	2019-05-28	`sftp://cveda.nimhans.ac.in/data/1.2/`
1.3	10.25720/veda-c13h	2019-06-26	`sftp://cveda.nimhans.ac.in/data/1.3/`
1.4	10.25720/veda-c14j	2019-10-16	`sftp://cveda.nimhans.ac.in/data/1.4/`
1.5	10.25720/veda-c15k	2021-06-07	`sftp://cveda.nimhans.ac.in/data/1.5/`