The BIOS project has generated RNA-sequencing and DNA methylation data for over 4000 individuals. As part of these data, GoNL imputed genotypes were generated from existing genotypes and several phenotypes/demographic variables were collected for the same set of samples. A relational, SQL-based (Postgres) metadatabase (MDb) was created to store the large-scale multiple-omic data collected, in a structured way. Metadata and quantifications from the RP4 metabolomics project were also added to this database.

The metadatabase is available on the BIOS-VM.

MDb contents

The MDb contains as much meta-information as possible from all samples and datatypes: location of (raw) data on srm, md5 checksum verification, quality control information, links between the different identifiers used (person_id, dna_id, etc) and phenotype information. The data has been seperated into a number of entities, as described below:

Table: Description:
person Information about persons (including associated ids)
relation Relationship information between persons
gwas Information about GWAS runs
imputation Information about preformed genotype imputations
visit Phenotypes and other information related to the collection of samples
dna_sample Information about DNA samples
methylation_450k_run Information about Illumina 450k methylation array runs
methylation_450k_freeze Which methylation runs are included in which data freezes (and freeze subsets)
rna_sample Information about RNA samples
rna_run Information about RNAseq runs
rna_merged_run Which RNAseq runs are included in merged RNA runs
rna_freeze Which RNAseq runs are included in which data freezes (and freeze subsets)
nightingale_run Information regarding nightingale runs
nightingale_quantification Metabolomics quantification measurements

The listTables function can be used to retrieve a list of table names as well:

listTables()
## Accessing the 'rp3_rp4_meta' database at 'localhost:5432' as user 'guest'.
##  [1] "methylation_450k_run"       "methylation_450k_freeze"   
##  [3] "rna_merged_run"             "imputation"                
##  [5] "rna_run"                    "rna_freeze"                
##  [7] "visit"                      "nightingale_run"           
##  [9] "nightingale_quantification" "relation"                  
## [11] "gwas"                       "dna_sample"                
## [13] "person"                     "rna_sample"

Available views

Views are predefined SQL queries which can be used to extract a subset of the available information from the database. The names of the available views can be retrieved using the listViews-function:

listViews()
## Accessing the 'rp3_rp4_meta' database at 'localhost:5432' as user 'guest'.
##  [1] "freeze2rnaseq"                 "freeze1methylation"           
##  [3] "freeze2methylation"            "getfastq"                     
##  [5] "getidat"                       "minimalphenotypes"            
##  [7] "persontogwas_includingmztwins" "freeze1rnaseq"                
##  [9] "getids"                        "allphenotypes"                
## [11] "cellcounts"                    "getmethylationruns"           
## [13] "getrelations"                  "getrnaseqruns"                
## [15] "methylationsamplesheet"        "getimputations"               
## [17] "rnaseqsamplesheet"

Retrieving views and querying the database

To retrieve a view the getSQLview-function can be used. Note that view names are not case sensitive.

head(getSQLview("getids"))
## Accessing the 'rp3_rp4_meta' database at 'localhost:5432' as user 'guest'.
##          ids    bios_id         uuid   biobank_id person_id pheno_id
## 1 CODAM-2175 CODAM-2175 BIOS78A709E9        CODAM      2175     2175
## 2   LLS-1114   LLS-1114 BIOS75EAD30E LLS_PARTOFFS      1114     1114
## 3   LLS-1331   LLS-1331 BIOS30EA25EA LLS_PARTOFFS      1331     1331
## 4   LLS-2058   LLS-2058 BIOS2A2A1392 LLS_PARTOFFS      2058     2058
## 5   LLS-2177   LLS-2177 BIOS8593A04A LLS_PARTOFFS      2177     2177
## 6    LLS-234    LLS-234 BIOS8187488D LLS_PARTOFFS       234      234
##   biobank_gwas_id dna_id rna_id                rna_note gonl_id
## 1            2175   2175   2175 library-prep: succeeded    <NA>
## 2            1114   1114   1114 library-prep: succeeded    <NA>
## 3            1331   1331   1331 library-prep: succeeded    <NA>
## 4            2058   2058   2058 library-prep: succeeded    <NA>
## 5            2177   2177   2177 library-prep: succeeded    <NA>
## 6             234    234    234 library-prep: succeeded    <NA>
##   old_gonl_id cg_id in_rp3
## 1        <NA>  <NA>  FALSE
## 2        <NA>  <NA>  FALSE
## 3        <NA>  <NA>  FALSE
## 4        <NA>  <NA>  FALSE
## 5        <NA>  <NA>  FALSE
## 6        <NA>  <NA>  FALSE

We can always add views if necessary; please contact Leon Mei.

If you have a Postgres account on the BIOS-VM you can also query the tables directly using the runQuery-function. This function is just a wrapper around the dbGetQuery-function from the RPostgreSQL-package, so that package (or any other API which interacts with postgres) can also be used directly.

runQuery("SELECT * FROM visits;")

Database versioning

The database is built from data and SQL scripts stored on the LUMC git server. To retrieve the hash of the commit used to built the database, the mdbVersion-function can be used. This hash can be seen as the version of the database.

mdbVersion()
## Accessing the 'rp3_rp4_meta' database at 'localhost:5432' as user 'guest'.
## [1] "ebc3cde14445ad8bea4dbe112b62f1e5cda1ef39"