Download
To be able to build and run the software, we first need to download the scripts and binary files. Additionally to have a fully functioning EVAdb many third-party datasets should also be downloaded. Data from gnomAD, dbSNP and others is used in many filtering steps to find rare variants or variants associated with the given phenotype of the patient. After downloading the scripts and Dockerfiles, we will build the container images.
Are pre-built Docker containers available for EVAdb?
Currently, the Docker containers for EVAdb are not available on a public docker registry. Therefore, the images must be built on a local machine before use.
EVAdb
All files necessary to deploy EVAdb can be found in the git repository of EVAdb
(here). The docker
branch contains
Dockerfiles and docker-compose scripts for building and installing the software.
Proxy
If youre locked behind a restrictive proxy that you need to access the
internet from your build host you have to adapt the Dockerfiles. Currently,
it is necessary to set the http_proxy
and https_proxy
environment
variables.
# Prefix each Dockerfile with
ENV http_proxy http://<USER>:<PW>@<IP>:<PORT>/
ENV https_proxy http://<USER>:<PW>@<IP>:<PORT>/
RUN git config --global https.proxy ${https_proxy} \
&& git config --global http.proxy ${http_proxy} \
&& git config --global https.proxyAuthMethod basic \
&& git config --global http.proxyAuthMethod basic
To clone the EVAdb docker branch, use
git clone -b docker https://github.com/mri-ihg/EVAdb.git
Third-Party Data
When building the container images and on first startup of the application, it is recommended to have some third party datasets available. It is possible to run the application without this data for development purposes, but in a production setup these datasets should be present in order for all features to work as expected (f.e. gnomAD filtering).
hg19 vs GRCh38
All library file URL's have to be adjusted for hg38
If you intend to process hg38
with the current version of EVAdb, make
sure to include the correct library files. All URL's in the table below are for hg19
.
Dataset Storage
Some of the datasets are very large. gnomAD genomes for example exceeds 200GB in size. Make sure to have enough disk space available when downloading these assets.
Dataset | Description | URL |
---|---|---|
dbNSFP | Polyphen2 and SIFT scores are taken from dbNSFP | ftp://dbnsfp:dbnsfp@dbnsfp.softgenetics.com/dbNSFPv3.5a.zip |
CADD | Cadd scores | https://krishna.gs.washington.edu/download/CADD/v1.6/GRCh37/whole_genome_SNVs.tsv.gz |
gnomAD | Genome aggregation database Exome and Genome builds | https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.vcf.bgz |
https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/genomes/gnomad.genomes.r2.1.1.sites.vcf.bgz | ||
OMIM | Mendelian inheritance in men | Request individual acces at omim.org |