Skip to contents

IGVF background

The Impact of Genomic Variation on Function (IGVF) Consortium,

aims to understand how genomic variation affects genome function, which in turn impacts phenotype. The NHGRI is funding this collaborative program that brings together teams of investigators who will use state-of-the-art experimental and computational approaches to model, predict, characterize and map genome function, how genome function shapes phenotype, and how these processes are affected by genomic variation. These joint efforts will produce a catalog of the impact of genomic variants on genome function and phenotypes.

The IGVF Catalog described in the last sentence is available through a number of interfaces, including a web interface as well as two programmatic interfaces. In addition, there is a Data Portal, where raw and processed data can be downloaded, with its own web interface and programmatic interface. This package, rigvf, focuses on the Catalog and not the Data Portal.

The IGVF Catalog is a form of knowledge graph, where the nodes are biological entities such as variants, genes, pathways, etc. and edges are relationships between such nodes, e.g. empirically measured effects of variants on cis-regulatory elements (CREs) or on transcripts and proteins. These edges may have metadata including information about cell type context and information about how the association was measured, e.g. which experiment or predictive model.

This package

This proof-of-concept illustrates how to access the IGVF Catalog. Only limited functionality is implemented.

Catalog API

The IGVF offers two programmatic interfaces. The ‘catalog’ https://api.catalog.igvf.org/# is prefered, with optimized queries of relevant information. Queries are simple REST requests implemented using the httr2 package. Here we query variants associated with “GCK”; one could also use, e.g., Ensembl identifiers.

rigvf::gene_variants(gene_name = "GCK")
#> # A tibble: 25 × 9
#>    `sequence variant`      gene  label log10pvalue effect_size source source_url
#>    <chr>                   <chr> <chr>       <dbl>       <dbl> <chr>  <chr>     
#>  1 variants/8c6a683829bcb… gene… eQTL         4.89       0.274 GTEx   https://s…
#>  2 variants/cf796b5a16212… gene… eQTL         5.76       0.221 GTEx   https://s…
#>  3 variants/9a36af4633321… gene… eQTL         6.17      -0.266 GTEx   https://s…
#>  4 variants/2fefe07a0750b… gene… eQTL         3.69       0.158 GTEx   https://s…
#>  5 variants/ab6df1152a643… gene… eQTL        16.9       -0.353 GTEx   https://s…
#>  6 variants/92833b52621e5… gene… eQTL         4.86      -0.170 GTEx   https://s…
#>  7 variants/bceca4e6ac3cd… gene… eQTL         4.63      -0.340 GTEx   https://s…
#>  8 variants/0a8ba63e5451a… gene… eQTL         4.94       0.215 GTEx   https://s…
#>  9 variants/80f639e0da643… gene… eQTL         6.59      -0.330 GTEx   https://s…
#> 10 variants/7f4ca6f1cfd70… gene… eQTL         4.10      -0.165 GTEx   https://s…
#> # ℹ 15 more rows
#> # ℹ 2 more variables: biological_context <chr>, chr <chr>

response <- rigvf::gene_variants(gene_id = "ENSG00000106633", verbose = TRUE)
response
#> # A tibble: 25 × 9
#>    `sequence variant` gene              label log10pvalue effect_size source
#>    <list>             <list>            <chr>       <dbl>       <dbl> <chr> 
#>  1 <named list [14]>  <named list [11]> eQTL         4.89       0.274 GTEx  
#>  2 <named list [14]>  <named list [11]> eQTL         5.76       0.221 GTEx  
#>  3 <named list [14]>  <named list [11]> eQTL         6.17      -0.266 GTEx  
#>  4 <named list [14]>  <named list [11]> eQTL         3.69       0.158 GTEx  
#>  5 <named list [14]>  <named list [11]> eQTL        16.9       -0.353 GTEx  
#>  6 <named list [14]>  <named list [11]> eQTL         4.86      -0.170 GTEx  
#>  7 <named list [14]>  <named list [11]> eQTL         4.63      -0.340 GTEx  
#>  8 <named list [14]>  <named list [11]> eQTL         4.94       0.215 GTEx  
#>  9 <named list [14]>  <named list [11]> eQTL         6.59      -0.330 GTEx  
#> 10 <named list [14]>  <named list [11]> eQTL         4.10      -0.165 GTEx  
#> # ℹ 15 more rows
#> # ℹ 3 more variables: source_url <chr>, biological_context <chr>, chr <chr>

response |>
    dplyr::select(`sequence variant`) |>
    tidyr::unnest_wider(`sequence variant`)
#> # A tibble: 25 × 14
#>    organism     `_id`    chr      pos rsid  ref   alt   spdi  hgvs  qual  filter
#>    <chr>        <chr>    <chr>  <int> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> 
#>  1 Homo sapiens 8c6a683… chr7  4.41e7 rs25… G     A     NC_0… NC_0… .     NA    
#>  2 Homo sapiens cf796b5… chr7  4.41e7 rs29… T     C     NC_0… NC_0… .     NA    
#>  3 Homo sapiens 9a36af4… chr7  4.41e7 rs11… GT    G     NC_0… NC_0… .     NA    
#>  4 Homo sapiens 2fefe07… chr7  4.43e7 rs28… G     A     NC_0… NC_0… .     NA    
#>  5 Homo sapiens ab6df11… chr7  4.41e7 rs22… C     G     NC_0… NC_0… .     NA    
#>  6 Homo sapiens 92833b5… chr7  4.41e7 rs41… G     A     NC_0… NC_0… .     NA    
#>  7 Homo sapiens bceca4e… chr7  4.40e7 rs76… T     G     NC_0… NC_0… .     NA    
#>  8 Homo sapiens 0a8ba63… chr7  4.41e7 rs25… A     G     NC_0… NC_0… .     NA    
#>  9 Homo sapiens 80f639e… chr7  4.41e7 rs14… A     AG    NC_0… NC_0… .     NA    
#> 10 Homo sapiens 7f4ca6f… chr7  4.42e7 rs29… A     T     NC_0… NC_0… .     NA    
#> # ℹ 15 more rows
#> # ℹ 3 more variables: annotations <list>, source <chr>, source_url <chr>

ArangoDB API

The ‘ArangoDB’ REST API provides flexibility but requires greater understanding of Arango Query Language and the database schema. Documentation is available in the database under the ‘Support’ menu item ‘REST API’ tab using username ‘guest’ and password ‘guestigvfcatalog’.

The following directly queries the database for variants of an Ensembl gene id.

rigvf::db_gene_variants("ENSG00000106633", threshold = 0.85)
#> # A tibble: 40 × 9
#>    `_key`              `_id` `_from` `_to` `_rev` `score:long` source source_url
#>    <chr>               <chr> <chr>   <chr> <chr>         <dbl> <chr>  <chr>     
#>  1 genic_chr7_4415452… regu… regula… gene… _g5CU…        0.989 ENCOD… https://w…
#>  2 promoter_chr7_4415… regu… regula… gene… _g5CU…        0.869 ENCOD… https://w…
#>  3 genic_chr7_4414584… regu… regula… gene… _g5CU…        0.948 ENCOD… https://w…
#>  4 promoter_chr7_4415… regu… regula… gene… _g5CU…        1.00  ENCOD… https://w…
#>  5 promoter_chr7_4415… regu… regula… gene… _g5CU…        1.00  ENCOD… https://w…
#>  6 intergenic_chr7_44… regu… regula… gene… _g5CU…        0.959 ENCOD… https://w…
#>  7 genic_chr7_4415544… regu… regula… gene… _g5CV…        0.942 ENCOD… https://w…
#>  8 promoter_chr7_4415… regu… regula… gene… _g5CV…        0.929 ENCOD… https://w…
#>  9 intergenic_chr7_44… regu… regula… gene… _g5CV…        0.936 ENCOD… https://w…
#> 10 promoter_chr7_4415… regu… regula… gene… _g5CW…        0.966 ENCOD… https://w…
#> # ℹ 30 more rows
#> # ℹ 1 more variable: biological_context <chr>

The AQL is

aql <- system.file(package = "rigvf", "aql", "gene_variants.aql")
readLines(aql) |> noquote()
#> [1] FOR l IN regulatory_regions_genes     
#> [2]     FILTER l._to == @geneid           
#> [3]     FILTER l.`score:long` > @threshold
#> [4]     RETURN l

The help page ?db_queries outlines other available user-facing functions. See ?arango for more developer-oriented information.

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] jsonlite_1.8.9    dplyr_1.1.4       compiler_4.4.2    tidyselect_1.2.1 
#>  [5] rigvf_0.0.2       tidyr_1.3.1       jquerylib_0.1.4   systemfonts_1.1.0
#>  [9] textshaping_0.4.1 yaml_2.3.10       fastmap_1.2.0     R6_2.5.1         
#> [13] rjsoncons_1.3.1   generics_0.1.3    curl_6.1.0        httr2_1.0.7      
#> [17] knitr_1.49        tibble_3.2.1      desc_1.4.3        bslib_0.8.0      
#> [21] pillar_1.10.1     rlang_1.1.4       utf8_1.2.4        cachem_1.1.0     
#> [25] xfun_0.50         fs_1.6.5          sass_0.4.9        memoise_2.0.1    
#> [29] cli_3.6.3         withr_3.0.2       pkgdown_2.1.1     magrittr_2.0.3   
#> [33] digest_0.6.37     rappdirs_0.3.3    lifecycle_1.0.4   vctrs_0.6.5      
#> [37] evaluate_1.0.3    glue_1.8.0        whisker_0.4.1     ragg_1.3.3       
#> [41] rmarkdown_2.29    purrr_1.0.2       tools_4.4.2       pkgconfig_2.0.3  
#> [45] htmltools_0.5.8.1