IGVF background
The Impact of Genomic Variation on Function (IGVF) Consortium,
aims to understand how genomic variation affects genome function, which in turn impacts phenotype. The NHGRI is funding this collaborative program that brings together teams of investigators who will use state-of-the-art experimental and computational approaches to model, predict, characterize and map genome function, how genome function shapes phenotype, and how these processes are affected by genomic variation. These joint efforts will produce a catalog of the impact of genomic variants on genome function and phenotypes.
The IGVF Catalog described in the last sentence is available
through a number of interfaces, including a web interface as well as two
programmatic interfaces. In addition, there is a Data Portal,
where raw and processed data can be downloaded, with its own web interface and programmatic
interface. This package, rigvf
, focuses on the
Catalog and not the Data Portal.
The IGVF Catalog is a form of knowledge graph, where the nodes are biological entities such as variants, genes, pathways, etc. and edges are relationships between such nodes, e.g. empirically measured effects of variants on cis-regulatory elements (CREs) or on transcripts and proteins. These edges may have metadata including information about cell type context and information about how the association was measured, e.g. which experiment or predictive model.
This package
This proof-of-concept illustrates how to access the IGVF Catalog. Only limited functionality is implemented.
Catalog API
The IGVF offers two programmatic interfaces. The ‘catalog’ https://api.catalog.igvf.org/# is prefered, with optimized queries of relevant information. Queries are simple REST requests implemented using the httr2 package. Here we query variants associated with “GCK”; one could also use, e.g., Ensembl identifiers.
rigvf::gene_variants(gene_name = "GCK")
#> # A tibble: 25 × 9
#> `sequence variant` gene label log10pvalue effect_size source source_url
#> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
#> 1 variants/8c6a683829bcb… gene… eQTL 4.89 0.274 GTEx https://s…
#> 2 variants/cf796b5a16212… gene… eQTL 5.76 0.221 GTEx https://s…
#> 3 variants/9a36af4633321… gene… eQTL 6.17 -0.266 GTEx https://s…
#> 4 variants/2fefe07a0750b… gene… eQTL 3.69 0.158 GTEx https://s…
#> 5 variants/ab6df1152a643… gene… eQTL 16.9 -0.353 GTEx https://s…
#> 6 variants/92833b52621e5… gene… eQTL 4.86 -0.170 GTEx https://s…
#> 7 variants/bceca4e6ac3cd… gene… eQTL 4.63 -0.340 GTEx https://s…
#> 8 variants/0a8ba63e5451a… gene… eQTL 4.94 0.215 GTEx https://s…
#> 9 variants/80f639e0da643… gene… eQTL 6.59 -0.330 GTEx https://s…
#> 10 variants/7f4ca6f1cfd70… gene… eQTL 4.10 -0.165 GTEx https://s…
#> # ℹ 15 more rows
#> # ℹ 2 more variables: biological_context <chr>, chr <chr>
response <- rigvf::gene_variants(gene_id = "ENSG00000106633", verbose = TRUE)
response
#> # A tibble: 25 × 9
#> `sequence variant` gene label log10pvalue effect_size source
#> <list> <list> <chr> <dbl> <dbl> <chr>
#> 1 <named list [14]> <named list [11]> eQTL 4.89 0.274 GTEx
#> 2 <named list [14]> <named list [11]> eQTL 5.76 0.221 GTEx
#> 3 <named list [14]> <named list [11]> eQTL 6.17 -0.266 GTEx
#> 4 <named list [14]> <named list [11]> eQTL 3.69 0.158 GTEx
#> 5 <named list [14]> <named list [11]> eQTL 16.9 -0.353 GTEx
#> 6 <named list [14]> <named list [11]> eQTL 4.86 -0.170 GTEx
#> 7 <named list [14]> <named list [11]> eQTL 4.63 -0.340 GTEx
#> 8 <named list [14]> <named list [11]> eQTL 4.94 0.215 GTEx
#> 9 <named list [14]> <named list [11]> eQTL 6.59 -0.330 GTEx
#> 10 <named list [14]> <named list [11]> eQTL 4.10 -0.165 GTEx
#> # ℹ 15 more rows
#> # ℹ 3 more variables: source_url <chr>, biological_context <chr>, chr <chr>
response |>
dplyr::select(`sequence variant`) |>
tidyr::unnest_wider(`sequence variant`)
#> # A tibble: 25 × 14
#> organism `_id` chr pos rsid ref alt spdi hgvs qual filter
#> <chr> <chr> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <lgl>
#> 1 Homo sapiens 8c6a683… chr7 4.41e7 rs25… G A NC_0… NC_0… . NA
#> 2 Homo sapiens cf796b5… chr7 4.41e7 rs29… T C NC_0… NC_0… . NA
#> 3 Homo sapiens 9a36af4… chr7 4.41e7 rs11… GT G NC_0… NC_0… . NA
#> 4 Homo sapiens 2fefe07… chr7 4.43e7 rs28… G A NC_0… NC_0… . NA
#> 5 Homo sapiens ab6df11… chr7 4.41e7 rs22… C G NC_0… NC_0… . NA
#> 6 Homo sapiens 92833b5… chr7 4.41e7 rs41… G A NC_0… NC_0… . NA
#> 7 Homo sapiens bceca4e… chr7 4.40e7 rs76… T G NC_0… NC_0… . NA
#> 8 Homo sapiens 0a8ba63… chr7 4.41e7 rs25… A G NC_0… NC_0… . NA
#> 9 Homo sapiens 80f639e… chr7 4.41e7 rs14… A AG NC_0… NC_0… . NA
#> 10 Homo sapiens 7f4ca6f… chr7 4.42e7 rs29… A T NC_0… NC_0… . NA
#> # ℹ 15 more rows
#> # ℹ 3 more variables: annotations <list>, source <chr>, source_url <chr>
ArangoDB API
The ‘ArangoDB’ REST API provides flexibility but requires greater understanding of Arango Query Language and the database schema. Documentation is available in the database under the ‘Support’ menu item ‘REST API’ tab using username ‘guest’ and password ‘guestigvfcatalog’.
The following directly queries the database for variants of an Ensembl gene id.
rigvf::db_gene_variants("ENSG00000106633", threshold = 0.85)
#> # A tibble: 40 × 9
#> `_key` `_id` `_from` `_to` `_rev` `score:long` source source_url
#> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 genic_chr7_4415452… regu… regula… gene… _g5CU… 0.989 ENCOD… https://w…
#> 2 promoter_chr7_4415… regu… regula… gene… _g5CU… 0.869 ENCOD… https://w…
#> 3 genic_chr7_4414584… regu… regula… gene… _g5CU… 0.948 ENCOD… https://w…
#> 4 promoter_chr7_4415… regu… regula… gene… _g5CU… 1.00 ENCOD… https://w…
#> 5 promoter_chr7_4415… regu… regula… gene… _g5CU… 1.00 ENCOD… https://w…
#> 6 intergenic_chr7_44… regu… regula… gene… _g5CU… 0.959 ENCOD… https://w…
#> 7 genic_chr7_4415544… regu… regula… gene… _g5CV… 0.942 ENCOD… https://w…
#> 8 promoter_chr7_4415… regu… regula… gene… _g5CV… 0.929 ENCOD… https://w…
#> 9 intergenic_chr7_44… regu… regula… gene… _g5CV… 0.936 ENCOD… https://w…
#> 10 promoter_chr7_4415… regu… regula… gene… _g5CW… 0.966 ENCOD… https://w…
#> # ℹ 30 more rows
#> # ℹ 1 more variable: biological_context <chr>
The AQL is
aql <- system.file(package = "rigvf", "aql", "gene_variants.aql")
readLines(aql) |> noquote()
#> [1] FOR l IN regulatory_regions_genes
#> [2] FILTER l._to == @geneid
#> [3] FILTER l.`score:long` > @threshold
#> [4] RETURN l
The help page ?db_queries
outlines other available
user-facing functions. See ?arango
for more
developer-oriented information.
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] jsonlite_1.8.9 dplyr_1.1.4 compiler_4.4.2 tidyselect_1.2.1
#> [5] rigvf_0.0.2 tidyr_1.3.1 jquerylib_0.1.4 systemfonts_1.1.0
#> [9] textshaping_0.4.1 yaml_2.3.10 fastmap_1.2.0 R6_2.5.1
#> [13] rjsoncons_1.3.1 generics_0.1.3 curl_6.1.0 httr2_1.0.7
#> [17] knitr_1.49 tibble_3.2.1 desc_1.4.3 bslib_0.8.0
#> [21] pillar_1.10.1 rlang_1.1.4 utf8_1.2.4 cachem_1.1.0
#> [25] xfun_0.50 fs_1.6.5 sass_0.4.9 memoise_2.0.1
#> [29] cli_3.6.3 withr_3.0.2 pkgdown_2.1.1 magrittr_2.0.3
#> [33] digest_0.6.37 rappdirs_0.3.3 lifecycle_1.0.4 vctrs_0.6.5
#> [37] evaluate_1.0.3 glue_1.8.0 whisker_0.4.1 ragg_1.3.3
#> [41] rmarkdown_2.29 purrr_1.0.2 tools_4.4.2 pkgconfig_2.0.3
#> [45] htmltools_0.5.8.1