Introduction to gtexr

The GTEx Portal API V2 enables programmatic access to data available from the Genotype-Tissue Expression Portal. The gtexr package wraps this API, providing R functions that correspond to each API endpoint:

Shiny app

Users can try out all functions interatively with the ⭐gtexr shiny app⭐, which pre-populates query parameters with those for the first working example from each function’s documentation.


The rest of this vignette outlines some example applications of gtexr.


Get build 37 coordinates for a variant

get_variant(snpId = "rs1410858") |>
    col = b37VariantId,
    into = c(
    sep = "_",
    remove = FALSE
  ) |>
#> ── Paging info ─────────────────────────────────────────────────────────────────
#> • numberOfPages = 1
#> • page = 0
#> • maxItemsPerPage = 250
#> • totalNumberOfItems = 1
#> # A tibble: 1 × 7
#>   snpId     b37VariantId chromosome position reference_allele alternative_allele
#>   <chr>     <chr>        <chr>      <chr>    <chr>            <chr>             
#> 1 rs1410858 1_153182116… 1          1531821… C                A                 
#> # ℹ 1 more variable: genome_build <chr>

Convert gene symbol to versioned GENCODE ID

Use get_gene() or get_genes()

get_genes("CRP") |>
  select(geneSymbol, gencodeId)
#> ── Paging info ─────────────────────────────────────────────────────────────────
#> • numberOfPages = 1
#> • page = 0
#> • maxItemsPerPage = 250
#> • totalNumberOfItems = 1
#> # A tibble: 1 × 2
#>   geneSymbol gencodeId         
#>   <chr>      <chr>             
#> 1 CRP        ENSG00000132693.12

Convert rsID to GTEx variant ID

Use get_variant()

get_variant(snpId = "rs1410858") |>
  select(snpId, variantId)
#> ── Paging info ─────────────────────────────────────────────────────────────────
#> • numberOfPages = 1
#> • page = 0
#> • maxItemsPerPage = 250
#> • totalNumberOfItems = 1
#> # A tibble: 1 × 2
#>   snpId     variantId             
#>   <chr>     <chr>                 
#> 1 rs1410858 chr1_153209640_C_A_b38

For a gene of interest, which tissues have significant cis-eQTLs?

Use get_significant_single_tissue_eqtls() (note this requires versioned GENCODE IDs)

gene_symbol_of_interest <- "CRP"

gene_gencodeId_of_interest <- get_genes(gene_symbol_of_interest) |>
  pull(gencodeId) |>

gene_gencodeId_of_interest |>
  get_significant_single_tissue_eqtls() |>
  distinct(geneSymbol, gencodeId, tissueSiteDetailId)
#> ── Paging info ─────────────────────────────────────────────────────────────────
#> • numberOfPages = 1
#> • page = 0
#> • maxItemsPerPage = 250
#> • totalNumberOfItems = 93
#> # A tibble: 3 × 3
#>   geneSymbol gencodeId          tissueSiteDetailId                 
#>   <chr>      <chr>              <chr>                              
#> 1 CRP        ENSG00000132693.12 Thyroid                            
#> 2 CRP        ENSG00000132693.12 Esophagus_Gastroesophageal_Junction
#> 3 CRP        ENSG00000132693.12 Muscle_Skeletal

Get data for non-eQTL variants

Some analyses (e.g. Mendelian randomisation) require data for variants which may or may not be significant eQTLs. Use calculate_expression_quantitative_trait_loci() with purrr::map() to retrieve data for multiple variants

variants_of_interest <- c("rs12119111", "rs6605071", "rs1053870")

variants_of_interest |>
  set_names() |>
    \(x) calculate_expression_quantitative_trait_loci(
      tissueSiteDetailId = "Liver",
      gencodeId = "ENSG00000237973.1",
      variantId = x
  ) |>
  bind_rows(.id = "rsid") |>
  # optionally, reformat output - first extract genomic coordinates and alleles
    col = "variantId",
    into = c(
    sep = "_"
  ) |>
  # ...then ascertain alternative_allele frequency
    alt_allele_count = (2 * homoAltCount) + hetCount,
    total_allele_count = 2 * (homoAltCount + hetCount +  homoRefCount),
    alternative_allele_frequency = alt_allele_count / total_allele_count
  ) |>
    beta = nes,
    se = error,
    minor_allele_frequency = maf,
#> # A tibble: 3 × 12
#>   rsid         beta     se  pValue minor_allele_frequency alternative_allele_f…¹
#>   <chr>       <dbl>  <dbl>   <dbl>                  <dbl>                  <dbl>
#> 1 rs121191…  0.0270 0.0670 6.88e-1                 0.365                   0.635
#> 2 rs6605071 -0.601  0.166  3.88e-4                 0.0409                  0.959
#> 3 rs1053870  0.0247 0.0738 7.38e-1                 0.214                   0.214
#> # ℹ abbreviated name: ¹​alternative_allele_frequency
#> # ℹ 6 more variables: chromosome <chr>, position <chr>, reference_allele <chr>,
#> #   alternative_allele <chr>, genome_build <chr>, tissueSiteDetailId <chr>

