Survey: Enrichr
Enrichr の論文をざっくり眺める
読んだことなかったので
ツールのコンセプトと実装はシンプルだけどカバレッジが広いのでパワフル
便利ツール生産ラボ maayanlab の「何をして何をしない」の哲学が知れるとよさそう
だが時間がないのでひとまず何が書いてあるかを把握する
3本出てる
1. Chen, E. Y., Tan, C. M., Kou, Y., Duan, Q., Wang, Z., Meirelles, G. V., Clark, N. R., & Ma’ayan, A. (2013). Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. In BMC Bioinformatics (Vol. 14, Issue 1). Springer Science and Business Media LLC. https://doi.org/10.1186/1471-2105-14-128 2. Kuleshov, M. V., Jones, M. R., Rouillard, A. D., Fernandez, N. F., Duan, Q., Wang, Z., Koplev, S., Jenkins, S. L., Jagodnik, K. M., Lachmann, A., McDermott, M. G., Monteiro, C. D., Gundersen, G. W., & Ma’ayan, A. (2016). Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. In Nucleic Acids Research (Vol. 44, Issue W1, pp. W90–W97). Oxford University Press (OUP). https://doi.org/10.1093/nar/gkw377 3. Xie, Z., Bailey, A., Kuleshov, M. V., Clarke, D. J. B., Evangelista, J. E., Jenkins, S. L., Lachmann, A., Wojciechowicz, M. L., Kropiwnicki, E., Jagodnik, K. M., Jeon, M., & Ma’ayan, A. (2021). Gene Set Knowledge Discovery with Enrichr. In Current Protocols (Vol. 1, Issue 3). Wiley. https://doi.org/10.1002/cpz1.90 1本目, BMC Bioinfo 2013
Cited by 3419 !!!!!!
データセットの詳細、検定の評価の手法、デモンストレーションとその結果の解釈をやっている
TogoDXの論文の構成に参考になるかも
Background/Objective
Gene ontology ばっかり、Fisher の正確検定ばっかり、でも後者はサンプルサイズにバイアスがある
While many gene-set libraries and tools for performing enrichment analysis already exist, there is a growing need for them and there are more ways to improve and validate gene set enrichment methods
GSEAをやるための Gene set のライブラリやツールが沢山ある => それらを評価する必要がある
Achievements
Enrichr, an integrative web-based and mobile software application that
includes many new gene-set libraries
a new approach to rank enriched terms
powerful interactive visualizations of the results in new ways.
Significance
contains 35 gene-set libraries where some libraries are borrowed from other tools while many other libraries are newly created and only available in Enrichr
Dataset まとめ
Transcription category (6 gene-set libraries)
The ChIP-x Enrichment Analysis (ChEA) database (Original)
PWMs from TRANSFAC and JASPAR
Transcription factor target genes inferred from PWMs for the human genome were downloaded from the UCSC Genome Browser
The ENCODE transcription factor gene-set library
The Histone modification from the NIH Roadmap Epigenomics
The microRNA gene set created from the TargetScan online database
Pathways category
The pathway associated gene-set libraries created from
BioCarta
The Kinase Enrichment Analysis (KEA) gene-set library
Expression2Kinases
created from a recent study that profiled nuclear complexes in human breast cancer cell lines after applying over 3000 immuno-precipitations followed by mass-spectrometry (IP-MS) experiments using over 1000 different antibodies
created from the mammalian complexes database, CORUM
Ontology category
gene-set libraries created from the three gene ontology trees 6 from the knockout mouse phenotypes ontology developed by the Jackson Lab from their MGI-MP browser 38. Disease/drugs category
gene set libraries created from
the Connectivity Map database 39, Cell type category
four gene-set libraries:
genes highly expressed in human and mouse tissues extracted from the Mouse and Human Gene Atlases 44 genes highly expressed in cancer cell lines
from the Cancer Cell Line Encyclopedia (CCLE) 45 and Dataset (コピペ)
The transcription category provides six gene-set libraries that attempt to link differentially expressed genes with the transcriptional machinery
The ChIP-x Enrichment Analysis (ChEA) database is our own resource for storing putative targets for transcription factors extracted from publications that report experiments of profiling transcription factors binding to DNA in mammalian cells
PWMs from TRANSFAC and JASPAR were used to scan the promoters of all human genes in the region −2000 and +500 from the transcription factor start site (TSS).
Transcription factor target genes inferred from PWMs for the human genome were downloaded from the UCSC Genome Browser FTP site which contains many resources for gene and sequence annotations
The ENCODE transcription factor gene-set library is the fourth method to create a transcription factor/target gene set library
The Histone modification gene-set library was created by processing experiments from the NIH Roadmap Epigenomics
The microRNA gene set library was created by processing data from the TargetScan online database
The pathways category includes gene-set libraries from well-known pathway databases
The pathway associated gene-set libraries were created from each of the above databases by converting members of each pathway from each pathway database to a list of human genes
WikiPathways 25, KEGG 26, BioCarta, and Reactome 27 The Kinase Enrichment Analysis (KEA) gene-set library contains human or mouse kinases and their known substrates collected from literature reports as provided by six kinase-substrate databases: HPRD 32, PhosphoSite 33, PhosphoPoint 34, Phospho.Elm 35, NetworKIN 36, and MINT 37. Expression2Kinases
created from a recent study that profiled nuclear complexes in human breast cancer cell lines after applying over 3000 immuno-precipitations followed by mass-spectrometry (IP-MS) experiments using over 1000 different antibodies
created from the mammalian complexes database, CORUM
The ontology category contains gene-set libraries created from the three gene ontology trees 6 and from the knockout mouse phenotypes ontology developed by the Jackson Lab from their MGI-MP browser 38. The disease/drugs category has gene set libraries created from the Connectivity Map database 39, GeneSigDB 40, MSigDB 5, OMIM 41, and VirusMINT 42. The Connectivity Map (CMAP) database 39 contains over 6,000 Affymetrix microarray gene expression experiments where human cancer cell lines were treated with over 1,300 drugs, many of them FDA approved, and changes in expression where measured after six hours The GeneSigDB gene-set library was borrowed from the GeneSigDB database
The OMIM gene-set library was created directly from the NCBI’s OMIM Morbid Map
The VirusMINT gene-set library was created from the VirusMINT database
The MSigDB computational and MSigDB oncogenic signature gene-set libraries were borrowed from the MSigDB database from categories C4 and C6
The cell type category is made of four gene-set libraries: genes highly expressed in human and mouse tissues extracted from the Mouse and Human Gene Atlases 44 and genes highly expressed in cancer cell lines from the Cancer Cell Line Encyclopedia (CCLE) 45 and NCI-60 46. 2本目, NAR 2016
Cited by 4174 !!!!
Updates
The new gene set libraries that were added include differentially expressed genes after drug, gene, disease and pathogen perturbations extracted from the national center for biotechnology information (NCBI) gene expression omnibus (GEO) through a crowdsourcing project
submit fuzzy sets
upload BED files
a calendar that shows the number of lists submitted each day
an improved application programming interface (API)
an enhanced help documentation
an improved Find a Gene feature, and visualization of the results as clustergrams.
3本目, Current Protocols 2021
Cited by 147 !!!
Protocol paper
Basic Protocol 1: Analyzing lists of differentially expressed genes from transcriptomics, proteomics and phosphoproteomics, GWAS studies, or other experimental studies
Basic Protocol 2: Searching Enrichr by a single gene or key search term
Basic Protocol 3: Preparing raw or processed RNA-seq data through BioJupies in preparation for Enrichr analysis
Basic Protocol 4: Analyzing gene sets for model organisms using modEnrichr
Basic Protocol 5: Using Enrichr in Geneshot
Basic Protocol 6: Using Enrichr in ARCHS4
Basic Protocol 7: Using the enrichment analysis visualization Appyter to visualize Enrichr results
Basic Protocol 8: Using the Enrichr API
Basic Protocol 9: Adding an Enrichr button to a website