DOI QR코드

DOI QR Code

LitCovid-AGAC: cellular and molecular level annotation data set based on COVID-19

  • Ouyang, Sizhuo (Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University) ;
  • Wang, Yuxing (Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University) ;
  • Zhou, Kaiyin (Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University) ;
  • Xia, Jingbo (Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University)
  • Received : 2021.03.17
  • Accepted : 2021.09.13
  • Published : 2021.09.30

Abstract

Currently, coronavirus disease 2019 (COVID-19) literature has been increasing dramatically, and the increased text amount make it possible to perform large scale text mining and knowledge discovery. Therefore, curation of these texts becomes a crucial issue for Bio-medical Natural Language Processing (BioNLP) community, so as to retrieve the important information about the mechanism of COVID-19. PubAnnotation is an aligned annotation system which provides an efficient platform for biological curators to upload their annotations or merge other external annotations. Inspired by the integration among multiple useful COVID-19 annotations, we merged three annotations resources to LitCovid data set, and constructed a cross-annotated corpus, LitCovid-AGAC. This corpus consists of 12 labels including Mutation, Species, Gene, Disease from PubTator, GO, CHEBI from OGER, Var, MPA, CPA, NegReg, PosReg, Reg from AGAC, upon 50,018 COVID-19 abstracts in LitCovid. Contain sufficient abundant information being possible to unveil the hidden knowledge in the pathological mechanism of COVID-19.

Keywords

Acknowledgement

This work is partially funded by the HZAU intramural innovative science funding, grant no. 2662021JC008. We would like to express our gratitude to many instructive discussion among BLAH7 Hackathon (https://blah7.linkedannotation.org/home). Qingyu Chen generously introduced LitCovid and PubTator annotation services. Fabio Rinaldi introduced OGER. Steven Vercruysse kindly offered the knowledge representation template in terms of AGAC mined logic for instructive visualization.

References

  1. Chen Q, Allot A, Lu Z. Keep up with the latest coronavirus research. Nature 2020;579:193. https://doi.org/10.1038/d41586-020-00694-1
  2. Chen Q, Allot A, Lu Z. LitCovid: an open database of COVID-19 literature. Nucleic Acids Res 2021;49:D1534-D1540. https://doi.org/10.1093/nar/gkaa952
  3. Wei CH, Allot A, Leaman R, Lu Z. PubTator central: automated concept annotation for biomedical full text articles. Nucleic Acids Res 2019;47:W587-W593. https://doi.org/10.1093/nar/gkz389
  4. Furrer L, Jancso A, Colic N, Rinaldi F. OGER++: hybrid multi-type entity recognition. J Cheminform 2019;11:7. https://doi.org/10.1186/s13321-018-0326-3
  5. Furrer L, Cornelius J, Rinaldi F. Parallel sequence tagging for concept recognition. Preprint at https://arxiv.org/abs/2003.07424 (2020).
  6. Wang Y, Zhou K, Kim JD, Cohen KB, Gachloo M, Ren Y, et al. An active gene annotation corpus and its application on anti-epilepsy drug discovery. 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 2019 Nov 18-21; San Diego, CA, USA. New York: Institute of Electrical and Electronics Engineers, 2019. pp 512-519.
  7. Zhou KY, Wang YX, Zhang S, Gachloo M, Kim JD, Luo Q, et al. GOF/LOF knowledge inference with tensor decomposition in support of high order link discovery for gene, mutation and disease. Math Biosci Eng 2019;16:1376-1391. https://doi.org/10.3934/mbe.2019067
  8. Wang Y, Zhou K, Gachloo M, Xia J. An overview of the active gene annotation corpus and the BioNLP OST 2019 AGAC Track Tasks. Proceedings of the 5th Workshop on BioNLP Open Shared Tasks; 2019 Nov 4; Hong Kong, China. Stroudsburg: Association for Computational Linguistics, 2019. pp 62-71.
  9. Zhou K, Wang Y, Bretonnel Cohen K, Kim JD, Ma X, Shen Z, et al. Bridging heterogeneous mutation data to enhance disease gene discovery. Brief Bioinform 2021;22:bbab079. https://doi.org/10.1093/bib/bbab079
  10. Kim JD, Wang Y, Fujiwara T, Okuda S, Callahan TJ, Cohen KB. Open Agile text mining for bioinformatics: the PubAnnotation ecosystem. Bioinformatics 2019;35:4372-4380. https://doi.org/10.1093/bioinformatics/btz227
  11. Degtyarenko K, de Matos P, Ennis M, Hastings J, Zbinden M, McNaught A, et al. ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Res 2008;36:D344-D350.
  12. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000;25:25-29. https://doi.org/10.1038/75556
  13. Vercruysse S, Zobolas J, Toure V, Andersen MK, Kuiper M. VSM-box: general-purpose interface for biocuration and knowledge representation. Preprint at https://www.preprints.org/manuscript/202007.0557/v1 (2020).