Introduction

With the rapid advancements in molecular biology and genomics, a multitude of connections between RNA and diseases has been unveiled, making the efficient and accurate extraction of RNA-disease relationships (RD relationships) from extensive biomedical literature crucial for advancing research in this field. RDscan, a novel text mining method developed based on the pre-training and fine-tuning strategy, aimed at automatically extracting RD-related information from a vast corpus of literature using pre-trained biomedical large language models. We constructed a dedicated RD corpus, comprising 2,082 positive and 2,000 negative statements, alongside an independent test dataset for training and evaluating RDscan. By fine-tuning the Bioformer and BioBERT pre-trained models, RDscan demonstrated exceptional performance in text classification and named entity recognition (NER) tasks. In summary, RDscan represents the first text mining tool specifically designed for RD relationship extraction.

Please cite us: RDscan: Extracting RNA-disease relationship from the literature based on pre-training model, Methods, 2024, DOI: 10.1016/j.ymeth.2024.05.012



Sister Projects:

About Us: