ItLnc-BXE: a Bagging-XGBoost-ensemble method with multiple features for identification of plant lncRNAs

November 01, 2019 ยท Entered Twilight ยท + Add venue

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"Last commit was 6.0 years ago (โ‰ฅ5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: README.md, Supplementary.docx, src

Authors Guangyan Zhang, Ziru Liu, Jichen Dai, Zilan Yu, Shuai Liu, Wen Zhang arXiv ID 1911.00185 Category q-bio.GN Cross-listed cs.LG Citations 0 Repository https://github.com/BioMedicalBigDataMiningLab/ItLnc-BXE โญ 2 Last Checked 2 months ago
Abstract
Motivation: Since long non-coding RNAs (lncRNAs) have involved in a wide range of functions in cellular and developmental processes, an increasing number of methods have been proposed for distinguishing lncRNAs from coding RNAs. However, most of the existing methods are designed for lncRNAs in animal systems, and only a few methods focus on the plant lncRNA identification. Different from lncRNAs in animal systems, plant lncRNAs have distinct characteristics. It is desirable to develop a computational method for accurate and robust identification of plant lncRNAs. Results: Herein, we present a plant lncRNA identification method ItLnc-BXE, which utilizes multiple features and the ensemble learning strategy. First, a diversity of lncRNA features is collected and filtered by feature selection to represent RNA transcripts. Then, several base learners are trained and further combined into a single meta-learner by ensemble learning, and thus an ItLnc-BXE model is constructed. ItLnc-BXE models are evaluated on datasets of six plant species, the results show that ItLnc-BXE outperforms other state-of-the-art plant lncRNA identification methods, achieving better and robust performances (AUC>95.91%). We also perform some experiments about cross-species lncRNA identification, and the results indicate that dicots-based and monocots-based models can be used to accurately identify lncRNAs in lower plant species, such as mosses and algae. Availability: source codes are available at https://github.com/BioMedicalBigDataMiningLab/ItLnc-BXE. Contact: zhangwen@mail.hzau.edu.cn (or) zhangwen@whu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” q-bio.GN