Though academic research is a slow moving and rigid process, the rate of scientific output has exploded in the last 50 years.1 According to Borman (2014)2, the contemporary scientific knowledge doubles every 9 years. In areas like healthcare, the doubling rate is as fast as every three years and expected to increase to every 10 weeks by the early 2020’s3. For overwhelmed researchers tasked with navigating the growing stack of science literature — the value isn’t in having so much new knowledge, but being able to find the relevant insights when they need them1.
This growth is particularly acute in relatively new fields of knowledge such as microRNA diagnostics5. In response to that, Miroculus built Loom.bio, a search tool that uses machine learning and graphs to determine the relationship between specific microRNA, diseases, and genes.
MicroRNAs are small non-coding RNA molecules, whose primary role is to regulate the expression of our genes. Their discovery in circulation in body fluids such as blood plasma/serum, urine and saliva has been followed up by a multitude of studies, providing evidence that detection of specific microRNA molecules can give clues about a person’s health status and may therefore be used as biomarkers for various conditions.
The Loom dataset is one of the inputs to our machine learning models to identify relevant microRNAs in a disease of interest, and we are making it accessible and open because we believe it may prove valuable in accelerating research efforts in the microRNA space.
Loom is an up to date snapshot of the microRNA literature landscape we built to expedite our own research. In spite of existing, more focused attempts to distill the PubMed abstract corpus to microRNA insights such as miRcancer4 (microRNA-cancer relationships, no longer available) and miRTex5 (microRNA-gene relationships), as of today there is no compelling way to access much of the microRNA research.
Using Loom's easy-to-use, interactive UI, the researcher is able to quickly locate the relevant sentences across many publications relating specific microRNAs with her disease or gene of interest. With this tool, our objective is to provide a visually compelling and complete overview of how microRNAs relate to specific diseases and genes.
The backend Loom comprises 4 micro-services. The first one is a listener that fetches new publications available in the NCBI databases on a daily basis: Pubmed for abstracts, PMC for full-text open-access publications. Then, a natural language processor scans the publication, breaking them down into their constituent sentences and detecting mentions of microRNAs, genes, and diseases. Within each sentence, a machine learned scorer evaluates the strength and type of relationship on a scale from 0 to 1 and outputs the results in a graph database.
The resulting graph database is then queried in real-time by the UI to retrieve the sentences and relationships the user is interested in. In the example video below, and with just a couple of clicks, the user gets a list of the many scientific findings relating stomach cancer with microRNA-29a, then browses the genes related to microRNA-142.
We at Miroculus believe that other groups and companies might benefit from the things we have learned and challenges we have come across. Because of this, we decided to make Loom open access. If you use it, find it useful, and/or have any questions or feedback about it, we would love to hear about it!
Lutz Bornmann, Ruediger Mutz.(2014). “Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references”. Journal of the Association for Information Science and Technology. Wiley-Blackwel. arXiv:1402.4578 ↩
Boya Xie, Qin Ding, Hongjin Han, Di Wu (2013). "miRCancer: a microRNA-cancer association database constructed by text mining on literature". Bioinformatics (Oxford, England), Vol. 29, No. 5., pp. 638-644, doi:10.1093/bioinformatics/btt014 ↩
Li G, Ross KE, Arighi CN, Peng Y, Wu CH, Vijay-Shanker K (2015) "miRTex: A Text Mining System for miRNA-Gene Relation Extraction". PLoS Comput Biol 11(9): e1004391. doi:10.1371/journal.pcbi.1004391 ↩