International Business Machines Corporation
Automated document filtration and priority scoring for document searching and access
Last updated:
Abstract:
Computer-based methods, systems, and computer readable media for managing documents within a content repository or documents within the document subsets are provided. Documents may be pre-processed to be machine readable and classified within the content repository into one or more categories, based upon a number of times classification terms appear in a specific section of the document or based on an article type tag. Document subsets may be generated based on user-defined terms. Documents may be associated with specific cancer-types, genes, gene variants and drugs by comparing relevant search terms to specific sections of the documents. A request for processing the documents may include one or more of the search terms, pertaining to one or more from a group of gene, gene variant, drug, and cancer terms. A priority score may be determined for documents based on a frequency of one or more of the search terms in each of the specific sections, and the documents may be ranked from highest total priority score to lowest total priority score.
Utility
30 Nov 2018
13 Jul 2021