VMware, Inc.
AUTOMATED METHODS AND SYSTEMS THAT FACILITATE ROOT-CAUSE ANALYSIS OF DISTRIBUTED-APPLICATION OPERATIONAL PROBLEMS AND FAILURES BY GENERTING NOISE-SUBTRACTED CALL-TRACE-CLASSIFICATION RULES
Last updated:
Abstract:
The current document is directed to methods and systems that employ call traces collected by one or more call-trace services to generate call-trace-classification rules to facilitate root-cause analysis of distributed-application operational problems and failures. In a described implementation, a set of automatically labeled call traces is partitioned by the generated call-trace-classification rules. Call-trace-classification-rule generation is constrained to produce relatively simple rules with greater-than-threshold confidences and coverages. The call-trace-classification rules may point to particular services and service failures, which provides useful information to distributed-application and distributed-computer-system managers and administrators attempting to diagnose operational problems and failures that arise during execution of distributed applications within distributed computer systems. A first dataset is collected during normal distributed-application operation and a second dataset is collected during problem-associated or failure-associated operation of the distributed application. The first and second datasets are used to generate noise-subtracted call-trace-classification rules and/or diagnostic suggestions.
Utility
1 Oct 2021
24 Feb 2022