International Business Machines Corporation
Optimized incident management using hierarchical clusters of metrics
Last updated:
Abstract:
A method, a computer system, and a computer program product for clustering operational parameter values in a micro-service architecture used in a computing infrastructure. The computer system measures a plurality of operational parameter values of elements of the computing infrastructure and logs identifiers for elements having caused a problem situation and related problem resolution times. The computer system clusters the operational parameter values of the elements having caused the problem situation, according to a correlation function. The computer system orders the operational parameter values within a cluster and the elements having caused the problem situation. The computer system periodically performs the clustering and the ordering such that a sequence of the operational parameter values and the elements having caused the problem situation is indicative of a resolution time required for a new problem situation.
Utility
11 Jul 2019
2 Nov 2021