Intel Corporation
Technologies for monitoring node cluster health
Last updated:
Abstract:
Technologies for monitoring node cluster health include a plurality of managed nodes of anode cluster communicatively coupled across a data network to a resource manager server. The resource manager server is configured to receive health data, via an out-of-band network, from each of the managed nodes of the node cluster. The resource manager server is further configured to identify whether a managed node of the plurality of managed nodes has indicated a failure, determine a cause of the failure, and classify the failure as being one of a soft failure or a hard failure as a function of the received health data and the cause of the failure. Additionally, the resource manager server is configured to transmit a health state change event to each of the other managed nodes of the plurality of managed nodes of the node cluster. Other embodiments are described herein.
Utility
29 Nov 2017
17 Aug 2021