Amazon.com, Inc.
Detecting and mitigating hardware component slow failures

Last updated:

Abstract:

Techniques for monitoring computer system components for degraded operation are described. In some embodiments, a baseline performance metric value is received from a system monitoring service, a request directed to an input/output (I/O) device is received that was generated by a first computing device, a timer is started, the timer having a duration based on the baseline performance metric value, the received request is sent to the I/O device, an error message is generated upon expiration of the duration of the timer before a response to the request is received from the I/O device, and the generated error message is sent to a second computing device to cause the second computing device to perform at least one action in response to the generated error message.

Status:
Grant
Type:

Utility

Filling date:

28 Jun 2018

Issue date:

14 Sep 2021