Microsoft Corporation
Efficient freshness crawl scheduling

Last updated:

Abstract:

The technology described herein builds an optimal refresh schedule by minimizing a cost function constrained by an available refresh bandwidth. The cost function receives an importance score for a content item and a change rate for the content item as input in order to optimize the schedule. The cost function is considered optimized when a refresh schedule is found that minimizes the cost while using the available bandwidth and no more. The technology can build an optimized schedule to refresh content with incomplete change data, content with complete change data, or a mixture of content with and without complete change data. It can also re-learn content item change rates from its own schedule execution history and re-compute the refresh schedule, ensuring that this schedule takes into account the latest trends in content item updates.

Status:
Grant
Type:

Utility

Filling date:

22 May 2019

Issue date:

5 Jul 2022