Meta Platforms, Inc.
URL normalization
Last updated:
Abstract:
In one embodiment, a method includes receiving a plurality of uniform resource identifiers (URI's) associated with a particular domain. Each of the URI's identifies a content page comprising one or more signature elements. The method further includes, for each URI in the plurality of URI's, successively testing the URI to identify a core of the URI and any unnecessary elements of the URI. The core of the URI is sufficient to retrieve a version of the content page including all of its signature elements. The method additionally includes, for each URI in the plurality of URI's, updating a set of rules based on the identified core and the identified unnecessary elements. The set of rules establishes a normalized version of the URI.
Utility
30 May 2019
26 Oct 2021