Microsoft Corporation
INFERRING INFORMATION ABOUT A WEBPAGE BASED UPON A UNIFORM RESOURCE LOCATOR OF THE WEBPAGE

Last updated:

Abstract:

Described herein are technologies related to inferring information about a webpage based upon semantics of a uniform resource location (URL) of the webpage. The URL is tokenized to create a sequence of tokens. An embedding for the URL is generated based upon the sequence of tokens, wherein the embedding is representative of semantics of the URL. Based upon the embedding for the URL, information about the webpage pointed to by the URL is inferred, the webpage is retrieved, and information is extracted from the webpage based upon the information inferred about the webpage.

Status:
Application
Type:

Utility

Filling date:

5 Feb 2021

Issue date:

11 Aug 2022