Microsoft Corporation
INFERRING INFORMATION ABOUT A WEBPAGE BASED UPON A UNIFORM RESOURCE LOCATOR OF THE WEBPAGE
Last updated:
Abstract:
Described herein are technologies related to inferring information about a webpage based upon semantics of a uniform resource location (URL) of the webpage. The URL is tokenized to create a sequence of tokens. An embedding for the URL is generated based upon the sequence of tokens, wherein the embedding is representative of semantics of the URL. Based upon the embedding for the URL, information about the webpage pointed to by the URL is inferred, the webpage is retrieved, and information is extracted from the webpage based upon the information inferred about the webpage.
Status:
Application
Type:
Utility
Filling date:
5 Feb 2021
Issue date:
11 Aug 2022