Microsoft Corporation
TOKENIZING ALPHANUMERIC TEXT THROUGH USE OF FINITE STATE MACHINES

Last updated:

Abstract:

Described herein are technologies related to tokenizing alphanumeric text through use of a tokenization algorithm that is at least partially implemented as a finite state machine. The tokenization algorithm is configured to output numeric identifiers that represent tokens or sub-tokens in the alphanumeric text.

Status:
Application
Type:

Utility

Filling date:

2 Mar 2021

Issue date:

8 Sep 2022