International Business Machines Corporation
Searching in multilevel clustered vector-based data

Last updated: 21 Sep 2022

Abstract:

A multilevel clustered data set for multidimensional vectors is created by defining a plurality of clusters based on each of the signed dimensions of the vectors, each dimension functioning as an axis. Vectors are assigned to each cluster by measuring cosine similarity between a vector and each axis. Sub-clusters are defined as ranges of cosine similarity values within a cluster, and each vector is assigned into the appropriate range based on their cosine similarity value with the axis of the cluster. Searching for a matching vector to a new vector is efficiently achieved in near-constant time by measuring cosine similarity for the new vector with each axis to identify the closest cluster, reusing the cosine similarity of the new vector and axis to determine which sub-cluster corresponds to the appropriate range of values, and then comparing each vector within the sub-cluster until a match is found or ruled out.

Status:

Grant

Type:

Utility

Filling date:

16 Jan 2020

Issue date:

20 Sep 2022

Full patent description

Patent application document

International Business Machines Corporation Searching in multilevel clustered vector-based data

Abstract:

International Business Machines Corporation
Searching in multilevel clustered vector-based data