SAP SE
Distributed vectorized representations of source code commits

Last updated:

Abstract:

Distributed vector representations of source code commits, are generated to become part of a data corpus for machine learning (ML) for analyzing source code. The code commit is received, and time information is referenced to split the source code into pre-change source code and post-change source code. The pre-change source code is converted into a first code representation (e.g., based on a graph model), and the post-change source code into a second code representation. A first particle is generated from the first code representation, and a second particle is generated from the second code representation. The first particle and the second particle are compared to create a delta. The delta is transformed into a first commit vector by referencing an embedding matrix to numerically encode the first particle and the second particle. Following classification, the commit vector is stored in a data corpus for performing ML analysis upon source code.

Status:
Grant
Type:

Utility

Filling date:

26 Oct 2020

Issue date:

19 Jul 2022