Meta Platforms, Inc.
Processing Multimodal User Input for Assistant Systems

Last updated:

Abstract:

In one embodiment, a method includes receiving from a client system associated with a first user a user input based on one or more modalities, at least one of which is a visual modality, identifying one or more subjects associated with the user input based on the visual modality based on one or more machine-learning models, determining one or more attributes associated with the one or more subjects respectively based on the one or more machine-learning models, resolving one or more entities corresponding to the one or more subjects based on the determined one or more attributes, executing one or more tasks associated with the one or more resolved entities, and sending instructions for presenting a communication content including information associated with the executed one or more tasks responsive to user input to the client system associated with the first user.

Status:
Application
Type:

Utility

Filling date:

2 Aug 2018

Issue date:

24 Oct 2019