Alibaba Group Holding Limited
Word segmentation system, method and device

Last updated:

Abstract:

In an optical character recognition system, a word segmentation method, comprising: acquiring a sample image comprising a word spacing marker or a non-word spacing marker; processing the sample image with a convolutional neural network to obtain a first eigenvector corresponding to the sample image, a word spacing probability value and/or a non-word spacing probability value corresponding to the first eigenvector; acquiring a to-be-tested image, and processing the to-be-tested image with the convolutional neural network to obtain a second eigenvector corresponding to the to-be-tested image, a word spacing probability value or a non-word spacing probability value corresponding to the second eigenvector; and performing word segmentation on the to-be-tested image by using the just obtained word spacing probability value or the non-word spacing probability value. In embodiments, word segmentation can be performed accurately, so that accuracy and speed of the word segmentation are improved, and user's experience is enhanced.

Status:
Grant
Type:

Utility

Filling date:

16 Feb 2017

Issue date:

27 Oct 2020