Alibaba Group Holding Limited
Word segmentation system, method and device
Last updated:
Abstract:
In an optical character recognition system, a word segmentation method, comprising: acquiring a sample image comprising a word spacing marker or a non-word spacing marker; processing the sample image with a convolutional neural network to obtain a first eigenvector corresponding to the sample image, a word spacing probability value and/or a non-word spacing probability value corresponding to the first eigenvector; acquiring a to-be-tested image, and processing the to-be-tested image with the convolutional neural network to obtain a second eigenvector corresponding to the to-be-tested image, a word spacing probability value or a non-word spacing probability value corresponding to the second eigenvector; and performing word segmentation on the to-be-tested image by using the just obtained word spacing probability value or the non-word spacing probability value. In embodiments, word segmentation can be performed accurately, so that accuracy and speed of the word segmentation are improved, and user's experience is enhanced.
Utility
16 Feb 2017
27 Oct 2020