Title : Line segmentation of Tibetan ancient documents based on seam carving

Abstract :

Line segmentation of Tibetan ancient documents is one of the key steps of character recognition. The adhesion between lines and broken characters often affect the effect of line segmentation. In this paper, we implement line segmentation for binary images of Tibetan ancient documents, and propose the energy minimization seam carving technology. The steps are as follows: (1) Radon transform is used to correct the skew; (2) The document image is projected horizontally, and the projection result is smoothed to accurately detect the position of the core area and the number of text lines; (3)The isolated upper vowel and broken stroke are classified to reduce the interference of seam carving path; (4) The gradient energy, distance from the core area energy, distance from the text area energy and passing through the text area energy are weighted to get the energy map; (5) Using seam carving technology, the text line is segmented in the line segmentation area of energy graph, and then combined with the processing in (3), finally the text line segmentation result is obtained. The experimental results show that the method proposed in this paper can solve the problems of line segmentation in Tibetan ancient documents, such as partial adhesion and stroke broken, a further improve the accuracy of line segmentation.

