patents.google.com

JPH0475187A - In-list character recognizing device - Google Patents

️Tue Mar 10 1992

JPH0475187A - In-list character recognizing device - Google Patents

In-list character recognizing device

Info

Publication number

JPH0475187A

JPH0475187A JP2188922A JP18892290A JPH0475187A JP H0475187 A JPH0475187 A JP H0475187A JP 2188922 A JP2188922 A JP 2188922A JP 18892290 A JP18892290 A JP 18892290A JP H0475187 A JPH0475187 A JP H0475187A Authority

Japan

Prior art keywords

character

characters

recognition

cell

unit

Prior art date

1990-07-17

Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)

Granted

Application number

JP2188922A

Other languages

Japanese (ja)

Other versions

JP2995809B2 (en

Inventor

Noboru Nakamura

昇中村

Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)

Panasonic Holdings Corp

Original Assignee

Matsushita Electric Industrial Co Ltd

Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)

1990-07-17

Filing date

1990-07-17

Publication date

1992-03-10

1990-07-17 Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd

1990-07-17 Priority to JP2188922A priority Critical patent/JP2995809B2/en

1992-03-10 Publication of JPH0475187A publication Critical patent/JPH0475187A/en

1999-12-27 Application granted granted Critical

1999-12-27 Publication of JP2995809B2 publication Critical patent/JP2995809B2/en

2014-12-27 Anticipated expiration legal-status Critical

Status Expired - Lifetime legal-status Critical Current

Landscapes

Character Discrimination (AREA)
Character Input (AREA)

Abstract

PURPOSE:To increase the recognition rate and to increase the recognition speed by determining a character kind by using the number of character strings of characters in a list and features at right-end and left-end positions, and using it preferentially. CONSTITUTION:This device is provided with a recognition command part 1, an image input part 14, an image memory 2, a list structure extraction part 3, a character string extraction part 4, a character kind prediction part 5, a character recognition part 6, a character pattern dictionary 7, an external output part 8, and a CPU. The character kind prediction part 5 estimates English characters when there are plural character strings in a cell or Japanese characters when character strings except in a head cell are not uniform in right-end position or uniform in both right-end and left-end positions. Further, numbers are estimated as the character kind when the left ends are not uniform although the right ends are uniform, and the character recognition part 6 compares and determines characters while giving priority to the estimated character kind. Consequently, the order of matching with the recognition dictionary 7 is charged into character kind priority to decrease the frequency of matching with the recognition dictionary 7, and consequently the recognition rate of characters in the list and the recognition speed are increased.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は表内の文字を高速かつ正確に、認識する文字認
識装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a character recognition device that recognizes characters in a table quickly and accurately.

（従来の技術）従来は表（セル）の中の文字を、一般の文字として扱い
、表中文字の縦横関係を利用せずに文字種類の判断を行
っていた。そのため又字種類の判断が正確でなく且つ判
断処理速度が遅いという欠点があった。(Prior Art) Conventionally, characters in a table (cell) were treated as ordinary characters, and character types were determined without using the vertical/horizontal relationship of the characters in the table. Therefore, there are disadvantages in that the determination of character type is not accurate and the determination processing speed is slow.

（発明が解決しようとする課題）本発明の課題は、表内文字の文字種類認識をより正確且
つ高速に行える表内文字認識装置を提供することにある
。(Problems to be Solved by the Invention) An object of the present invention is to provide an in-table character recognition device that can recognize the character type of in-table characters more accurately and at high speed.

（課題を解決するための手段）一般に表の上下のセル列について着目すると、英文字は
表の中に複数の文字列を含み、日本語文字は１つの文字
列からなり、数字文字は右端位置が揃っており、左端位
置は揃ってない傾向があり、セル内の文字位置から文字
の種別を推定することが可能である。かかる知見を得て
、本発明では表内文字の上記特性を使って課題を達成し
たものであり、その要旨は、キーボード・マウス等の指示により文字認識の指令を行
う認識指令部と、表内文字を表とともにイメージで入力
する画像入力部と、同画像入カ部のイメージ・データを
格納する画像メモリと、イメージ・データより裏構造を
抽出しセルに分解する裏構造抽出部と、各セルより文字
列を取り田し左端位置及び右端位置を抽出する文字列抽
出部と、同文字列抽出部で抽出されたセル内の文字列数
と文字位置とから文字の種別を推定する文字種予測部と
、同文字種予測部て推定した文字種を１牙先してサンプ
ル文字とイメージ・データの文字とを比較して文字を決
定する文字認識部と、サンフル文字を記憶した文字パタ
ーノ辞典と、文字認識部で決定した文字を出力する外部
出力部と、これらの各部を制御するＣＰＵとを有し、文字種予測部において、セルの中に複数の文字列があれ
ば英文字と推定し、それ以外て先頭セル以外の文字列の
右端位置が揃っていない場合及び右端と左端がともに揃
っている場合は日本語文字と推定し、更に右端が揃って
いて左端が揃っていない場合は数字であると文字種を推
定し、推定した文字種を優先して文字認識部において文
字を比較決定することを特徴とする表内文字認識装置に
ある。(Means for solving the problem) In general, if we focus on the cell columns above and below a table, we can see that English characters include multiple character strings in the table, Japanese characters consist of one character string, and numeric characters are located at the rightmost position. are aligned, but the left end positions tend not to be aligned, and it is possible to estimate the type of character from the character position within the cell. Based on this knowledge, the present invention uses the above-mentioned characteristics of the characters in the table to achieve the object.The gist of the invention is to: An image input section that inputs characters along with the front as an image, an image memory that stores the image data of the image input section, a back structure extraction section that extracts the back structure from the image data and decomposes it into cells, and each cell. A character string extraction part that takes a character string and extracts the left end position and right end position, and a character type prediction part that estimates the type of character from the number of character strings and character positions in the cell extracted by the character string extraction part. , a character recognition unit that determines a character by comparing the sample character and the character of the image data based on the character type estimated by the character type prediction unit, a character pattern dictionary that stores sample characters, and a character recognition unit It has an external output section that outputs the characters determined by the section, and a CPU that controls each of these sections.The character type prediction section assumes that if there are multiple character strings in a cell, it is an English character. If the right edges of a string other than the first cell are not aligned, or if the right and left edges are aligned, it is assumed to be a Japanese character, and if the right edge is aligned but the left edge is not aligned, it is assumed to be a number. The present invention provides an in-table character recognition device characterized in that a character recognition section compares and determines characters with priority given to the estimated character type.

（作用）本発明は、表内の文字認識において、上下方向の項目は
、同じ文字の種別が使われている可能性が高いことに着
目して、裏構造抽出部によりイメージ・データからセル
を抽出し、文字列抽出部でセル内の文字列の列数と右端
・左端位置を求め、文字種予測部で上記表内文字の特性
を使って文字の種別を推定することにより、認識辞書と
のマツチング順序を推定した文字種優先に変え・認識辞
書とのマツチング回数を減らし、表の中の文字の認識率
を高め、かつ、認識速度を高めるものである。(Function) The present invention focuses on the fact that the same character type is likely to be used for items in the vertical direction when recognizing characters in a table, and uses a back structure extraction unit to extract cells from image data. The character string extraction section calculates the number of columns and the right and left end positions of the character strings in the cell, and the character type prediction section estimates the character type using the characteristics of the characters in the table above. This changes the matching order to give priority to the estimated character type, reduces the number of matchings with the recognition dictionary, increases the recognition rate of characters in the table, and increases the recognition speed.

（実施例）以下、本発明の一実施例について図面を参照しなから説
明する。(Example) An example of the present invention will be described below with reference to the drawings.

第１図は表内文字認識装置のブロック構成図である。１
はキーボード、マウス等の指示により、文字認識を行う
命令を発行する認識指令部である。２は認識するイメー
ジ・データを格納スる画像メモリ、３はイメージ・デー
タより裏構造を抽出しセルに分解する裏構造抽出部、４
は各セルより文字列を取り出し、左端位置及び右端位置
を抽出する文字列抽出部、５は表の列の文字種別を推定
する文字種予測部、６は推定した文字種から順番にサン
プル文字と比較を行ない、前もって決めた値よりも近い
類似度ならば、その文字として出力する文字認識部、７
はサンプル文字を記憶する文字パターン辞書、８は表よ
り認識した文字を出力する外部出力部、１４は表内文字
を表とともにイメージで入力する画像入力部である。FIG. 1 is a block diagram of the in-table character recognition device. 1
is a recognition command unit that issues commands to perform character recognition in response to instructions from a keyboard, mouse, etc. 2 is an image memory that stores image data to be recognized; 3 is a back structure extraction unit that extracts a back structure from the image data and decomposes it into cells; 4
5 is a character string extraction unit that extracts a character string from each cell and extracts the left end position and right end position; 5 is a character type prediction unit that estimates the character type of a column in the table; 6 is a character string prediction unit that sequentially compares the estimated character type with sample characters. and if the similarity is closer than a predetermined value, the character recognition unit outputs that character;
Reference numeral 8 designates a character pattern dictionary that stores sample characters, 8 an external output unit that outputs characters recognized from the table, and 14 an image input unit that inputs characters in the table as images together with the table.

第２図は、表内文字認識装置のハード構成図である。９
はＲＡＭで画像メモリ、表の構造、予測文字種等を格納
する。１ｏはＲＯＭで、文字バターン辞書・プログラム
を格納する。１１はＲ５２３２−Ｃて、認識指令及び文
字出力を行う。１２はＣＰＵて、プログラムの動作をコ
ノトローノしする。１３はスキャナで画像をイメージ・
メモリの中にとりこむ画像入力部１４を構成する０処理
全体の流れを、第３図のフローチャートを使って説明す
る。表構造抽出部３で、スキ十す１３て取り込んだイメ
ージ・データから、水平及び垂直な線分で表わさＪする
表形式を得る（ステップ１）。この表形式により、縦及
び横に分離された長方形領域をセルという。この表の列
がなければ終わり（ステップ２）、あれば、文字列抽出
部４において、その列の先頭を除くすべてのセルについ
て文字列を求める（ステ・ツブ３）。次に、文字種予測
部５で、その中に１個ても複数の文字列がある場合は、
その列は英文字で占めらＪｖていると予測する（ステッ
プ４．５．６）。１つの文字列の場合、文字列の左端位
置、右端位置を求める（ステップ７）。列の先頭のセｌ
しを除（・たセルがすべて、右端位置が揃っていなかっ
たら、この列のセルの文字列は日本語文字と推定する（
ステップ８．９）。右端位置が揃っており、左端位置が
揃っていない場合は、数字と推定（ステップ１１）、右
端、左端ともに揃っている場合は日本語文字と推定する
（ステップ９）。文字認識部６において、列のセルに対
して、文字種の推定に従い、文字を認識する（ステップ
１２）。FIG. 2 is a hardware configuration diagram of the in-table character recognition device. 9
stores image memory, table structure, predicted character types, etc. in RAM. 1o is a ROM that stores a character pattern dictionary and programs. 11 is R5232-C, which performs recognition commands and character output. A CPU 12 controls the operation of the program. 13 Image is imaged with a scanner.
The overall flow of the 0 process constituting the image input section 14 for importing into the memory will be explained using the flowchart shown in FIG. The table structure extraction unit 3 obtains a table format represented by horizontal and vertical line segments from the image data taken in by the screen 13 (step 1). In this table format, rectangular areas separated vertically and horizontally are called cells. If there is no column in this table, the process ends (step 2), and if there is, the character string extractor 4 obtains character strings for all cells except the head of the column (step 3). Next, in the character type prediction unit 5, if there is at least one character string in the character string,
Predict that column to be occupied by English letters (step 4.5.6). In the case of one character string, the left end position and right end position of the character string are determined (step 7). cell at the beginning of the column
If the right edge positions of all the cells are not aligned, the text strings in the cells in this column are assumed to be Japanese characters (
Step 8.9). If the right edge positions are aligned but the left edge positions are not aligned, it is estimated to be a number (step 11), and if both the right edge and left edge are aligned, it is estimated to be a Japanese character (step 9). The character recognition unit 6 recognizes characters in the cells of the column according to the estimated character type (step 12).

全体の処理具体例を用いて、説明を補足する。The explanation will be supplemented using a specific example of the entire process.

まず表構造抽出部３において、正規化されたイメージ・
データ（第４図参照）から、水平・垂直な線分を得て、
そハらから構成つれるセルを抽出する（第５図）。First, in the table structure extraction unit 3, the normalized image
Obtain horizontal and vertical line segments from the data (see Figure 4),
The cells that are composed of these are extracted (Fig. 5).

次に、これらのセルは、すべて文字列数１であるから、
文字列の左端位置及び右端位置を求める（第６図に左端
・右端の位置の座標値のメモリ状態を示している）。次
に文字種予測部５において・２列目、３列目は最初のセ
ルを除いて、右端位置が揃っていて、左端位置は揃って
いないので、数字であると推定する。文字認識部６にお
いて・これらの推定に従い、辞書の検索順序をかえ、認
識時の識別レベルを低くすることにより・”０　　と　
０”等を誤って認識しなくなり、かつ検索する辞書のサ
ンプル文字数が少なくなるため、処理時間の短縮して文
字を決定できるものとした。Next, since all of these cells have a string number of 1,
Find the left end position and right end position of the character string (FIG. 6 shows the memory state of the coordinate values of the left end and right end positions). Next, in the character type prediction unit 5, except for the first cell, the right end positions of the second and third columns are aligned, but the left end positions are not aligned, so it is estimated that they are numbers. The character recognition unit 6 changes the search order of the dictionary according to these estimates and lowers the discrimination level during recognition.
0'' etc. will not be erroneously recognized, and the number of sample characters in the dictionary to be searched will be reduced, so that the processing time can be shortened and characters can be determined.

（発明の効果）以上のように、本発明は、表内文字の文字列の数と右端
・左端位置の特徴を使って文字種を決定し、それを優先
して使うことで、認識率が高く且つ認識速度も速いもの
にできた。(Effects of the Invention) As described above, the present invention can achieve a high recognition rate by determining the character type using the number of character strings in a table and the characteristics of the right and left end positions, and using it with priority. Moreover, the recognition speed was also increased.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は本発明の一実施例１こおけるブロック構成図、
第２図は本発明一実施例における・・−ド構成図、第３
図は本発明の一実施例におけるフローチャート図、第４
図は詔品を行う表の例を示す説明図、第５図は各セルの
呼び方を示す説明図、第６図は各セルの文字列左右端の
座標の位置を示す説明図、第７図は各列に対し予測され
た文字種を示す説明図である。特　　許出願　　人松下電器産業株式会社代　　理人FIG. 1 is a block diagram of a first embodiment of the present invention;
Fig. 2 is a configuration diagram of the ... code in one embodiment of the present invention;
The figure is a flowchart diagram in one embodiment of the present invention.
The figure is an explanatory diagram showing an example of a table for performing edicts, Fig. 5 is an explanatory diagram showing how to call each cell, Fig. 6 is an explanatory diagram showing the position of the coordinates of the left and right ends of the character string in each cell, and Fig. 7 The figure is an explanatory diagram showing predicted character types for each column. Patent application agent Matsushita Electric Industrial Co., Ltd.

Claims (1)

【特許請求の範囲】１）キーボード・マウス等の指示により文字認識の指令
を行う認識指令部と、表内文字を表とともにイメージで
入力する画像入力部と、同画像入力部のイメージ・デー
タを格納する画像メモリと、イメージ・データより表構
造を抽出しセルに分解する表構造抽出部と、各セルより
文字列を取り出し左端位置及び右端位置を抽出する文字
列抽出部と、同文字列抽出部で抽出されたセル内の文字
列数と文字位置とから文字の種別を推定する文字種予測
部と、同文字種予測部で推定した文字種を優先してサン
プル文字とイメージ・データの文字とを比較して文字を
決定する文字認識部と、サンプル文字を記憶した文字パ
ターン辞典と、文字認識部で決定した文字を出力する外
部出力部と、これらの各部を制御するＣＰＵとを有し、文字種予測部において、セルの中に複数の文字列があれ
ば英文字と推定し、それ以外で先頭セル以外の文字列の
右端位置が揃っていない場合及び右端と左端がともに揃
っている場合は日本語文字と推定し、更に右端が揃って
いて左端が揃っていない場合は数字であると文字種を推
定し、推定した文字種を優先して文字認識部において文
字を比較決定することを特徴とする表内文字認識装置。[Scope of Claims] 1) A recognition command unit that issues character recognition commands based on instructions from a keyboard, mouse, etc.; an image input unit that inputs characters in a table together with the table as an image; and an image input unit that inputs image data of the image input unit. A table structure extraction unit that extracts a table structure from image data and decomposes it into cells, a character string extraction unit that extracts a character string from each cell and extracts the left end position and right end position, and the same character string extraction unit. The character type prediction part estimates the character type from the number of character strings and character positions in the cell extracted by the cell, and the sample characters are compared with the characters in the image data, giving priority to the character type estimated by the same character type prediction part. It has a character recognition unit that determines characters by using the character recognition unit, a character pattern dictionary that stores sample characters, an external output unit that outputs the characters determined by the character recognition unit, and a CPU that controls each of these units. In the section, if there are multiple character strings in a cell, it is assumed to be an English character, and if the right end positions of the character strings other than the first cell are not aligned, or if both the right and left ends are aligned, it is assumed to be Japanese. The table is characterized in that it is estimated to be a character, and if the right edge is aligned but the left edge is not aligned, the character type is estimated to be a number, and the character recognition unit compares and determines the character, giving priority to the estimated character type. Character recognition device.

JP2188922A 1990-07-17 1990-07-17 In-table character recognition device Expired - Lifetime JP2995809B2 (en)

Priority Applications (1)

Application Number	Priority Date	Filing Date	Title
JP2188922A JP2995809B2 (en)	1990-07-17	1990-07-17	In-table character recognition device

Applications Claiming Priority (1)

Application Number	Priority Date	Filing Date	Title
JP2188922A JP2995809B2 (en)	1990-07-17	1990-07-17	In-table character recognition device

Publications (2)

Publication Number	Publication Date
JPH0475187A true JPH0475187A (en)	1992-03-10
JP2995809B2 JP2995809B2 (en)	1999-12-27

Family

ID=16232234

Family Applications (1)

Application Number	Title	Priority Date	Filing Date
JP2188922A Expired - Lifetime JP2995809B2 (en)	1990-07-17	1990-07-17	In-table character recognition device

Country Status (1)

Country	Link
JP (1)	JP2995809B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party

Publication number	Priority date	Publication date	Assignee	Title
CN111680688A (en) *	2020-06-10	2020-09-18	创新奇智(成都)科技有限公司	Character recognition method and device, electronic equipment and storage medium
CN112070087A (en) *	2020-09-14	2020-12-11	成都主导软件技术有限公司	Train number identification method and device with end position and readable storage medium

1990
- 1990-07-17 JP JP2188922A patent/JP2995809B2/en not_active Expired - Lifetime

Cited By (4)

* Cited by examiner, † Cited by third party

Publication number	Priority date	Publication date	Assignee	Title
CN111680688A (en) *	2020-06-10	2020-09-18	创新奇智(成都)科技有限公司	Character recognition method and device, electronic equipment and storage medium
CN111680688B (en) *	2020-06-10	2023-08-08	创新奇智(成都)科技有限公司	Character recognition method and device, electronic equipment and storage medium
CN112070087A (en) *	2020-09-14	2020-12-11	成都主导软件技术有限公司	Train number identification method and device with end position and readable storage medium
CN112070087B (en) *	2020-09-14	2023-06-02	成都主导软件技术有限公司	Train number identification method and device with end bit and readable storage medium

Also Published As

Publication number	Publication date
JP2995809B2 (en)	1999-12-27

Publication	Publication Date	Title
US5579408A (en)	1996-11-26	Character recognition method and apparatus
US11475688B2 (en)	2022-10-18	Information processing apparatus and information processing method for extracting information from document image
CN105260428A (en)	2016-01-20	Picture processing method and apparatus
US5621818A (en)	1997-04-15	Document recognition apparatus
US20110222789A1 (en)	2011-09-15	Character string sensing device, character evaluating device, image processing device, character string sensing method, character evaluation method, control program, and recording medium
JPH0475187A (en)	1992-03-10	In-list character recognizing device
JPH07160822A (en)	1995-06-23	Pattern recognizing method
US20220392107A1 (en)	2022-12-08	Image processing apparatus, image processing method, image capturing apparatus, and non-transitory computer-readable storage medium
KR19980058361A (en)	1998-09-25	Korean Character Recognition Method and System
KR19990049667A (en)	1999-07-05	Korean Character Recognition Method
KR940007345B1 (en)	1994-08-13	On-line recognitin method of hand-written korean character
JP3526821B2 (en)	2004-05-17	Document search device
JPH0331981A (en)	1991-02-12	Character recognizing device
JPH0744656A (en)	1995-02-14	Handwritten character recognizing device
JP3897999B2 (en)	2007-03-28	Handwritten character recognition method
JP3763262B2 (en)	2006-04-05	Handwritten character recognition device
JP3074691B2 (en)	2000-08-07	Character recognition device
JP3157530B2 (en)	2001-04-16	Character extraction method
JPH0475552B2 (en)	1992-12-01
JPH09259218A (en)	1997-10-03	Word input device and word input method
JPS63269267A (en)	1988-11-07	Character recognizing device
Kim et al.	1996	Segmentation of touching characters in printed Korean/English document recognition
JPH05346974A (en)	1993-12-27	Character recognizing device
JP2982221B2 (en)	1999-11-22	Character reader
Chaudhury et al.	2012	Efficient segmentation of characters in printed Bengali texts

Legal Events

Date	Code	Title	Description
2007-10-02	FPAY	Renewal fee payment (event date is renewal date of database)	Free format text: PAYMENT UNTIL: 20081029 Year of fee payment: 9
2008-10-07	FPAY	Renewal fee payment (event date is renewal date of database)	Free format text: PAYMENT UNTIL: 20091029 Year of fee payment: 10
2009-10-01	FPAY	Renewal fee payment (event date is renewal date of database)	Free format text: PAYMENT UNTIL: 20091029 Year of fee payment: 10
2009-10-06	FPAY	Renewal fee payment (event date is renewal date of database)	Free format text: PAYMENT UNTIL: 20101029 Year of fee payment: 11
2010-07-17	EXPY	Cancellation because of completion of term
2010-09-29	FPAY	Renewal fee payment (event date is renewal date of database)	Free format text: PAYMENT UNTIL: 20101029 Year of fee payment: 11

JPH0475187A - In-list character recognizing device - Google Patents