JPH0475187A - In-list character recognizing device - Google Patents
- ️Tue Mar 10 1992
JPH0475187A - In-list character recognizing device - Google Patents
In-list character recognizing deviceInfo
-
Publication number
- JPH0475187A JPH0475187A JP2188922A JP18892290A JPH0475187A JP H0475187 A JPH0475187 A JP H0475187A JP 2188922 A JP2188922 A JP 2188922A JP 18892290 A JP18892290 A JP 18892290A JP H0475187 A JPH0475187 A JP H0475187A Authority
- JP
- Japan Prior art keywords
- character
- characters
- recognition
- cell
- unit Prior art date
- 1990-07-17 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000605 extraction Methods 0.000 claims abstract description 14
- 239000000284 extract Substances 0.000 claims description 8
- 238000010586 diagram Methods 0.000 description 9
- 238000000034 method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Landscapes
- Character Discrimination (AREA)
- Character Input (AREA)
Abstract
PURPOSE:To increase the recognition rate and to increase the recognition speed by determining a character kind by using the number of character strings of characters in a list and features at right-end and left-end positions, and using it preferentially. CONSTITUTION:This device is provided with a recognition command part 1, an image input part 14, an image memory 2, a list structure extraction part 3, a character string extraction part 4, a character kind prediction part 5, a character recognition part 6, a character pattern dictionary 7, an external output part 8, and a CPU. The character kind prediction part 5 estimates English characters when there are plural character strings in a cell or Japanese characters when character strings except in a head cell are not uniform in right-end position or uniform in both right-end and left-end positions. Further, numbers are estimated as the character kind when the left ends are not uniform although the right ends are uniform, and the character recognition part 6 compares and determines characters while giving priority to the estimated character kind. Consequently, the order of matching with the recognition dictionary 7 is charged into character kind priority to decrease the frequency of matching with the recognition dictionary 7, and consequently the recognition rate of characters in the list and the recognition speed are increased.
Description
【発明の詳細な説明】
(産業上の利用分野)
本発明は表内の文字を高速かつ正確に、認識する文字認
識装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a character recognition device that recognizes characters in a table quickly and accurately.
(従来の技術)
従来は表(セル)の中の文字を、一般の文字として扱い
、表中文字の縦横関係を利用せずに文字種類の判断を行
っていた。そのため又字種類の判断が正確でなく且つ判
断処理速度が遅いという欠点があった。(Prior Art) Conventionally, characters in a table (cell) were treated as ordinary characters, and character types were determined without using the vertical/horizontal relationship of the characters in the table. Therefore, there are disadvantages in that the determination of character type is not accurate and the determination processing speed is slow.
(発明が解決しようとする課題)
本発明の課題は、表内文字の文字種類認識をより正確且
つ高速に行える表内文字認識装置を提供することにある
。(Problems to be Solved by the Invention) An object of the present invention is to provide an in-table character recognition device that can recognize the character type of in-table characters more accurately and at high speed.
(課題を解決するための手段)
一般に表の上下のセル列について着目すると、英文字は
表の中に複数の文字列を含み、日本語文字は1つの文字
列からなり、数字文字は右端位置が揃っており、左端位
置は揃ってない傾向があり、セル内の文字位置から文字
の種別を推定することが可能である。かかる知見を得て
、本発明では表内文字の上記特性を使って課題を達成し
たものであり、その要旨は、
キーボード・マウス等の指示により文字認識の指令を行
う認識指令部と、表内文字を表とともにイメージで入力
する画像入力部と、同画像入カ部のイメージ・データを
格納する画像メモリと、イメージ・データより裏構造を
抽出しセルに分解する裏構造抽出部と、各セルより文字
列を取り田し左端位置及び右端位置を抽出する文字列抽
出部と、同文字列抽出部で抽出されたセル内の文字列数
と文字位置とから文字の種別を推定する文字種予測部と
、同文字種予測部て推定した文字種を1牙先してサンプ
ル文字とイメージ・データの文字とを比較して文字を決
定する文字認識部と、サンフル文字を記憶した文字パタ
ーノ辞典と、文字認識部で決定した文字を出力する外部
出力部と、これらの各部を制御するCPUとを有し、
文字種予測部において、セルの中に複数の文字列があれ
ば英文字と推定し、それ以外て先頭セル以外の文字列の
右端位置が揃っていない場合及び右端と左端がともに揃
っている場合は日本語文字と推定し、更に右端が揃って
いて左端が揃っていない場合は数字であると文字種を推
定し、推定した文字種を優先して文字認識部において文
字を比較決定することを特徴とする表内文字認識装置に
ある。(Means for solving the problem) In general, if we focus on the cell columns above and below a table, we can see that English characters include multiple character strings in the table, Japanese characters consist of one character string, and numeric characters are located at the rightmost position. are aligned, but the left end positions tend not to be aligned, and it is possible to estimate the type of character from the character position within the cell. Based on this knowledge, the present invention uses the above-mentioned characteristics of the characters in the table to achieve the object.The gist of the invention is to: An image input section that inputs characters along with the front as an image, an image memory that stores the image data of the image input section, a back structure extraction section that extracts the back structure from the image data and decomposes it into cells, and each cell. A character string extraction part that takes a character string and extracts the left end position and right end position, and a character type prediction part that estimates the type of character from the number of character strings and character positions in the cell extracted by the character string extraction part. , a character recognition unit that determines a character by comparing the sample character and the character of the image data based on the character type estimated by the character type prediction unit, a character pattern dictionary that stores sample characters, and a character recognition unit It has an external output section that outputs the characters determined by the section, and a CPU that controls each of these sections.The character type prediction section assumes that if there are multiple character strings in a cell, it is an English character. If the right edges of a string other than the first cell are not aligned, or if the right and left edges are aligned, it is assumed to be a Japanese character, and if the right edge is aligned but the left edge is not aligned, it is assumed to be a number. The present invention provides an in-table character recognition device characterized in that a character recognition section compares and determines characters with priority given to the estimated character type.
(作用)
本発明は、表内の文字認識において、上下方向の項目は
、同じ文字の種別が使われている可能性が高いことに着
目して、裏構造抽出部によりイメージ・データからセル
を抽出し、文字列抽出部でセル内の文字列の列数と右端
・左端位置を求め、文字種予測部で上記表内文字の特性
を使って文字の種別を推定することにより、認識辞書と
のマツチング順序を推定した文字種優先に変え・認識辞
書とのマツチング回数を減らし、表の中の文字の認識率
を高め、かつ、認識速度を高めるものである。(Function) The present invention focuses on the fact that the same character type is likely to be used for items in the vertical direction when recognizing characters in a table, and uses a back structure extraction unit to extract cells from image data. The character string extraction section calculates the number of columns and the right and left end positions of the character strings in the cell, and the character type prediction section estimates the character type using the characteristics of the characters in the table above. This changes the matching order to give priority to the estimated character type, reduces the number of matchings with the recognition dictionary, increases the recognition rate of characters in the table, and increases the recognition speed.
(実施例)
以下、本発明の一実施例について図面を参照しなから説
明する。(Example) An example of the present invention will be described below with reference to the drawings.
第1図は表内文字認識装置のブロック構成図である。1
はキーボード、マウス等の指示により、文字認識を行う
命令を発行する認識指令部である。2は認識するイメー
ジ・データを格納スる画像メモリ、3はイメージ・デー
タより裏構造を抽出しセルに分解する裏構造抽出部、4
は各セルより文字列を取り出し、左端位置及び右端位置
を抽出する文字列抽出部、5は表の列の文字種別を推定
する文字種予測部、6は推定した文字種から順番にサン
プル文字と比較を行ない、前もって決めた値よりも近い
類似度ならば、その文字として出力する文字認識部、7
はサンプル文字を記憶する文字パターン辞書、8は表よ
り認識した文字を出力する外部出力部、14は表内文字
を表とともにイメージで入力する画像入力部である。FIG. 1 is a block diagram of the in-table character recognition device. 1
is a recognition command unit that issues commands to perform character recognition in response to instructions from a keyboard, mouse, etc. 2 is an image memory that stores image data to be recognized; 3 is a back structure extraction unit that extracts a back structure from the image data and decomposes it into cells; 4
5 is a character string extraction unit that extracts a character string from each cell and extracts the left end position and right end position; 5 is a character type prediction unit that estimates the character type of a column in the table; 6 is a character string prediction unit that sequentially compares the estimated character type with sample characters. and if the similarity is closer than a predetermined value, the character recognition unit outputs that character;
Reference numeral 8 designates a character pattern dictionary that stores sample characters, 8 an external output unit that outputs characters recognized from the table, and 14 an image input unit that inputs characters in the table as images together with the table.
第2図は、表内文字認識装置のハード構成図である。9
はRAMで画像メモリ、表の構造、予測文字種等を格納
する。1oはROMで、文字バターン辞書・プログラム
を格納する。11はR5232−Cて、認識指令及び文
字出力を行う。12はCPUて、プログラムの動作をコ
ノトローノしする。13はスキャナで画像をイメージ・
メモリの中にとりこむ画像入力部14を構成する0処理
全体の流れを、第3図のフローチャートを使って説明す
る。表構造抽出部3で、スキ十す13て取り込んだイメ
ージ・データから、水平及び垂直な線分で表わさJする
表形式を得る(ステップ1)。この表形式により、縦及
び横に分離された長方形領域をセルという。この表の列
がなければ終わり(ステップ2)、あれば、文字列抽出
部4において、その列の先頭を除くすべてのセルについ
て文字列を求める(ステ・ツブ3)。次に、文字種予測
部5で、その中に1個ても複数の文字列がある場合は、
その列は英文字で占めらJvていると予測する(ステッ
プ4.5.6)。1つの文字列の場合、文字列の左端位
置、右端位置を求める(ステップ7)。列の先頭のセl
しを除(・たセルがすべて、右端位置が揃っていなかっ
たら、この列のセルの文字列は日本語文字と推定する(
ステップ8.9)。右端位置が揃っており、左端位置が
揃っていない場合は、数字と推定(ステップ11)、右
端、左端ともに揃っている場合は日本語文字と推定する
(ステップ9)。文字認識部6において、列のセルに対
して、文字種の推定に従い、文字を認識する(ステップ
12)。FIG. 2 is a hardware configuration diagram of the in-table character recognition device. 9
stores image memory, table structure, predicted character types, etc. in RAM. 1o is a ROM that stores a character pattern dictionary and programs. 11 is R5232-C, which performs recognition commands and character output. A CPU 12 controls the operation of the program. 13 Image is imaged with a scanner.
The overall flow of the 0 process constituting the image input section 14 for importing into the memory will be explained using the flowchart shown in FIG. The table structure extraction unit 3 obtains a table format represented by horizontal and vertical line segments from the image data taken in by the screen 13 (step 1). In this table format, rectangular areas separated vertically and horizontally are called cells. If there is no column in this table, the process ends (step 2), and if there is, the character string extractor 4 obtains character strings for all cells except the head of the column (step 3). Next, in the character type prediction unit 5, if there is at least one character string in the character string,
Predict that column to be occupied by English letters (step 4.5.6). In the case of one character string, the left end position and right end position of the character string are determined (step 7). cell at the beginning of the column
If the right edge positions of all the cells are not aligned, the text strings in the cells in this column are assumed to be Japanese characters (
Step 8.9). If the right edge positions are aligned but the left edge positions are not aligned, it is estimated to be a number (step 11), and if both the right edge and left edge are aligned, it is estimated to be a Japanese character (step 9). The character recognition unit 6 recognizes characters in the cells of the column according to the estimated character type (step 12).
全体の処理具体例を用いて、説明を補足する。The explanation will be supplemented using a specific example of the entire process.
まず表構造抽出部3において、正規化されたイメージ・
データ(第4図参照)から、水平・垂直な線分を得て、
そハらから構成つれるセルを抽出する(第5図)。First, in the table structure extraction unit 3, the normalized image
Obtain horizontal and vertical line segments from the data (see Figure 4),
The cells that are composed of these are extracted (Fig. 5).
次に、これらのセルは、すべて文字列数1であるから、
文字列の左端位置及び右端位置を求める(第6図に左端
・右端の位置の座標値のメモリ状態を示している)。次
に文字種予測部5において・2列目、3列目は最初のセ
ルを除いて、右端位置が揃っていて、左端位置は揃って
いないので、数字であると推定する。文字認識部6にお
いて・これらの推定に従い、辞書の検索順序をかえ、認
識時の識別レベルを低くすることにより・”0 と
0”等を誤って認識しなくなり、かつ検索する辞書のサ
ンプル文字数が少なくなるため、処理時間の短縮して文
字を決定できるものとした。Next, since all of these cells have a string number of 1,
Find the left end position and right end position of the character string (FIG. 6 shows the memory state of the coordinate values of the left end and right end positions). Next, in the character type prediction unit 5, except for the first cell, the right end positions of the second and third columns are aligned, but the left end positions are not aligned, so it is estimated that they are numbers. The character recognition unit 6 changes the search order of the dictionary according to these estimates and lowers the discrimination level during recognition.
0'' etc. will not be erroneously recognized, and the number of sample characters in the dictionary to be searched will be reduced, so that the processing time can be shortened and characters can be determined.
(発明の効果)
以上のように、本発明は、表内文字の文字列の数と右端
・左端位置の特徴を使って文字種を決定し、それを優先
して使うことで、認識率が高く且つ認識速度も速いもの
にできた。(Effects of the Invention) As described above, the present invention can achieve a high recognition rate by determining the character type using the number of character strings in a table and the characteristics of the right and left end positions, and using it with priority. Moreover, the recognition speed was also increased.
第1図は本発明の一実施例1こおけるブロック構成図、
第2図は本発明一実施例における・・−ド構成図、第3
図は本発明の一実施例におけるフローチャート図、第4
図は詔品を行う表の例を示す説明図、第5図は各セルの
呼び方を示す説明図、第6図は各セルの文字列左右端の
座標の位置を示す説明図、第7図は各列に対し予測され
た文字種を示す説明図である。
特 許
出
願 人
松下電器産業株式会社
代 理
人FIG. 1 is a block diagram of a first embodiment of the present invention;
Fig. 2 is a configuration diagram of the ... code in one embodiment of the present invention;
The figure is a flowchart diagram in one embodiment of the present invention.
The figure is an explanatory diagram showing an example of a table for performing edicts, Fig. 5 is an explanatory diagram showing how to call each cell, Fig. 6 is an explanatory diagram showing the position of the coordinates of the left and right ends of the character string in each cell, and Fig. 7 The figure is an explanatory diagram showing predicted character types for each column. Patent application agent Matsushita Electric Industrial Co., Ltd.
Claims (1)
【特許請求の範囲】
1)キーボード・マウス等の指示により文字認識の指令
を行う認識指令部と、表内文字を表とともにイメージで
入力する画像入力部と、同画像入力部のイメージ・デー
タを格納する画像メモリと、イメージ・データより表構
造を抽出しセルに分解する表構造抽出部と、各セルより
文字列を取り出し左端位置及び右端位置を抽出する文字
列抽出部と、同文字列抽出部で抽出されたセル内の文字
列数と文字位置とから文字の種別を推定する文字種予測
部と、同文字種予測部で推定した文字種を優先してサン
プル文字とイメージ・データの文字とを比較して文字を
決定する文字認識部と、サンプル文字を記憶した文字パ
ターン辞典と、文字認識部で決定した文字を出力する外
部出力部と、これらの各部を制御するCPUとを有し、
文字種予測部において、セルの中に複数の文字列があれ
ば英文字と推定し、それ以外で先頭セル以外の文字列の
右端位置が揃っていない場合及び右端と左端がともに揃
っている場合は日本語文字と推定し、更に右端が揃って
いて左端が揃っていない場合は数字であると文字種を推
定し、推定した文字種を優先して文字認識部において文
字を比較決定することを特徴とする表内文字認識装置。[Scope of Claims] 1) A recognition command unit that issues character recognition commands based on instructions from a keyboard, mouse, etc.; an image input unit that inputs characters in a table together with the table as an image; and an image input unit that inputs image data of the image input unit. A table structure extraction unit that extracts a table structure from image data and decomposes it into cells, a character string extraction unit that extracts a character string from each cell and extracts the left end position and right end position, and the same character string extraction unit. The character type prediction part estimates the character type from the number of character strings and character positions in the cell extracted by the cell, and the sample characters are compared with the characters in the image data, giving priority to the character type estimated by the same character type prediction part. It has a character recognition unit that determines characters by using the character recognition unit, a character pattern dictionary that stores sample characters, an external output unit that outputs the characters determined by the character recognition unit, and a CPU that controls each of these units. In the section, if there are multiple character strings in a cell, it is assumed to be an English character, and if the right end positions of the character strings other than the first cell are not aligned, or if both the right and left ends are aligned, it is assumed to be Japanese. The table is characterized in that it is estimated to be a character, and if the right edge is aligned but the left edge is not aligned, the character type is estimated to be a number, and the character recognition unit compares and determines the character, giving priority to the estimated character type. Character recognition device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2188922A JP2995809B2 (en) | 1990-07-17 | 1990-07-17 | In-table character recognition device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2188922A JP2995809B2 (en) | 1990-07-17 | 1990-07-17 | In-table character recognition device |
Publications (2)
Publication Number | Publication Date |
---|---|
JPH0475187A true JPH0475187A (en) | 1992-03-10 |
JP2995809B2 JP2995809B2 (en) | 1999-12-27 |
Family
ID=16232234
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2188922A Expired - Lifetime JP2995809B2 (en) | 1990-07-17 | 1990-07-17 | In-table character recognition device |
Country Status (1)
Country | Link |
---|---|
JP (1) | JP2995809B2 (en) |
Cited By (2)
* Cited by examiner, † Cited by third partyPublication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111680688A (en) * | 2020-06-10 | 2020-09-18 | 创新奇智(成都)科技有限公司 | Character recognition method and device, electronic equipment and storage medium |
CN112070087A (en) * | 2020-09-14 | 2020-12-11 | 成都主导软件技术有限公司 | Train number identification method and device with end position and readable storage medium |
-
1990
- 1990-07-17 JP JP2188922A patent/JP2995809B2/en not_active Expired - Lifetime
Cited By (4)
* Cited by examiner, † Cited by third partyPublication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111680688A (en) * | 2020-06-10 | 2020-09-18 | 创新奇智(成都)科技有限公司 | Character recognition method and device, electronic equipment and storage medium |
CN111680688B (en) * | 2020-06-10 | 2023-08-08 | 创新奇智(成都)科技有限公司 | Character recognition method and device, electronic equipment and storage medium |
CN112070087A (en) * | 2020-09-14 | 2020-12-11 | 成都主导软件技术有限公司 | Train number identification method and device with end position and readable storage medium |
CN112070087B (en) * | 2020-09-14 | 2023-06-02 | 成都主导软件技术有限公司 | Train number identification method and device with end bit and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
JP2995809B2 (en) | 1999-12-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5579408A (en) | 1996-11-26 | Character recognition method and apparatus |
US11475688B2 (en) | 2022-10-18 | Information processing apparatus and information processing method for extracting information from document image |
CN105260428A (en) | 2016-01-20 | Picture processing method and apparatus |
US5621818A (en) | 1997-04-15 | Document recognition apparatus |
US20110222789A1 (en) | 2011-09-15 | Character string sensing device, character evaluating device, image processing device, character string sensing method, character evaluation method, control program, and recording medium |
JPH0475187A (en) | 1992-03-10 | In-list character recognizing device |
JPH07160822A (en) | 1995-06-23 | Pattern recognizing method |
US20220392107A1 (en) | 2022-12-08 | Image processing apparatus, image processing method, image capturing apparatus, and non-transitory computer-readable storage medium |
KR19980058361A (en) | 1998-09-25 | Korean Character Recognition Method and System |
KR19990049667A (en) | 1999-07-05 | Korean Character Recognition Method |
KR940007345B1 (en) | 1994-08-13 | On-line recognitin method of hand-written korean character |
JP3526821B2 (en) | 2004-05-17 | Document search device |
JPH0331981A (en) | 1991-02-12 | Character recognizing device |
JPH0744656A (en) | 1995-02-14 | Handwritten character recognizing device |
JP3897999B2 (en) | 2007-03-28 | Handwritten character recognition method |
JP3763262B2 (en) | 2006-04-05 | Handwritten character recognition device |
JP3074691B2 (en) | 2000-08-07 | Character recognition device |
JP3157530B2 (en) | 2001-04-16 | Character extraction method |
JPH0475552B2 (en) | 1992-12-01 | |
JPH09259218A (en) | 1997-10-03 | Word input device and word input method |
JPS63269267A (en) | 1988-11-07 | Character recognizing device |
Kim et al. | 1996 | Segmentation of touching characters in printed Korean/English document recognition |
JPH05346974A (en) | 1993-12-27 | Character recognizing device |
JP2982221B2 (en) | 1999-11-22 | Character reader |
Chaudhury et al. | 2012 | Efficient segmentation of characters in printed Bengali texts |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
2007-10-02 | FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20081029 Year of fee payment: 9 |
2008-10-07 | FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20091029 Year of fee payment: 10 |
2009-10-01 | FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20091029 Year of fee payment: 10 |
2009-10-06 | FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20101029 Year of fee payment: 11 |
2010-07-17 | EXPY | Cancellation because of completion of term | |
2010-09-29 | FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20101029 Year of fee payment: 11 |