CN112885331A - Dialect voice dialing communication equipment based on small vocabulary voice recognition technology - Google Patents
- ️Tue Jun 01 2021
Info
-
Publication number
- CN112885331A CN112885331A CN202110253408.XA CN202110253408A CN112885331A CN 112885331 A CN112885331 A CN 112885331A CN 202110253408 A CN202110253408 A CN 202110253408A CN 112885331 A CN112885331 A CN 112885331A Authority
- CN
- China Prior art keywords
- voice
- feature
- module
- feature extraction
- recognition technology Prior art date
- 2021-03-09 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/005—Language recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Telephonic Communication Services (AREA)
- Telephone Function (AREA)
Abstract
The invention discloses dialect voice dialing communication equipment based on a small vocabulary voice recognition technology, which comprises a voice recording module, a voice feature extraction module, a feature storage database, a feature matching module and an operation execution module, wherein the output end of the voice recording module is connected with the input end of the voice feature extraction module, the output end of the voice feature extraction module is connected with the input end of the feature storage database, the voice feature extraction module and the storage database are respectively connected with the feature matching module, and the output end of the feature matching module is connected with the input end of the operation execution module. The invention adopts a one-person-one database mode to record the voice of the high-frequency operation which is possibly used, establishes the voice characteristic database corresponding to the related operation, and directly compares the voice characteristics of the voice content with the voice characteristics recorded in the database when the mobile phone is operated by voice, thereby improving the accuracy of voice operation recognition and reducing the difficulty of using the mobile phone.
Description
Technical Field
The invention relates to the technical field of voice recognition, in particular to dialect voice dialing communication equipment based on a small vocabulary voice recognition technology.
Background
With the explosive development of the mobile internet technology, mobile payment, take-out meal ordering, online shopping and the like become indispensable life styles of people, and smart phones also become necessities of people in the current life stage. However, in some underdeveloped areas, a large number of old people with low cultural level or visual impairment cannot use mobile phones normally, and the developed mobile internet cannot provide convenience for the old people but brings inconvenience to life for the old people.
The traditional voice recognition technology is to recognize voice as characters and then perform semantic analysis on the characters to acquire instructions sent by a user. However, for the above-mentioned user group, there are several problems with this type of speech recognition technology: 1) the existing voice recognition technology can only recognize Mandarin or dialects with a large number of users, but can not recognize dialects of the small people; 2) the language identification function is too diverse, so that the learning cost of the users cannot be reduced; 3) the accuracy is not high enough and the semantic meaning is easy to be identified as wrong.
And because the mandarin level, cultural level and learning ability of the user are limited, the user can hardly learn the complex operation of the mobile phone, the traditional voice recognition technology can not bring convenience to the user, and the difficulty of the old people in using the mobile phone is increased.
Disclosure of Invention
The technical problem to be solved by the invention is to provide dialect voice dialing communication equipment based on a small vocabulary voice recognition technology, so that the accuracy of voice operation recognition is improved, and the use difficulty is reduced.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows.
Dialect voice dialing communication equipment based on a small vocabulary voice recognition technology comprises a voice input module for inputting data related to the operation method on a mobile phone, a voice feature extraction module for converting the voice data into voice features, a feature storage database for storing the voice features and corresponding operations, a feature matching module for sequentially comparing the voice features recognized this time with the features in the feature storage database, and an operation execution module for controlling the mobile phone to execute voice operations, wherein the output end of the voice input module is connected with the input end of the voice feature extraction module, the output end of the voice feature extraction module is connected with the input end of the feature storage database, the voice feature extraction module and the storage database are respectively connected with the feature matching module, and the output end of the feature matching module is connected with the input end of the operation execution module.
According to the dialect voice dialing communication equipment based on the small vocabulary voice recognition technology, the voice recording module is used for reading voice data with limited operations one by a mobile phone user and respectively recording the voice data into the mobile phone, and a unique ID is distributed to the recorded voice data.
In the dialect voice dialing communication equipment based on the small vocabulary voice recognition technology, the voice feature extraction module adopts the MFCC feature extraction technology for extraction.
According to the dialect voice dialing communication equipment based on the small vocabulary voice recognition technology, the MFCC feature extraction step comprises preprocessing, fast Fourier transform, a Mei filter bank, logarithm operation, discrete cosine transform and dynamic feature extraction.
In the dialect voice dialing communication equipment based on the small vocabulary voice recognition technology, the extracted features and the corresponding operation voice ID and operation ID are stored in the feature storage database.
According to the dialect voice dialing communication equipment based on the small vocabulary voice recognition technology, the voice characteristics are stored in the local mobile phone or uploaded to the server by the characteristic storage database.
In the dialect voice dialing communication equipment based on the small vocabulary voice recognition technology, if the similarity of the features with the highest matching degree in the feature matching module is within an acceptable threshold value, the corresponding operation is executed by the operation execution module; if the number of the operations is lower than the acceptable threshold value, reminding the user to re-enter the operation, or selecting the corresponding correct operation at the time to update the feature database.
Due to the adoption of the technical scheme, the technical progress of the invention is as follows.
The invention adopts a one-person-one database mode to record the voice of the high-frequency operation which is possibly used, establishes the voice characteristic database corresponding to the related operation, and directly compares the voice characteristics of the voice content with the voice characteristics recorded in the database when the mobile phone is operated by voice, thereby improving the accuracy of voice operation recognition and reducing the difficulty of using the mobile phone.
Drawings
FIG. 1 is a block diagram of a process for creating an operating speech feature library according to the present invention;
FIG. 2 is a block diagram of a process for extracting operational speech according to the present invention;
FIG. 3 is a block diagram of the process of matching the target operation in the present invention.
Detailed Description
The invention will be described in further detail below with reference to the figures and specific examples.
Dialect voice dialing communication equipment based on a small vocabulary voice recognition technology comprises a voice recording module, a voice feature extraction module, a feature storage database, a feature matching module and an operation execution module. The voice input module is used for inputting data related to the operation method on collection, the voice feature extraction module is used for converting the voice data into voice features, the feature storage module is used for storing the voice features and corresponding operations, the feature matching module is used for sequentially comparing the voice features recognized at this time with the features in the feature storage database, and the operation execution module is used for executing the voice operations. The output end of the voice input module is connected with the input end of the voice feature extraction module, the output end of the voice feature extraction module is connected with the input end of the feature storage database, the voice feature extraction module and the storage database are respectively connected with the feature matching module, and the output end of the feature matching module is connected with the input end of the operation execution module.
The voice recording module records voice data, preprocesses the recorded voice data through the voice feature extraction module, converts the voice data into voice features, stores the recorded voice features through the feature storage database, and establishes the operation voice feature library. The flow chart is shown in fig. 1, and the specific operation method is to read the voice data of limited operations which may be needed one by the mobile phone user himself and respectively record the voice data into the mobile phone, and assign a unique ID to the voice data so as to facilitate reference. Deleting the first and last sections of blank data of the voice data, reserving effective data, and performing noise reduction processing on the operation voice data so as to facilitate subsequent recognition. The voice format is generally a WMA series standard format, and for the WMA standard, refer to IEC61939 standard WMA specification series documents.
After the input voice data is preprocessed, an original voice segment is processed into an effective segment and is stored in a to-be-processed area of a computer system, so that the information amount needing data processing is greatly reduced.
The voice feature extraction module is used for converting voice data into voice features, and extracting the voice features by adopting an MFCC (Mel cepstrum coefficient) feature extraction technology, and a flow chart of the voice feature extraction module is shown in FIG. 2 and comprises the steps of preprocessing, fast Fourier transform, Mei filter bank, logarithm operation, discrete cosine transform, dynamic feature extraction and the like.
The extracted MFCC features are stored through a feature storage database, the storage position depends on the situation of computing power collection, if the computing power of the mobile phone is enough, the operations of feature extraction, feature comparison and the like can be performed within the time without influencing the experience, and the extracted MFCC features are stored locally to save server resources; and if the local calculation power of the mobile phone is insufficient, uploading the mobile phone to a server, and calculating in the server.
After the features are extracted, the features and the corresponding operation voice ID and operation ID form the content of the whole database.
In addition to storing in the database, it is necessary to separately create an area on the ROM for storing sample voices, each sample voice being named with its ID, which corresponds to the sample characteristic voice ID in the database.
The feature matching module matches the target voice based on the feature similarity of the voice recorded by the matched user when using the mobile phone, and the flow chart is shown in fig. 3, firstly, the feature is compared, the voice features of the current time are compared with the features in the voice database, the feature similarity is calculated, the similarity is sequenced, the voice feature with the highest similarity is selected, and the corresponding operation ID is the operation ID corresponding to the current operation of the user.
And then feeding back the result through the operation execution module, and if the similarity of the features with the highest matching degree is within an acceptable threshold, executing the corresponding operation. If the number of the operations is lower than the acceptable threshold value, reminding the user to re-enter the operation, or selecting the corresponding correct operation at the time to update the feature database.
The invention discloses a method for operating a mobile phone by adopting a small vocabulary speech recognition technology, which comprises the following steps:
1) the data related to the operation method is entered with the help of a mobile phone salesman or other users familiar with the operation of the mobile phone, and the data comprises the following steps: and voice data of the corresponding operation of each high-frequency operation, voice data of the contact person and the mobile phone number of the corresponding contact person. The voice data is read by the user in person under the direction of the sales force.
The user records the name of the contact person and the pronunciation of the high-frequency operation, so that the steps of recognizing characters, understanding natural language and the like are avoided, and the characters of a certain language do not need to be recognized firstly and then grammar, semantics and context understanding are not needed in the voice recognition operation.
2) And starting a voice feature extraction module to convert the voice data into voice features, store the voice features and the corresponding operation contents in a feature storage database.
3) When the user uses the device, the user presses a special button to input voice data (consistent with the input content when the user activates) needing to execute the operation.
4) And starting a voice feature extraction module to convert the voice data into voice features, and extracting the features of the command record sent by the user.
5) The voice characteristics recognized this time are compared with the characteristics in the characteristic storage database in sequence by the characteristic matching module, and the characteristics with the shortest distance to the voice characteristics, namely the closest characteristics, are found out.
6) And executing the corresponding operation by the operation execution module.
The invention utilizes the small vocabulary speech recognition technology, avoids text recognition and semantic analysis in the traditional speech recognition, directly compares the characteristics of the command with the characteristics of the recording pre-recorded by the user in the mode base, only needs the same command sent by the user and pre-recorded, and can be applied to all languages and dialects. The technology is applied to the mobile phone, so that people with poor eyesight and low cultural degree can normally use the basic function of the mobile phone through voice, and the threshold caused by the fact that the traditional voice recognition technology has high requirements on the standard Chinese level is avoided.
Claims (7)
1. Dialect voice dialing communication equipment based on small vocabulary voice recognition technology is characterized in that: the mobile phone voice recognition method comprises a voice input module for inputting data related to the operation method, a voice feature extraction module for converting the voice data into voice features, a feature storage database for storing the voice features and corresponding operations, a feature matching module for sequentially comparing the voice features recognized this time with the features in the feature storage database, and an operation execution module for controlling the mobile phone to execute the voice operations, wherein the output end of the voice input module is connected with the input end of the voice feature extraction module, the output end of the voice feature extraction module is connected with the input end of the feature storage database, the voice feature extraction module and the storage database are respectively connected with the feature matching module, and the output end of the feature matching module is connected with the input end of the operation execution module.
2. The dialect voice dialing communication device of claim 1 based on small vocabulary voice recognition technology, wherein: the voice recording module reads voice data of limited operations one by the mobile phone user himself and records the voice data into the mobile phone respectively, and assigns a unique ID for the recorded voice data.
3. The dialect voice dialing communication device of claim 1 based on small vocabulary voice recognition technology, wherein: the voice feature extraction module adopts MFCC feature extraction technology for extraction.
4. The dialect voice dialing communication device of claim 3 based on small vocabulary voice recognition technology, wherein: the MFCC feature extraction step comprises preprocessing, fast Fourier transform, Mei filter bank, logarithm operation, discrete cosine transform and dynamic feature extraction.
5. The dialect voice dialing communication device of claim 1 based on small vocabulary voice recognition technology, wherein: the feature storage database stores the extracted features and the operation voice ID and the operation ID corresponding to the extracted features.
6. The dialect voice dialing communication device of claim 1 based on small vocabulary voice recognition technology, wherein: the feature storage database stores the voice features locally in the mobile phone or uploads the voice features to a server.
7. The dialect voice dialing communication device of claim 1 based on small vocabulary voice recognition technology, wherein: the operation execution module executes corresponding operation if the similarity of the feature with the highest matching degree in the feature matching module is within an acceptable threshold; if the number of the operations is lower than the acceptable threshold value, reminding the user to re-enter the operation, or selecting the corresponding correct operation at the time to update the feature database.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110253408.XA CN112885331A (en) | 2021-03-09 | 2021-03-09 | Dialect voice dialing communication equipment based on small vocabulary voice recognition technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110253408.XA CN112885331A (en) | 2021-03-09 | 2021-03-09 | Dialect voice dialing communication equipment based on small vocabulary voice recognition technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112885331A true CN112885331A (en) | 2021-06-01 |
Family
ID=76053841
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110253408.XA Pending CN112885331A (en) | 2021-03-09 | 2021-03-09 | Dialect voice dialing communication equipment based on small vocabulary voice recognition technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112885331A (en) |
Citations (5)
* Cited by examiner, † Cited by third partyPublication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102209154A (en) * | 2010-03-31 | 2011-10-05 | 鸿富锦精密工业(深圳)有限公司 | Voice dialing system and method thereof |
CN104717339A (en) * | 2013-12-13 | 2015-06-17 | 富泰华工业(深圳)有限公司 | Contact query system, contact query method and communication device |
US20150381787A1 (en) * | 2014-06-26 | 2015-12-31 | Anatoliy Babayev | Voice Only Phone and Method of Operation |
CN105653596A (en) * | 2015-12-22 | 2016-06-08 | 惠州Tcl移动通信有限公司 | Quick startup method and device of specific function on the basis of voice frequency comparison |
CN111798845A (en) * | 2020-05-28 | 2020-10-20 | 厦门快商通科技股份有限公司 | Off-line voice control method and device for smart home and storage device |
-
2021
- 2021-03-09 CN CN202110253408.XA patent/CN112885331A/en active Pending
Patent Citations (5)
* Cited by examiner, † Cited by third partyPublication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102209154A (en) * | 2010-03-31 | 2011-10-05 | 鸿富锦精密工业(深圳)有限公司 | Voice dialing system and method thereof |
CN104717339A (en) * | 2013-12-13 | 2015-06-17 | 富泰华工业(深圳)有限公司 | Contact query system, contact query method and communication device |
US20150381787A1 (en) * | 2014-06-26 | 2015-12-31 | Anatoliy Babayev | Voice Only Phone and Method of Operation |
CN105653596A (en) * | 2015-12-22 | 2016-06-08 | 惠州Tcl移动通信有限公司 | Quick startup method and device of specific function on the basis of voice frequency comparison |
CN111798845A (en) * | 2020-05-28 | 2020-10-20 | 厦门快商通科技股份有限公司 | Off-line voice control method and device for smart home and storage device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109410664B (en) | 2021-01-26 | Pronunciation correction method and electronic equipment |
US8612212B2 (en) | 2013-12-17 | Method and system for automatically detecting morphemes in a task classification system using lattices |
US6681206B1 (en) | 2004-01-20 | Method for generating morphemes |
CN108536654B (en) | 2022-05-17 | Method and device for displaying identification text |
US20050033575A1 (en) | 2005-02-10 | Operating method for an automated language recognizer intended for the speaker-independent language recognition of words in different languages and automated language recognizer |
WO2002035519A1 (en) | 2002-05-02 | Speech recognition using word-in-phrase command |
CN112466287B (en) | 2023-06-27 | Voice segmentation method, device and computer readable storage medium |
CN111159375A (en) | 2020-05-15 | Text processing method and device |
CN117010907A (en) | 2023-11-07 | Multi-mode customer service method and system based on voice and image recognition |
CN112818680B (en) | 2023-08-01 | Corpus processing method and device, electronic equipment and computer readable storage medium |
CN112233680A (en) | 2021-01-15 | Speaker role identification method and device, electronic equipment and storage medium |
Arslan et al. | 2020 | A detailed survey of Turkish automatic speech recognition |
EP1617409B1 (en) | 2011-04-20 | Multimodal method to provide input to a computing device |
CN117352000A (en) | 2024-01-05 | Speech classification method, device, electronic equipment and computer readable medium |
CN113555133A (en) | 2021-10-26 | Medical inquiry data processing method and device |
CN113051384A (en) | 2021-06-29 | User portrait extraction method based on conversation and related device |
US20110119052A1 (en) | 2011-05-19 | Speech recognition dictionary creating support device, computer readable medium storing processing program, and processing method |
CN110809796B (en) | 2020-09-18 | Speech recognition system and method with decoupled wake phrases |
JPWO2014033855A1 (en) | 2016-08-08 | Voice search device, computer-readable storage medium, and voice search method |
CN113112996A (en) | 2021-07-13 | System and method for speech-based audio and text alignment |
CN117634471A (en) | 2024-03-01 | NLP quality inspection method and computer readable storage medium |
CN112885331A (en) | 2021-06-01 | Dialect voice dialing communication equipment based on small vocabulary voice recognition technology |
JPH1097285A (en) | 1998-04-14 | Speech recognition system |
CN116052655A (en) | 2023-05-02 | Audio processing method, device, electronic equipment and readable storage medium |
CN115577712A (en) | 2023-01-06 | Text error correction method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
2021-06-01 | PB01 | Publication | |
2021-06-01 | PB01 | Publication | |
2021-06-18 | SE01 | Entry into force of request for substantive examination | |
2021-06-18 | SE01 | Entry into force of request for substantive examination | |
2022-12-09 | RJ01 | Rejection of invention patent application after publication | |
2022-12-09 | RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210601 |