patents.google.com

CN110211606B - A replay attack detection method for voice authentication system - Google Patents

️Tue Apr 06 2021

CN110211606B - A replay attack detection method for voice authentication system - Google Patents

A replay attack detection method for voice authentication system Download PDF

Info

Publication number

CN110211606B

CN110211606B CN201910303649.3A CN201910303649A CN110211606B CN 110211606 B CN110211606 B CN 110211606B CN 201910303649 A CN201910303649 A CN 201910303649A CN 110211606 B CN110211606 B CN 110211606B Authority

China

Prior art keywords

voice

value

sequence

polarity

signal

Prior art date

2019-04-12

Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)

Active

Application number

CN201910303649.3A

Other languages

Chinese (zh)

Other versions

CN110211606A (en

Inventor

冀晓宇

龙颜

徐文渊

闫琛

Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)

Zhejiang University ZJU

Original Assignee

Zhejiang University ZJU

Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)

2019-04-12

Filing date

2019-04-12

Publication date

2021-04-06

2019-04-12 Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU

2019-04-12 Priority to CN201910303649.3A priority Critical patent/CN110211606B/en

2019-09-06 Publication of CN110211606A publication Critical patent/CN110211606A/en

2021-04-06 Application granted granted Critical

2021-04-06 Publication of CN110211606B publication Critical patent/CN110211606B/en

Status Active legal-status Critical Current

2039-04-12 Anticipated expiration legal-status Critical

Images

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

Engineering & Computer Science (AREA)
Computational Linguistics (AREA)
Signal Processing (AREA)
Health & Medical Sciences (AREA)
Audiology, Speech & Language Pathology (AREA)
Human Computer Interaction (AREA)
Physics & Mathematics (AREA)
Acoustics & Sound (AREA)
Multimedia (AREA)
Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Collating Specific Patterns (AREA)
Lock And Its Accessories (AREA)

Abstract

The invention discloses a replay attack detection method of a voice authentication system based on voice signal time domain polarity. The voice authentication system collects and records voice signals, extracts positive polarity signals and negative polarity signals of the voice signals, compares the proportional relation of the positive polarity signals and the negative polarity signals, and judges whether the voice signals belong to replay attack or living voice: if the proportion difference of the positive polarity part and the negative polarity part is large and the proportion of the positive polarity signal is higher than that of the negative polarity signal, the attack is considered to be replay attack; if the proportion difference of the positive polarity part and the negative polarity part is large and the proportion of the positive polarity signal is not higher than that of the negative polarity signal, the voice is regarded as the living voice. The invention can accurately and effectively detect the replay attack in the voice authentication system.

Description

Replay attack detection method of voice authentication system

Technical Field

The invention belongs to the technical field of voice authentication technology and security, and particularly relates to a software processing method capable of detecting replay attack aiming at a voice authentication system.

Background

The voice authentication system is a safety authentication system which extracts the voice specificity characteristics of a speaker by using a voice authentication technology and identifies the identity of the speaker by matching voice characteristic modes. Due to the characteristics of low hardware requirement, low cost, simple and convenient authentication and capability of performing remote non-contact authentication, the method has gradually become a mainstream user authentication and access control mode. However, existing voice authentication systems are generally vulnerable to replay attacks.

The replay attack aiming at the voice authentication system means that an attacker records and collects real and legal user voice sample fragments in advance, and broadcasts the real and legal user voice sample fragments through a loudspeaker directly or after splicing so as to deceive the voice authentication system. Replay attacks, which do not require the attack originator to have knowledge of the speech signal processing and with the development of electronic device technology, high quality and low cost speakers have become more common, all of which make replay attacks the easiest but most threatening attacks on speech authentication systems; but at the same time, the replay attack is extremely difficult to discover and defend.

To detect and defend against replay attacks, knowledge of the acoustic-to-electrical and electro-acoustic conversion mechanisms of the microphone and speaker is required. Microphones, speakers, etc. are transducers for sound wave-to-electromagnetic signal conversion. The microphone converts the mechanical energy of vibration into the electric energy of an electric signal by utilizing the Faraday electromagnetic induction effect through the film vibration caused by sound waves; the loudspeaker converts the electric signal into kinetic energy of the film in a computer reverse direction, so that the film disturbs air to form sound waves, and the sound before being converted into the electric signal is restored.

Ideally, the conversion of the microphone and the loudspeaker is a completely reciprocal process, i.e. as in fig. 1 below, the acoustic signal 1 should be identical to the acoustic signal 2. In reality, however, the two signals tend to be different. The main reasons for the difference between the two are two reasons: 1) in the electric signal path of the microphone and the loudspeaker, circuits such as a power amplifier, an input and output filter, an AD/DA converter and the like can introduce noise into the electric signal; 2) when the vibration of the vibrating membrane realizes the electro-acoustic and acoustic-electric conversion, the motion mode of the vibrating membrane is changed due to various mechanical resistances, and signals before and after the conversion are inconsistent.

Since in a replay attack, the voice signal (here, the abstract sum of the acoustic signal and the electrical signal) from the person to be authenticated is received by the voice authentication system microphone, and passes through an additional set of microphone-loudspeaker attack hardware than the live user directly authenticates, the voice signal of the replay attack will contain more noise and distortion due to the change of the diaphragm motion pattern than the live authentication. By detecting these distortions, replay attacks can be detected and protected in theory.

There have been many related studies to detect replay attacks by detecting the introduction of noise by the attack hardware. The detection method has the characteristics of low detection accuracy and large influence on the quality of a microphone and a loudspeaker used by replay attack. However, no research has been focused on the distortion of the speech signal caused by the change in the diaphragm motion pattern on the hardware path of the attack device.

Disclosure of Invention

In order to solve the technical problems in the background art, the invention provides a replay attack detection method of a voice authentication system based on the time domain polarity of a voice signal, which can accurately and effectively detect the replay attack by detecting the time domain polarity characteristic of the voice signal collected by the voice authentication system.

The invention adopts the following technical scheme:

the invention collects and records the voice signal through the voice authentication system, extracts the positive polarity signal and the negative polarity signal of the voice signal, compares the proportional relation of the positive polarity signal and the negative polarity signal, judges and obtains whether the voice signal belongs to replay attack (sound emitted by a recording device) or living voice (namely sound emitted by a living user):

if the proportion difference of the positive polarity part and the negative polarity part is large and the proportion of the positive polarity signal is higher than that of the negative polarity signal, the attack is considered to be replay attack;

if the proportion difference of the positive polarity part and the negative polarity part is large and the proportion of the positive polarity signal is not higher than that of the negative polarity signal, the voice is regarded as the living voice.

The method specifically comprises the following steps:

1) voice activity detection is carried out on voice signals collected and collected by a voice authentication system at intervals of a certain sampling frequency, noise in the voice signals is removed, and a part of the voice audio signals is extracted to be used as a pure voice part;

the voice activity detection used by the method of the invention mainly judges whether the voice signal of the appointed section is pure human voice or noise through the signal amplitude and the duration.

2) And (3) performing polarity index calculation on the obtained time domain pure human voice signal:

the pure human voice signal sequence S is a sequence containing N sampling points, wherein the number of all sampling points with positive sampling values is N_posThe absolute value of the Sum of the sample values of all the sample points whose sample value is positive is | Sum_posL, the number of all sampling points with negative sampling value is N_negThe absolute value of the Sum of the sample values of all the sample points whose sample value is negative is | Sum_negAnd obtaining a polarity value I by adopting the following formula:

The step 1) is specifically as follows:

1.2) extracting all sampling points of the voice signal Sa with sampling value absolute value larger than signal amplitude threshold value | Athr |, and forming a first sequence (Sa)_i1,Sa_i2,Sa_i3,...Sa_ix) And has 1<＝i₁<i₂<i₃<...<i_x<N, i is an index ordinal value of the sampling point in the voice signal Sa sequence, and N represents the total number of sampling points in the voice signal Sa sequence;

1.3) to the first sequence (Sa)_i1,Sa_i2,Sa_i3,...Sa_ix) In (ii), initially with the i-th₁Using the sampling point as a reference sampling point, firstly from the ith₁The index ordinal value of each sampling point starts to traverse backwards to search the index ordinal value of each sampling point: if it is the ith_pIndex order value and ith of sampling point_(p-1)The difference of the index ordinal number values of the sampling points is larger than a preset ordinal number threshold value D₁Then will be the ith_p-1Sampling point and ith₁A first sequence (Sa) between sampling points_i1,Sa_i2,Sa_i3,...Sa_ix) All sample points in (1) constitute a 1 st subset sequence Ssub 1;

1.4) then from the i-th_pTaking the sampling points as the starting points and continuously repeating the step 1.3) backwards, and taking the ith sampling point as the starting point_q(q>A first sequence (Sa) between p) samples and its nearest preceding reference sample_i1,Sa_i2,Sa_i3,...Sa_ix) Until going through to the last Sa_ixSampling points, and finally obtaining a y-th subset sequence Ssub;

1.5) for the 1 st subset sequence Ssub1 through the y th subset sequence Ssub (y)>1) judging whether each sampling point in each subset sequence meets the condition that the difference between the maximum index sequence value and the minimum index sequence value of the sampling point is larger than a preset index threshold value D₂Finally, the difference between the maximum index sequence value and the minimum index sequence value of all the satisfied sampling points is larger than a preset index threshold value D₂The subset sequences of (a) are combined into a pure personA sequence S of acoustic speech signals.

The invention discovers that in the process of living body authentication, because the vocal cord vibration sound production mode of a human body is relatively fixed, the living body voice directly recorded by an authentication system basically has the characteristics that the proportion difference of positive and negative polarity parts of signals is large, and the proportion of positive polarity signals is higher than that of negative polarity signals.

When the attack is replayed, the vibration mode of the diaphragm is changed due to the attack of a hardware channel of the equipment, and the voice signal basically has the characteristics that the proportion of positive and negative polarity parts is equivalent, even the proportion of a negative polarity signal is higher than that of a positive polarity signal.

The invention can simply and effectively judge whether the voice signal comes from a live speaker or a replay attack speaker by detecting the comparison (time domain polarity) of the positive and negative polarity signals of the voice signal collected by the hardware of the voice authentication system.

The invention has the beneficial effects that:

the invention realizes the detection and defense to replay attack under the condition of only processing the voice authentication time domain signal. The method is very simple and effective, has few processing steps and low algorithm complexity, and has the advantages of high correction and low delay; meanwhile, because the detected object is irrelevant to the noise mixed in the electric signal paths of the microphone and the loudspeaker, the detection success rate of the method is not influenced by the tone quality of the microphone and the loudspeaker used by replay attack, namely, the method has the same defense effect on the attack initiated by the loudspeaker and the microphone with different quality grades.

The invention can accurately and effectively detect the replay attack in the voice authentication system.

Drawings

Fig. 1 is a schematic diagram of the conversion process of a microphone and a speaker in an ideal case.

FIG. 2 is a flow chart of the detection method of the present invention.

Fig. 3 is a voice signal detection diagram of the embodiment.

Detailed Description

The invention is further illustrated by the following figures and examples.

The specific implementation process of the invention is as follows:

1) voice activity detection is carried out on voice signals collected by a voice authentication system at intervals, noise in the voice signals is removed, and a part of voice audio signals is extracted to be used as a pure voice part;

1.2) extracting all sampling points of the voice signal Sa with sampling value absolute value larger than signal amplitude threshold value | Athr |, and forming a first sequence (Sa)_i1,Sa_i2,Sa_i3,...Sa_ix)，Sa_i1,Sa_i2,Sa_i3,...Sa_ixRespectively represent the ith₁Sampling point to ith_xA sampling value of 1<＝i₁<i₂<i₃<...<i_x<N, i is an index ordinal value of the sampling point in the voice signal Sa sequence, and N represents the total number of sampling points in the voice signal Sa sequence;

1.4) then from the i-th_pTaking the sampling points as the starting points and continuously repeating the step 1.3) backwards, and taking the ith sampling point as the starting point_q(q>A first sequence (Sa) between p) samples and its nearest preceding reference sample_i1,Sa_i2,Sa_i3,...Sa_ix) Until going through to the last Sa_ixIndividual miningSampling points, and finally obtaining a y-th subset sequence Ssub;

2) And (3) performing polarity index calculation on the obtained time domain pure human voice signal:

3) the obtained polarity value I and a preset polarity threshold value I are compared_thrAnd (3) comparison: when the polarity value I is larger than the polarity threshold value I_thrI.e. I>I_thrWhen the voice signal accords with the polarity characteristics of the voice signal of the living user, the voice signal is judged to be the living voice; otherwise, the attack is judged to be replay attack.

The first embodiment is as follows:

in fig. 3, the upper channel is a living body authentication voice signal obtained by the voice authentication system, and the lower channel is a voice signal obtained by a HiVi acoustic replay attack. It is clear that the positive polarity proportion of the live speech signal is much higher than the negative polarity proportion, whereas the replay attack signal is exactly the opposite. After the detection method is processed by the first two steps (voice activity detection and polarity index calculation), the polarity index of the living body authentication voice signal is calculated to be 0.583, which is obviously greater than the polarity index of the replay attack voice signal to be 0.494.

Example two:

in this example, the biometric voice of 20 persons (14 men and 6 women) in total was collected, and the replay attack was performed using 8 speakers having a wide mass distribution including the HiVi audio. And setting the judgment threshold value to be 0.52, namely judging the voice with the polarity index larger than 0.52 as the living voice, and obtaining the accuracy rate of detecting the living voice and the accuracy rate of detecting the replay attack by 96.5 percent when the voice is judged to be the replay attack reversely.

Claims (1)

1. A replay attack detection method of a voice authentication system is characterized in that: the voice authentication system collects and records voice signals, extracts positive polarity signals and negative polarity signals of the voice signals, compares the proportional relation of the positive polarity signals and the negative polarity signals, and judges whether the voice signals belong to replay attack or living voice: if the proportion difference of the positive polarity part and the negative polarity part is large and the proportion of the positive polarity signal is higher than that of the negative polarity signal, the attack is considered to be replay attack; if the proportion difference of the positive polarity part and the negative polarity part is large and the proportion of the positive polarity signal is not higher than that of the negative polarity signal, the voice is regarded as living voice;

the method comprises the following specific steps:

2) and (3) performing polarity index calculation on the obtained time domain pure human voice signal:

the step 1) is specifically as follows:

1.5) Ssub1 for the 1 st subset sequenceTo the y-th subset sequence Ssub (y)>1) judging whether each sampling point in each subset sequence meets the condition that the difference between the maximum index sequence value and the minimum index sequence value of the sampling point is larger than a preset index threshold value D₂Finally, the difference between the maximum index sequence value and the minimum index sequence value of all the satisfied sampling points is larger than a preset index threshold value D₂The subset sequences of (a) are combined into a pure human speech signal sequence S.

CN201910303649.3A 2019-04-12 2019-04-12 A replay attack detection method for voice authentication system Active CN110211606B (en)

Priority Applications (1)

Application Number	Priority Date	Filing Date	Title
CN201910303649.3A CN110211606B (en)	2019-04-12	2019-04-12	A replay attack detection method for voice authentication system

Applications Claiming Priority (1)

Application Number	Priority Date	Filing Date	Title
CN201910303649.3A CN110211606B (en)	2019-04-12	2019-04-12	A replay attack detection method for voice authentication system

Publications (2)

Publication Number	Publication Date
CN110211606A CN110211606A (en)	2019-09-06
CN110211606B true CN110211606B (en)	2021-04-06

Family

ID=67785410

Family Applications (1)

Application Number	Title	Priority Date	Filing Date
CN201910303649.3A Active CN110211606B (en)	2019-04-12	2019-04-12	A replay attack detection method for voice authentication system

Country Status (1)

Country	Link
CN (1)	CN110211606B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party

Publication number	Priority date	Publication date	Assignee	Title
CN111243600A (en) *	2020-01-10	2020-06-05	浙江大学	A detection method of speech spoofing attack based on sound field and field pattern
CN112151038B (en) *	2020-09-10	2022-12-16	达闼机器人股份有限公司	Voice replay attack detection method and device, readable storage medium and electronic equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party

Publication number	Priority date	Publication date	Assignee	Title
CN1928991B (en) *	2006-07-20	2012-07-11	中山大学	An audio watermarking method against synchronization attack
CN101115124B (en) *	2006-07-26	2012-04-18	日电（中国）有限公司	Method and device for identifying media program based on audio watermark
CN106297772B (en) *	2016-08-24	2019-06-25	武汉大学	Replay attack detection method based on the voice signal distorted characteristic that loudspeaker introduces
CN106531172B (en) *	2016-11-23	2019-06-14	湖北大学	Speaker voice playback identification method and system based on environmental noise change detection
CN109448759A (en) *	2018-12-28	2019-03-08	武汉大学	A kind of anti-voice authentication spoofing attack detection method based on gas explosion sound

2019
- 2019-04-12 CN CN201910303649.3A patent/CN110211606B/en active Active

Also Published As

Publication number	Publication date
CN110211606A (en)	2019-09-06

Publication	Publication Date	Title
Nassi et al.	2020	Lamphone: Real-time passive sound recovery from light bulb vibrations
CN104040627B (en)	2017-07-21	Method and apparatus for wind noise detection
Malik	2013	Acoustic environment identification and its applications to audio forensics
US20200043484A1 (en)	2020-02-06	Detection of replay attack
US20120290297A1 (en)	2012-11-15	Speaker Liveness Detection
CN102246541A (en)	2011-11-16	System, method and hearing aids for in situ occlusion effect measurement
CN106664486A (en)	2017-05-10	Method and apparatus for wind noise detection
WO2016184138A1 (en)	2016-11-24	Method, mobile terminal and computer storage medium for adjusting audio parameters
CN110211606B (en)	2021-04-06	A replay attack detection method for voice authentication system
Shang et al.	2020	Voice liveness detection for voice assistants using ear canal pressure
CN112312258B (en)	2023-04-07	Intelligent earphone with hearing protection and hearing compensation
Ganguly et al.	2017	Real-time Smartphone implementation of noise-robust Speech source localization algorithm for hearing aid users
Li et al.	2023	Enrollment-stage backdoor attacks on speaker recognition systems via adversarial ultrasound
JP2000148184A (en)	2000-05-26	Speech recognizing device
CN111243600A (en)	2020-06-05	A detection method of speech spoofing attack based on sound field and field pattern
CN111161753A (en)	2020-05-15	Safe voice interaction method and system based on intelligent terminal
Shang et al.	2018	Srvoice: A robust sparse representation-based liveness detection system
CN116453537B (en)	2023-09-05	Method and system for improving audio information transmission effect
Anand et al.	2017	Coresident evil: Noisy vibrational pairing in the face of co-located acoustic eavesdropping
CN116599742A (en)	2023-08-15	Equipment authentication method based on equipment audio sensor
You et al.	2019	Device Feature Extractor for Replay Spoofing Detection.
CN112581975B (en)	2024-05-17	Ultrasonic voice command defense method based on signal aliasing and binaural correlation
JP2015125184A (en)	2015-07-06	Sound signal processing device and program
CN115989683B (en)	2025-01-07	Method and system for authentication and compensation
CN106328159B (en)	2021-07-09	Audio stream processing method and device

Legal Events

Date	Code	Title
2019-09-06	PB01	Publication
2019-09-06	PB01	Publication
2019-10-08	SE01	Entry into force of request for substantive examination
2019-10-08	SE01	Entry into force of request for substantive examination
2021-04-06	GR01	Patent grant
2021-04-06	GR01	Patent grant

CN110211606B - A replay attack detection method for voice authentication system - Google Patents