patents.google.com

JP2016177153A - Communication system, communication method, and program - Google Patents

️Thu Oct 06 2016

JP2016177153A - Communication system, communication method, and program - Google Patents

Communication system, communication method, and program Download PDF

Info

Publication number

JP2016177153A

JP2016177153A JP2015057620A JP2015057620A JP2016177153A JP 2016177153 A JP2016177153 A JP 2016177153A JP 2015057620 A JP2015057620 A JP 2015057620A JP 2015057620 A JP2015057620 A JP 2015057620A JP 2016177153 A JP2016177153 A JP 2016177153A Authority

Japan

Prior art keywords

signal

terminal

packet

slave terminal

packet transmission

Prior art date

2015-03-20

Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)

Granted

Application number

JP2015057620A

Other languages

Japanese (ja)

Other versions

JP6377557B2 (en

Inventor

達也加古

Tatsuya Kako

達也加古

和則小林

Kazunori Kobayashi

和則小林

仲大室

Hitoshi Omuro

仲大室

Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)

Nippon Telegraph and Telephone Corp

Original Assignee

Nippon Telegraph and Telephone Corp

Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)

2015-03-20

Filing date

2015-03-20

Publication date

2016-10-06

2015-03-20 Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp

2015-03-20 Priority to JP2015057620A priority Critical patent/JP6377557B2/en

2016-10-06 Publication of JP2016177153A publication Critical patent/JP2016177153A/en

2018-08-22 Application granted granted Critical

2018-08-22 Publication of JP6377557B2 publication Critical patent/JP6377557B2/en

Status Active legal-status Critical Current

2035-03-20 Anticipated expiration legal-status Critical

Landscapes

Studio Devices (AREA)
Circuit For Audible Band Transducer (AREA)
Small-Scale Networks (AREA)

Abstract

PROBLEM TO BE SOLVED: To synchronize communication among terminals disposed in a wide range.SOLUTION: A slave terminal 2 stores a communication object signal in a packet and transmits the packet to a packet reception unit 121. The packet reception unit 121 receives the packet from the slave terminal 2 and extracts the signal. A delay amount determination unit 124 measures a packet transmission time per slave terminal 2, and finds the arithmetic mean of the packet transmission times as a delay amount of the slave terminal 2. A delay buffer processing unit 125 delays the signal by the amount corresponding to the delay amount of the slave terminal 2 per slave terminal 2 to generate a signal after delay.SELECTED DRAWING: Figure 2

Description

この発明は、スレーブ端末で生成した信号をマスター端末へネットワークを介して伝送し、マスター端末で信号を同期して所望の処理を行う通信技術に関する。 The present invention relates to a communication technique in which a signal generated by a slave terminal is transmitted to a master terminal via a network, and the master terminal performs signal processing in synchronization with the signal.

デジタル端末（例えば、スマートフォンやパーソナルコンピュータ、ビデオカメラ、カーナビゲーションシステム、テレビなど）で音声を用いるサービス（例えば、テレビ会議や動画撮影、防犯記録など）を利用する場合、デジタル端末に搭載されているマイクを用いて、そのサービスで利用する音を取得する。しかし、例えば、デジタル端末がモノラルマイクしか搭載していない場合や、ハンズフリーによる収音に特化したチューニングがされており遠方の音のＳ／Ｎ比が悪い場合など、デジタル端末から離れた場所の音を取得することが困難な場合がある。また、ビデオカメラでは通常ステレオマイクを搭載しているが、指向性が低いため離れた音をピンポイントで取得することは困難である。 When using a service (for example, video conference, video shooting, security record, etc.) that uses sound on a digital terminal (for example, a smartphone, personal computer, video camera, car navigation system, television, etc.), it is installed in the digital terminal. Using a microphone, obtain the sound used for the service. However, for example, when the digital terminal has only a monaural microphone, or when it is tuned specifically for picking up sound by hands-free and has a poor S / N ratio for far-field sound, a place away from the digital terminal It may be difficult to get the sound of. In addition, a video camera usually has a stereo microphone, but it is difficult to pinpoint a distant sound because of its low directivity.

このような問題を解決するために非特許文献１から５のような従来技術がある。非特許文献１では、デジタル端末にイヤホン端子やUSB（Universal Serial Bus）端子を介して外付けのステレオマイクや多チャンネルマイクを接続することでマイクの音響特性を改善し、適切な方向の音のみを取得する技術が記載されている。非特許文献２には、有線で接続された単一指向性のクリップ付きのマイクロホンが記載されている。非特許文献３には、指向性を持ったガンマイク型のモノラルマイクロホンが記載されている。非特許文献４には、デジタル端末から離れた音を取得する技術として、デジタル端末とヘッドセットをBluetooth（登録商標）でペアリングして接続し、ヘッドセットからの音をデジタル端末に伝送してワイヤレスで音声を取得する技術が記載されている。非特許文献５には、手元のリモコンにマイクを導入して音声を取得するテレビが記載されている。なお、以降の説明では、音声サービスを制御するデジタル端末をマスター端末と呼び、マスター端末から離れた場所で音声を取得する機器をスレーブ端末と呼ぶ。 In order to solve such problems, there are conventional techniques such as Non-Patent Documents 1 to 5. In Non-Patent Document 1, the acoustic characteristics of the microphone are improved by connecting an external stereo microphone or multi-channel microphone to the digital terminal via an earphone terminal or a USB (Universal Serial Bus) terminal, and only sound in an appropriate direction is received. The technology to acquire is described. Non-Patent Document 2 describes a microphone with a unidirectional clip connected by wire. Non-Patent Document 3 describes a gun microphone type monaural microphone having directivity. In Non-Patent Document 4, as a technique for acquiring sound away from a digital terminal, the digital terminal and a headset are paired and connected via Bluetooth (registered trademark), and the sound from the headset is transmitted to the digital terminal. A technique for wirelessly acquiring audio is described. Non-Patent Document 5 describes a television that acquires sound by introducing a microphone into a remote controller at hand. In the following description, a digital terminal that controls voice service is called a master terminal, and a device that acquires voice at a location away from the master terminal is called a slave terminal.

株式会社ズーム、“iQ7オペレーション・マニュアル”、[online]、［平成27年2月24日検索］、インターネット<URL: http://www.zoom.co.jp/download/J_iQ7.pdf>Zoom Co., Ltd., “iQ7 Operation Manual”, [online], [Search February 24, 2015], Internet <URL: http://www.zoom.co.jp/download/J_iQ7.pdf> 株式会社オーディオテクニカ、“AT9902iS | マイクロホン”、[online]、［平成27年2月24日検索］、インターネット<URL: https://www.audio-technica.co.jp/atj/show_model.php?modelId=970>Audiotechnica, Inc., “AT9902iS | Microphone”, [online], [Search February 24, 2015], Internet <URL: https://www.audio-technica.co.jp/atj/show_model.php? modelId = 970> 株式会社オーディオテクニカ、“AT9913iS | マイクロホン”、[online]、［平成27年2月24日検索］、インターネット<URL: https://www.audio-technica.co.jp/atj/show_model.php?modelId=971>Audio Technica, Inc., “AT9913iS | Microphone”, [online], [Search February 24, 2015], Internet <URL: https://www.audio-technica.co.jp/atj/show_model.php? modelId = 971> エレコム株式会社、“LBT-MPHS510シリーズ、LBT-PCHS510シリーズ取扱説明書”、[online]、［平成27年2月24日検索］、インターネット<URL: http://www.elecom.co.jp//support/manual/avd/headphone/bluetooth/LBT-HS510_manual_v2.pdf>Elecom Corporation, “LBT-MPHS510 Series, LBT-PCHS510 Series Instruction Manual”, [online], [Search February 24, 2015], Internet <URL: http://www.elecom.co.jp/ /support/manual/avd/headphone/bluetooth/LBT-HS510_manual_v2.pdf> パナソニック株式会社、“4K対応テレビ AX800/AX800Fシリーズ（液晶）”、[online]、［平成27年2月24日検索］、インターネット<URL: http://panasonic.jp/viera/products/ax800_800f/>Panasonic Corporation, “4K-compatible TV AX800 / AX800F Series (LCD)”, [online], [Search February 24, 2015], Internet <URL: http://panasonic.jp/viera/products/ax800_800f/ >

外付けのガンマイクを接続する方法では、ガンマイクを向けた方向の音の感度を上げて取得することができるが、向けた方向に存在する雑音なども合わせて感度が上がってしまう。また、ガンマイクであっても収音できる範囲には限界があり、ガンマイクから例えば３メートルほど離れた発話者の音声を取得するとＳ／Ｎ比が劣化してしまう。 In the method of connecting an external gun microphone, it is possible to obtain the sound by increasing the sensitivity of the sound in the direction toward the gun microphone. Further, there is a limit to the range in which sound can be picked up even with a gun microphone, and the S / N ratio deteriorates if the voice of a speaker who is separated from the gun microphone by, for example, about 3 meters is acquired.

Bluetoothによりヘッドセットを接続する方法では、マスター端末から離れた音を取得することができるが、マスター端末が持っているマイクに入る音声は遮断してしまうため、例えばビデオ撮影や音声会議など広範囲の音が必要となる利用シーンでは必要な範囲の音声を取得することができない。テレビリモコンのマイクを同時に収音して記録する場合も、同様に、テレビリモコンの音声のみを利用し、マスター端末のマイク収音機能は遮断してしまう。 In the method of connecting a headset with Bluetooth, it is possible to obtain sound that is far from the master terminal, but the sound that enters the microphone of the master terminal is blocked, so a wide range of applications such as video shooting and audio conferencing are used. In usage scenes that require sound, it is not possible to obtain the necessary range of sound. Similarly, when recording and recording the microphone of the TV remote control at the same time, only the sound of the TV remote control is used and the microphone recording function of the master terminal is cut off.

仮に、マスター端末とスレーブ端末とで時間的な同期を考慮せず単純にミキシングして収音した場合、通信網やBluetoothによる伝送遅延によって音が二重に聴こえてしまい、音質が劣化してしまうことがある。 If the master terminal and the slave terminal simply collect the sound without taking into account temporal synchronization, the sound will be heard twice due to the transmission delay due to the communication network or Bluetooth, and the sound quality will deteriorate. Sometimes.

この発明の目的は、このような点に鑑みて、マスター端末とスレーブ端末の間のパケット伝送時間に基づいて時間調整を行うことで広範囲に配置された端末間の通信を同期することができる通信技術を提供することである。 In view of such a point, an object of the present invention is communication that can synchronize communication between terminals arranged in a wide range by performing time adjustment based on packet transmission time between a master terminal and a slave terminal. Is to provide technology.

上記の課題を解決するために、この発明の通信システムは、マスター端末と少なくとも１台のスレーブ端末とを含む通信システムであって、スレーブ端末は、通信対象の信号をパケットに格納してマスター端末へ送信するパケット送信部を含み、マスター端末は、スレーブ端末からパケットを受信し信号を取り出すパケット受信部と、スレーブ端末ごとにパケット伝送時間を計測し、パケット伝送時間の算術平均をスレーブ端末の遅延量として求める遅延量決定部と、スレーブ端末ごとに信号に対してスレーブ端末の遅延量に対応する遅延を与えて遅延後信号を生成する遅延バッファ処理部と、を含む。 In order to solve the above problems, a communication system according to the present invention is a communication system including a master terminal and at least one slave terminal, wherein the slave terminal stores a signal to be communicated in a packet and stores the master terminal The master terminal receives the packet from the slave terminal and extracts the signal, and measures the packet transmission time for each slave terminal, and calculates the arithmetic average of the packet transmission time to the delay of the slave terminal. And a delay buffer processing unit that generates a delayed signal by giving a delay corresponding to the delay amount of the slave terminal to the signal for each slave terminal.

この発明の通信技術によれば、マスター端末とスレーブ端末の間のパケット伝送時間に基づいて時間調整を行うことで広範囲に配置された端末間の通信を同期することができる。この発明を様々な音声を用いるサービスに適用すれば、サービスを提供するマスター端末と収音機能を持つスレーブ端末とがネットワークを介して接続することで、マイク数を増やし広範囲で収音することができる。また、映像撮影サービスと連携すれば、マスター端末がミキシング機能を備えることで複数のマイクからの音声を１つの画面でリアルタイムに処理することができる。広範囲の音を取得した信号は、例えば音声会議システムや映像コンテンツ作成、動画配信サービスなどと連携することで、コンテンツの音声品質を向上することができる。 According to the communication technique of the present invention, communication between terminals arranged over a wide range can be synchronized by performing time adjustment based on the packet transmission time between the master terminal and the slave terminal. If this invention is applied to a service using various voices, a master terminal providing the service and a slave terminal having a sound collection function are connected via a network, so that the number of microphones can be increased and sound can be collected over a wide range. it can. Also, if linked with the video shooting service, the master terminal has a mixing function, so that audio from a plurality of microphones can be processed in real time on one screen. A signal obtained from a wide range of sounds can improve the audio quality of the content by cooperating with, for example, an audio conference system, video content creation, and a video distribution service.

図１は、第一実施形態の通信システムの機能構成を例示する図である。FIG. 1 is a diagram illustrating a functional configuration of a communication system according to the first embodiment. 図２は、音声処理部の機能構成を例示する図である。FIG. 2 is a diagram illustrating a functional configuration of the audio processing unit. 図３は、第一実施形態の通信方法の処理フローを例示する図である。FIG. 3 is a diagram illustrating a processing flow of the communication method according to the first embodiment. 図４は、接続制御部の処理フローを例示する図である。FIG. 4 is a diagram illustrating a processing flow of the connection control unit. 図５は、第一実施形態のＧＵＩ制御部の表示方法を例示する図である。FIG. 5 is a diagram illustrating a display method of the GUI control unit according to the first embodiment. 図６は、第二実施形態の通信システムの機能構成を例示する図である。FIG. 6 is a diagram illustrating a functional configuration of the communication system according to the second embodiment. 図７は、第二実施形態のＧＵＩ制御部の表示方法を例示する図である。FIG. 7 is a diagram illustrating a display method of the GUI control unit according to the second embodiment.

以下、この発明の実施の形態について詳細に説明する。なお、図面中において同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. In addition, the same number is attached | subjected to the component which has the same function in drawing, and duplication description is abbreviate | omitted.

［第一実施形態］
第一実施形態は、デジタル端末であるマスター端末とスレーブ端末との接続を行い、各端末で取得した音声に対して所望の音声処理を行う通信システムである。本形態では、スレーブ端末が音声を取得しマスター端末へ伝送し、マスター端末がスレーブ端末からの音声と自ら取得した音声に対して目的の話者の音声を強調する音声強調処理を行う例を説明する。 [First embodiment]
The first embodiment is a communication system that connects a master terminal, which is a digital terminal, and a slave terminal, and performs desired audio processing on audio acquired by each terminal. In this embodiment, an example is described in which the slave terminal acquires voice and transmits it to the master terminal, and the master terminal performs voice enhancement processing that emphasizes the voice of the target speaker with respect to the voice from the slave terminal and the voice acquired by itself. To do.

本形態の通信システムは、図１に例示するように、１台のマスター端末１とn（≧1）台のスレーブ端末２₁,…,２_nとを含む。マスター端末１とスレーブ端末２₁,…,２_nとは、通信網９を介して通信可能なように接続される。通信網９は、接続される各装置が相互に通信可能なように構成されたパケット交換方式の通信網であり、その通信プロトコルとしてはWi-Fiのような無線LAN（Local Area Network）、NFC（Near Field Connection）やBluetoothのような近距離無線通信などを用いることができる。マスター端末１は、マイクＭ₀、接続制御部１０、ＧＵＩ制御部１１、および音声処理部１２を含む。スレーブ端末２_k（k∈{1,…,n}）は、マイクＭ_k、接続制御部２０_k、Ａ／Ｄ変換部２１_k、エンコード処理部２２_k、およびパケット送信部２３_kを含む。図１では、マイクＭ₀,…,Ｍ_nが各端末に内蔵されるように示しているが各端末の各種のインターフェースを介して接続される外付けマイクとして構成されていても構わない。 As illustrated in FIG. 1, the communication system of the present embodiment includes one master terminal 1 and n (≧ 1) slave terminals 2 ₁ ,..., 2 _n . The master terminal 1 and slave terminals 2 ₁ ,..., 2 _n are connected via a communication network 9 so that they can communicate with each other. The communication network 9 is a packet-switching communication network configured such that connected devices can communicate with each other, and the communication protocol is a wireless local area network (Wi-Fi), NFC, or the like. (Near Field Connection) or near field communication such as Bluetooth can be used. The master terminal 1 includes a microphone M ₀ , a connection control unit 10, a GUI control unit 11, and an audio processing unit 12. The slave terminal 2 _k (kε {1,..., N}) includes a microphone M _k , a connection control unit 20 _k , an A / D conversion unit 21 _k , an encoding processing unit 22 _k , and a packet transmission unit 23 _k . In FIG. 1, microphones M ₀ ,..., M _n are shown to be built in each terminal, but they may be configured as external microphones connected via various interfaces of each terminal.

マスター端末１の音声処理部１２は、図２に例示するように、Ａ／Ｄ変換部１２０、n個のパケット受信部１２１₁,…,１２１_n、n個のデコード処理部１２２₁,…,１２２_n、n個のマイクバッファ処理部１２３₁,…,１２３_n、遅延量決定部１２４、n+1個の遅延バッファ処理部１２５₀,…,１２５_n、n個の音声遅延量推定部１２６₁,…,１２６_n、話者強調処理部１２７、n+1個のノイズ除去部１２８₀,…,１２８_n、およびミキシング部１２９を含む。 As illustrated in FIG. 2, the audio processing unit 12 of the master terminal 1 includes an A / D conversion unit 120, n packet receiving units 121 ₁ ,... 121 _n , n decoding processing units 122 ₁ ,. 122 _n, n-number of the microphone buffer processing unit 123 _1, ..., 123 _n, the delay amount determining section 124, n + 1 pieces of delay buffers processing unit _{_{125 0, ..., 125 n,}} n pieces of audio delay amount estimating section 126 ₁ ,..., 126 _n , speaker emphasis processing unit 127, n + 1 noise removal units 128 ₀ ,..., 128 _n , and mixing unit 129.

マスター端末１およびスレーブ端末２の各端末は、例えば、中央演算処理装置（CPU: Central Processing Unit）、主記憶装置（RAM: Random Access Memory）などを有する公知又は専用のコンピュータに特別なプログラムが読み込まれて構成された特別な装置である。各端末は、例えば、中央演算処理装置の制御のもとで各処理を実行する。各端末に入力されたデータや各処理で得られたデータは、例えば、主記憶装置に格納され、主記憶装置に格納されたデータは必要に応じて中央演算処理装置へ読み出されて他の処理に利用される。各端末の各処理部は、少なくとも一部が集積回路等のハードウェアによって構成されていてもよい。 Each terminal of the master terminal 1 and the slave terminal 2 reads a special program into a known or dedicated computer having, for example, a central processing unit (CPU), a main storage (RAM), and the like. It is a special device constructed. Each terminal executes each process under the control of the central processing unit, for example. The data input to each terminal and the data obtained in each process are stored in, for example, a main storage device, and the data stored in the main storage device is read out to the central processing unit as needed and other data is stored. Used for processing. Each processing unit of each terminal may be configured at least partially by hardware such as an integrated circuit.

マスター端末１は、具体的には、例えばパーソナルコンピュータやスマートフォン、タブレット端末のような音声取得機能および無線通信機能を備えた情報処理装置である。スレーブ端末２は、具体的には、例えばパーソナルコンピュータやスマートフォン、タブレット端末のような音声取得機能および無線通信機能を備えた情報処理装置でもよいし、従来技術で用いられるヘッドセットやリモコンのような音声取得機能を備えた専用のデジタル機器でもよい。 Specifically, the master terminal 1 is an information processing apparatus having a voice acquisition function and a wireless communication function such as a personal computer, a smartphone, and a tablet terminal. Specifically, the slave terminal 2 may be an information processing apparatus having a voice acquisition function and a wireless communication function such as a personal computer, a smartphone, or a tablet terminal, or a headset or a remote control used in the conventional technology. A dedicated digital device having a voice acquisition function may be used.

図３を参照して、第一実施形態の通信方法の処理手続きを説明する。以下では、任意の１台のスレーブ端末２_kに対する処理手続きを説明するが、複数のスレーブ端末を利用する場合には各スレーブ端末に対して同様の処理が行われるものとする。 With reference to FIG. 3, the processing procedure of the communication method of the first embodiment will be described. In the following, the processing procedure for any one slave terminal 2 _k will be described. However, when a plurality of slave terminals are used, the same processing is performed for each slave terminal.

ステップＳ１０およびＳ２０において、マスター端末１の接続制御部１０とスレーブ端末２_kの接続制御部２０_kは、マスター端末１とスレーブ端末２_kの間の通信の接続を行う。接続の方法は、例えば、無線LANや有線LANなどの通信網を用いたIP（Internet Protocol）通信や、BluetoothやNFCなどを用いた近距離無線通信、Wi-Fi Direct（登録商標）やMultipeer Connectivityなどの端末同士が直接通信するピアツーピア通信などが考えられる。無線LANや有線LANを用いたIP通信の接続制御方法には、例えば、参考文献１に記載の公知技術で用いられている方法がある。
〔参考文献１〕日本電信電話株式会社、“手持ちのスマホでワイヤレスマイク機能を簡単に実現〜スマホがテレビ電話・テレビ会議の拡張マイクとして使える「振幅スペクトルビームフォーマ技術」を開発〜”、[online]、［平成27年2月24日検索］、インターネット<URL: http://www.ntt.co.jp/news2014/1401/140129a.html> In steps S10 and S20, the connection control unit 20 _k of the connection control unit 10 and the slave terminal 2 _k of the master terminal 1 performs a connection of a communication between a master terminal 1 and the slave terminal 2 _k. The connection method includes, for example, IP (Internet Protocol) communication using a communication network such as a wireless LAN or a wired LAN, short-range wireless communication using Bluetooth, NFC, etc., Wi-Fi Direct (registered trademark), Multipeer Connectivity Peer-to-peer communication in which terminals such as the above directly communicate with each other can be considered. As a connection control method of IP communication using a wireless LAN or a wired LAN, for example, there is a method used in a known technique described in Reference Document 1.
[Reference 1] NIPPON TELEGRAPH AND TELEPHONE CORPORATION, “Easy Wireless Mic Function with Handheld Smartphones-Smartphone Develops“ Amplitude Spectrum Beamformer Technology ”that can be Used as an Extended Microphone for Videophones and Video Conferencing”, [online ], [Search February 24, 2015], Internet <URL: http://www.ntt.co.jp/news2014/1401/140129a.html>

図４を参照して、NFCとWi-Fi Directを用いた接続制御のシーケンスについて述べる。まず、スレーブ端末２_kでWi-Fi Directの初期化処理を行う。スレーブ端末２_kの初期化処理では、自端末のMACアドレス（Media Access Control address）を取得する。同時に、マスター端末１でWi-Fi Directの初期化処理を行う。マスター端末１の初期化処理では、Wi-Fi Direct Groupを作成し、Wi-Fi Directのグループオーナーアドレスを生成する。 A connection control sequence using NFC and Wi-Fi Direct will be described with reference to FIG. First, Wi-Fi Direct initialization processing is performed on the slave terminal 2 _k . In the initialization process of the slave terminal 2 _k , the MAC address (Media Access Control address) of the own terminal is acquired. At the same time, the master terminal 1 performs Wi-Fi Direct initialization processing. In the initialization process of the master terminal 1, a Wi-Fi Direct Group is created and a Wi-Fi Direct group owner address is generated.

その後、マスター端末１と同一ネットワークに接続するスレーブ端末２₁,…,２_nのそれぞれについてMACアドレス等のネットワーク接続情報を取得する。ネットワーク接続情報の取得方法の一例を以下に記載する。マスター端末１とスレーブ端末２_kとの間でNFCなどの近距離無線通信を行う。この際、情報伝送の承認をユーザーに依頼する。承認なしで送信することもできる。ユーザーへの依頼方法は、例えば、マスター端末１の画面に情報伝送を行う等のダイアログを表示し承認ボタンを選択させる。承認後、NFCなどの近距離無線通信を用いて、マスター端末１からスレーブ端末２_kへ通信接続に必要なMACアドレス、グループオーナーアドレス、コーデック種別、ポート番号などを通知する。仮に接続が不可の場合には、例えば、「-1」などのエラーコードをスレーブ端末２_kに送信する。 Thereafter, network connection information such as a MAC address is acquired for each of the slave terminals 2 ₁ ,..., 2 _n connected to the same network as the master terminal 1. An example of a method for acquiring network connection information is described below. Short-distance wireless communication such as NFC is performed between the master terminal 1 and the slave terminal 2 _k . At this time, the user is requested to approve information transmission. It can also be sent without approval. As a method for requesting the user, for example, a dialog for transmitting information is displayed on the screen of the master terminal 1 and the approval button is selected. After approval by using a short-range wireless communication such as NFC, MAC address necessary for communication connection from the master terminal 1 to the slave terminal 2 _k, the group owner address, codec type, and notifies the port number etc.. If the connection is impossible, for example, an error code such as “−1” is transmitted to the slave terminal 2 _k .

スレーブ端末２_kでは、受信したマスター端末１のグループオーナーアドレスにWi-Fi Direct接続を行う。マスター端末１に対してはWi-Fi Directに用いるIPアドレスとMACアドレスを通知する。マスター端末１では、スレーブ端末２_kからIPアドレスとMACアドレスを受信するまで一定時間待機を行う。例えば、10秒間待機を行い、取得できなければ接続を拒否し、再度Wi-Fi Directの初期化を行う。 The slave terminal 2 _k performs Wi-Fi Direct connection to the received group owner address of the master terminal 1. The master terminal 1 is notified of the IP address and MAC address used for Wi-Fi Direct. The master terminal 1 waits for a predetermined time until receiving the IP address and the MAC address from the slave terminal 2 _k . For example, it waits for 10 seconds, and if it cannot be acquired, it refuses the connection and initializes Wi-Fi Direct again.

接続が確立するとスレーブ端末２_kはグループオーナーアドレスを取得する。スレーブ端末２_kはグループオーナーアドレスへWi-Fi Directで接続を行う。グループオーナーアドレスへ接続することで、Wi-Fi Directグループ接続デバイスリストの更新がマスター端末１およびスレーブ端末２_kへ通知される。マスター端末１は、例えばUDPなどの通信プロトコルを用いてスレーブ端末２_kからの情報を受信するための受信ポートを開く。受信ポート番号は任意に決定してよいが、例えば、18081番以降のポートを開く。スレーブ端末２_kはマスター端末１のWi-Fi Directグループオーナーアドレスの指定された受信ポートへUDP等の通信プロトコルを用いて音声パケットの送信を開始する。 When the connection is established, the slave terminal 2 _k acquires the group owner address. The slave terminal 2 _k connects to the group owner address using Wi-Fi Direct. By connecting to the group owner address, the update of the Wi-Fi Direct group connection device list is notified to the master terminal 1 and the slave terminal 2 _k . The master terminal 1 opens a reception port for receiving information from the slave terminal 2 _k using a communication protocol such as UDP. The reception port number may be determined arbitrarily, but, for example, ports 18081 and after are opened. The slave terminal 2 _k starts transmitting voice packets to the reception port designated by the Wi-Fi Direct group owner address of the master terminal 1 using a communication protocol such as UDP.

ステップＳ１２０において、マスター端末１のＡ／Ｄ変換部１２０は、マスター端末１に接続されたマイクＭ₀を用いて観測した音声をサンプリングしデジタルの観測信号を取得する。デジタルの観測信号は遅延バッファ処理部１２５₀および遅延量決定部１２４へ送られる。 In step S120, A / D conversion unit 120 of the master terminal 1, sampling the audio that is observed using the microphone M ₀ connected to the master terminal 1 acquires the digital observation signal. Digital observation signals are sent to a delay buffer unit 125 ₀ and the delay amount determining section 124.

ステップＳ２１において、スレーブ端末２_kのＡ／Ｄ変換部２１_kは、スレーブ端末２_kに接続されたマイクＭ_kを用いて観測した音声をサンプリングしデジタルの観測信号を取得する。デジタルの観測信号はエンコード処理部２２_kへ送られる。 In step S21, A / D conversion unit 21 _k of the slave terminal 2 _k samples a sound observed with microphone M _k associated with the slave terminal 2 _k to get a digital observation signal. The digital observation signal is sent to the encoding processing unit 22 _k .

ステップＳ２２において、スレーブ端末２_kのエンコード処理部２２_kは、Ａ／Ｄ変換部２１_kから観測信号を受け取り、その観測信号に対してコーデックをかけ音声圧縮を行う。コーデックには、例えばOpus, SILKなどを用いることができる。コーデックの情報とエンコードした圧縮音声信号はパケット送信部２３_kへ送られる。コーデックを用いず無圧縮のPCM（Pulse Code Modulation）信号を伝送する場合はエンコード処理部２２_kの処理を行わない。 In step S22, the encoding processing unit 22 _k of the slave terminal 2 _k receives the observation signal from the A / D conversion unit 21 _k , applies a codec to the observation signal, and performs voice compression. For example, Opus or SILK can be used as the codec. The codec information and the encoded compressed audio signal are sent to the packet transmitter 23 _k . When an uncompressed PCM (Pulse Code Modulation) signal is transmitted without using a codec, the encoding processing unit 22 _k is not processed.

ステップＳ２３において、スレーブ端末２_kのパケット送信部２３_kは、エンコード処理部２２_kで圧縮した観測信号とコーデック情報を受け取り、そのコーデック情報をパケットのヘッダーに、観測信号をパケットのペイロードに格納して、そのパケットをマスター端末１のパケット受信部１２１_kへ送信する。 In step S23, the packet transmission unit 23 _k of the slave terminal 2 _k receives the observation signal and codec information compressed by the encoding processing unit 22 _k , and stores the codec information in the packet header and the observation signal in the packet payload. Then, the packet is transmitted to the packet receiver 121 _k of the master terminal 1.

ステップＳ１２１において、マスター端末１のパケット受信部１２１_kは、スレーブ端末２_kのパケット送信部２３_kから送られたパケットを受信し、そのパケットのヘッダーに格納されたコーデック情報とペイロードに格納された観測信号を取り出す。取り出したコーデック情報と観測信号はデコード処理部１２２_kへ送られる。 In step S121, the packet receiving unit 121 _k of the master terminals 1 receives the packet sent from the packet transmitting unit 23 _k of the slave terminal 2 _k, stored in the stored codec information and payload header of the packet Take the observation signal. The extracted codec information and observation signal are sent to the decoding processing unit 122 _k .

ステップＳ１２２において、マスター端末１のデコード処理部１２２_kは、パケット受信部１２１_kから受け取ったコーデック情報を用いて観測信号に対してデコード処理を行う。無圧縮のPCM信号を受信した場合はデコード処理部１２２_kの処理は行わない。コーデックが行われている場合はコーデック情報に従って観測信号のデコードを行う。デコードされた観測信号はマイクバッファ処理部１２３_kへ送られる。 In step S122, the decoding processing unit 122 _k of the master terminal 1 performs decoding processing on the observation signal using the codec information received from the packet receiving unit 121 _k . When an uncompressed PCM signal is received, the decoding processor 122 _k is not processed. When the codec is performed, the observation signal is decoded according to the codec information. The decoded observation signal is sent to the microphone buffer processing unit 123 _k .

マイクバッファ処理部１２３_kは、デコード処理部１２２_kから観測信号を受け取り、以後の処理の基準となる信号長分の観測信号のバッファリングを行う。バッファ処理には固定長のバッファリングを行ってもよいし、動的遅延バッファのように後述のパケット伝送時間に基づいてバッファ長を動的に変更してもよい。バッファした観測信号は遅延バッファ処理部１２５_kおよび遅延量決定部１２４へ送られる。 The microphone buffer processing unit 123 _k receives the observation signal from the decoding processing unit 122 _k and performs buffering of the observation signal for a signal length serving as a reference for subsequent processing. For buffer processing, fixed-length buffering may be performed, or the buffer length may be dynamically changed based on a packet transmission time described later, such as a dynamic delay buffer. The buffered observation signal is sent to the delay buffer processing unit 125 _k and the delay amount determining unit 124.

ステップＳ１２４において、マスター端末１の遅延量決定部１２４は、各端末から取得した観測信号を同期するために各観測信号に与える遅延量を決定する。マスター端末１とスレーブ端末２_kとはパケット通信による通信網を介して観測信号の伝送を行っているため、パケット伝送時間の遅延を考慮しなければならない。マスター端末１とスレーブ端末２_kとの距離が大きく離れており、話者の音声をいくつかのスレーブ端末２_kで観測できない場合、互いに観測された音声を手がかりに遅延を揃えることはできない。そこで、スレーブ端末２_kごとにマスター端末１とスレーブ端末２_kの間のパケット伝送時間を計測し、そのパケット伝送時間に基づいて各端末で取得した観測信号の遅延を揃えるための遅延量を求める。 In step S124, the delay amount determination unit 124 of the master terminal 1 determines a delay amount to be given to each observation signal in order to synchronize the observation signals acquired from each terminal. Since the master terminal 1 and the slave terminal 2 _k transmit observation signals via a communication network using packet communication, it is necessary to consider a delay in packet transmission time. If the distance between the master terminal 1 and the slave terminal 2 _k is far away and the speaker's voice cannot be observed by some slave terminals 2 _k , the delays cannot be made uniform based on the mutually observed voices. Therefore, the packet transmission time between the master terminal 1 and the slave terminal 2 _k is measured for each slave terminal 2 _k , and a delay amount for aligning the delay of the observation signal acquired at each terminal is obtained based on the packet transmission time. .

マスター端末１とスレーブ端末２_kの間のパケット伝送時間は、ラウンドトリップ時間（RTT: Round-Trip Time）の計測により求める。マスター端末１はラウンドトリップ時間を計測するためのパケット（以下、RTT計測パケットと呼ぶ。）をスレーブ端末２_kへ向けて送信し、スレーブ端末２_kはRTT計測パケットを受信した後ラウンドトリップ時間を計測するための返答パケット（以下、RTT計測返答パケットと呼ぶ。）をマスター端末１へ送信する。マスター端末１では、受信したRTT計測返答パケットの受信時間とRTT計測パケットの送信時間との差を計算し、マスター端末１とスレーブ端末２_kとのラウンドトリップ時間T_kとする。ラウンドトリップ時間は往復の伝送時間となるため、スレーブ端末２_kからマスター端末１へのパケット伝送時間はt_k(n)=T_k/2となる。ここで、t_k(n)はスレーブ端末２_kからのパケットが伝送されるのに要した時間を表し、nはパケット到達の順番を表す。ラウンドトリップ時間の計測は任意の時間間隔で定期的に行う。例えば500ミリ秒単位で計測を行う。 The packet transmission time between the master terminal 1 and the slave terminal 2 _k is _obtained by measuring a round trip time (RTT). The master terminal 1 transmits a packet for measuring the round trip time (hereinafter referred to as an RTT measurement packet) to the slave terminal 2 _k , and the slave terminal 2 _k receives the round trip time after receiving the RTT measurement packet. A response packet for measurement (hereinafter referred to as an RTT measurement response packet) is transmitted to the master terminal 1. The master terminal 1 calculates the difference between the reception time of the received RTT measurement response packet and the transmission time of the RTT measurement packet, and sets it as the round trip time T _k between the master terminal 1 and the slave terminal 2 _k . Since the round trip time is a round trip transmission time, the packet transmission time from the slave terminal 2 _k to the master terminal 1 is t _k (n) = T _k / 2. Here, t _k (n) represents the time required for transmitting the packet from the slave terminal 2 _k , and n represents the order of arrival of the packet. Round trip time is measured periodically at arbitrary time intervals. For example, measurement is performed in units of 500 milliseconds.

次に、計測したパケット伝送時間t_k(n)を用いて各スレーブ端末２_kから受信した観測信号に対する遅延量を計算する。通信網を用いてパケットを伝送する場合、伝送元から伝送先までの間でパケット伝送時間にゆらぎが生じる。このゆらぎを許容し遅延量を決定する。パケット伝送時間のゆらぎがガウス分布に従っていると仮定し、パケット伝送時間から頑健に遅延時間を推定する。スレーブ端末２_kからマスター端末１へのパケット伝送にかかる真の時間をτ_0kとし、パケット伝送時間のゆらぎをノイズε(n)とする。このノイズε(n)が分散σ0のガウス分布（ε(n)=N(n|0,σ0)）に従うとすると、パケット伝送時間t_k(n)はt_k(n)=τ_0k+ε(n)で表すことができる。すなわち、パケット伝送時間のゆらぎとは、計測したパケット伝送時間と真のパケット伝送時間の差分とも言える。真のパケット伝送時間τ_0kの最尤推定値は、実際に計測されたパケット伝送時間t_k(n)から算術平均E[t_k(n)]として求めることができる。 Next, the delay amount for the observation signal received from each slave terminal 2 _k is calculated using the measured packet transmission time t _k (n). When a packet is transmitted using a communication network, the packet transmission time fluctuates between the transmission source and the transmission destination. The amount of delay is determined by allowing this fluctuation. Assuming that the fluctuation of the packet transmission time follows a Gaussian distribution, the delay time is estimated robustly from the packet transmission time. The true time required for packet transmission from the slave terminal 2 _k to the master terminal 1 is τ _0k, and the fluctuation of the packet transmission time is noise ε (n). If this noise ε (n) follows a Gaussian distribution with variance σ0 (ε (n) = N (n | 0, σ0)), the packet transmission time t _k (n) is t _k (n) = τ _0k + ε It can be represented by (n). That is, the fluctuation of the packet transmission time can be said to be a difference between the measured packet transmission time and the true packet transmission time. The maximum likelihood estimate of the true packet transmission time τ _0k can be obtained as the arithmetic average E [t _k (n)] from the actually measured packet transmission time t _k (n).

しかしながら、パケット伝送時間は伝送経路や無線LANルーターなどのネットワーク機器がバッファリングを行うなど様々な要因からパケット伝送時間が大幅に遅くなってしまうことがある。この伝送時間の大幅な遅延は算術平均E[t_k(n)]に影響を与えてしまう。そこで、外れ値を除去して推定値を計算する。パケット伝送時間のゆらぎがガウス分布に従っていると仮定し、次式により、パケット伝送時間t_k(n)から分布の平均M_T_kと分散V_T_kとを求める。 However, the packet transmission time may be significantly delayed due to various factors such as buffering of network devices such as transmission paths and wireless LAN routers. This significant delay in transmission time affects the arithmetic mean E [t _k (n)]. Therefore, the outlier is removed and the estimated value is calculated. Assuming that the fluctuation of the packet transmission time follows a Gaussian distribution, the average M_T _k and the variance V_T _{k of the} distribution are _obtained from the packet transmission time t _k (n) by the following equation.

ここで、E[・]は一定パケット数で平均値を算出する期待値演算処理であり、τ_kはスレーブ端末２_kの音声遅延量である。音声遅延量τ_kは後述の音声遅延量推定部１２６_kで求めるものである。τ_kが一度も更新されていない場合には、τ_k=0と初期値を与える。 Here, E [•] is an expected value calculation process for calculating an average value with a fixed number of packets, and τ _k is an audio delay amount of the slave terminal 2 _k . The audio delay amount τ _k is _obtained by an audio delay amount estimating unit 126 _k described later. If τ _k has never been updated, τ _k = 0 and an initial value are given.

求めた平均M_T_kと分散V_T_kとから分布の外れ値を省いてサンプリングを行うためパケット伝送時間の選別を行う。まず、パケット伝送時間t_k(n)を観測したときの起こり得る確率qを次式により計算する。 The packet transmission time is selected in order to perform sampling without the outlier of the distribution from the obtained average M_T _k and variance V_T _k . First, a probable probability q when the packet transmission time t _k (n) is observed is calculated by the following equation.

ここで、G(・)はガウス分布を示す。確率qの両側検定で５％の棄却域に入るパケット伝送時間t_k(n)に対しては到達時間の平均M_T_kと分散V_T_kの更新は行わない。 Here, G (•) indicates a Gaussian distribution. The average M_T _k and variance V_T _k of the arrival times are not updated for the packet transmission time t _k (n) that falls within the 5% rejection area by the two-sided test of the probability q.

もしくは、平均M_T_kと分散V_T_kとから次式により評価値zを計算する。 Alternatively, the evaluation value z is calculated from the average M_T _k and the variance V_T _{k according} to the following equation.

ここで、Nはこれまでに平均値に採用したパケット伝送時間の数である。βが１のときは観測数Nに従い、平均M_t_kからの誤差の許容範囲が狭くなり、ある到達時刻に収束していく。βの値をβ=1/Nとし分母を√V_T_kとすることで、観測数が増加しても分散値が変化しないようにして、単純な平均値と分散値を求めて外れ値を取り除くこともできる。βは0<β<1の範囲の値を取る。評価値zが閾値rを超える場合に到達時間を棄却する。例えばr=1.96とし、z<-rもしくはz>rとなる場合、n番目のパケット伝送時間t_k(n)を棄却する。または、q=G(t_k(n)|M_T_k, V_T_k)で求めたパケット伝送時間t_k(n)の確率値qから閾値を決めてもよい。例えば分散値が±10ミリ秒と想定し、r=0.0058とし、q<rとなるパケット伝送時間t_k(n)を除き平均M_t_kと分散V_T_kの更新を行う。 Here, N is the number of packet transmission times employed as an average value so far. When β is 1 in accordance with the number of observations N, margin of error from the mean M_t _k is narrowed, it converges to a certain arrival time. By setting the β value to β = 1 / N and the denominator to √V_T _k , the variance value does not change even if the number of observations increases, and the outliers are removed by calculating the simple average value and variance value. You can also. β takes a value in the range of 0 <β <1. When the evaluation value z exceeds the threshold value r, the arrival time is rejected. For example, when r = 1.96 and z <−r or z> r, the n-th packet transmission time t _k (n) is rejected. Alternatively, the threshold may be determined from the probability value q of the packet transmission time t _k (n) obtained by q = G (t _k (n) | M_T _k , V_T _k ). For example, assuming that the dispersion value is ± 10 milliseconds, r = 0.0058, and the average M_t _k and the dispersion V_T _k are updated except for the packet transmission time t _k (n) _where q <r.

これにより、大きなゆらぎがあるパケット伝送時間に対しても安定した平均と分散を求めることができるとともに、パケット伝送時間のゆらぎに対して頑健に遅延量を求めることができるため、収録する音の不連続性を抑えることができる。 This makes it possible to obtain a stable average and variance even for packet transmission times with large fluctuations, and to obtain a delay amount robustly against fluctuations in packet transmission times, so that the recorded sound is not bad. Continuity can be suppressed.

遅延量決定部１２４は、外れ値を除外した後のパケット伝送時間t_kを用いて平均M_t_kと分散V_T_kを更新する。更新は一定時間おきに計算を行うか、逐次的に計算を行うことが可能である。観測したパケット伝送時間を記録し、パケット伝送時間を観測する度に平均値と分散値を計算してもよいし、逐次計算を行い更新してもよい。逐次的に計算を行う際の更新式は以下を用いる。 The delay amount determination unit 124 updates the average M_t _k and the variance V_T _k using the packet transmission time t _k after the outlier is excluded. The update can be performed at regular time intervals or sequentially. The observed packet transmission time may be recorded, and the average value and variance value may be calculated each time the packet transmission time is observed, or may be updated by sequential calculation. The following is used as an update formula when performing the calculation sequentially.

ここでαは0以上1未満の正の実数値であり、例えば0.1をとる。M'_T_k, V'_T_kはそれぞれM_T_k, V_T_kの更新値である。観測したパケット伝送時間の平均M_T_kは遅延バッファ処理部１２５_kにおいてスレーブ端末２_kからの観測信号に与えられる遅延量となる。 Here, α is a positive real value not less than 0 and less than 1, for example, 0.1. M′_T _k and V′_T _k are updated values of M_T _k and V_T _k , respectively. The average M_T _k of the observed packet transmission time is a delay amount given to the observation signal from the slave terminal 2 _k in the delay buffer processing unit 125 _k .

遅延量決定部１２４は、音声遅延量推定部１２６_kから音声遅延量τ_kを受け取ると、パケット伝送時間の平均M_T_kを更新する。更新後の平均M'_T_kは遅延バッファ処理部１２５_kへ送られる。平均M_T_kの更新は、パケット伝送時間の平均M_T_kの最大値をM_maxとし、M_maxとM_T_kの差分をそれぞれ計算し、さらにτ_kにより補正を行う。これにより、最大遅延のスレーブ端末は遅延０とし、その他の遅延の少ないスレーブ端末は遅延の差分が与えられるため、すべてのスレーブ端末の遅延を揃えることができる。平均M_T_kの更新は次式により行う。 When receiving the audio delay amount τ _k from the audio delay amount estimating unit 126 _k , the delay amount determining unit 124 updates the average packet transmission time M_T _k . The updated average M′_T _k is sent to the delay buffer processing unit 125 _k . Updating of the average M_T _k is the maximum value of the average M_T _k packet transmission time is M _max, respectively calculated the difference of M _max and M_T _k, it corrects by further tau _k. As a result, the slave terminal with the maximum delay is set to delay 0, and the slave terminals with less delay are given delay differences, so that the delays of all slave terminals can be made uniform. The average M_T _k is updated by the following equation.

遅延バッファ処理部１２５_kは、遅延量決定部１２４から受け取ったパケット伝送時間の平均M'_T_kに対応する遅延をスレーブ端末２_kからの観測信号に与える。遅延を与えた観測信号は音声遅延量推定部１２６_kへ送られる。 The delay buffer processing unit 125 _k gives to the observation signal from the slave terminal 2 _k a delay corresponding to the average M′_T _k of packet transmission times received from the delay amount determining unit 124. The observation signal given the delay is sent to the speech delay estimation unit 126 _k .

ステップＳ１２６において、マスター端末１の音声遅延量推定部１２６_kは、観測信号に含まれる音声の相対的なずれを示す音声遅延量を推定する。各端末からの観測信号に共通の音声が入る場合、音声の波形情報を用いて端末間に伝送する音声の遅延量を計算し、その音声遅延量を用いてパケット伝送時間の再修正を行う。この処理を行うことで、パケット伝送遅延の誤りの修正ができる。また、複数のスレーブ端末２やマスター端末１のマイクに同一の音が入り遅延を修正せずミキシングしてしまうと音が二重に聴こえてしまう問題があるが、その問題を修正することができる。 In step S126, the audio delay amount estimation unit 126 _k of the master terminal 1 estimates an audio delay amount indicating a relative shift of the audio included in the observation signal. When a common voice is included in the observation signal from each terminal, the delay amount of the voice transmitted between the terminals is calculated using the voice waveform information, and the packet transmission time is re-corrected using the voice delay amount. By performing this process, the packet transmission delay error can be corrected. In addition, if the same sound enters the microphones of the plurality of slave terminals 2 and the master terminal 1 and mixing without correcting the delay, there is a problem that the sound can be heard twice, but the problem can be corrected. .

音声遅延量推定部１２６_kは、他の観測信号と同一の音声が混入しているか否かを判定するために、各観測信号にそれぞれ信号検出処理を行う。各スレーブ端末２およびマスター端末１からの観測信号をx_k(t)とする。ここで、tはサンプル点の番号を表す。信号検出には、VAD手法を用いてもよいが、ここでは、ノイズレベル推定と閾値を用いた単純な信号処理手法を用いた場合を説明する。まず、数十ミリ秒程度で観測信号に対してスムージング処理を行う。スムージング処理は次式により行う。 The voice delay amount estimation unit 126 _k performs signal detection processing on each observation signal in order to determine whether or not the same voice as other observation signals is mixed. The observation signal from each slave terminal 2 and master terminal 1 is assumed to be x _k (t). Here, t represents a sample point number. For signal detection, the VAD method may be used, but here, a case where a simple signal processing method using noise level estimation and a threshold is used will be described. First, smoothing processing is performed on the observation signal in about several tens of milliseconds. The smoothing process is performed according to the following equation.

ここで、βはβ<1となる定数である。時定数=処理間隔/(1-β)とすると、ノイズ推定に利用するスムージングの信号は時定数150ミリ秒とする。また信号比較を行うために時定数の短い40ミリ秒の信号を用意する。次に、雑音信号パワーを以下のように更新する。 Here, β is a constant that satisfies β <1. If time constant = processing interval / (1-β), the smoothing signal used for noise estimation is set to 150 milliseconds. A 40 ms signal with a short time constant is prepared for signal comparison. Next, the noise signal power is updated as follows.

このN(t)を定数α倍し、それを雑音の閾値とする。定数αは例えば2.5とする。この閾値を時定数40ミリ秒の信号が上回れば音声信号観測時刻tだと判断する。２個未満のスレーブ端末２で音声信号を観測したと判定されなければ、音声遅延量τ_kの推定は行わない。２個以上のスレーブ端末２で音声信号を観測したと判定された場合、判定された音声信号間で音声遅延量τ_kを推定する。 This N (t) is multiplied by a constant α to make it a noise threshold. The constant α is, for example, 2.5. If a signal with a time constant of 40 milliseconds exceeds this threshold, it is determined that the audio signal observation time t is reached. If it is not determined that less than two slave terminals 2 have observed an audio signal, the audio delay amount τ _k is not estimated. If it is determined that two or more slave terminals 2 have observed audio signals, the audio delay amount τ _k is estimated between the determined audio signals.

１．マスター端末１で音声信号を観測したと判定した場合、音声遅延量の推定にはマスター端末１の観測信号を基準信号x_kM(n)とし、音声信号を観測したと判定したスレーブ端末２_kの観測信号x_k(n)との相互相関を次式により求める。相互相関が最大となるサンプル数mを、基準信号としたマスター端末１と、比較対象としたスレーブ端末２_kとの相対的な音の伝送時間ずれとする。 1. When it is determined that the audio signal is observed at the master terminal 1, the observation signal of the master terminal 1 is used as the reference signal x _kM (n) for estimation of the audio delay amount, and the slave terminal 2 _k determined that the audio signal is observed. The cross correlation with the observed signal x _k (n) is _obtained by the following equation. The number m of samples with the maximum cross-correlation is _defined as a relative sound transmission time difference between the master terminal 1 as a reference signal and the slave terminal 2 _k as a comparison target.

求めた音声遅延量τ_kを用いてスレーブ端末２_kの観測信号に与える遅延量M_T_kを修正する。遅延量の修正は遅延量決定部１２４で行うため、音声遅延量τ_kを遅延量決定部１２４に渡す。 Correcting the delay amount M_T _k giving the observed signal of the slave terminal 2 _k by using the amount of sound delay tau _k obtained. Since the delay amount is corrected by the delay amount determination unit 124, the audio delay amount τ _{k is transferred} to the delay amount determination unit 124.

２．マスター端末１で音声信号を観測せず、複数のスレーブ端末２で音声信号を観測したと判定した場合、音声遅延量の推定には、音声信号を観測した複数のスレーブ端末２のうち任意の１つを基準信号として選択する。選択したスレーブ端末２をスレーブ端末２_k'とする。このスレーブ端末２_k'の観測信号x_k'(n)と、他の音声信号を観測したスレーブ端末２_kの観測信号x_k(n)との相互相関を次式により求める。相互相関が最大となるサンプル数mを、基準信号としたスレーブ端末２_k'と、比較対象としたスレーブ端末２_kとの相対的な音の伝送時間ずれとする。 2. When it is determined that the audio signal is not observed at the master terminal 1 and the audio signals are observed at the plurality of slave terminals 2, any one of the plurality of slave terminals 2 from which the audio signal is observed is used for estimating the audio delay amount. Is selected as the reference signal. The selected slave terminal 2 is _defined as a slave terminal 2 _{k ′} . The _'observed signal x _k' of the slave terminal 2 _k (n), and the cross-correlation between the observed signal x _k of the slave terminal 2 _k, which measured other audio signal (n) calculated by the following equation. The number m of samples with the maximum cross-correlation is _defined as a relative sound transmission time difference between the slave terminal 2 _{k ′ used} as the reference signal and the slave terminal 2 _{k used} as the comparison target.

ステップＳ１２７において、マスター端末１の話者強調部１２７は非同期分散マイクロホンを用いた音声強調手法を用いる。音声強調処理は、例えば、参考文献２に記載された非同期マイクロホンアレイ処理を利用することができる。また、参考文献３に記載されるように、各PCM信号までの到達時間差を揃え、特定方向のみの音を強調するマイクロホンアレイ処理を行ってもよいし、参考文献４に記載されるように、特定話者だけの音が残るように周波数スペクトル上で雑音のスペクトル成分を差し引くスペクトルサブトラクションを行い、雑音抑圧を行ってもよい。話者強調された音声信号はミキシング部１２９へ送られる。
〔参考文献２〕加古達也、小林和則、大室仲、“非同期分散マイクアレーのための振幅スペクトルビームフォーマの提案”、日本音響学会2013年春季研究発表会講演論文集、1-P-5、2013年
〔参考文献３〕浅野太著、“音のアレイ信号処理”、コロナ社、2011年
〔参考文献４〕向井良等、“非定常スペクトルサブトラクションによる音源分離後の残留雑音除去”、日本音響学会秋季研究発表会、2010年
ステップＳ１２８において、マスター端末１のノイズ除去部１２８_kは、遅延バッファ部１２５_kから遅延後の観測信号を受け取り、その観測信号に対して定常雑音のノイズリダクションを行う。ノイズリダクションは、例えば参考文献５に記載されたスペクトルサブトラクションを用いて実現することができる。ノイズ除去した観測信号はミキシング部１２９へ送られる。
〔参考文献５〕Steven Boll, “Suppression of Acoustic Noise in Speech Using Spectral Subtraction”, IEEE Trans. Acoust. Speech and Signal Processing, Vol. ASSP-27, pp.113-120, 1979 In step S127, the speaker emphasizing unit 127 of the master terminal 1 uses a voice emphasis method using an asynchronous distributed microphone. As the speech enhancement processing, for example, the asynchronous microphone array processing described in Reference 2 can be used. Further, as described in Reference 3, microphone array processing may be performed in which arrival time differences to each PCM signal are aligned and sound in only a specific direction is emphasized. As described in Reference 4, Noise suppression may be performed by performing spectral subtraction by subtracting the noise spectrum component on the frequency spectrum so that the sound of only a specific speaker remains. The voice signal emphasized by the speaker is sent to the mixing unit 129.
[Reference 2] Tatsuya Kako, Kazunori Kobayashi, Nakamichi Omuro, “Proposal of Amplitude Spectrum Beamformer for Asynchronous Dispersive Microphone Array”, Proceedings of the 2013 Spring Conference of the Acoustical Society of Japan, 1-P-5, 2013 [Reference 3] Tadashi Asano, “Sound Array Signal Processing”, Corona, 2011 [Reference 4] Ryo Mukai, et al. “Residual noise removal after non-stationary spectral subtraction”, Nippon Acoustics Academic Society Autumn Meeting, 2010 In step S128, the noise removal unit 128 _k of the master terminal 1 receives the delayed observation signal from the delay buffer unit 125 _k and performs noise reduction of stationary noise on the observation signal. . Noise reduction can be realized using, for example, spectral subtraction described in Reference 5. The noise-removed observation signal is sent to the mixing unit 129.
[Reference 5] Steven Boll, “Suppression of Acoustic Noise in Speech Using Spectral Subtraction”, IEEE Trans. Acoust. Speech and Signal Processing, Vol. ASSP-27, pp. 113-120, 1979

ステップＳ１２９において、マスター端末１のミキシング部１２９は、ノイズ除去部１２８₀,…,１２８_nおよび話者強調部１２７から音声信号を受け取り、ＧＵＩ制御部１０から各信号のゲインを受け取る。受け取ったゲインに基づいて各音声信号の音量を増幅する係数A_kを乗算し、乗算後の音声信号からパワーを計算する。パワーは乗算後の音声信号の二乗値の単位区間の平均とする。単位区間は例えば500ミリ秒とし、500ミリ秒ごとにパワーを算出する。算出したパワーの値はＧＵＩ制御部１０へ送られる。その後、乗算後の音声信号をミキシングする。ミキシングは各音声信号の和を算出しミキシング信号を取得する。ミキシング信号にＧＵＩ制御部１０から取得した係数A_mixを乗算して処理後音声信号を得、この処理後音声信号を出力する。出力先にはテレビ会議システムや音声認識システム、動画撮影などの音声入力などを接続し、所望のサービスを提供することができる。 In step S129, the mixing unit 129 of the master terminal 1 receives the audio signals from the noise removing units 128 ₀ ,..., 128 _n and the speaker emphasizing unit 127, and receives the gain of each signal from the GUI control unit 10. Factor to amplify the sound volume of each audio signal based on the received gain multiplied by A _k, to calculate the power from the speech signal after the multiplication. The power is the average of the unit interval of the square value of the voice signal after multiplication. The unit interval is, for example, 500 milliseconds, and the power is calculated every 500 milliseconds. The calculated power value is sent to the GUI control unit 10. Thereafter, the multiplied audio signal is mixed. Mixing calculates the sum of each audio signal to obtain a mixing signal. The mixing signal is multiplied by the coefficient A _mix acquired from the GUI control unit 10 to obtain a processed audio signal, and the processed audio signal is output. The output destination can be connected to a video conference system, a voice recognition system, a voice input such as moving image shooting, and the like to provide a desired service.

ＧＵＩ制御部１０ではミキシング処理の音量調整を行う。ＧＵＩ制御部１０の表示イメージを図５に示す。ＧＵＩ制御部１０は、各音声信号のゲイン調整用のスライドバー１０１、各音声信号のミュート制御用のボタン１０２、各スレーブ端末との接続状態を色等で知らせるボタン、スレーブ端末のマイクから取得しているゲイン値を示すレベルメーター１０３、録音の制御を行うボタン１０４、スレーブ端末の接続状態などを表示するステータス表示エリア１０５を有する。レベルメーターのバーの値は、ミキシング部１２９から受け取ったパワーの値を用いて決定する。ミキシング部１２９で各音声信号に与えるゲインA_kの値は、ゲイン調整のスライドバー１０１から取得する。ゲインの値は連続値を取得してもよいし、離散値を取得してもよい。例えば15段階の離散値とすると、最大値を+7とし、ゲインの値は21dBに対応する11.2倍とする。中間値は0とし、0dBに対応する1.0倍を取る。最低値は-7とし、-∞dBに対応する0.0倍とする。また、ミュートボタンがオンのときはゲインA_kに0.0倍を与え、ミュートボタンがオフのときはゲインA_kにスライドバーから取得したゲインを用いる。このゲインA_kをミキシング部１２９に受け渡す。また、ミキシングした後の処理後音声信号に対してもスライドバーからゲイン値A_mixを受け渡す。 The GUI control unit 10 adjusts the volume of the mixing process. A display image of the GUI control unit 10 is shown in FIG. The GUI control unit 10 acquires from a slide bar 101 for gain adjustment of each audio signal, a button 102 for mute control of each audio signal, a button for notifying the connection state with each slave terminal by color, etc., and a microphone of the slave terminal. A level meter 103 indicating a gain value, a button 104 for controlling recording, and a status display area 105 for displaying a connection state of the slave terminal. The value of the bar of the level meter is determined using the power value received from the mixing unit 129. The value of the gain _Ak given to each audio signal by the mixing unit 129 is acquired from the gain adjustment slide bar 101. The gain value may be a continuous value or a discrete value. For example, assuming 15 discrete values, the maximum value is +7, and the gain value is 11.2 times corresponding to 21 dB. The intermediate value is 0 and takes 1.0 times corresponding to 0 dB. The minimum value is -7, and it is 0.0 times corresponding to -∞dB. Moreover, given the 0.0 times the gain A _k when the mute button is on, the mute button is used the gain obtained from the slide bar to the gain A _k When off. Passes the gain A _k to the mixing unit 129. Also, the gain value A _mix is passed from the slide bar to the processed audio signal after mixing.

［第二実施形態］
第二実施形態は、マスター端末が動画撮影機能を有しており、複数のスレーブ端末を用いて収音した遠方の音声を、マスター端末で撮影した動画に付加して出力する通信システムである。本形態のスレーブ端末は、例えばスマートフォンとする。スマートフォンで取得した音声をマスター端末に伝送し、マスター端末では取得した音声信号から目的の話者の音声を強調する音声強調処理を行う。また、各マイクで取得した音声信号を手元のスマートフォンを用いて任意にユーザーがミキシングすることができる機能を持つ。 [Second Embodiment]
The second embodiment is a communication system in which a master terminal has a moving image shooting function, and distant sounds picked up using a plurality of slave terminals are added to a moving image shot by the master terminal and output. The slave terminal of this embodiment is a smartphone, for example. The voice acquired by the smartphone is transmitted to the master terminal, and the master terminal performs voice enhancement processing for enhancing the voice of the target speaker from the acquired voice signal. Moreover, it has the function which a user can mix arbitrarily the audio | voice signal acquired with each microphone using the smart phone at hand.

本形態のマスター端末３は、図６に例示するように、マイクＭ₀、接続制御部１０、ＧＵＩ制御部１１、および音声処理部１２に加えて、ビデオカメラＶ、動画処理部１３、バッファ処理部１４、および映像出力部１５を含む。ビデオカメラＶは各種の受光素子を備え映像を取得することが可能な機器であり、図６に示すようにマスター端末１に内蔵されていてもよいし、マスター端末１へ各種のインターフェースを介して接続されたウェブカメラのような周辺機器であってもよい。 As illustrated in FIG. 6, the master terminal 3 of the present embodiment includes a video camera V, a moving image processing unit 13, a buffer process, in addition to the microphone M ₀ , the connection control unit 10, the GUI control unit 11, and the audio processing unit 12. Unit 14 and video output unit 15. The video camera V is a device that includes various light receiving elements and can acquire an image, and may be incorporated in the master terminal 1 as illustrated in FIG. 6 or may be connected to the master terminal 1 via various interfaces. It may be a peripheral device such as a connected webcam.

動画処理部１３は、ビデオカメラＶで取得した映像をサンプリングしてデジタルの映像信号へ変換する。取得した映像信号はバッファ処理部１４へ送られる。 The moving image processing unit 13 samples the video acquired by the video camera V and converts it into a digital video signal. The acquired video signal is sent to the buffer processing unit 14.

バッファ処理部１４は、音声処理部１２の遅延量決定部１２４が計算する遅延量を用いて、動画処理部１３が出力する映像信号に遅延を与える。バッファ処理部１４は、音声処理部１２の遅延量決定部１２４から各スレーブ端末２₁,…,２_nの遅延量M_T₁,…,M_T_nを受け取り、その遅延量M_T₁,…,M_T_nに基づいて映像信号のフレームをバッファして映像信号に遅延を与える。遅延は各スレーブ端末に対する遅延量M_T₀,…,M_T_nのうち最も大きい値を与える。これにより映像信号と音声信号のフレームずれが無くなる。遅延させた映像信号は映像出力部１５へ送られる。 The buffer processing unit 14 gives a delay to the video signal output from the moving image processing unit 13 using the delay amount calculated by the delay amount determination unit 124 of the audio processing unit 12. Buffering unit 14, the delay amount determining section 124 each slave terminal 2 ₁ from the audio processing section 12, ..., the delay amount M_T ₁ of 2 _n, ..., receive M_T _n, the delay amount M_T _1, ..., M_T _n Based on the above, the frame of the video signal is buffered to delay the video signal. The delay gives the largest value among the delay amounts M_T ₀ ,..., M_T _n for each slave terminal. This eliminates the frame shift between the video signal and the audio signal. The delayed video signal is sent to the video output unit 15.

映像出力部１５は、バッファ処理部１４の出力する遅延後映像信号と音声処理部１１の出力する処理後音声信号を受け取り、遅延後映像信号に処理後音声信号を付与して音声付映像信号を生成する。音声付映像信号は後段のサービスに合わせて適切なコーデックで符号化して出力する。例えば、動画配信サービスに出力する場合は、映像はMP4（MPEG-4）、音声はAAC（Advanced Audio Coding）でコーデックを行う等の動画処理を行う。 The video output unit 15 receives the delayed video signal output from the buffer processing unit 14 and the processed audio signal output from the audio processing unit 11, and adds the processed audio signal to the delayed video signal to obtain the video signal with audio. Generate. The video signal with audio is encoded with an appropriate codec according to the subsequent service and output. For example, when outputting to a moving image distribution service, moving image processing such as codec is performed with MP4 (MPEG-4) for video and AAC (Advanced Audio Coding) for audio.

本形態のＧＵＩ制御部１０には、図７に示すように、ビデオカメラＶの取得している映像表示エリア１０６、各音声信号のゲイン調整用のスライドバー１０１、各音声信号のミュート制御用のボタン１０２、スレーブ端末との接続状態を色等で知らせるボタン、スレーブ端末のマイクから取得しているゲイン値を示すレベルメーター１０３、録音・録画の制御を行うボタン１０４、スレーブ端末の接続状態などを表示するステータス表示エリア１０５を有する。レベルメーターのバーの値は、ミキシング部１２９から受け取ったパワーの値を用いて決定する。ミキシング部１２９で各音声信号に与えるゲインA_kの値は、ゲイン調整のスライドバーから取得する。ゲインの値は連続値を取得してもよいし、離散値を取得してもよい。例えば15段階の離散値とすると、最大値を+7とし、ゲインの値は21dBに対応する11.2倍とする。中間値は0とし、0dBに対応する1.0倍を取る。最低値は-7とし、-∞dBに対応する0.0倍とする。また、ミュートボタンがオンのときはゲインA_kに0.0倍を与え、ミュートボタンがオフのときはゲインA_kにスライドバーから取得したゲインを用いる。このゲインA_kをミキシング部１２９に受け渡す。また、ミキシングした後の処理後音声信号に対してもスライドバーからゲイン値A_mixを受け渡す。 As shown in FIG. 7, the GUI control unit 10 of this embodiment includes a video display area 106 acquired by the video camera V, a slide bar 101 for adjusting the gain of each audio signal, and a mute control for each audio signal. Button 102, a button for notifying the connection state with the slave terminal by color or the like, a level meter 103 indicating a gain value acquired from the microphone of the slave terminal, a button 104 for controlling recording / recording, a connection state of the slave terminal, etc. It has a status display area 105 for displaying. The value of the bar of the level meter is determined using the power value received from the mixing unit 129. The value of the gain _Ak given to each audio signal by the mixing unit 129 is obtained from a gain adjustment slide bar. The gain value may be a continuous value or a discrete value. For example, assuming 15 discrete values, the maximum value is +7, and the gain value is 11.2 times corresponding to 21 dB. The intermediate value is 0 and takes 1.0 times corresponding to 0 dB. The minimum value is -7, and it is 0.0 times corresponding to -∞dB. Moreover, given the 0.0 times the gain A _k when the mute button is on, the mute button is used the gain obtained from the slide bar to the gain A _k When off. Passes the gain A _k to the mixing unit 129. Also, the gain value A _mix is passed from the slide bar to the processed audio signal after mixing.

この発明は上述の実施形態に限定されるものではなく、この発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。上記実施形態において説明した各種の処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。 The present invention is not limited to the above-described embodiment, and it goes without saying that modifications can be made as appropriate without departing from the spirit of the present invention. The various processes described in the above embodiment may be executed not only in time series according to the order of description, but also in parallel or individually as required by the processing capability of the apparatus that executes the processes or as necessary.

［プログラム、記録媒体］
上記実施形態で説明した各装置における各種の処理機能をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記各装置における各種の処理機能がコンピュータ上で実現される。 [Program, recording medium]
When various processing functions in each device described in the above embodiment are realized by a computer, the processing contents of the functions that each device should have are described by a program. Then, by executing this program on a computer, various processing functions in each of the above devices are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD-ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 This program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるASP（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. A configuration in which the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes a processing function only by an execution instruction and result acquisition without transferring a program from the server computer to the computer. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, the present apparatus is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

１、３マスター端末
２スレーブ端末
９通信網
１０接続制御部
１１ＧＵＩ制御部
１２音声処理部
１３動画処理部
１４バッファ処理部
１５映像出力部
２０接続制御部
２１Ａ／Ｄ変換部
２２エンコード処理部
２３パケット送信部
１２０Ａ／Ｄ変換部
１２１パケット受信部
１２２デコード処理部
１２３マイクバッファ処理部
１２４遅延量決定部
１２５遅延バッファ処理部
１２６音声遅延量推定部
１２７話者強調処理部
１２８ノイズ除去部
１２９ミキシング部 1, 3 Master terminal 2 Slave terminal 9 Communication network 10 Connection control unit 11 GUI control unit 12 Audio processing unit 13 Movie processing unit 14 Buffer processing unit 15 Video output unit 20 Connection control unit 21 A / D conversion unit 22 Encoding processing unit 23 Packet transmission unit 120 A / D conversion unit 121 Packet reception unit 122 Decoding processing unit 123 Microphone buffer processing unit 124 Delay amount determination unit 125 Delay buffer processing unit 126 Speech delay amount estimation unit 127 Speaker enhancement processing unit 128 Noise removal unit 129 Mixing Part

Claims (8)

マスター端末と少なくとも１台のスレーブ端末とを含む通信システムであって、
上記スレーブ端末は、通信対象の信号をパケットに格納して上記マスター端末へ送信するパケット送信部を含み、
上記マスター端末は、
上記スレーブ端末から上記パケットを受信し上記信号を取り出すパケット受信部と、
上記スレーブ端末ごとにパケット伝送時間を計測し、上記パケット伝送時間の算術平均を当該スレーブ端末の遅延量として求める遅延量決定部と、
上記スレーブ端末ごとに上記信号に対して当該スレーブ端末の遅延量に対応する遅延を与えて遅延後信号を生成する遅延バッファ処理部と、
を含む通信システム。 A communication system including a master terminal and at least one slave terminal,
The slave terminal includes a packet transmission unit that stores a signal to be communicated in a packet and transmits the packet to the master terminal,
The master terminal
A packet receiver that receives the packet from the slave terminal and extracts the signal;
A delay amount determination unit that measures the packet transmission time for each slave terminal and obtains the arithmetic average of the packet transmission time as the delay amount of the slave terminal;
A delay buffer processing unit that generates a delayed signal by giving a delay corresponding to the delay amount of the slave terminal to the signal for each slave terminal;
A communication system including: 請求項１に記載の通信システムであって、
上記スレーブ端末は、当該スレーブ端末に接続されたマイクを用いてデジタルの観測信号を取得するＡ／Ｄ変換部をさらに含み、
上記パケット送信部は、上記観測信号を通信対象の信号としてパケットに格納し、上記マスター端末へ送信するものであり、
上記マスター端末は、
当該マスター端末に接続されたマイクを用いてデジタルの観測信号を取得するＡ／Ｄ変換部と、
上記マスター端末および上記スレーブ端末が取得した複数の観測信号のうち音声が含まれる観測信号の各組について相互相関が最大となる時間差を当該観測信号の音声遅延量として求める音声遅延量推定部と、
をさらに含み、
上記遅延量決定部は、上記音声遅延量を用いて上記パケット伝送時間を補正して上記パケット伝送時間の算術平均を計算するものである
通信システム。 The communication system according to claim 1,
The slave terminal further includes an A / D converter that acquires a digital observation signal using a microphone connected to the slave terminal,
The packet transmission unit stores the observation signal as a communication target signal in a packet, and transmits the packet to the master terminal.
The master terminal
An A / D converter that acquires a digital observation signal using a microphone connected to the master terminal;
A voice delay amount estimation unit for obtaining a time difference at which the cross-correlation is maximum for each set of observation signals including voice among the plurality of observation signals acquired by the master terminal and the slave terminal;
Further including
The delay amount determination unit corrects the packet transmission time using the voice delay amount and calculates an arithmetic average of the packet transmission time. 請求項２に記載の通信システムであって、
上記遅延量決定部は、計測したパケット伝送時間と真のパケット伝送時間の差がガウス分布に従うと仮定して上記パケット伝送時間から上記ガウス分布の平均と分散を求め、上記ガウス分布の平均と分散を用いて上記パケット伝送時間が外れ値であるか否かを判定し、外れ値であると判定されたパケット伝送時間は算術平均の計算に用いないものである
通信システム。 A communication system according to claim 2,
The delay amount determination unit obtains an average and variance of the Gaussian distribution from the packet transmission time on the assumption that the difference between the measured packet transmission time and the true packet transmission time follows a Gaussian distribution, and calculates the average and variance of the Gaussian distribution. It is determined whether or not the packet transmission time is an outlier using, and the packet transmission time determined to be an outlier is not used for arithmetic average calculation. 請求項２または３に記載の通信システムであって、
上記マスター端末は、上記遅延後信号に対して特定の話者の音声を強調する話者強調処理を行う話者強調処理部をさらに含む
通信システム。 The communication system according to claim 2 or 3,
The master terminal further includes a speaker emphasis processing unit that performs speaker emphasis processing for emphasizing a specific speaker's voice with respect to the delayed signal. 請求項２から４のいずれかに記載の通信システムであって、
上記マスター端末は、
上記遅延後信号からノイズを除去してノイズ除去後音声信号を生成するノイズ除去部と、
上記ノイズ除去後音声信号の総和を算出して処理後音声信号を生成するミキシング部と、
をさらに含む通信システム。 A communication system according to any one of claims 2 to 4,
The master terminal
A noise removing unit that removes noise from the delayed signal and generates an audio signal after noise removal;
A mixing unit that calculates a sum of the audio signals after noise removal and generates a processed audio signal;
A communication system further comprising: 請求項２から５のいずれかに記載の通信システムであって、
上記マスター端末は、
当該マスター端末に接続されたビデオカメラを用いてデジタルの映像信号を取得する動画処理部と、
上記映像信号に対して上記スレーブ端末の遅延量のうち最大の遅延量に対応する遅延を与えた遅延後映像信号を生成するバッファ処理部と、
上記遅延後映像信号に上記遅延後信号に基づく信号を付加して音声付映像信号を生成する動画出力部と、
をさらに含む通信システム。 The communication system according to any one of claims 2 to 5,
The master terminal
A video processing unit that acquires a digital video signal using a video camera connected to the master terminal;
A buffer processing unit for generating a delayed video signal in which a delay corresponding to the maximum delay amount of the slave terminal delay amount is given to the video signal;
A video output unit for generating a video signal with audio by adding a signal based on the delayed signal to the delayed video signal;
A communication system further comprising: 少なくとも１台のスレーブ端末が、通信対象の信号をパケットに格納してマスター端末へ送信するパケット送信ステップと、
上記マスター端末が、上記スレーブ端末から上記パケットを受信し上記信号を取り出すパケット受信ステップと、
上記マスター端末が、上記スレーブ端末ごとにパケット伝送時間を計測し、上記パケット伝送時間の算術平均を当該スレーブ端末の遅延量として求める遅延量決定ステップと、
上記マスター端末が、上記スレーブ端末ごとに上記信号に対して当該スレーブ端末の遅延量に対応する遅延を与えて遅延後信号を生成する遅延バッファ処理ステップと、
を含む通信方法。 A packet transmission step in which at least one slave terminal stores a signal to be communicated in a packet and transmits the packet to the master terminal;
A packet receiving step in which the master terminal receives the packet from the slave terminal and extracts the signal;
The master terminal measures the packet transmission time for each slave terminal, and determines a delay amount determining step for obtaining an arithmetic average of the packet transmission time as a delay amount of the slave terminal;
A delay buffer processing step in which the master terminal generates a delayed signal by giving a delay corresponding to the delay amount of the slave terminal to the signal for each slave terminal;
Including a communication method. 請求項１から６のいずれかに記載のマスター端末としてコンピュータを機能させるためのプログラム。 The program for functioning a computer as a master terminal in any one of Claim 1 to 6.

JP2015057620A 2015-03-20 2015-03-20 Communication system, communication method, and program Active JP6377557B2 (en)

Priority Applications (1)

Application Number	Priority Date	Filing Date	Title
JP2015057620A JP6377557B2 (en)	2015-03-20	2015-03-20	Communication system, communication method, and program

Applications Claiming Priority (1)

Application Number	Priority Date	Filing Date	Title
JP2015057620A JP6377557B2 (en)	2015-03-20	2015-03-20	Communication system, communication method, and program

Publications (2)

Publication Number	Publication Date
JP2016177153A true JP2016177153A (en)	2016-10-06
JP6377557B2 JP6377557B2 (en)	2018-08-22

Family

ID=57069043

Family Applications (1)

Application Number	Title	Priority Date	Filing Date
JP2015057620A Active JP6377557B2 (en)	2015-03-20	2015-03-20	Communication system, communication method, and program