CN111444817A - A person image recognition method, device, electronic device and storage medium - Google Patents
- ️Fri Jul 24 2020
技术领域technical field
本发明涉及图像处理技术领域,尤其是涉及一种人物图像识别方法、装置、电子设备和存储介质。The present invention relates to the technical field of image processing, and in particular, to a method, device, electronic device and storage medium for identifying a person image.
背景技术Background technique
随着深度神经网络和深度学习技术的发展,基于深度神经网络强大的学习能力,在越来越多的方面取得成功。其中,在人脸识别方面的表现尤其出色,甚至超过了人工识别的准确率。例如,对于影视视频中的人物身份识别,现有方法主要针对视频图像进行人脸识别,计算数据库中保存的标准人脸特征向量和图像帧中检测到的人脸特征向量之间的距离(如欧氏距离),设置距离阈值判断当前帧的人脸图像与数据库中人脸图像的匹配程度,距离小于阈值则识别成功,否则识别失败,数据库中匹配度最高的特征向量对应的人物身份即为当前人物的识别身份。With the development of deep neural network and deep learning technology, based on the powerful learning ability of deep neural network, it has achieved success in more and more aspects. Among them, the performance in face recognition is particularly good, even exceeding the accuracy of manual recognition. For example, for the identification of people in film and television videos, the existing methods mainly perform face recognition on video images, and calculate the distance between the standard face feature vector saved in the database and the face feature vector detected in the image frame (such as Euclidean distance), set the distance threshold to judge the matching degree between the face image of the current frame and the face image in the database. If the distance is less than the threshold, the recognition is successful, otherwise the recognition fails. The character identity corresponding to the feature vector with the highest matching degree in the database is The identity of the current person.
具体来说,对于人物身份识别,现有的解决办法是将人脸图像转换为特征向量(如512维特征向量),基于某种距离度量方法(如欧式距离),判断数据库中所有标准向量与当前待识别人脸向量的最小距离是否满足设定的阈值条件,以此确定人脸识别的结果。然而,与人脸解锁、人脸支付等识别场景不同,影视视频中的识别场景复杂多变,人物姿态各异,人脸角度、表情变化较大,视频镜头切换频繁。因此,若通过人脸识别对视频中的人物身份进行识别,容易出现漏识别的情况,识别成功率较低。Specifically, for person identification, the existing solution is to convert the face image into a feature vector (such as a 512-dimensional feature vector), and based on a certain distance measurement method (such as Euclidean distance), judge all standard vectors in the database and Whether the minimum distance of the current face vector to be recognized satisfies the set threshold condition is used to determine the result of face recognition. However, unlike recognition scenarios such as face unlocking and face payment, the recognition scenarios in film and video are complex and changeable, with different poses of characters, large changes in face angles and expressions, and frequent switching of video lenses. Therefore, if the identity of the person in the video is recognized through face recognition, it is easy to miss the recognition, and the recognition success rate is low.
发明内容SUMMARY OF THE INVENTION
本发明实施例提供一种人物图像识别方法、装置、电子设备和存储介质,用以解决现有技术中现有的通过人脸识别对视频中的人物身份进行识别,容易出现漏识别的情况,识别成功率较低的问题。Embodiments of the present invention provide a method, device, electronic device, and storage medium for identifying a person image, so as to solve the problem of identifying a person's identity in a video through face recognition in the prior art, which is prone to omission of identification. Identify issues with low success rates.
针对以上技术问题,第一方面,本发明实施例提供一种人物图像识别方法,包括:In view of the above technical problems, in a first aspect, an embodiment of the present invention provides a method for recognizing a person image, including:
对通过人脸识别从视频的各帧图像中识别的任一身份信息,获取所述视频中通过人脸识别未识别出所述身份信息的第一图像;For any identity information identified from each frame image of the video through face recognition, obtain the first image in the video where the identity information is not identified through face recognition;
根据第二图像中与所述身份信息对应的人脸识别区域,通过图像跟踪算法从第一图像中识别出与所述身份信息对应的目标图像区域;According to the face recognition area corresponding to the identity information in the second image, identify the target image area corresponding to the identity information from the first image through an image tracking algorithm;
其中,第二图像为所述视频中通过人脸识别成功识别出所述身份信息的图像。Wherein, the second image is an image in the video in which the identity information is successfully identified through face recognition.
第二方面,本发明实施例提供一种人物图像识别装置,包括:In a second aspect, an embodiment of the present invention provides a person image recognition device, including:
获取模块,用于对通过人脸识别从视频的各帧图像中识别的任一身份信息,获取所述视频中通过人脸识别未识别出所述身份信息的第一图像;an acquisition module, configured to acquire the first image in the video for which the identity information is not identified through face recognition for any identity information identified from each frame of the video through face recognition;
识别模块,用于根据第二图像中与所述身份信息对应的人脸识别区域,通过图像跟踪算法从第一图像中识别出与所述身份信息对应的目标图像区域;an identification module, configured to identify a target image area corresponding to the identity information from the first image through an image tracking algorithm according to the face recognition area corresponding to the identity information in the second image;
第三方面,本发明实施例提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现以上所述的人物图像识别方法的步骤。In a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the processor, the processor implementing the above-mentioned character when executing the program The steps of an image recognition method.
第四方面,本发明实施例提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以上任一项所述的人物图像识别方法的步骤。In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of any one of the above-mentioned methods for recognizing a person image.
本发明的实施例提供的一种人物图像识别方法、装置、电子设备和存储介质,通过人脸识别从视频中识别出身份信息后,对于每一身份信息,通过图像跟踪算法对通过人脸识别未识别出该身份信息的第一图像进行重新识别,通过图像跟踪算法对人脸识别的结果进行补充。人脸识别和图像跟踪算法的结合提高了对该身份信息的人物图像识别的成功率,降低漏识别的情况。The embodiments of the present invention provide a method, device, electronic device and storage medium for identifying a person image. After identifying identity information from a video through face recognition, for each identity information The first image for which the identity information is not recognized is re-identified, and the result of face recognition is supplemented by an image tracking algorithm. The combination of face recognition and image tracking algorithm improves the success rate of person image recognition of the identity information and reduces the situation of missed recognition.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.
图1是本发明实施例提供的人物图像识别方法的流程示意图;1 is a schematic flowchart of a method for recognizing a person image provided by an embodiment of the present invention;
图2是本发明另一实施例提供的通过人脸检测、识别和跟踪识别人物身份的总体流程示意图;2 is a schematic diagram of an overall flow of identifying a person's identity through face detection, recognition and tracking provided by another embodiment of the present invention;
图3是本发明另一实施例提供的跟踪网络模型对人物图像进行跟踪的原理示意图;3 is a schematic diagram of the principle of tracking a person image by a tracking network model provided by another embodiment of the present invention;
图4是本发明另一实施例提供的面积重合度的计算原理示意图;4 is a schematic diagram of the calculation principle of the area coincidence degree provided by another embodiment of the present invention;
图5是本发明另一实施例提供的人物图像识别装置的结构框图;5 is a structural block diagram of a person image recognition device provided by another embodiment of the present invention;
图6是本发明另一实施例提供的电子设备的实体结构图。FIG. 6 is a physical structure diagram of an electronic device provided by another embodiment of the present invention.
具体实施方式Detailed ways
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
现有的对视频中人物进行识别存在识别成功率较低的问题,例如,视频中的图像受到环境或者镜头切换的影响导致某些图像中的人脸图像无法提供较为全面和清晰的面部特征。为了解决视频中人物识别成功率较低的问题,本申请提供了一种人物图像识别方法,主要用于对视频(例如,电影,电影片段)中出现的人物进行身份识别。以使得用户在没有观看视频之前,能够获知视频各帧图像中出现的人物。该方法可以由任一设备执行,例如,计算机、服务器、手机等。图1为本实施例提供的人物图像识别方法的流程示意图,参见图1,该方法包括:The existing recognition of people in videos has the problem of low recognition success rate. For example, the images in the video are affected by the environment or the switching of shots, so that the face images in some images cannot provide more comprehensive and clear facial features. In order to solve the problem of low success rate of character recognition in videos, the present application provides a character image recognition method, which is mainly used to identify characters appearing in videos (eg, movies, movie clips). So that the user can know the characters appearing in each frame of the video before watching the video. The method can be performed by any device, for example, a computer, a server, a cell phone, and the like. FIG. 1 is a schematic flowchart of a method for recognizing a person image provided by the present embodiment. Referring to FIG. 1 , the method includes:
步骤101:对通过人脸识别从视频的各帧图像中识别的任一身份信息,获取所述视频中通过人脸识别未识别出所述身份信息的第一图像。Step 101 : For any identity information identified from each frame of images of the video through face recognition, obtain a first image in the video where the identity information is not identified through face recognition.
身份信息为用于标识不同人物的信息,例如,人物的姓名、证件号码或者生物信息等,本实施例对此不作具体限制。The identity information is information used to identify different persons, for example, the person's name, certificate number, or biological information, which is not specifically limited in this embodiment.
步骤102:根据第二图像中与所述身份信息对应的人脸识别区域,通过图像跟踪算法从第一图像中识别出与所述身份信息对应的目标图像区域。Step 102: According to the face recognition area corresponding to the identity information in the second image, identify the target image area corresponding to the identity information from the first image through an image tracking algorithm.
图像跟踪算法的种类有很多,例如,通过相似度、图像位置改变等对图像进行追踪,本实施例对此不作具体限制。对于各第一图像,可以根据视频中任一第二图像中与某一身份信息对应的人脸识别区域识别出该身份信息对应的目标图像区域,(具体来说,根据第二图像中与某一身份信息对应的人脸识别区域确定目标跟踪区域,根据目标跟踪区域执行图像跟踪算法)。然而,为了提高通过图像跟踪算法的识别成功率,可以根据视频播放时间在该第一图像之前,且视频播放时间接近该第一图像(例如,视频播放时间与该第一图像最接近)的第二图像识别出该身份信息对应的目标图像区域There are many types of image tracking algorithms, for example, images are tracked through similarity, image position change, etc., which is not specifically limited in this embodiment. For each first image, the target image area corresponding to the identity information can be identified according to the face recognition area corresponding to a certain identity information in any second image in the video, (specifically, according to The face recognition area corresponding to the identity information determines the target tracking area, and the image tracking algorithm is executed according to the target tracking area). However, in order to improve the recognition success rate through the image tracking algorithm, the video playback time may be before the first image, and the video playback time is close to the first image (for example, the video playback time is closest to the first image). The second image identifies the target image area corresponding to the identity information
本发明的实施例提供的一种人物图像识别方法,通过人脸识别从视频中识别出身份信息后,对于每一身份信息,通过图像跟踪算法对通过人脸识别未识别出该身份信息的第一图像进行重新识别,通过图像跟踪算法对人脸识别的结果进行补充。人脸识别和图像跟踪算法的结合提高了对该身份信息的人物图像识别的成功率,降低漏识别的情况。In a method for recognizing a person image provided by an embodiment of the present invention, after identifying identity information from a video through face recognition, for each identity information, an image tracking algorithm is used to identify the first identity information whose identity information is not identified through face recognition. An image is re-identified, and the result of face recognition is supplemented by an image tracking algorithm. The combination of face recognition and image tracking algorithm improves the success rate of person image recognition of the identity information and reduces the situation of missed recognition.
为了提高识别成功率,本实施例对各第一图像“划分视频片段”,针对每一视频片段,利用视频过程中图像之间的关联性,对每一视频片段分别确定与其对应的跟踪图像,以实现对该视频片段中出现的该身份信息的识别。进一步地,在上述实施例的基础上,所述根据第二图像中与所述身份信息对应的人脸识别区域,通过图像跟踪算法从第一图像中识别出与所述身份信息对应的目标图像区域,包括:In order to improve the recognition success rate, in this embodiment, each first image is "divided into video segments", and for each video segment, the correlation between images in the video process is used to determine the corresponding tracking image for each video segment respectively, In order to realize the identification of the identity information appearing in the video clip. Further, on the basis of the above embodiment, the target image corresponding to the identity information is identified from the first image through an image tracking algorithm according to the face recognition area corresponding to the identity information in the second image. area, including:
获取由视频播放时间连续的第一图像组成的任一初始视频片段,作为待识别视频片段,获取视频播放时间在所述待识别视频片段的首帧播放时间之前,且与所述首帧播放时间间隔预设时长的第二图像,作为跟踪图像;Obtain any initial video clip consisting of first images with continuous video playback time, as the video clip to be identified, obtain the video playback time before the first frame playback time of the to-be-recognized video clip, and the same as the first frame playback time. The second image with a preset time interval is used as a tracking image;
以所述跟踪图像中与所述身份信息对应的人脸识别区域作为目标跟踪区域,根据所述目标跟踪区域,通过图像跟踪算法从所述待识别视频片段中识别出所述目标图像区域。The face recognition area corresponding to the identity information in the tracking image is used as the target tracking area, and according to the target tracking area, the target image area is identified from the to-be-recognized video segment through an image tracking algorithm.
需要说明的是,对每一身份信息均需要确定与该身份信息对应的待识别视频片段。待识别视频片段可以根据视频播放时间连续的第一图像提取。视频播放时间指的不中停播的情况下播放完整个视频时,视频中各帧图像对应的播放时间点。It should be noted that, for each piece of identity information, the to-be-identified video segment corresponding to the identity information needs to be determined. The video segment to be identified may be extracted according to the first image with continuous video playback time. The video playback time refers to the playback time point corresponding to each frame image in the video when the entire video is played without interruption.
以下以示例的方法对初始视频片段的确定过程进行介绍:The following describes the process of determining the initial video segment by way of example:
对于“person1”、“person2”和“person3”三个身份信息,需要分别对每一身份信息确定初始视频片段,表1是本实施例提供的对不同身份信息的识别结果时间轴。参见表1,对于person1,在“00:00:05:880-00:01:10:200”、“00:02:39:360-00:03:13:060”、“00:04:37:540-00:05:05:340”三个时间段内person1被成功识别,在“00:01:10:200-00:02:39:360”、“00:03:13:060-00:04:37:540”和“00:05:05:340-视频结束”时间段内的person1未被识别,这3个时间段即为对person1确定的3个初始视频片段。其中,“00:01:10:200”、“00:03:13:060”和“00:05:05:340”分别对应person1三个未被识别时间段的起始时间点。For the three identities of "person1", "person2" and "person3", initial video segments need to be determined for each identity information respectively. Table 1 is the time axis of the identification results of different identity information provided in this embodiment. See Table 1, for person1, at "00:00:05:880-00:01:10:200", "00:02:39:360-00:03:13:060", "00:04:37 :540-00:05:05:340" person1 was successfully identified in the three time periods, and in "00:01:10:200-00:02:39:360", "00:03:13:060-00 :04:37:540" and "00:05:05:340-video end" time periods are not identified for person1, and these three time periods are the three initial video clips determined for person1. Among them, "00:01:10:200", "00:03:13:060" and "00:05:05:340" respectively correspond to the starting time points of the three unrecognized time periods of person1.
表1对不同身份信息确定的视频短短的时间轴Table 1 Short timeline of videos determined by different identity information
在本实施例中,对每一作为待识别视频片段的初始视频片段,均确定一张在该初始视频片段之前且与该初始视频片段的视频播放时间较为接近的第二图像,作为跟踪图像。通过跟踪图像实现对该初始视频片段中身份信息的识别,利用了视频中各图像之间的关联性,有利于提高对该身份信息进行识别的识别准确率。In this embodiment, for each initial video clip that is a video clip to be identified, a second image before the initial video clip and closer to the video playback time of the initial video clip is determined as a tracking image. The identification of the identity information in the initial video segment is realized by tracking the image, and the correlation between the images in the video is utilized, which is beneficial to improve the identification accuracy of the identification of the identity information.
需要说明的是,预设时长为设定值,例如,预设时长等于视频的相邻两帧图像在视频播放时间上的时间间隔(即获取的跟踪图像在所述初始视频片段之前,且与初始视频片段的首帧播放时间最接近的图像),或者预设时长等于视频中任一图像与该图像之前的第2帧或第3帧图像在视频播放时间上的时间间隔,本实施例对此不作具体限制。It should be noted that the preset duration is a set value, for example, the preset duration is equal to the time interval between two adjacent frames of video in the video playback time (that is, the acquired tracking image is before the initial video segment, and is different from the initial video segment. The first frame of the initial video clip has the closest playback time), or the preset duration is equal to the time interval between any image in the video and the second or third frame before the image in the video playback time. This is not specifically limited.
本发明将第一图像划分为初始视频片段,并针对每一初始视频片段分别确定出跟踪图像,得到由跟踪图像和初始视频片段组成的目标视频片段。由于跟踪图像是在初始视频片段之前且与初始视频片段具有强关联的图像,因此,通过跟踪图像中的人脸识别区域能够进一步提高对目标视频片段进行识别的识别成功率。The present invention divides the first image into initial video segments, and determines a tracking image for each initial video segment to obtain a target video segment composed of the tracking image and the initial video segment. Since the tracking image is an image before the initial video segment and has a strong correlation with the initial video segment, the recognition success rate of identifying the target video segment can be further improved by tracking the face recognition area in the image.
为了进一步说明本申请中人物识别方法的具体流程,图2为本实施例提供的通过人脸检测、识别和跟踪识别人物身份的总体流程示意图,参见图2,该过程包括如下几个步骤:In order to further illustrate the specific flow of the method for character recognition in the present application, Figure 2 provides a schematic diagram of the overall flow of face detection, recognition and tracking identification of a character provided by the present embodiment, referring to Figure 2, the process includes the following steps:
步骤1:对视频中的图像进行人脸检测;Step 1: Perform face detection on the image in the video;
步骤2:进行人脸识别;Step 2: Perform face recognition;
步骤3:计算未被识别的时间轴区间;Step 3: Calculate the unrecognized time axis interval;
步骤4:提取有跟踪价值的区间片段;Step 4: Extract interval segments with tracking value;
步骤5:提升待跟踪目标的可判别性;Step 5: Improve the discriminability of the target to be tracked;
步骤6:目标区域跟踪;Step 6: target area tracking;
步骤7:提升人物识别率。Step 7: Improve the character recognition rate.
为了提高识别的效率,还可以对各视频片段的第一图像进行“精简”。进一步地,在上述各实施例的基础上,所述获取由视频播放时间连续的第一图像组成的任一初始视频片段,作为待识别视频片段,包括:In order to improve the efficiency of identification, the first image of each video segment may also be "reduced". Further, on the basis of the above embodiments, the acquisition of any initial video segment consisting of first images with continuous video playback time, as the video segment to be identified, includes:
对由视频播放时间连续的第一图像组成的任一初始视频片段,根据所述目标跟踪区域在所述跟踪图像中的位置信息,以及所述初始视频片段中在各第一图像中出现的人脸识别区域的位置信息,从所述初始视频片段中确定疑似包含与所述身份信息对应的图像区域的第一图像;For any initial video segment consisting of first images with continuous video playback time, according to the position information of the target tracking area in the tracking image, and the person appearing in each first image in the initial video segment The position information of the face recognition area, the first image that is suspected to contain the image area corresponding to the identity information is determined from the initial video clip;
将由疑似包含与所述身份信息对应的图像区域的第一图像组成的视频片段,作为所述待识别视频片段。A video segment consisting of a first image suspected of containing an image area corresponding to the identity information is used as the to-be-identified video segment.
其中,人脸识别区域的位置信息包括人脸识别区域的坐标位置和尺寸。The location information of the face recognition area includes the coordinate position and size of the face recognition area.
具体来说,可以通过位置信息确定出的人脸识别图像在相邻帧图像的面积重合度,来初步判断后一帧图像中是否疑似包括了该人脸识别图像。Specifically, it can be preliminarily determined whether the face recognition image is suspected to be included in the next frame of image according to the area coincidence degree of the face recognition image determined by the position information in the adjacent frame images.
本实施例根据是否疑似包含与所述身份信息对应的图像区域,对初始视频片段进行进一步缩短,以缩短后的初始视频片段作为待识别视频片段,降低图像跟踪算法的运算量,提高识别效率。In this embodiment, the initial video segment is further shortened according to whether the image area corresponding to the identity information is suspected to be included, and the shortened initial video segment is used as the video segment to be recognized, which reduces the computational complexity of the image tracking algorithm and improves the recognition efficiency.
有些情况下(例如,在人物受到遮挡的情况下),人脸识别区域所能提供的特征很少,难以根据人脸识别区域的特征成功进行识别。对此,本实施例中将人脸识别区域进行放大,将放大后的人物识别图像作为跟踪图像,由于人物识别图像包含了更多的特征,有利于进一步提高识别准确率。进一步地,在上述各实施例的基础上,所述根据第二图像中与所述身份信息对应的人脸识别区域,通过图像跟踪算法从第一图像中识别出与所述身份信息对应的目标图像区域,包括:In some cases (for example, when a person is occluded), the features that the face recognition area can provide are very few, and it is difficult to successfully recognize based on the features of the face recognition area. In this regard, in this embodiment, the face recognition area is enlarged, and the enlarged person recognition image is used as the tracking image. Since the person recognition image contains more features, it is beneficial to further improve the recognition accuracy. Further, on the basis of the above embodiments, the target corresponding to the identity information is identified from the first image through an image tracking algorithm according to the face recognition area corresponding to the identity information in the second image. Image area, including:
根据第二图像中与所述身份信息对应的人脸识别区域进行放大后形成的人物识别区域,通过图像跟踪算法从第一图像中识别出与所述身份信息对应的目标图像区域。According to the person recognition area formed by enlarging the face recognition area corresponding to the identity information in the second image, the target image area corresponding to the identity information is identified from the first image through an image tracking algorithm.
其中,第二图像中与所述身份信息对应的人脸识别区域按照设定放大规则放大,得到所述人物识别区域。Wherein, the face recognition area corresponding to the identity information in the second image is enlarged according to the set enlargement rule to obtain the person recognition area.
具体地,将放大后得到的人物识别区域作为目标跟踪区域,通过图像跟踪算法对第一图像中的目标图像区域进行识别。Specifically, the person recognition area obtained after the enlargement is taken as the target tracking area, and the target image area in the first image is recognized by the image tracking algorithm.
其中,所述按照设定放大规则为按照公式Wherein, the enlarging rule according to the setting is according to the formula
对第二图像中与所述身份信息对应的人脸识别区域进行放大,x和y分别是人脸识别区域bounding box相对于图像左上角的坐标位置,w和h分别是人脸识别区域boundingbox的宽度和高度。Enlarging the face recognition area corresponding to the identity information in the second image, x and y are the coordinate positions of the face recognition area bounding box relative to the upper left corner of the image, w and h are respectively the face recognition area boundingbox width and height.
也就是说,对人脸识别区域进行放大后得到的人物识别区域相对于人脸识别区域,左上角的坐标位置中,横坐标减小0.3乘以人脸识别区域的宽度w,纵坐标减小0.3乘以人脸识别区域的高度h,人物识别区域相对于人脸识别区域在宽度上增加了1.3乘以人脸识别区域的宽度w,在高度上增加了1.3乘以人脸识别区域的高度h。That is to say, in the coordinate position of the upper left corner of the face recognition area obtained by enlarging the face recognition area, the abscissa decreases by 0.3 times the width w of the face recognition area, and the ordinate decreases. 0.3 times the height h of the face recognition area. Compared with the face recognition area, the width of the person recognition area increases by 1.3 times the width w of the face recognition area, and the height increases by 1.3 times the height of the face recognition area. h.
本实施例对人脸识别区域进行放大,根据放大后得到的人物识别区域对图像跟踪算法的跟踪区域进行初始化。放大形成的人物识别区域中包括了人物更多的特征,通过人物识别区域能够进一步提高对该身份信息的人物进行识别的识别成功率,降低漏识别的情况。In this embodiment, the face recognition area is enlarged, and the tracking area of the image tracking algorithm is initialized according to the person recognition area obtained after the enlargement. The enlarged person identification area includes more features of the person, and the identification success rate of identifying the person with the identity information can be further improved through the person identification area, and the situation of missed identification can be reduced.
本实施例提供一种实现对身份信息对应的目标图像区域进行识别的具体实施方式,进一步地,在上述各实施例的基础上,所述根据所述目标跟踪区域,通过图像跟踪算法从所述初始视频片段中识别出所述目标图像区域,包括:This embodiment provides a specific implementation for identifying the target image area corresponding to the identity information. Further, on the basis of the above embodiments, according to the target tracking area, the image tracking algorithm is used from the target image tracking area. The target image area is identified in the initial video clip, including:
确定由所述跟踪图像和所述待识别视频片段组成,且以所述跟踪图像作为首帧图像的目标视频片段;Determine the target video segment consisting of the tracking image and the to-be-identified video segment, and using the tracking image as the first frame image;
将所述目标视频片段,以及在所述目标视频片段的首帧图像中所述目标跟踪区域的位置信息输入到预先训练的跟踪网络模型中,由所述跟踪网络模型输出所述目标视频片段中的每一第一图像所包含的各图像区域与所述目标跟踪区域的相似度;Input the target video clip and the position information of the target tracking area in the first frame image of the target video clip into a pre-trained tracking network model, and output the target video clip from the tracking network model. The similarity between each image area included in each first image and the target tracking area;
根据所述跟踪网络模型对每一第一图像输出的相似度,从所述目标视频片段中识别出所述目标图像区域;Identifying the target image region from the target video segment according to the similarity of each first image output by the tracking network model;
其中,跟踪网络模型为,根据样本视频的首帧图像中标记的样本人物跟踪区域的位置信息和所述样本视频中标记的与所述样本人物跟踪区域表示同一人物的图像区域,通过机器学习进行训练的,用于根据输入的视频和视频的首帧图像中人物跟踪区域的位置信息,从视频中识别出与人物跟踪区域表示同一人物的图像区域的模型。The tracking network model is based on the location information of the sample person tracking area marked in the first frame image of the sample video and the image area marked in the sample video that represents the same person as the sample person tracking area, through machine learning. A model trained to identify an image area representing the same person as the person tracking area from the video based on the input video and the position information of the person tracking area in the first frame of the video.
其中,根据所述跟踪网络模型对每一第一图像输出的相似度,从所述目标视频片段中识别出所述目标图像区域,包括:Wherein, identifying the target image area from the target video segment according to the similarity output by the tracking network model to each first image, including:
对所述目标视频片段中的任一第一图像,判断第一图像中是否存在与所述目标跟踪区域的相似度大于相似度阈值的图像区域,若是,将第一图像中与所述目标跟踪区域的相似度最大的图像区域,作为所述目标图像区域,否则,所述第一图像中不存在所述目标图像区域。For any first image in the target video segment, determine whether there is an image area in the first image that has a similarity with the target tracking area greater than a similarity threshold, and if so, compare the first image with the target tracking area. The image area with the largest similarity of the areas is used as the target image area, otherwise, the target image area does not exist in the first image.
其中,跟踪网络模型为基于深度神经网络的“孪生网络”。Among them, the tracking network model is a "twin network" based on a deep neural network.
具体地,跟踪网络模型实际上是根据初始化的目标跟踪区域,对视频中的后续图像进行对比分析,查找与目标跟踪区域相似度最高的区域作为跟踪结果,并依次跟踪下去。对于上述步骤6,图3为本实施例提供的跟踪网络模型对人物图像进行跟踪的原理示意图,本实施例采用基于深度神经网络的“孪生网络”跟踪方法完成跟踪任务。参见图3,z是模板图像,也就是经过放大后的bounding box区域,x是视频后续的图像帧,φ为卷积神经网络,将图像区域映射到特征空间,*是卷积操作,将6*6*128的特征和22*22*128的特征进行卷积,输出17*17的跟踪结果得分矩阵,表示搜索区域x中各个位置与模板z的相似程度,相似度最高的点对应的图像区域就是跟踪器的跟踪结果,跟踪成功的图像x与模板图像z具有相同的人物身份。Specifically, the tracking network model actually compares and analyzes the subsequent images in the video according to the initialized target tracking area, finds the area with the highest similarity with the target tracking area as the tracking result, and tracks in turn. For the above step 6, FIG. 3 is a schematic diagram of the principle of tracking a person image by the tracking network model provided in this embodiment. This embodiment uses a "twin network" tracking method based on a deep neural network to complete the tracking task. See Figure 3, z is the template image, that is, the enlarged bounding box area, x is the subsequent image frame of the video, φ is the convolutional neural network, which maps the image area to the feature space, * is the convolution operation, and 6 *6*128 features are convolved with 22*22*128 features, and a 17*17 tracking result score matrix is output, indicating the similarity between each position in the search area x and the template z, and the image corresponding to the point with the highest similarity The region is the tracking result of the tracker, and the successfully tracked image x has the same person identity as the template image z.
本实施例通过预先训练的跟踪网络模型对目标跟踪区域进行跟踪,基于相似度的比对实现了对某一身份信息的图像区域的识别过程,实现了对人脸识别未成功识别的第一图像的重识别,提高了该身份信息的识别成功率。In this embodiment, the target tracking area is tracked by the pre-trained tracking network model, and the recognition process of the image area of a certain identity information is realized based on the comparison based on the similarity, and the first image that is not successfully recognized by face recognition is realized. The re-identification of the identity information improves the identification success rate of the identity information.
本实施例提供如何“精简”视频片段的具体实施方式。进一步地,在上述各实施例的基础上,所述根据所述目标跟踪区域在所述跟踪图像中的位置信息,以及所述初始视频片段中在各第一图像中出现的人脸识别区域的位置信息,从所述初始视频片段中确定疑似包含与所述身份信息对应的图像区域的第一图像,包括:This embodiment provides a specific implementation of how to "reduce" a video clip. Further, on the basis of the above-mentioned embodiments, the position information of the target tracking area in the tracking image and the position information of the face recognition area appearing in each first image in the initial video segment are performed. Location information, determining from the initial video segment a first image suspected of containing an image area corresponding to the identity information, including:
自所述初始视频片段的首帧第一图像开始,循环执行检测操作,直到检测到不包含与所述身份信息对应的图像区域的第一图像或者对所述初始视频片段的最后一帧第一图像执行了所述检测操作;Starting from the first image of the first frame of the initial video segment, the detection operation is performed cyclically until the first image that does not contain the image area corresponding to the identity information is detected or the first image of the last frame of the initial video segment is detected. the image performs the detection operation;
其中,检测操作包括:Among them, the detection operation includes:
首次执行时,根据所述目标跟踪区域在所述跟踪图像中的位置信息,和所述初始视频片段的首帧第一图像中各人脸识别区域的位置信息,确定所述初始视频片段的首帧第一图像中各人脸识别区域与所述目标跟踪区域的第一面积重合度,若存在大于重合度阈值的第一面积重合度,则所述初始视频片段的首帧第一图像中疑似包含与所述身份信息对应的图像区域,并将最大的第一面积重合度对应的人脸识别区域作为所述初始视频片段的首帧第一图像中疑似包含的与所述身份信息对应的疑似图像区域,否则,所述初始视频片段的首帧第一图像中不包含与所述身份信息对应的图像区域;When executed for the first time, according to the position information of the target tracking area in the tracking image, and the position information of each face recognition area in the first image of the first frame of the initial video clip, determine the first position of the initial video clip. The first area coincidence of each face recognition area and the target tracking area in the first image of the frame, if there is a first area coincidence greater than the coincidence threshold, then the first frame of the initial video segment Including the image area corresponding to the identity information, and using the face recognition area corresponding to the maximum first area coincidence as the suspected identity information corresponding to the first frame of the first image of the initial video segment. image area, otherwise, the first image of the first frame of the initial video clip does not contain the image area corresponding to the identity information;
非首次执行时,获取前一次执行检测操作确定的疑似图像区域,根据疑似图像区域在前一次执行检测操的第一图像中的位置信息,和当前执行检测操作的第一图像中各人脸识别区域的位置信息,确定当前执行检测操作的第一图像中各人脸识别区域与疑似图像区域的第二面积重合度,若存在大于所述重合度阈值的第二面积重合度,则当前执行检测操作的第一图像中疑似包含与所述身份信息对应的图像区域,并将最大的第二面积重合度对应的人脸识别区域作为当前执行检测操作的第一图像中疑似包含的与所述身份信息对应的疑似图像区域,否则,当前执行检测操作的第一图像中不包含与所述身份信息对应的图像区域;When it is not executed for the first time, the suspected image area determined by the previous detection operation is obtained, and the position information of the suspected image area in the first image of the previous detection operation is performed, and each face recognition in the first image of the current detection operation is performed. The location information of the area, to determine the second area coincidence degree of each face recognition area and the suspected image area in the first image currently performing the detection operation, if there is a second area coincidence degree greater than the threshold value of the coincidence degree, then the current detection operation is performed. The first image of the operation is suspected to contain the image area corresponding to the identity information, and the face recognition area corresponding to the largest second area coincidence is used as the first image that is currently performing the detection operation. The suspected image area corresponding to the information, otherwise, the first image currently performing the detection operation does not contain the image area corresponding to the identity information;
其中,面积重合度根据两个区域重合部分所占的第一面积,以及所述两个区域非重合部分所占的第二面积和所述第一面积之和确定。The degree of area overlap is determined according to the first area occupied by the overlapping portion of the two regions, and the sum of the second area and the first area occupied by the non-overlapping portion of the two regions.
其中,面积重合度等于所述第一面积与所述和的比值。Wherein, the area coincidence is equal to the ratio of the first area to the sum.
关于面积重合度的计算,图4为本实施例提供的面积重合度的计算原理示意图,参见图4,对于两个人脸识别区域,这两个人脸识别区域的交集(即两个区域重合部分)所占的第一面积,与这两个人脸识别区域的并集(即两个区域非重合部分所占的第二面积和所述第一面积之和)的比值(即交并比)为这两个人脸识别区域的面积重合度。Regarding the calculation of the area coincidence degree, FIG. 4 is a schematic diagram of the calculation principle of the area coincidence degree provided by this embodiment. Referring to FIG. 4 , for two face recognition areas, the intersection of the two face recognition areas (that is, the overlapping part of the two areas) The ratio of the first area occupied to the union of the two face recognition areas (that is, the sum of the second area occupied by the non-overlapping parts of the two areas and the first area) is this The area coincidence of the two face recognition areas.
在图4中,两个方框代表连续两帧图像中检测到的人脸区域的坐标位置,“交集”表示两个区域在坐标系中重合的面积,“并集”表示两个区域在坐标系中的总面积。面积重合度IoU∈[0,1],本实施例设置IoU阈值(即重合度阈值)为0.5,低于IoU阈值表示后续视频场景变化较大,可跟踪性较低。IoU数值越接近1,表示连续两帧中人脸区域的重合度越大,视频片段的连续性概率越高,跟踪价值越大。In Figure 4, the two boxes represent the coordinate positions of the detected face regions in two consecutive frames of images, the "intersection" represents the overlapping area of the two regions in the coordinate system, and the "union" represents the coordinates of the two regions. total area in the system. The area coincidence degree IoU∈[0,1]. In this embodiment, the IoU threshold (ie, the coincidence degree threshold) is set to 0.5, and if it is lower than the IoU threshold, it means that the subsequent video scene changes greatly and the traceability is low. The closer the IoU value is to 1, the greater the coincidence of the face regions in two consecutive frames, the higher the continuity probability of the video clip, and the greater the tracking value.
其中,重合度阈值为设定值,例如,重合度阈值为0.5。Wherein, the coincidence degree threshold is a set value, for example, the coincidence degree threshold is 0.5.
具体来说,对于上述步骤4,只有连续渐变的视频片段才具有跟踪的价值和跟踪成功的可能。基于步骤3输出的未识别视频片段,结合步骤2输出的未识别人脸bounding box位置,在每个未识别片段内,以第一帧中被识别成功的人脸bounding box为起始,计算连续视频图像中人脸bounding box坐标位置的重合度。通过bounding box的重合度初步判断视频片段的连续性,筛选具有跟踪价值的片段。bounding box重合度越高,视频的连续性概率越高,跟踪成功的可能性越大。人脸区域bounding box重合度可以通过如图4所示,采用连续两帧中人脸区域的“交并比”来计算。Specifically, for the above step 4, only video clips with continuous gradients have the value of tracking and the possibility of successful tracking. Based on the unrecognized video segment output in step 3, combined with the position of the unrecognized face bounding box output in step 2, in each unrecognized segment, starting with the successfully recognized face bounding box in the first frame, calculate the continuous The degree of coincidence of the coordinate positions of the bounding box of the face in the video image. The continuity of video clips is preliminarily judged by the coincidence of the bounding box, and the clips with tracking value are screened. The higher the bounding box overlap, the higher the probability of video continuity, and the higher the probability of successful tracking. The coincidence degree of the bounding box of the face region can be calculated by using the "intersection ratio" of the face regions in two consecutive frames, as shown in Figure 4.
对于上述步骤5,基于步骤4的筛选结果,以每个片段第一帧中被识别成功的人脸bounding box区域作为起始跟踪区域,初始化目标跟踪器。由于人脸检测方法检测到的人脸区域只包含人脸面部特征,并不包含人物的发型、衣服样式、颜色、背景等信息,如果仅对人脸区域进行跟踪,准确率不高,跟踪结果容易出现“漂移”导致跟踪失败。本提案使用放大的人脸检测框bounding box初始化跟踪器,放大后的bounding box区域中引入了人脸周边的图像信息,增加了跟踪区域的可判别性,有助于提升跟踪结果的准确性和鲁棒性。For the above step 5, based on the screening results of step 4, the target tracker is initialized with the successfully recognized face bounding box area in the first frame of each segment as the starting tracking area. Since the face area detected by the face detection method only contains facial features, and does not contain information such as hairstyle, clothing style, color, background, etc., if only the face area is tracked, the accuracy rate is not high, and the tracking results It is prone to "drift" leading to tracking failure. This proposal uses the enlarged bounding box of the face detection frame to initialize the tracker. The enlarged bounding box area introduces image information around the face, which increases the discriminability of the tracking area and helps to improve the accuracy of the tracking results. robustness.
本实施例通过人脸识别区域的面积重合度对初始视频片段进行进一步缩短,提取了初始视频片段连续性较好的图像形成目标视频片段,提高了对目标视频片段进行图像跟踪算法的有效性,提高了识别效率和识别成功率。In this embodiment, the initial video segment is further shortened by the area coincidence of the face recognition area, and images with better continuity of the initial video segment are extracted to form the target video segment, which improves the effectiveness of the image tracking algorithm for the target video segment. The recognition efficiency and recognition success rate are improved.
进一步地,在上述各实施例的基础上,还包括:Further, on the basis of the above embodiments, it also includes:
通过人脸检测确定视频的各帧图像中包含的人脸识别区域,对每一人脸识别区域,通过人脸识别确定所述人脸识别区域与数据库中各参考人脸图像的相似度,根据所述人脸识别区域与数据库中各参考人脸图像的相似度确定与所述人脸识别区域对应的身份信息;Determine the face recognition area included in each frame image of the video through face detection, and for each face recognition area, determine the similarity between the face recognition area and each reference face image in the database through face recognition. The similarity between the face recognition area and each reference face image in the database determines the identity information corresponding to the face recognition area;
其中,所述数据库中包括身份信息与参考人脸图像的对应关系。Wherein, the database includes the correspondence between the identity information and the reference face image.
其中,还包括:通过人脸检测确定每一人脸识别区域在图像中的位置信息,其中,位置信息包括人脸识别区域在图像中的位置和人脸区域的尺寸。The method further includes: determining the position information of each face recognition area in the image through face detection, wherein the position information includes the position of the face recognition area in the image and the size of the face area.
其中,通过人脸检测确定每一人脸识别区域在图像中的位置信息,包括:Among them, the position information of each face recognition area in the image is determined by face detection, including:
通过人脸检测检测出所述视频的各帧图像中的人脸识别区域,保存每一人脸识别区域的四元组信息;其中,四元组信息(x,y,w,h)中,x、y表示人脸识别区域相对于图像左上角的坐标位置,w、h表示人脸识别区域的宽度和高度。The face recognition area in each frame image of the video is detected through face detection, and the quadruple information of each face recognition area is saved; wherein, in the quadruple information (x, y, w, h), x , y represent the coordinate position of the face recognition area relative to the upper left corner of the image, w, h represent the width and height of the face recognition area.
具体地,对于上述步骤1,人脸检测算法可以检测出图像中的人脸区域、人脸区域的bounding box坐标、人脸五官位置坐标信息。对视频中的所有图像帧进行人脸检测,得到所有人脸区域bounding box的位置信息。bounding box使用四元组(x,y,w,h)保存。Specifically, for the above step 1, the face detection algorithm can detect the face area in the image, the bounding box coordinates of the face area, and the position coordinate information of the facial features of the face. Perform face detection on all image frames in the video to obtain the position information of the bounding box of all face regions. The bounding box is stored as a quad (x,y,w,h).
其中,根据所述人脸识别区域与数据库中各参考人脸图像的相似度,确定与所述人脸识别区域对应的身份信息,包括:对每一人脸识别区域,将所述人脸识别区域转换为固定长度的特征向量,根据转换的特征向量和数据库中保存的特征向量的确定所述人脸识别区域与数据库中各参考人脸图像的相似度,将大于人脸识别阈值中的最大相似度对应的参考人脸图像的身份信息作为所述人脸识别区域对应的身份信息。Wherein, according to the similarity between the face recognition area and each reference face image in the database, determining the identity information corresponding to the face recognition area includes: for each face recognition area, assigning the face recognition area to the Convert into a fixed-length feature vector, and determine the similarity between the face recognition area and each reference face image in the database according to the converted feature vector and the feature vector stored in the database, which will be greater than the maximum similarity in the face recognition threshold. The identity information of the reference face image corresponding to the degree is used as the identity information corresponding to the face recognition area.
其中,根据转换的特征向量和数据库中保存的特征向量的确定所述人脸识别区域与数据库中各参考人脸图像的相似度,包括:Wherein, the similarity between the face recognition area and each reference face image in the database is determined according to the converted feature vector and the feature vector saved in the database, including:
通过公式by formula
计算相似度;Calculate similarity;
其中,xi和xj分别表示两个512维的人脸特征向量,dist∈[0,+∞]表示两个特征向量之间的欧氏距离。如果dist小于人脸识别阈值,则xi和xj具有相同的人物身份,人脸识别成功。否则,人脸识别失败。Among them, x i and x j represent two 512-dimensional face feature vectors, respectively, and dist∈[0,+∞] represents the Euclidean distance between the two feature vectors. If dist is less than the face recognition threshold, then x i and x j have the same person identity, and face recognition is successful. Otherwise, face recognition fails.
具体地,对于上述步骤2,基于步骤1的人脸检测结果,将检测到的人脸图像转化为固定长度的特征向量(如512维向量),计算检测到的特征向量和数据库中保存的特征向量的最佳匹配,确定人脸身份归属.Specifically, for the above step 2, based on the face detection result of step 1, the detected face image is converted into a fixed-length feature vector (such as a 512-dimensional vector), and the detected feature vector and the features saved in the database are calculated. The best match of the vector to determine the identity of the face.
本实施例通过人脸识别实现了对视频中出现的易识别图像的身份识别。This embodiment realizes the identification of easily identifiable images appearing in the video through face recognition.
对于上述步骤7,基于上述步骤1至步骤6,最终输出的对视频中人物的识别结果中,不仅实现了对容易识别的人脸图像的识别,还实现了对受环境或镜头影响不容易识别的人物进行识别,提高了对视频中人物的识别成功率。For the above-mentioned step 7, based on the above-mentioned steps 1 to 6, in the final output recognition result of the person in the video, not only the easy-to-recognize face image is recognized, but also the difficult-to-recognize face image affected by the environment or the lens is realized. It can improve the recognition success rate of the characters in the video.
由此,本申请结合人脸检测、人脸识别和目标跟踪方法,提升视频中人物的识别率。对于视频中易于识别的部分采用人脸识别方法完成,不容易识别的部分采用目标跟踪方法完成。通过引入目标跟踪方法,提升人脸在遮挡、光照、侧脸、低头、表情夸张、远景镜头等场景下的识别率。此外,本申请基于人脸检测和人脸识别结果,对视频片段进行拆分,通过计算人脸区域的重合度IoU筛选具有潜在跟踪价值的视频片段。并且,在初始化目标跟踪器时,对人脸区域bounding box进行缩放,通过缩放引入人物的发型、衣服样式、颜色、背景等图像信息,提升跟踪区域的判别能力,提升跟踪结果的准确性和鲁棒性。Therefore, the present application combines face detection, face recognition and target tracking methods to improve the recognition rate of characters in videos. The easy-to-recognize part of the video is completed by face recognition method, and the difficult-to-recognize part is completed by target tracking method. By introducing the target tracking method, the recognition rate of faces in scenes such as occlusion, lighting, profile, bowing, exaggerated expressions, and long-range shots can be improved. In addition, the present application splits video clips based on the results of face detection and face recognition, and selects video clips with potential tracking value by calculating the overlap degree IoU of the face region. In addition, when initializing the target tracker, the bounding box of the face area is zoomed, and image information such as hairstyle, clothing style, color, background, etc. of the character is introduced through zooming, so as to improve the discrimination ability of the tracking area, and improve the accuracy and robustness of the tracking results. Awesomeness.
图5为本实施例提供的人物图像识别装置的结构框图,参见图5,该装置包括获取模块501和识别模块502,其中,FIG. 5 is a structural block diagram of a person image recognition device provided in this embodiment. Referring to FIG. 5, the device includes an acquisition module 501 and a recognition module 502, wherein,
获取模块501,用于对通过人脸识别从视频的各帧图像中识别的任一身份信息,获取所述视频中通过人脸识别未识别出所述身份信息的第一图像;The acquisition module 501 is used to acquire the first image in the video whose identity information is not identified by the facial recognition for any identity information identified from each frame of images of the video through face recognition;
识别模块502,用于根据第二图像中与所述身份信息对应的人脸识别区域,通过图像跟踪算法从第一图像中识别出与所述身份信息对应的目标图像区域;The identification module 502 is used to identify the target image area corresponding to the identity information from the first image through an image tracking algorithm according to the face recognition area corresponding to the identity information in the second image;
其中,第二图像为所述视频中通过人脸识别成功识别出所述身份信息的图像。Wherein, the second image is an image in the video in which the identity information is successfully identified through face recognition.
本实施例提供的人物图像识别装置适用于上述实施例提供的人物图像识别方法,在此不再赘述。The person image recognition device provided in this embodiment is applicable to the person image recognition method provided by the above-mentioned embodiments, and details are not described herein again.
本实施例提供的一种人物图像识别装置,通过人脸识别从视频中识别出身份信息后,对于每一身份信息,通过图像跟踪算法对通过人脸识别未识别出该身份信息的第一图像进行重新识别,通过图像跟踪算法对人脸识别的结果进行补充。人脸识别和图像跟踪算法的结合提高了对该身份信息的人物图像识别的成功率,降低漏识别的情况。In a person image recognition device provided in this embodiment, after identifying identity information from a video through face recognition, for each identity information, an image tracking algorithm is used to identify the first image whose identity information is not identified through face recognition. Re-identification is performed, and the results of face recognition are supplemented by the image tracking algorithm. The combination of face recognition and image tracking algorithm improves the success rate of person image recognition of the identity information and reduces the situation of missed recognition.
进一步地,在上述实施例的基础上,所述识别区域还用于:Further, on the basis of the above embodiment, the identification area is also used for:
获取由视频播放时间连续的第一图像组成的任一初始视频片段,作为待识别视频片段,获取视频播放时间在所述待识别视频片段的首帧播放时间之前,且与所述首帧播放时间间隔预设时长的第二图像,作为跟踪图像;Obtain any initial video clip consisting of first images with continuous video playback time, as the video clip to be identified, obtain the video playback time before the first frame playback time of the to-be-recognized video clip, and the same as the first frame playback time. The second image with a preset time interval is used as a tracking image;
以所述跟踪图像中与所述身份信息对应的人脸识别区域作为目标跟踪区域,根据所述目标跟踪区域,通过图像跟踪算法从所述待识别视频片段中识别出所述目标图像区域。The face recognition area corresponding to the identity information in the tracking image is used as the target tracking area, and according to the target tracking area, the target image area is identified from the to-be-recognized video segment through an image tracking algorithm.
进一步地,在上述各实施例的基础上,Further, on the basis of the above-mentioned embodiments,
所述获取由视频播放时间连续的第一图像组成的任一初始视频片段,作为待识别视频片段,包括:The acquisition of any initial video clip composed of first images with continuous video playback time, as the video clip to be identified, includes:
对由视频播放时间连续的第一图像组成的任一初始视频片段,根据所述目标跟踪区域在所述跟踪图像中的位置信息,以及所述初始视频片段中在各第一图像中出现的人脸识别区域的位置信息,从所述初始视频片段中确定疑似包含与所述身份信息对应的图像区域的第一图像;For any initial video segment consisting of first images with continuous video playback time, according to the position information of the target tracking area in the tracking image, and the person appearing in each first image in the initial video segment The position information of the face recognition area, the first image that is suspected to contain the image area corresponding to the identity information is determined from the initial video clip;
将由疑似包含与所述身份信息对应的图像区域的第一图像组成的视频片段,作为所述待识别视频片段。A video segment consisting of a first image suspected of containing an image area corresponding to the identity information is used as the to-be-identified video segment.
进一步地,在上述各实施例的基础上,所述根据第二图像中与所述身份信息对应的人脸识别区域,通过图像跟踪算法从第一图像中识别出与所述身份信息对应的目标图像区域,包括:Further, on the basis of the above embodiments, the target corresponding to the identity information is identified from the first image through an image tracking algorithm according to the face recognition area corresponding to the identity information in the second image. Image area, including:
根据第二图像中与所述身份信息对应的人脸识别区域进行放大后形成的人物识别区域,通过图像跟踪算法从第一图像中识别出与所述身份信息对应的目标图像区域。According to the person recognition area formed by enlarging the face recognition area corresponding to the identity information in the second image, the target image area corresponding to the identity information is identified from the first image through an image tracking algorithm.
进一步地,在上述各实施例的基础上,Further, on the basis of the above-mentioned embodiments,
所述根据所述目标跟踪区域,通过图像跟踪算法从所述初始视频片段中识别出所述目标图像区域,包括:The identifying the target image area from the initial video segment by using an image tracking algorithm according to the target tracking area includes:
确定由所述跟踪图像和所述待识别视频片段组成,且以所述跟踪图像作为首帧图像的目标视频片段;Determine the target video segment consisting of the tracking image and the to-be-identified video segment, and using the tracking image as the first frame image;
将所述目标视频片段,以及在所述目标视频片段的首帧图像中所述目标跟踪区域的位置信息输入到预先训练的跟踪网络模型中,由所述跟踪网络模型输出所述目标视频片段中的每一第一图像所包含的各图像区域与所述目标跟踪区域的相似度;Input the target video clip and the position information of the target tracking area in the first frame image of the target video clip into a pre-trained tracking network model, and output the target video clip from the tracking network model. The similarity between each image area included in each first image and the target tracking area;
根据所述跟踪网络模型对每一第一图像输出的相似度,从所述目标视频片段中识别出所述目标图像区域;Identifying the target image region from the target video segment according to the similarity of each first image output by the tracking network model;
其中,跟踪网络模型为,根据样本视频的首帧图像中标记的样本人物跟踪区域的位置信息和所述样本视频中标记的与所述样本人物跟踪区域表示同一人物的图像区域,通过机器学习进行训练的,用于根据输入的视频和视频的首帧图像中人物跟踪区域的位置信息,从视频中识别出与人物跟踪区域表示同一人物的图像区域的模型。The tracking network model is based on the location information of the sample person tracking area marked in the first frame image of the sample video and the image area marked in the sample video that represents the same person as the sample person tracking area, through machine learning. A model trained to identify an image area representing the same person as the person tracking area from the video based on the input video and the position information of the person tracking area in the first frame of the video.
进一步地,在上述各实施例的基础上,所述根据所述目标跟踪区域在所述跟踪图像中的位置信息,以及所述初始视频片段中在各第一图像中出现的人脸识别区域的位置信息,从所述初始视频片段中确定疑似包含与所述身份信息对应的图像区域的第一图像,包括:Further, on the basis of the above-mentioned embodiments, the position information of the target tracking area in the tracking image and the position information of the face recognition area appearing in each first image in the initial video segment are performed. Location information, determining from the initial video clip a first image suspected of containing an image area corresponding to the identity information, including:
自所述初始视频片段的首帧第一图像开始,循环执行检测操作,直到检测到不包含与所述身份信息对应的图像区域的第一图像或者对所述初始视频片段的最后一帧第一图像执行了所述检测操作;Starting from the first image of the first frame of the initial video segment, the detection operation is performed cyclically until the first image that does not contain the image area corresponding to the identity information is detected or the first image of the last frame of the initial video segment is detected. the image performs the detection operation;
其中,检测操作包括:Among them, the detection operation includes:
首次执行时,根据所述目标跟踪区域在所述跟踪图像中的位置信息,和所述初始视频片段的首帧第一图像中各人脸识别区域的位置信息,确定所述初始视频片段的首帧第一图像中各人脸识别区域与所述目标跟踪区域的第一面积重合度,若存在大于重合度阈值的第一面积重合度,则所述初始视频片段的首帧第一图像中疑似包含与所述身份信息对应的图像区域,并将最大的第一面积重合度对应的人脸识别区域作为所述初始视频片段的首帧第一图像中疑似包含的与所述身份信息对应的疑似图像区域,否则,所述初始视频片段的首帧第一图像中不包含与所述身份信息对应的图像区域;When executed for the first time, according to the position information of the target tracking area in the tracking image, and the position information of each face recognition area in the first image of the first frame of the initial video clip, determine the first position of the initial video clip. The first area coincidence of each face recognition area and the target tracking area in the first image of the frame, if there is a first area coincidence greater than the coincidence threshold, then the first frame of the initial video segment Including the image area corresponding to the identity information, and using the face recognition area corresponding to the maximum first area coincidence as the suspected identity information corresponding to the first frame of the first image of the initial video segment. image area, otherwise, the first image of the first frame of the initial video clip does not contain the image area corresponding to the identity information;
非首次执行时,获取前一次执行检测操作确定的疑似图像区域,根据疑似图像区域在前一次执行检测操的第一图像中的位置信息,和当前执行检测操作的第一图像中各人脸识别区域的位置信息,确定当前执行检测操作的第一图像中各人脸识别区域与疑似图像区域的第二面积重合度,若存在大于所述重合度阈值的第二面积重合度,则当前执行检测操作的第一图像中疑似包含与所述身份信息对应的图像区域,并将最大的第二面积重合度对应的人脸识别区域作为当前执行检测操作的第一图像中疑似包含的与所述身份信息对应的疑似图像区域,否则,当前执行检测操作的第一图像中不包含与所述身份信息对应的图像区域;When it is not executed for the first time, the suspected image area determined by the previous detection operation is obtained, and the position information of the suspected image area in the first image of the previous detection operation is performed, and each face recognition in the first image of the current detection operation is performed. The location information of the area, to determine the second area coincidence degree of each face recognition area and the suspected image area in the first image currently performing the detection operation, if there is a second area coincidence degree greater than the threshold value of the coincidence degree, then the current detection operation is performed. The first image of the operation is suspected to contain the image area corresponding to the identity information, and the face recognition area corresponding to the largest second area coincidence is used as the first image that is currently performing the detection operation. The suspected image area corresponding to the information, otherwise, the first image currently performing the detection operation does not contain the image area corresponding to the identity information;
其中,面积重合度根据两个区域重合部分所占的第一面积,以及所述两个区域非重合部分所占的第二面积和所述第一面积之和确定。The degree of area overlap is determined according to the first area occupied by the overlapping portion of the two regions, and the sum of the second area and the first area occupied by the non-overlapping portion of the two regions.
进一步地,在上述各实施例的基础上,还包括:Further, on the basis of the above embodiments, it also includes:
通过人脸检测确定视频的各帧图像中包含的人脸识别区域,对每一人脸识别区域,通过人脸识别确定所述人脸识别区域与数据库中各参考人脸图像的相似度,根据所述人脸识别区域与数据库中各参考人脸图像的相似度确定与所述人脸识别区域对应的身份信息;Determine the face recognition area included in each frame image of the video through face detection, and for each face recognition area, determine the similarity between the face recognition area and each reference face image in the database through face recognition. The similarity between the face recognition area and each reference face image in the database determines the identity information corresponding to the face recognition area;
其中,所述数据库中包括身份信息与参考人脸图像的对应关系。Wherein, the database includes the correspondence between the identity information and the reference face image.
图6示例了一种电子设备的实体结构示意图,如图6所示,该电子设备可以包括:处理器(processor)601、通信接口(Communications Interface)602、存储器(memory)603和通信总线604,其中,处理器601,通信接口602,存储器603通过通信总线604完成相互间的通信。处理器601可以调用存储器603中的逻辑指令,以执行如下方法:对通过人脸识别从视频的各帧图像中识别的任一身份信息,获取所述视频中通过人脸识别未识别出所述身份信息的第一图像;根据第二图像中与所述身份信息对应的人脸识别区域,通过图像跟踪算法从第一图像中识别出与所述身份信息对应的目标图像区域;其中,第二图像为所述视频中通过人脸识别成功识别出所述身份信息的图像。FIG. 6 illustrates a schematic diagram of the physical structure of an electronic device. As shown in FIG. 6 , the electronic device may include: a processor (processor) 601, a communication interface (Communications Interface) 602, a memory (memory) 603 and a communication bus 604, The processor 601 , the communication interface 602 , and the memory 603 communicate with each other through the communication bus 604 . The processor 601 can call the logic instruction in the memory 603 to execute the following method: for any identity information identified from each frame of the video through face The first image of the identity information; according to the face recognition area corresponding to the identity information in the second image, the target image area corresponding to the identity information is identified from the first image through an image tracking algorithm; wherein the second The image is an image in the video in which the identity information is successfully recognized through face recognition.
此外,上述的存储器603中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logic instructions in the memory 603 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on such understanding, the technical solution of the present invention can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, removable hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .
进一步地,本发明实施例公开一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,计算机能够执行上述各方法实施例所提供的方法,例如包括:对通过人脸识别从视频的各帧图像中识别的任一身份信息,获取所述视频中通过人脸识别未识别出所述身份信息的第一图像;根据第二图像中与所述身份信息对应的人脸识别区域,通过图像跟踪算法从第一图像中识别出与所述身份信息对应的目标图像区域;其中,第二图像为所述视频中通过人脸识别成功识别出所述身份信息的图像。Further, an embodiment of the present invention discloses a computer program product, the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer When executing, the computer can execute the methods provided by the above-mentioned method embodiments, for example, including: for any identity information identified from each frame of images of the video through face recognition, obtain the information that is not identified through face recognition in the video. the first image of the identity information; according to the face recognition area corresponding to the identity information in the second image, identify the target image area corresponding to the identity information from the first image through an image tracking algorithm; wherein, The second image is an image in the video in which the identity information is successfully recognized through face recognition.
另一方面,本发明实施例还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各实施例提供的传输方法,例如包括:对通过人脸识别从视频的各帧图像中识别的任一身份信息,获取所述视频中通过人脸识别未识别出所述身份信息的第一图像;根据第二图像中与所述身份信息对应的人脸识别区域,通过图像跟踪算法从第一图像中识别出与所述身份信息对应的目标图像区域;其中,第二图像为所述视频中通过人脸识别成功识别出所述身份信息的图像。On the other hand, an embodiment of the present invention further provides a non-transitory computer-readable storage medium on which a computer program is stored, and the computer program is implemented by a processor to execute the transmission method provided by the above embodiments, for example, including : for any identity information identified from each frame of the video through face recognition, obtain the first image in the video for which the identity information is not identified through face recognition; The face recognition area corresponding to the information, and the target image area corresponding to the identity information is identified from the first image through the image tracking algorithm; wherein, the second image is the video in which the identity was successfully identified through face recognition. Information image.
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on this understanding, the above-mentioned technical solutions can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic A disc, an optical disc, etc., includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments or some parts of the embodiments.
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.