patents.google.com

CN112084911A - Human face feature point positioning method and system based on global attention - Google Patents

  • ️Tue Dec 15 2020

CN112084911A - Human face feature point positioning method and system based on global attention - Google Patents

Human face feature point positioning method and system based on global attention Download PDF

Info

Publication number
CN112084911A
CN112084911A CN202010886980.5A CN202010886980A CN112084911A CN 112084911 A CN112084911 A CN 112084911A CN 202010886980 A CN202010886980 A CN 202010886980A CN 112084911 A CN112084911 A CN 112084911A Authority
CN
China
Prior art keywords
layer
human face
feature point
face
global
Prior art date
2020-08-28
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010886980.5A
Other languages
Chinese (zh)
Other versions
CN112084911B (en
Inventor
张卡
何佳
戴亮亮
尼秀明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Qingxin Internet Information Technology Co ltd
Original Assignee
Anhui Qingxin Internet Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
2020-08-28
Filing date
2020-08-28
Publication date
2020-12-15
2020-08-28 Application filed by Anhui Qingxin Internet Information Technology Co ltd filed Critical Anhui Qingxin Internet Information Technology Co ltd
2020-08-28 Priority to CN202010886980.5A priority Critical patent/CN112084911B/en
2020-12-15 Publication of CN112084911A publication Critical patent/CN112084911A/en
2023-03-07 Application granted granted Critical
2023-03-07 Publication of CN112084911B publication Critical patent/CN112084911B/en
Status Active legal-status Critical Current
2040-08-28 Anticipated expiration legal-status Critical

Links

  • 238000000034 method Methods 0.000 title claims abstract description 32
  • 238000003062 neural network model Methods 0.000 claims abstract description 36
  • 230000004927 fusion Effects 0.000 claims abstract description 16
  • 230000007246 mechanism Effects 0.000 claims abstract description 9
  • 230000006870 function Effects 0.000 claims description 25
  • 238000012549 training Methods 0.000 claims description 21
  • 238000010586 diagram Methods 0.000 claims description 18
  • 238000004422 calculation algorithm Methods 0.000 claims description 16
  • 238000011176 pooling Methods 0.000 claims description 12
  • 230000004913 activation Effects 0.000 claims description 7
  • 238000001514 detection method Methods 0.000 claims description 7
  • 238000002372 labelling Methods 0.000 claims description 6
  • 238000007792 addition Methods 0.000 claims description 5
  • 102100037642 Elongation factor G, mitochondrial Human genes 0.000 claims description 4
  • 101000880344 Homo sapiens Elongation factor G, mitochondrial Proteins 0.000 claims description 4
  • 238000004364 calculation method Methods 0.000 claims description 4
  • 230000008569 process Effects 0.000 claims description 4
  • 101000719024 Homo sapiens Ribosome-releasing factor 2, mitochondrial Proteins 0.000 claims description 3
  • 108010001267 Protein Subunits Proteins 0.000 claims description 3
  • 102100025784 Ribosome-releasing factor 2, mitochondrial Human genes 0.000 claims description 3
  • 238000006243 chemical reaction Methods 0.000 claims description 3
  • 238000013461 design Methods 0.000 claims description 3
  • 238000009826 distribution Methods 0.000 claims description 3
  • 239000011159 matrix material Substances 0.000 claims description 3
  • 230000009467 reduction Effects 0.000 claims description 3
  • 238000005516 engineering process Methods 0.000 abstract description 8
  • 238000013135 deep learning Methods 0.000 abstract description 4
  • 238000004590 computer program Methods 0.000 description 8
  • 230000001815 facial effect Effects 0.000 description 4
  • 238000012545 processing Methods 0.000 description 4
  • 238000003860 storage Methods 0.000 description 4
  • 238000013527 convolutional neural network Methods 0.000 description 3
  • 210000000056 organ Anatomy 0.000 description 3
  • 238000005457 optimization Methods 0.000 description 2
  • 238000004458 analytical method Methods 0.000 description 1
  • 238000013528 artificial neural network Methods 0.000 description 1
  • 230000009286 beneficial effect Effects 0.000 description 1
  • 210000004556 brain Anatomy 0.000 description 1
  • 238000010276 construction Methods 0.000 description 1
  • 238000000605 extraction Methods 0.000 description 1
  • 230000002349 favourable effect Effects 0.000 description 1
  • 238000004519 manufacturing process Methods 0.000 description 1
  • 238000012986 modification Methods 0.000 description 1
  • 230000004048 modification Effects 0.000 description 1
  • 238000010606 normalization Methods 0.000 description 1
  • 230000003287 optical effect Effects 0.000 description 1
  • 238000011160 research Methods 0.000 description 1
  • 238000007619 statistical method Methods 0.000 description 1
  • 238000006467 substitution reaction Methods 0.000 description 1

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a human face feature point positioning method and a system based on global attention, which comprises the following steps: acquiring a local image of a human face; inputting the acquired image into a human face feature point positioning model based on global attention trained in advance, and directly outputting the position of a human face feature point after the forward operation of the human face feature point positioning model based on the global attention for the input human face local image; the method and the system for positioning the face feature points based on the global attention acquire the fusion characteristics of the face image with the global semantic information and the local semantic information by utilizing a residual error network mechanism and a global attention fusion mechanism based on the deep learning technology, so that a deep neural network model can take the global information and the local information of the face image into account, the position of the face feature points is accurately calculated, the face feature points are positioned more accurately, and the robustness is higher.

Description

Human face feature point positioning method and system based on global attention

Technical Field

The invention relates to the technical field of face recognition, in particular to a face feature point positioning method and system based on global attention.

Background

The positioning of the human face feature points refers to the positioning of key feature point positions of the face on a human face image through a machine vision technology, wherein the key feature points comprise organ positions such as a mouth corner, an eye corner and a nose tip and positions such as a face contour. The positioning of the face feature points is a technical basis of the application fields of a face recognition system, an expression recognition system, a face attribute analysis system and the like, and the reliability and the accuracy of subsequent work can be directly influenced by the quality of the positioning of the face feature points.

In recent 20 years, the face feature point positioning algorithm is always a research hotspot in the field of machine vision, and a plurality of classical algorithms emerge, and the specific algorithms can be divided into the following categories:

(1) a face feature point positioning algorithm based on the traditional technology is mainly based on a face statistical shape model method and a cascade regression method, such as a classical algorithm: ASM, AAM, SDM, LBF, and the like. The algorithm is characterized in that the geometric position relation of human face organs is utilized, a statistical method and a cascade optimization method are adopted to obtain the final position of the human face characteristic points, the expression capability of the algorithm for extracting the human face characteristics is limited, the shape constraint between the human face characteristic points is not considered, and the positioning accuracy error of the characteristic points of the algorithm is large.

(2) In recent years, Deep Learning technology can simulate a human brain neural network, accurate nonlinear prediction can be performed, various fields are widely concerned and applied, and a group of classical human face feature point positioning network frameworks such as MDM (Mnemunic surface method), PFLD (PFLD: A Practical Facial Landmark Detector), TCDCN (Facial Landmark Detection by Deep Multi-task Learning) and the like appear. The algorithm is characterized in that a convolutional neural network model is used for capturing deep semantic features of the human face, and the final positions of the feature points of the human face are obtained by using the deep semantic features, or based on a multi-branch task training mode, or based on a plurality of cascaded neural network models for iterative optimization training. Compared with a human face feature point positioning algorithm of the traditional technology, the human face feature point positioning accuracy is greatly improved, but local semantic information of a human face is mainly used for positioning the feature points, and the local semantic information cannot comprehensively utilize the overall geometric information of human face organs, so that certain errors exist in positioning of the human face feature points.

Disclosure of Invention

The invention provides a human face feature point positioning method and system based on global attention, which can solve the problems in the background technology.

In order to achieve the purpose, the invention adopts the following technical scheme:

a face feature point positioning method based on global attention comprises the following steps:

acquiring a local image of a human face;

inputting the acquired image into a human face feature point positioning model based on global attention trained in advance, and directly outputting the position of a human face feature point after the forward operation of the human face feature point positioning model based on the global attention for the input human face local image;

wherein,

the network structure of the human face feature point positioning model based on global attention comprises:

the conv0 layer is a convolutional layer with a core size of 7 × 7 and a span of 2 × 2;

the maxpool0 layer is a maximum pooling layer with a kernel size of 2 × 2 and a span of 2 × 2;

the conv0 layer and the maxpool0 layer jointly form a feature map resolution rapid reduction network;

resblock0, resblock1, resblock2 and resblock3 are resblock residual modules of a resnet network;

GFM0, GFM1, GFM2, GFM3 are all global attention fusion modules;

the ave-pool layer is a global mean pooling layer; the fc layer is a fully connected layer with 2xN dimension output characteristic, and N represents the number of human face characteristic points.

Further, the specific network structure of the resblock residual module includes:

the rconv2 layer is a convolutional layer with a core size of 1x1 and a span of 2x2, the rconv0 layer is a convolutional layer with a core size of 3x3 and a span of 2x2, the rconv1 layer, the rconv3 layer and the rconv4 layer are convolutional layers with a core size of 3x3 and a span of 1x1, and the eltsum0 layer and the eltsum1 layer are merging layers and are used for merging a plurality of input feature maps into an output feature map according to corresponding element additions.

Further, the specific network structure of the global attention fusion module includes:

gfmconv0, gfmconv1 and gfmconv2 are convolution layers with the kernel size of 1 × 1 and the span of 1 × 1, and reshape0, reshape1, reshape2 and reshape3 are feature size conversion layers and are used for adjusting the size of an input feature to meet the requirement of subsequent feature layer operation;

globalavodiol 0 is a global mean pooling layer based on the feature map channel dimension, globalmaxpool0 is a global maximum pooling layer based on the feature map channel dimension; splicing the output characteristic diagram of the globavapool 0 layer and the output characteristic diagram of the globalmaxpool0 layer according to the channel dimension; gfmconv is a convolution layer with a kernel size of 7 × 7 and a span of 1 × 1, and is used for extracting importance degree weights of each pixel position on the input feature map;

the sigmod layer is an activation function of the sigmod type; the scale layer is a pixel weighting layer, and is used for weighting the input feature graph one by one according to pixel positions, wherein the weighting calculation process is as shown in formula (1), and a space attention mechanism module is formed by globavapoool 0, globalmaxpool0, gfmconv, sigmod and scale; the softmax layer is used for performing softmax type activation operation according to the 2 nd dimension of the input feature map so as to obtain a probability distribution value of the input feature vector;

matmul0 and matmul1 are both feature map multiplication operation layers and follow a general matrix multiplication rule; matsum is a feature map addition operation layer which is used for adding and combining two input feature maps into an output feature map according to corresponding elements;

Oc(x,y)=w(x,y)*Ic(x,y) (1)

wherein, Oc(x, y) represents a numerical value at the (x, y) th channel of the output weighted feature map, w (x, y) represents an importance level weight value at the (x, y) position of the input feature map, Ic(x, y) represents the value at the (x, y) th position of the c-th channel of the input feature map.

Further, the training step of the human face feature point location model based on global attention is as follows:

s21, acquiring training sample images, namely collecting face images under various scenes, various light rays and various angles, acquiring a local area image of each face through the existing face detection algorithm, then labeling the positions of N feature points on each face local image, and recording the position information of the feature points;

s22, designing a target loss function of the deep neural network model;

and S23, training the deep neural network model, namely, sending the labeled face sample image set into the well-defined deep neural network model, and learning related model parameters.

Further, the target loss function adopts a mean square error loss function.

In another aspect, the present invention provides a system for locating facial feature points based on global attention, including the following units:

the image acquisition unit is used for acquiring a local image of a human face;

and the face feature point positioning unit is used for inputting the acquired image into a pre-trained face feature point positioning model based on global attention, and directly outputting the position of the face feature point after the forward operation of the face feature point positioning model based on the global attention on the input face local image.

Further, the device also comprises the following sub-units,

the training sample acquisition unit is used for acquiring training sample images, namely collecting face images under various scenes, various light rays and various angles, acquiring a local area image of each face through the existing face detection algorithm, then labeling the positions of N characteristic points on each face local image, and recording the position information of the characteristic points;

the target loss function design unit is used for designing a target loss function of the deep neural network model;

and the deep neural network model training unit is used for sending the labeled human face sample image set into the well-defined deep neural network model and learning related model parameters.

According to the technical scheme, the human face feature point positioning method and system based on the global attention acquire the fusion feature of the human face image with the global semantic information and the local semantic information by utilizing a residual error network mechanism and a global attention fusion mechanism based on the deep learning technology, so that the deep neural network model can give consideration to the global information and the local information of the human face image, the position of the human face feature point is accurately calculated, the positioning of the human face feature point is more accurate, and the robustness is higher.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a model building flow diagram of the present invention;

figure 3 is a diagram of a deep neural network model architecture,

FIG. 4 is a block diagram of a resblock residual module;

FIG. 5 is a block diagram of a global attention fusion module;

wherein the alphanumeric next to each module graphic represents the output feature map size of the current module, namely: the feature map height x feature map width x number of feature map channels.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention.

As shown in fig. 1, the method for positioning human face feature points based on global attention in this embodiment includes:

this embodiment is performed on the premise that a single face partial image has been acquired.

A method for positioning human face feature points based on global attention is disclosed, as shown in FIG. 1, and includes the following steps:

acquiring a local image of a human face;

inputting the acquired image into a human face feature point positioning model based on global attention trained in advance, and directly outputting the position of a human face feature point after the forward operation of the human face feature point positioning model based on the global attention for the input human face local image;

the following is a detailed description:

as shown in fig. 2: the construction steps of the human face feature point positioning model trained in advance based on the global attention are as follows:

s1, designing a deep neural network model, wherein the deep neural network model designed by the invention mainly has the main function of fusing local semantic information of a face image and global semantic information of the face image by means of a well-designed deep neural network model, extracting fusion characteristics of the face image and the global semantic information, and accurately calculating the position of a face feature point. The present invention uses a Convolutional Neural Network (CNN), which defines some terms for convenience of describing the present invention: feature resolution refers to feature height x feature width, feature size refers to feature height x feature width x number of feature channels, kernel size refers to kernel width x kernel height, and span refers to width span x height span, and each convolutional layer is followed by a bulk normalization layer and a nonlinear activation layer. The method specifically comprises the following steps of designing a deep neural network model:

s11, designing an input image of the deep neural network model, wherein the input image adopted by the invention is a 3-channel RGB image with the resolution of 224 x 224, and the larger the size of the input image is, the more details are contained, and the more the accurate positioning of the human face characteristic points is facilitated.

S12, designing a main network of the deep neural network model, wherein the main network is mainly used for acquiring local semantic information of the face image under the condition that the deep neural network model can see the global semantic information of the face image, extracting the fusion characteristics of the face image with the global information and the local information, and the extraction quality of the fusion characteristics of the face directly influences the positioning accuracy of the subsequent face characteristic points. As can be seen from step S11, the size of the input image used in the present invention is large, which is not favorable for the fast operation of the deep neural network model, and therefore, an efficient network capable of fast extracting the features of the input face image is required. As shown in fig. 3, the present invention adopts an improved classical resnet network structure as a model main network, wherein, the conv0 layer is a convolutional layer with a core size of 7 × 7 and a span of 2 × 2; the maxpool0 layer is a maximum pooling layer with a kernel size of 2 × 2 and a span of 2 × 2; the conv0 layer and the maxpool0 layer jointly form a feature map resolution rapid reduction network, and the main function is to rapidly reduce the feature map resolution and reduce the calculation amount of subsequent operations while keeping more image details; resblock0, resblock1, resblock2 and resblock3 are resblock residual modules of a resnet network; GFM0, GFM1, GFM2, GFM3 are all Global attention fusion modules (Global Fuse modules); the ave-pool layer is a global mean pooling layer; the fc layer is a fully connected layer with 2xN dimension output characteristic, and N represents the number of human face characteristic points. The specific network structure of the reslock residual module is shown in fig. 4, the rconv2 layer is a convolutional layer with a core size of 1x1 and a span of 2x2, the rconv0 layer is a convolutional layer with a core size of 3x3 and a span of 2x2, the rconv1 layer, the rconv3 layer and the rconv4 layer are convolutional layers with a core size of 3x3 and a span of 1x1, and the eltsum0 layer and the eltsum1 layer are merging layers and are used for merging a plurality of input feature maps into an output feature map according to corresponding elements; the specific network structure of the global attention fusion module GFM is shown in fig. 5, wherein gfmconv0, gfmconv1, gfmconv2 are convolutional layers with a core size of 1 × 1 and a span of 1 × 1, and reshape0, reshape1, reshape2, and reshape3 are feature size conversion layers, and mainly function to adjust the input feature size to meet the requirement of subsequent feature layer operation; globalavodiol 0 is a global mean pooling layer based on the feature map channel dimension, globalmaxpool0 is a global maximum pooling layer based on the feature map channel dimension; splicing the output characteristic diagram of the globavapool 0 layer and the output characteristic diagram of the globalmaxpool0 layer according to the channel dimension; gfmconv is a convolution layer with a kernel size of 7 × 7 and a span of 1 × 1, and is mainly used for extracting importance degree weights of each pixel position on an input feature map; the sigmod layer is an activation function of the sigmod type; the scale layer is a pixel weighting layer, and is used for weighting the input feature graph one by one according to pixel positions, wherein the weighting calculation process is as shown in formula (1), and a space attention mechanism module is formed by globavapoool 0, globalmaxpool0, gfmconv, sigmod and scale; the softmax layer is mainly used for performing softmax type activation operation according to the 2 nd dimension of the input feature map so as to acquire the probability distribution value of the input feature vector. matmul0 and matmul1 are both feature map multiplication operation layers and follow a general matrix multiplication rule; matsum is a feature map addition operation layer, and is mainly used for adding and combining two input feature maps into one output feature map according to corresponding elements.

Oc(x,y)=w(x,y)*Ic(x,y) (1)

Wherein, Oc(x, y) represents a numerical value at the (x, y) th channel of the output weighted feature map, w (x, y) represents an importance level weight value at the (x, y) position of the input feature map, Ic(x, y) represents the value at the (x, y) th position of the c-th channel of the input feature map.

S2, training the deep neural network model, optimizing parameters of the deep neural network model mainly through a large amount of labeled training sample data, so that the deep neural network model can accurately position the positions of the characteristic points of the human face, and the specific steps are as follows:

s21, acquiring training sample images, mainly collecting face images under various scenes, various light rays and various angles, acquiring a local area image of each face through the existing face detection algorithm, then labeling the positions of N feature points on each face local image, and recording the position information of the feature points;

s22, designing a target loss function of the deep neural network model, wherein the target loss function is a Mean Square Error (MSE) loss function.

S23, training a deep neural network model, mainly sending a labeled face sample image set into the well-defined deep neural network model, and learning related model parameters;

and S3, directly outputting the positions of the characteristic points of the human face after forward operation of the depth neural network model for any given local image of the human face by using the depth neural network model.

In summary, the method and system for positioning the face feature points based on the global attention of the present invention obtain the fusion features of the face image with the global semantic information and the local semantic information by using the residual network mechanism and the global attention fusion mechanism based on the deep learning technology, so that the deep neural network model can give consideration to the global information and the local information of the face image, accurately calculate the position of the face feature points, and the face feature points are more accurately positioned and have higher robustness.

In another aspect, the present invention provides a system for locating facial feature points based on global attention, including the following units:

the image acquisition unit is used for acquiring a local image of a human face;

and the face feature point positioning unit is used for inputting the acquired image into a pre-trained face feature point positioning model based on global attention, and directly outputting the position of the face feature point after the forward operation of the face feature point positioning model based on the global attention on the input face local image.

Further, the device also comprises the following sub-units,

the training sample acquisition unit is used for acquiring training sample images, namely collecting face images under various scenes, various light rays and various angles, acquiring a local area image of each face through the existing face detection algorithm, then labeling the positions of N characteristic points on each face local image, and recording the position information of the characteristic points;

the target loss function design unit is used for designing a target loss function of the deep neural network model;

and the deep neural network model training unit is used for sending the labeled human face sample image set into the well-defined deep neural network model and learning related model parameters.

In a third aspect, the present invention also discloses a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method as described above.

It is understood that the system provided by the embodiment of the present invention corresponds to the method provided by the embodiment of the present invention, and the explanation, the example and the beneficial effects of the related contents can refer to the corresponding parts in the method.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (7)

1. A human face feature point positioning method based on global attention is characterized by comprising the following steps:

acquiring a local image of a human face;

inputting the acquired image into a human face feature point positioning model based on global attention trained in advance, and directly outputting the position of a human face feature point after the forward operation of the human face feature point positioning model based on the global attention for the input human face local image;

wherein,

the network structure of the human face feature point positioning model based on global attention comprises:

the conv0 layer is a convolutional layer with a core size of 7 × 7 and a span of 2 × 2;

the maxpool0 layer is a maximum pooling layer with a kernel size of 2 × 2 and a span of 2 × 2;

the conv0 layer and the maxpool0 layer jointly form a feature map resolution rapid reduction network;

resblock0, resblock1, resblock2 and resblock3 are resblock residual modules of a resnet network;

GFM0, GFM1, GFM2, GFM3 are all global attention fusion modules;

the ave-pool layer is a global mean pooling layer; the fc layer is a fully connected layer with 2xN dimension output characteristic, and N represents the number of human face characteristic points.

2. The global attention-based human face feature point positioning method according to claim 1, characterized in that: the specific network structure of the resblock residual module comprises:

the rconv2 layer is a convolutional layer with a core size of 1x1 and a span of 2x2, the rconv0 layer is a convolutional layer with a core size of 3x3 and a span of 2x2, the rconv1 layer, the rconv3 layer and the rconv4 layer are convolutional layers with a core size of 3x3 and a span of 1x1, and the eltsum0 layer and the eltsum1 layer are merging layers and are used for merging a plurality of input feature maps into an output feature map according to corresponding element additions.

3. The global attention-based human face feature point positioning method according to claim 2, wherein: the specific network structure of the global attention fusion module comprises:

gfmconv0, gfmconv1 and gfmconv2 are convolution layers with the kernel size of 1 × 1 and the span of 1 × 1, and reshape0, reshape1, reshape2 and reshape3 are feature size conversion layers and are used for adjusting the size of an input feature to meet the requirement of subsequent feature layer operation;

globalavodiol 0 is a global mean pooling layer based on the feature map channel dimension, globalmaxpool0 is a global maximum pooling layer based on the feature map channel dimension; splicing the output characteristic diagram of the globavapool 0 layer and the output characteristic diagram of the globalmaxpool0 layer according to the channel dimension; gfmconv is a convolution layer with a kernel size of 7 × 7 and a span of 1 × 1, and is used for extracting importance degree weights of each pixel position on the input feature map;

the sigmod layer is an activation function of the sigmod type; the scale layer is a pixel weighting layer, and is used for weighting the input feature graph one by one according to pixel positions, wherein the weighting calculation process is as shown in formula (1), and a space attention mechanism module is formed by globavapoool 0, globalmaxpool0, gfmconv, sigmod and scale; the softmax layer is used for performing softmax type activation operation according to the 2 nd dimension of the input feature map so as to obtain a probability distribution value of the input feature vector;

matmul0 and matmul1 are both feature map multiplication operation layers and follow a general matrix multiplication rule; matsum is a feature map addition operation layer which is used for adding and combining two input feature maps into an output feature map according to corresponding elements;

Oc(x,y)=w(x,y)*Ic(x,y) (1)

wherein, Oc(x, y) represents a numerical value at the (x, y) th channel of the output weighted feature map, w (x, y) represents an importance level weight value at the (x, y) position of the input feature map, Ic(x, y) represents the value at the (x, y) th position of the c-th channel of the input feature map.

4. The global attention-based human face feature point positioning method according to claim 1, characterized in that:

the training steps of the human face feature point positioning model based on the global attention are as follows:

s21, acquiring training sample images, namely collecting face images under various scenes, various light rays and various angles, acquiring a local area image of each face through the existing face detection algorithm, then labeling the positions of N feature points on each face local image, and recording the position information of the feature points;

s22, designing a target loss function of the deep neural network model;

and S23, training the deep neural network model, namely, sending the labeled face sample image set into the well-defined deep neural network model, and learning related model parameters.

5. The global attention-based human face feature point positioning method according to claim 4, wherein: the target loss function adopts a mean square error loss function.

6. A human face feature point positioning system based on global attention is characterized in that: the method comprises the following units: the image acquisition unit is used for acquiring a local image of a human face;

and the face feature point positioning unit is used for inputting the acquired image into a pre-trained face feature point positioning model based on global attention, and directly outputting the position of the face feature point after the forward operation of the face feature point positioning model based on the global attention on the input face local image.

7. The system of claim 6, wherein the human face feature point location system based on global attention comprises: also comprises the following sub-units,

the training sample acquisition unit is used for acquiring training sample images, namely collecting face images under various scenes, various light rays and various angles, acquiring a local area image of each face through the existing face detection algorithm, then labeling the positions of N characteristic points on each face local image, and recording the position information of the characteristic points;

the target loss function design unit is used for designing a target loss function of the deep neural network model;

and the deep neural network model training unit is used for sending the labeled human face sample image set into the well-defined deep neural network model and learning related model parameters.

CN202010886980.5A 2020-08-28 2020-08-28 Human face feature point positioning method and system based on global attention Active CN112084911B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010886980.5A CN112084911B (en) 2020-08-28 2020-08-28 Human face feature point positioning method and system based on global attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010886980.5A CN112084911B (en) 2020-08-28 2020-08-28 Human face feature point positioning method and system based on global attention

Publications (2)

Publication Number Publication Date
CN112084911A true CN112084911A (en) 2020-12-15
CN112084911B CN112084911B (en) 2023-03-07

Family

ID=73728873

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010886980.5A Active CN112084911B (en) 2020-08-28 2020-08-28 Human face feature point positioning method and system based on global attention

Country Status (1)

Country Link
CN (1) CN112084911B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084912A (en) * 2020-08-28 2020-12-15 安徽清新互联信息科技有限公司 Face feature point positioning method and system based on self-adaptive information enhancement
CN112488049A (en) * 2020-12-16 2021-03-12 哈尔滨市科佳通用机电股份有限公司 Fault identification method for foreign matter clamped between traction motor and shaft of motor train unit
CN113065402A (en) * 2021-03-05 2021-07-02 四川翼飞视科技有限公司 Face detection method based on deformed attention mechanism
CN114743277A (en) * 2022-04-22 2022-07-12 南京亚信软件有限公司 Liveness detection method, device, electronic device, storage medium and program product
WO2022151535A1 (en) * 2021-01-15 2022-07-21 苏州大学 Deep learning-based face feature point detection method

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108268885A (en) * 2017-01-03 2018-07-10 京东方科技集团股份有限公司 Feature point detecting method, equipment and computer readable storage medium
CN109872306A (en) * 2019-01-28 2019-06-11 腾讯科技(深圳)有限公司 Medical image cutting method, device and storage medium
CN110287846A (en) * 2019-06-19 2019-09-27 南京云智控产业技术研究院有限公司 A face key point detection method based on attention mechanism
CN110287857A (en) * 2019-06-20 2019-09-27 厦门美图之家科技有限公司 A kind of training method of characteristic point detection model
KR20190113119A (en) * 2018-03-27 2019-10-08 삼성전자주식회사 Method of calculating attention for convolutional neural network
CN110427867A (en) * 2019-07-30 2019-11-08 华中科技大学 Human facial expression recognition method and system based on residual error attention mechanism
CN110610129A (en) * 2019-08-05 2019-12-24 华中科技大学 A deep learning face recognition system and method based on self-attention mechanism
CN110675406A (en) * 2019-09-16 2020-01-10 南京信息工程大学 CT image kidney segmentation algorithm based on residual double-attention depth network
CN110728312A (en) * 2019-09-29 2020-01-24 浙江大学 A dry eye classification system based on regional adaptive attention network
US20200151424A1 (en) * 2018-11-09 2020-05-14 Sap Se Landmark-free face attribute prediction
CN111160085A (en) * 2019-11-19 2020-05-15 天津中科智能识别产业技术研究院有限公司 Human body image key point posture estimation method
CN111274977A (en) * 2020-01-22 2020-06-12 中能国际建筑投资集团有限公司 Multitask convolution neural network model, using method, device and storage medium
CN111325751A (en) * 2020-03-18 2020-06-23 重庆理工大学 CT image segmentation system based on attention convolution neural network
CN111476184A (en) * 2020-04-13 2020-07-31 河南理工大学 A Human Keypoint Detection Method Based on Dual Attention Mechanism

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108268885A (en) * 2017-01-03 2018-07-10 京东方科技集团股份有限公司 Feature point detecting method, equipment and computer readable storage medium
KR20190113119A (en) * 2018-03-27 2019-10-08 삼성전자주식회사 Method of calculating attention for convolutional neural network
US20200151424A1 (en) * 2018-11-09 2020-05-14 Sap Se Landmark-free face attribute prediction
CN109872306A (en) * 2019-01-28 2019-06-11 腾讯科技(深圳)有限公司 Medical image cutting method, device and storage medium
CN110287846A (en) * 2019-06-19 2019-09-27 南京云智控产业技术研究院有限公司 A face key point detection method based on attention mechanism
CN110287857A (en) * 2019-06-20 2019-09-27 厦门美图之家科技有限公司 A kind of training method of characteristic point detection model
CN110427867A (en) * 2019-07-30 2019-11-08 华中科技大学 Human facial expression recognition method and system based on residual error attention mechanism
CN110610129A (en) * 2019-08-05 2019-12-24 华中科技大学 A deep learning face recognition system and method based on self-attention mechanism
CN110675406A (en) * 2019-09-16 2020-01-10 南京信息工程大学 CT image kidney segmentation algorithm based on residual double-attention depth network
CN110728312A (en) * 2019-09-29 2020-01-24 浙江大学 A dry eye classification system based on regional adaptive attention network
CN111160085A (en) * 2019-11-19 2020-05-15 天津中科智能识别产业技术研究院有限公司 Human body image key point posture estimation method
CN111274977A (en) * 2020-01-22 2020-06-12 中能国际建筑投资集团有限公司 Multitask convolution neural network model, using method, device and storage medium
CN111325751A (en) * 2020-03-18 2020-06-23 重庆理工大学 CT image segmentation system based on attention convolution neural network
CN111476184A (en) * 2020-04-13 2020-07-31 河南理工大学 A Human Keypoint Detection Method Based on Dual Attention Mechanism

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
AMIT KUMAR等: "KEPLER: Simultaneous estimation of keypoints and 3D pose of unconstrained faces in a unified framework by learning efficient H-CNN regressors", 《IMAGE AND VISION COMPUTING》 *
ZHEN QIN等: "SRPRID: Pedestrian Re-Identification Based on Super-Resolution Images", 《IEEE ACCESS》 *
曾家建: "基于深度学习的人脸关键点检测和人脸属性分析", 《中国优秀硕士论文全文数据库信息科技辑》 *
秦晓飞等: "基于注意力模型的人脸关键点检测算法", 《光学仪器》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084912A (en) * 2020-08-28 2020-12-15 安徽清新互联信息科技有限公司 Face feature point positioning method and system based on self-adaptive information enhancement
CN112084912B (en) * 2020-08-28 2024-08-20 安徽清新互联信息科技有限公司 Face feature point positioning method and system based on self-adaptive information enhancement
CN112488049A (en) * 2020-12-16 2021-03-12 哈尔滨市科佳通用机电股份有限公司 Fault identification method for foreign matter clamped between traction motor and shaft of motor train unit
WO2022151535A1 (en) * 2021-01-15 2022-07-21 苏州大学 Deep learning-based face feature point detection method
CN113065402A (en) * 2021-03-05 2021-07-02 四川翼飞视科技有限公司 Face detection method based on deformed attention mechanism
CN113065402B (en) * 2021-03-05 2022-12-09 四川翼飞视科技有限公司 Face detection method based on deformation attention mechanism
CN114743277A (en) * 2022-04-22 2022-07-12 南京亚信软件有限公司 Liveness detection method, device, electronic device, storage medium and program product

Also Published As

Publication number Publication date
CN112084911B (en) 2023-03-07

Similar Documents

Publication Publication Date Title
CN112084911B (en) 2023-03-07 Human face feature point positioning method and system based on global attention
CN111696110B (en) 2022-04-01 Scene segmentation method and system
CN110610210B (en) 2022-03-25 A multi-target detection method
CN111027576B (en) 2020-10-30 Co-saliency detection method based on co-saliency generative adversarial network
CN111914782A (en) 2020-11-10 Human face and detection method and device of feature points of human face, electronic equipment and storage medium
CN111353544B (en) 2023-07-25 A Target Detection Method Based on Improved Mixed Pooling-YOLOV3
CN112784756B (en) 2022-08-26 Human body identification tracking method
WO2023151237A1 (en) 2023-08-17 Face pose estimation method and apparatus, electronic device, and storage medium
CN116740362B (en) 2023-11-21 An attention-based lightweight asymmetric scene semantic segmentation method and system
CN114842026A (en) 2022-08-02 Real-time fan blade image segmentation method and system
CN117541587B (en) 2024-04-02 Solar panel defect detection method, system, electronic equipment and storage medium
CN114972780A (en) 2022-08-30 Lightweight target detection network based on improved YOLOv5
CN115797808A (en) 2023-03-14 Unmanned aerial vehicle inspection defect image identification method, system, device and medium
CN111401335A (en) 2020-07-10 Key point detection method and device and storage medium
CN118504645B (en) 2024-11-08 Multi-mode large model training method, robot motion prediction method and processing device
CN111881746B (en) 2024-04-02 Face feature point positioning method and system based on information fusion
CN112487911B (en) 2024-05-24 Real-time pedestrian detection method and device based on improvement yolov under intelligent monitoring environment
CN118262273A (en) 2024-06-28 A self-supervised video anomaly detection method combined with self-attention module
Wang et al. 2021 An improved YOLOv3 object detection network for mobile augmented reality
CN118015276A (en) 2024-05-10 A semi-supervised semantic segmentation method based on dual-path multi-scale
CN115376195B (en) 2023-01-13 Method for training multi-scale network model and face key point detection method
CN114140524B (en) 2024-08-23 Closed loop detection system and method for multi-scale feature fusion
CN112541469B (en) 2023-09-08 Crowd counting method and system based on self-adaptive classification
CN112084912B (en) 2024-08-20 Face feature point positioning method and system based on self-adaptive information enhancement
CN115908809A (en) 2023-04-04 Target detection method and system based on scale division

Legal Events

Date Code Title Description
2020-12-15 PB01 Publication
2020-12-15 PB01 Publication
2021-01-01 SE01 Entry into force of request for substantive examination
2021-01-01 SE01 Entry into force of request for substantive examination
2023-03-07 GR01 Patent grant
2023-03-07 GR01 Patent grant
2023-08-29 PE01 Entry into force of the registration of the contract for pledge of patent right
2023-08-29 PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A method and system for facial feature point localization based on global attention

Effective date of registration: 20230811

Granted publication date: 20230307

Pledgee: Anhui pilot Free Trade Zone Hefei area sub branch of Huishang Bank Co.,Ltd.

Pledgor: ANHUI QINGXIN INTERNET INFORMATION TECHNOLOGY Co.,Ltd.

Registration number: Y2023980051775

2024-09-13 PC01 Cancellation of the registration of the contract for pledge of patent right
2024-09-13 PC01 Cancellation of the registration of the contract for pledge of patent right

Granted publication date: 20230307

Pledgee: Anhui pilot Free Trade Zone Hefei area sub branch of Huishang Bank Co.,Ltd.

Pledgor: ANHUI QINGXIN INTERNET INFORMATION TECHNOLOGY Co.,Ltd.

Registration number: Y2023980051775