CN112084911A - Human face feature point positioning method and system based on global attention - Google Patents
- ️Tue Dec 15 2020
CN112084911A - Human face feature point positioning method and system based on global attention - Google Patents
Human face feature point positioning method and system based on global attention Download PDFInfo
-
Publication number
- CN112084911A CN112084911A CN202010886980.5A CN202010886980A CN112084911A CN 112084911 A CN112084911 A CN 112084911A CN 202010886980 A CN202010886980 A CN 202010886980A CN 112084911 A CN112084911 A CN 112084911A Authority
- CN
- China Prior art keywords
- layer
- human face
- feature point
- face
- global Prior art date
- 2020-08-28 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000003062 neural network model Methods 0.000 claims abstract description 36
- 230000004927 fusion Effects 0.000 claims abstract description 16
- 230000007246 mechanism Effects 0.000 claims abstract description 9
- 230000006870 function Effects 0.000 claims description 25
- 238000012549 training Methods 0.000 claims description 21
- 238000010586 diagram Methods 0.000 claims description 18
- 238000004422 calculation algorithm Methods 0.000 claims description 16
- 238000011176 pooling Methods 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 7
- 238000001514 detection method Methods 0.000 claims description 7
- 238000002372 labelling Methods 0.000 claims description 6
- 238000007792 addition Methods 0.000 claims description 5
- 102100037642 Elongation factor G, mitochondrial Human genes 0.000 claims description 4
- 101000880344 Homo sapiens Elongation factor G, mitochondrial Proteins 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 4
- 101000719024 Homo sapiens Ribosome-releasing factor 2, mitochondrial Proteins 0.000 claims description 3
- 108010001267 Protein Subunits Proteins 0.000 claims description 3
- 102100025784 Ribosome-releasing factor 2, mitochondrial Human genes 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000013461 design Methods 0.000 claims description 3
- 238000009826 distribution Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 8
- 238000013135 deep learning Methods 0.000 abstract description 4
- 238000004590 computer program Methods 0.000 description 8
- 230000001815 facial effect Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000003860 storage Methods 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 3
- 210000000056 organ Anatomy 0.000 description 3
- 238000005457 optimization Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a human face feature point positioning method and a system based on global attention, which comprises the following steps: acquiring a local image of a human face; inputting the acquired image into a human face feature point positioning model based on global attention trained in advance, and directly outputting the position of a human face feature point after the forward operation of the human face feature point positioning model based on the global attention for the input human face local image; the method and the system for positioning the face feature points based on the global attention acquire the fusion characteristics of the face image with the global semantic information and the local semantic information by utilizing a residual error network mechanism and a global attention fusion mechanism based on the deep learning technology, so that a deep neural network model can take the global information and the local information of the face image into account, the position of the face feature points is accurately calculated, the face feature points are positioned more accurately, and the robustness is higher.
Description
Technical Field
The invention relates to the technical field of face recognition, in particular to a face feature point positioning method and system based on global attention.
Background
The positioning of the human face feature points refers to the positioning of key feature point positions of the face on a human face image through a machine vision technology, wherein the key feature points comprise organ positions such as a mouth corner, an eye corner and a nose tip and positions such as a face contour. The positioning of the face feature points is a technical basis of the application fields of a face recognition system, an expression recognition system, a face attribute analysis system and the like, and the reliability and the accuracy of subsequent work can be directly influenced by the quality of the positioning of the face feature points.
In recent 20 years, the face feature point positioning algorithm is always a research hotspot in the field of machine vision, and a plurality of classical algorithms emerge, and the specific algorithms can be divided into the following categories:
(1) a face feature point positioning algorithm based on the traditional technology is mainly based on a face statistical shape model method and a cascade regression method, such as a classical algorithm: ASM, AAM, SDM, LBF, and the like. The algorithm is characterized in that the geometric position relation of human face organs is utilized, a statistical method and a cascade optimization method are adopted to obtain the final position of the human face characteristic points, the expression capability of the algorithm for extracting the human face characteristics is limited, the shape constraint between the human face characteristic points is not considered, and the positioning accuracy error of the characteristic points of the algorithm is large.
(2) In recent years, Deep Learning technology can simulate a human brain neural network, accurate nonlinear prediction can be performed, various fields are widely concerned and applied, and a group of classical human face feature point positioning network frameworks such as MDM (Mnemunic surface method), PFLD (PFLD: A Practical Facial Landmark Detector), TCDCN (Facial Landmark Detection by Deep Multi-task Learning) and the like appear. The algorithm is characterized in that a convolutional neural network model is used for capturing deep semantic features of the human face, and the final positions of the feature points of the human face are obtained by using the deep semantic features, or based on a multi-branch task training mode, or based on a plurality of cascaded neural network models for iterative optimization training. Compared with a human face feature point positioning algorithm of the traditional technology, the human face feature point positioning accuracy is greatly improved, but local semantic information of a human face is mainly used for positioning the feature points, and the local semantic information cannot comprehensively utilize the overall geometric information of human face organs, so that certain errors exist in positioning of the human face feature points.
Disclosure of Invention
The invention provides a human face feature point positioning method and system based on global attention, which can solve the problems in the background technology.
In order to achieve the purpose, the invention adopts the following technical scheme:
a face feature point positioning method based on global attention comprises the following steps:
acquiring a local image of a human face;
inputting the acquired image into a human face feature point positioning model based on global attention trained in advance, and directly outputting the position of a human face feature point after the forward operation of the human face feature point positioning model based on the global attention for the input human face local image;
wherein,
the network structure of the human face feature point positioning model based on global attention comprises:
the conv0 layer is a convolutional layer with a core size of 7 × 7 and a span of 2 × 2;
the maxpool0 layer is a maximum pooling layer with a kernel size of 2 × 2 and a span of 2 × 2;
the conv0 layer and the maxpool0 layer jointly form a feature map resolution rapid reduction network;
resblock0, resblock1, resblock2 and resblock3 are resblock residual modules of a resnet network;
GFM0, GFM1, GFM2, GFM3 are all global attention fusion modules;
the ave-pool layer is a global mean pooling layer; the fc layer is a fully connected layer with 2xN dimension output characteristic, and N represents the number of human face characteristic points.
Further, the specific network structure of the resblock residual module includes:
the rconv2 layer is a convolutional layer with a core size of 1x1 and a span of 2x2, the rconv0 layer is a convolutional layer with a core size of 3x3 and a span of 2x2, the rconv1 layer, the rconv3 layer and the rconv4 layer are convolutional layers with a core size of 3x3 and a span of 1x1, and the eltsum0 layer and the eltsum1 layer are merging layers and are used for merging a plurality of input feature maps into an output feature map according to corresponding element additions.
Further, the specific network structure of the global attention fusion module includes:
gfmconv0, gfmconv1 and gfmconv2 are convolution layers with the kernel size of 1 × 1 and the span of 1 × 1, and reshape0, reshape1, reshape2 and reshape3 are feature size conversion layers and are used for adjusting the size of an input feature to meet the requirement of subsequent feature layer operation;
globalavodiol 0 is a global mean pooling layer based on the feature map channel dimension, globalmaxpool0 is a global maximum pooling layer based on the feature map channel dimension; splicing the output characteristic diagram of the globavapool 0 layer and the output characteristic diagram of the globalmaxpool0 layer according to the channel dimension; gfmconv is a convolution layer with a kernel size of 7 × 7 and a span of 1 × 1, and is used for extracting importance degree weights of each pixel position on the input feature map;
the sigmod layer is an activation function of the sigmod type; the scale layer is a pixel weighting layer, and is used for weighting the input feature graph one by one according to pixel positions, wherein the weighting calculation process is as shown in formula (1), and a space attention mechanism module is formed by globavapoool 0, globalmaxpool0, gfmconv, sigmod and scale; the softmax layer is used for performing softmax type activation operation according to the 2 nd dimension of the input feature map so as to obtain a probability distribution value of the input feature vector;
matmul0 and matmul1 are both feature map multiplication operation layers and follow a general matrix multiplication rule; matsum is a feature map addition operation layer which is used for adding and combining two input feature maps into an output feature map according to corresponding elements;
Oc(x,y)=w(x,y)*Ic(x,y) (1)
wherein, Oc(x, y) represents a numerical value at the (x, y) th channel of the output weighted feature map, w (x, y) represents an importance level weight value at the (x, y) position of the input feature map, Ic(x, y) represents the value at the (x, y) th position of the c-th channel of the input feature map.
Further, the training step of the human face feature point location model based on global attention is as follows:
s21, acquiring training sample images, namely collecting face images under various scenes, various light rays and various angles, acquiring a local area image of each face through the existing face detection algorithm, then labeling the positions of N feature points on each face local image, and recording the position information of the feature points;
s22, designing a target loss function of the deep neural network model;
and S23, training the deep neural network model, namely, sending the labeled face sample image set into the well-defined deep neural network model, and learning related model parameters.
Further, the target loss function adopts a mean square error loss function.
In another aspect, the present invention provides a system for locating facial feature points based on global attention, including the following units:
the image acquisition unit is used for acquiring a local image of a human face;
and the face feature point positioning unit is used for inputting the acquired image into a pre-trained face feature point positioning model based on global attention, and directly outputting the position of the face feature point after the forward operation of the face feature point positioning model based on the global attention on the input face local image.
Further, the device also comprises the following sub-units,
the training sample acquisition unit is used for acquiring training sample images, namely collecting face images under various scenes, various light rays and various angles, acquiring a local area image of each face through the existing face detection algorithm, then labeling the positions of N characteristic points on each face local image, and recording the position information of the characteristic points;
the target loss function design unit is used for designing a target loss function of the deep neural network model;
and the deep neural network model training unit is used for sending the labeled human face sample image set into the well-defined deep neural network model and learning related model parameters.
According to the technical scheme, the human face feature point positioning method and system based on the global attention acquire the fusion feature of the human face image with the global semantic information and the local semantic information by utilizing a residual error network mechanism and a global attention fusion mechanism based on the deep learning technology, so that the deep neural network model can give consideration to the global information and the local information of the human face image, the position of the human face feature point is accurately calculated, the positioning of the human face feature point is more accurate, and the robustness is higher.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a model building flow diagram of the present invention;
figure 3 is a diagram of a deep neural network model architecture,
FIG. 4 is a block diagram of a resblock residual module;
FIG. 5 is a block diagram of a global attention fusion module;
wherein the alphanumeric next to each module graphic represents the output feature map size of the current module, namely: the feature map height x feature map width x number of feature map channels.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention.
As shown in fig. 1, the method for positioning human face feature points based on global attention in this embodiment includes:
this embodiment is performed on the premise that a single face partial image has been acquired.
A method for positioning human face feature points based on global attention is disclosed, as shown in FIG. 1, and includes the following steps:
acquiring a local image of a human face;
inputting the acquired image into a human face feature point positioning model based on global attention trained in advance, and directly outputting the position of a human face feature point after the forward operation of the human face feature point positioning model based on the global attention for the input human face local image;
the following is a detailed description:
as shown in fig. 2: the construction steps of the human face feature point positioning model trained in advance based on the global attention are as follows:
s1, designing a deep neural network model, wherein the deep neural network model designed by the invention mainly has the main function of fusing local semantic information of a face image and global semantic information of the face image by means of a well-designed deep neural network model, extracting fusion characteristics of the face image and the global semantic information, and accurately calculating the position of a face feature point. The present invention uses a Convolutional Neural Network (CNN), which defines some terms for convenience of describing the present invention: feature resolution refers to feature height x feature width, feature size refers to feature height x feature width x number of feature channels, kernel size refers to kernel width x kernel height, and span refers to width span x height span, and each convolutional layer is followed by a bulk normalization layer and a nonlinear activation layer. The method specifically comprises the following steps of designing a deep neural network model:
s11, designing an input image of the deep neural network model, wherein the input image adopted by the invention is a 3-channel RGB image with the resolution of 224 x 224, and the larger the size of the input image is, the more details are contained, and the more the accurate positioning of the human face characteristic points is facilitated.
S12, designing a main network of the deep neural network model, wherein the main network is mainly used for acquiring local semantic information of the face image under the condition that the deep neural network model can see the global semantic information of the face image, extracting the fusion characteristics of the face image with the global information and the local information, and the extraction quality of the fusion characteristics of the face directly influences the positioning accuracy of the subsequent face characteristic points. As can be seen from step S11, the size of the input image used in the present invention is large, which is not favorable for the fast operation of the deep neural network model, and therefore, an efficient network capable of fast extracting the features of the input face image is required. As shown in fig. 3, the present invention adopts an improved classical resnet network structure as a model main network, wherein, the conv0 layer is a convolutional layer with a core size of 7 × 7 and a span of 2 × 2; the maxpool0 layer is a maximum pooling layer with a kernel size of 2 × 2 and a span of 2 × 2; the conv0 layer and the maxpool0 layer jointly form a feature map resolution rapid reduction network, and the main function is to rapidly reduce the feature map resolution and reduce the calculation amount of subsequent operations while keeping more image details; resblock0, resblock1, resblock2 and resblock3 are resblock residual modules of a resnet network; GFM0, GFM1, GFM2, GFM3 are all Global attention fusion modules (Global Fuse modules); the ave-pool layer is a global mean pooling layer; the fc layer is a fully connected layer with 2xN dimension output characteristic, and N represents the number of human face characteristic points. The specific network structure of the reslock residual module is shown in fig. 4, the rconv2 layer is a convolutional layer with a core size of 1x1 and a span of 2x2, the rconv0 layer is a convolutional layer with a core size of 3x3 and a span of 2x2, the rconv1 layer, the rconv3 layer and the rconv4 layer are convolutional layers with a core size of 3x3 and a span of 1x1, and the eltsum0 layer and the eltsum1 layer are merging layers and are used for merging a plurality of input feature maps into an output feature map according to corresponding elements; the specific network structure of the global attention fusion module GFM is shown in fig. 5, wherein gfmconv0, gfmconv1, gfmconv2 are convolutional layers with a core size of 1 × 1 and a span of 1 × 1, and reshape0, reshape1, reshape2, and reshape3 are feature size conversion layers, and mainly function to adjust the input feature size to meet the requirement of subsequent feature layer operation; globalavodiol 0 is a global mean pooling layer based on the feature map channel dimension, globalmaxpool0 is a global maximum pooling layer based on the feature map channel dimension; splicing the output characteristic diagram of the globavapool 0 layer and the output characteristic diagram of the globalmaxpool0 layer according to the channel dimension; gfmconv is a convolution layer with a kernel size of 7 × 7 and a span of 1 × 1, and is mainly used for extracting importance degree weights of each pixel position on an input feature map; the sigmod layer is an activation function of the sigmod type; the scale layer is a pixel weighting layer, and is used for weighting the input feature graph one by one according to pixel positions, wherein the weighting calculation process is as shown in formula (1), and a space attention mechanism module is formed by globavapoool 0, globalmaxpool0, gfmconv, sigmod and scale; the softmax layer is mainly used for performing softmax type activation operation according to the 2 nd dimension of the input feature map so as to acquire the probability distribution value of the input feature vector. matmul0 and matmul1 are both feature map multiplication operation layers and follow a general matrix multiplication rule; matsum is a feature map addition operation layer, and is mainly used for adding and combining two input feature maps into one output feature map according to corresponding elements.
Oc(x,y)=w(x,y)*Ic(x,y) (1)
Wherein, Oc(x, y) represents a numerical value at the (x, y) th channel of the output weighted feature map, w (x, y) represents an importance level weight value at the (x, y) position of the input feature map, Ic(x, y) represents the value at the (x, y) th position of the c-th channel of the input feature map.
S2, training the deep neural network model, optimizing parameters of the deep neural network model mainly through a large amount of labeled training sample data, so that the deep neural network model can accurately position the positions of the characteristic points of the human face, and the specific steps are as follows:
s21, acquiring training sample images, mainly collecting face images under various scenes, various light rays and various angles, acquiring a local area image of each face through the existing face detection algorithm, then labeling the positions of N feature points on each face local image, and recording the position information of the feature points;
s22, designing a target loss function of the deep neural network model, wherein the target loss function is a Mean Square Error (MSE) loss function.
S23, training a deep neural network model, mainly sending a labeled face sample image set into the well-defined deep neural network model, and learning related model parameters;
and S3, directly outputting the positions of the characteristic points of the human face after forward operation of the depth neural network model for any given local image of the human face by using the depth neural network model.
In summary, the method and system for positioning the face feature points based on the global attention of the present invention obtain the fusion features of the face image with the global semantic information and the local semantic information by using the residual network mechanism and the global attention fusion mechanism based on the deep learning technology, so that the deep neural network model can give consideration to the global information and the local information of the face image, accurately calculate the position of the face feature points, and the face feature points are more accurately positioned and have higher robustness.
In another aspect, the present invention provides a system for locating facial feature points based on global attention, including the following units:
the image acquisition unit is used for acquiring a local image of a human face;
and the face feature point positioning unit is used for inputting the acquired image into a pre-trained face feature point positioning model based on global attention, and directly outputting the position of the face feature point after the forward operation of the face feature point positioning model based on the global attention on the input face local image.
Further, the device also comprises the following sub-units,
the training sample acquisition unit is used for acquiring training sample images, namely collecting face images under various scenes, various light rays and various angles, acquiring a local area image of each face through the existing face detection algorithm, then labeling the positions of N characteristic points on each face local image, and recording the position information of the characteristic points;
the target loss function design unit is used for designing a target loss function of the deep neural network model;
and the deep neural network model training unit is used for sending the labeled human face sample image set into the well-defined deep neural network model and learning related model parameters.
In a third aspect, the present invention also discloses a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method as described above.
It is understood that the system provided by the embodiment of the present invention corresponds to the method provided by the embodiment of the present invention, and the explanation, the example and the beneficial effects of the related contents can refer to the corresponding parts in the method.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (7)
1. A human face feature point positioning method based on global attention is characterized by comprising the following steps:
acquiring a local image of a human face;
inputting the acquired image into a human face feature point positioning model based on global attention trained in advance, and directly outputting the position of a human face feature point after the forward operation of the human face feature point positioning model based on the global attention for the input human face local image;
wherein,
the network structure of the human face feature point positioning model based on global attention comprises:
the conv0 layer is a convolutional layer with a core size of 7 × 7 and a span of 2 × 2;
the maxpool0 layer is a maximum pooling layer with a kernel size of 2 × 2 and a span of 2 × 2;
the conv0 layer and the maxpool0 layer jointly form a feature map resolution rapid reduction network;
resblock0, resblock1, resblock2 and resblock3 are resblock residual modules of a resnet network;
GFM0, GFM1, GFM2, GFM3 are all global attention fusion modules;
the ave-pool layer is a global mean pooling layer; the fc layer is a fully connected layer with 2xN dimension output characteristic, and N represents the number of human face characteristic points.
2. The global attention-based human face feature point positioning method according to claim 1, characterized in that: the specific network structure of the resblock residual module comprises:
the rconv2 layer is a convolutional layer with a core size of 1x1 and a span of 2x2, the rconv0 layer is a convolutional layer with a core size of 3x3 and a span of 2x2, the rconv1 layer, the rconv3 layer and the rconv4 layer are convolutional layers with a core size of 3x3 and a span of 1x1, and the eltsum0 layer and the eltsum1 layer are merging layers and are used for merging a plurality of input feature maps into an output feature map according to corresponding element additions.
3. The global attention-based human face feature point positioning method according to claim 2, wherein: the specific network structure of the global attention fusion module comprises:
gfmconv0, gfmconv1 and gfmconv2 are convolution layers with the kernel size of 1 × 1 and the span of 1 × 1, and reshape0, reshape1, reshape2 and reshape3 are feature size conversion layers and are used for adjusting the size of an input feature to meet the requirement of subsequent feature layer operation;
globalavodiol 0 is a global mean pooling layer based on the feature map channel dimension, globalmaxpool0 is a global maximum pooling layer based on the feature map channel dimension; splicing the output characteristic diagram of the globavapool 0 layer and the output characteristic diagram of the globalmaxpool0 layer according to the channel dimension; gfmconv is a convolution layer with a kernel size of 7 × 7 and a span of 1 × 1, and is used for extracting importance degree weights of each pixel position on the input feature map;
the sigmod layer is an activation function of the sigmod type; the scale layer is a pixel weighting layer, and is used for weighting the input feature graph one by one according to pixel positions, wherein the weighting calculation process is as shown in formula (1), and a space attention mechanism module is formed by globavapoool 0, globalmaxpool0, gfmconv, sigmod and scale; the softmax layer is used for performing softmax type activation operation according to the 2 nd dimension of the input feature map so as to obtain a probability distribution value of the input feature vector;
matmul0 and matmul1 are both feature map multiplication operation layers and follow a general matrix multiplication rule; matsum is a feature map addition operation layer which is used for adding and combining two input feature maps into an output feature map according to corresponding elements;
Oc(x,y)=w(x,y)*Ic(x,y) (1)
wherein, Oc(x, y) represents a numerical value at the (x, y) th channel of the output weighted feature map, w (x, y) represents an importance level weight value at the (x, y) position of the input feature map, Ic(x, y) represents the value at the (x, y) th position of the c-th channel of the input feature map.
4. The global attention-based human face feature point positioning method according to claim 1, characterized in that:
the training steps of the human face feature point positioning model based on the global attention are as follows:
s21, acquiring training sample images, namely collecting face images under various scenes, various light rays and various angles, acquiring a local area image of each face through the existing face detection algorithm, then labeling the positions of N feature points on each face local image, and recording the position information of the feature points;
s22, designing a target loss function of the deep neural network model;
and S23, training the deep neural network model, namely, sending the labeled face sample image set into the well-defined deep neural network model, and learning related model parameters.
5. The global attention-based human face feature point positioning method according to claim 4, wherein: the target loss function adopts a mean square error loss function.
6. A human face feature point positioning system based on global attention is characterized in that: the method comprises the following units: the image acquisition unit is used for acquiring a local image of a human face;
and the face feature point positioning unit is used for inputting the acquired image into a pre-trained face feature point positioning model based on global attention, and directly outputting the position of the face feature point after the forward operation of the face feature point positioning model based on the global attention on the input face local image.
7. The system of claim 6, wherein the human face feature point location system based on global attention comprises: also comprises the following sub-units,
the training sample acquisition unit is used for acquiring training sample images, namely collecting face images under various scenes, various light rays and various angles, acquiring a local area image of each face through the existing face detection algorithm, then labeling the positions of N characteristic points on each face local image, and recording the position information of the characteristic points;
the target loss function design unit is used for designing a target loss function of the deep neural network model;
and the deep neural network model training unit is used for sending the labeled human face sample image set into the well-defined deep neural network model and learning related model parameters.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010886980.5A CN112084911B (en) | 2020-08-28 | 2020-08-28 | Human face feature point positioning method and system based on global attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010886980.5A CN112084911B (en) | 2020-08-28 | 2020-08-28 | Human face feature point positioning method and system based on global attention |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112084911A true CN112084911A (en) | 2020-12-15 |
CN112084911B CN112084911B (en) | 2023-03-07 |
Family
ID=73728873
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010886980.5A Active CN112084911B (en) | 2020-08-28 | 2020-08-28 | Human face feature point positioning method and system based on global attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112084911B (en) |
Cited By (5)
* Cited by examiner, † Cited by third partyPublication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112084912A (en) * | 2020-08-28 | 2020-12-15 | 安徽清新互联信息科技有限公司 | Face feature point positioning method and system based on self-adaptive information enhancement |
CN112488049A (en) * | 2020-12-16 | 2021-03-12 | 哈尔滨市科佳通用机电股份有限公司 | Fault identification method for foreign matter clamped between traction motor and shaft of motor train unit |
CN113065402A (en) * | 2021-03-05 | 2021-07-02 | 四川翼飞视科技有限公司 | Face detection method based on deformed attention mechanism |
CN114743277A (en) * | 2022-04-22 | 2022-07-12 | 南京亚信软件有限公司 | Liveness detection method, device, electronic device, storage medium and program product |
WO2022151535A1 (en) * | 2021-01-15 | 2022-07-21 | 苏州大学 | Deep learning-based face feature point detection method |
Citations (14)
* Cited by examiner, † Cited by third partyPublication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108268885A (en) * | 2017-01-03 | 2018-07-10 | 京东方科技集团股份有限公司 | Feature point detecting method, equipment and computer readable storage medium |
CN109872306A (en) * | 2019-01-28 | 2019-06-11 | 腾讯科技(深圳)有限公司 | Medical image cutting method, device and storage medium |
CN110287846A (en) * | 2019-06-19 | 2019-09-27 | 南京云智控产业技术研究院有限公司 | A face key point detection method based on attention mechanism |
CN110287857A (en) * | 2019-06-20 | 2019-09-27 | 厦门美图之家科技有限公司 | A kind of training method of characteristic point detection model |
KR20190113119A (en) * | 2018-03-27 | 2019-10-08 | 삼성전자주식회사 | Method of calculating attention for convolutional neural network |
CN110427867A (en) * | 2019-07-30 | 2019-11-08 | 华中科技大学 | Human facial expression recognition method and system based on residual error attention mechanism |
CN110610129A (en) * | 2019-08-05 | 2019-12-24 | 华中科技大学 | A deep learning face recognition system and method based on self-attention mechanism |
CN110675406A (en) * | 2019-09-16 | 2020-01-10 | 南京信息工程大学 | CT image kidney segmentation algorithm based on residual double-attention depth network |
CN110728312A (en) * | 2019-09-29 | 2020-01-24 | 浙江大学 | A dry eye classification system based on regional adaptive attention network |
US20200151424A1 (en) * | 2018-11-09 | 2020-05-14 | Sap Se | Landmark-free face attribute prediction |
CN111160085A (en) * | 2019-11-19 | 2020-05-15 | 天津中科智能识别产业技术研究院有限公司 | Human body image key point posture estimation method |
CN111274977A (en) * | 2020-01-22 | 2020-06-12 | 中能国际建筑投资集团有限公司 | Multitask convolution neural network model, using method, device and storage medium |
CN111325751A (en) * | 2020-03-18 | 2020-06-23 | 重庆理工大学 | CT image segmentation system based on attention convolution neural network |
CN111476184A (en) * | 2020-04-13 | 2020-07-31 | 河南理工大学 | A Human Keypoint Detection Method Based on Dual Attention Mechanism |
-
2020
- 2020-08-28 CN CN202010886980.5A patent/CN112084911B/en active Active
Patent Citations (14)
* Cited by examiner, † Cited by third partyPublication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108268885A (en) * | 2017-01-03 | 2018-07-10 | 京东方科技集团股份有限公司 | Feature point detecting method, equipment and computer readable storage medium |
KR20190113119A (en) * | 2018-03-27 | 2019-10-08 | 삼성전자주식회사 | Method of calculating attention for convolutional neural network |
US20200151424A1 (en) * | 2018-11-09 | 2020-05-14 | Sap Se | Landmark-free face attribute prediction |
CN109872306A (en) * | 2019-01-28 | 2019-06-11 | 腾讯科技(深圳)有限公司 | Medical image cutting method, device and storage medium |
CN110287846A (en) * | 2019-06-19 | 2019-09-27 | 南京云智控产业技术研究院有限公司 | A face key point detection method based on attention mechanism |
CN110287857A (en) * | 2019-06-20 | 2019-09-27 | 厦门美图之家科技有限公司 | A kind of training method of characteristic point detection model |
CN110427867A (en) * | 2019-07-30 | 2019-11-08 | 华中科技大学 | Human facial expression recognition method and system based on residual error attention mechanism |
CN110610129A (en) * | 2019-08-05 | 2019-12-24 | 华中科技大学 | A deep learning face recognition system and method based on self-attention mechanism |
CN110675406A (en) * | 2019-09-16 | 2020-01-10 | 南京信息工程大学 | CT image kidney segmentation algorithm based on residual double-attention depth network |
CN110728312A (en) * | 2019-09-29 | 2020-01-24 | 浙江大学 | A dry eye classification system based on regional adaptive attention network |
CN111160085A (en) * | 2019-11-19 | 2020-05-15 | 天津中科智能识别产业技术研究院有限公司 | Human body image key point posture estimation method |
CN111274977A (en) * | 2020-01-22 | 2020-06-12 | 中能国际建筑投资集团有限公司 | Multitask convolution neural network model, using method, device and storage medium |
CN111325751A (en) * | 2020-03-18 | 2020-06-23 | 重庆理工大学 | CT image segmentation system based on attention convolution neural network |
CN111476184A (en) * | 2020-04-13 | 2020-07-31 | 河南理工大学 | A Human Keypoint Detection Method Based on Dual Attention Mechanism |
Non-Patent Citations (4)
* Cited by examiner, † Cited by third partyTitle |
---|
AMIT KUMAR等: "KEPLER: Simultaneous estimation of keypoints and 3D pose of unconstrained faces in a unified framework by learning efficient H-CNN regressors", 《IMAGE AND VISION COMPUTING》 * |
ZHEN QIN等: "SRPRID: Pedestrian Re-Identification Based on Super-Resolution Images", 《IEEE ACCESS》 * |
曾家建: "基于深度学习的人脸关键点检测和人脸属性分析", 《中国优秀硕士论文全文数据库信息科技辑》 * |
秦晓飞等: "基于注意力模型的人脸关键点检测算法", 《光学仪器》 * |
Cited By (7)
* Cited by examiner, † Cited by third partyPublication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112084912A (en) * | 2020-08-28 | 2020-12-15 | 安徽清新互联信息科技有限公司 | Face feature point positioning method and system based on self-adaptive information enhancement |
CN112084912B (en) * | 2020-08-28 | 2024-08-20 | 安徽清新互联信息科技有限公司 | Face feature point positioning method and system based on self-adaptive information enhancement |
CN112488049A (en) * | 2020-12-16 | 2021-03-12 | 哈尔滨市科佳通用机电股份有限公司 | Fault identification method for foreign matter clamped between traction motor and shaft of motor train unit |
WO2022151535A1 (en) * | 2021-01-15 | 2022-07-21 | 苏州大学 | Deep learning-based face feature point detection method |
CN113065402A (en) * | 2021-03-05 | 2021-07-02 | 四川翼飞视科技有限公司 | Face detection method based on deformed attention mechanism |
CN113065402B (en) * | 2021-03-05 | 2022-12-09 | 四川翼飞视科技有限公司 | Face detection method based on deformation attention mechanism |
CN114743277A (en) * | 2022-04-22 | 2022-07-12 | 南京亚信软件有限公司 | Liveness detection method, device, electronic device, storage medium and program product |
Also Published As
Publication number | Publication date |
---|---|
CN112084911B (en) | 2023-03-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112084911B (en) | 2023-03-07 | Human face feature point positioning method and system based on global attention |
CN111696110B (en) | 2022-04-01 | Scene segmentation method and system |
CN110610210B (en) | 2022-03-25 | A multi-target detection method |
CN111027576B (en) | 2020-10-30 | Co-saliency detection method based on co-saliency generative adversarial network |
CN111914782A (en) | 2020-11-10 | Human face and detection method and device of feature points of human face, electronic equipment and storage medium |
CN111353544B (en) | 2023-07-25 | A Target Detection Method Based on Improved Mixed Pooling-YOLOV3 |
CN112784756B (en) | 2022-08-26 | Human body identification tracking method |
WO2023151237A1 (en) | 2023-08-17 | Face pose estimation method and apparatus, electronic device, and storage medium |
CN116740362B (en) | 2023-11-21 | An attention-based lightweight asymmetric scene semantic segmentation method and system |
CN114842026A (en) | 2022-08-02 | Real-time fan blade image segmentation method and system |
CN117541587B (en) | 2024-04-02 | Solar panel defect detection method, system, electronic equipment and storage medium |
CN114972780A (en) | 2022-08-30 | Lightweight target detection network based on improved YOLOv5 |
CN115797808A (en) | 2023-03-14 | Unmanned aerial vehicle inspection defect image identification method, system, device and medium |
CN111401335A (en) | 2020-07-10 | Key point detection method and device and storage medium |
CN118504645B (en) | 2024-11-08 | Multi-mode large model training method, robot motion prediction method and processing device |
CN111881746B (en) | 2024-04-02 | Face feature point positioning method and system based on information fusion |
CN112487911B (en) | 2024-05-24 | Real-time pedestrian detection method and device based on improvement yolov under intelligent monitoring environment |
CN118262273A (en) | 2024-06-28 | A self-supervised video anomaly detection method combined with self-attention module |
Wang et al. | 2021 | An improved YOLOv3 object detection network for mobile augmented reality |
CN118015276A (en) | 2024-05-10 | A semi-supervised semantic segmentation method based on dual-path multi-scale |
CN115376195B (en) | 2023-01-13 | Method for training multi-scale network model and face key point detection method |
CN114140524B (en) | 2024-08-23 | Closed loop detection system and method for multi-scale feature fusion |
CN112541469B (en) | 2023-09-08 | Crowd counting method and system based on self-adaptive classification |
CN112084912B (en) | 2024-08-20 | Face feature point positioning method and system based on self-adaptive information enhancement |
CN115908809A (en) | 2023-04-04 | Target detection method and system based on scale division |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
2020-12-15 | PB01 | Publication | |
2020-12-15 | PB01 | Publication | |
2021-01-01 | SE01 | Entry into force of request for substantive examination | |
2021-01-01 | SE01 | Entry into force of request for substantive examination | |
2023-03-07 | GR01 | Patent grant | |
2023-03-07 | GR01 | Patent grant | |
2023-08-29 | PE01 | Entry into force of the registration of the contract for pledge of patent right | |
2023-08-29 | PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: A method and system for facial feature point localization based on global attention Effective date of registration: 20230811 Granted publication date: 20230307 Pledgee: Anhui pilot Free Trade Zone Hefei area sub branch of Huishang Bank Co.,Ltd. Pledgor: ANHUI QINGXIN INTERNET INFORMATION TECHNOLOGY Co.,Ltd. Registration number: Y2023980051775 |
2024-09-13 | PC01 | Cancellation of the registration of the contract for pledge of patent right | |
2024-09-13 | PC01 | Cancellation of the registration of the contract for pledge of patent right |
Granted publication date: 20230307 Pledgee: Anhui pilot Free Trade Zone Hefei area sub branch of Huishang Bank Co.,Ltd. Pledgor: ANHUI QINGXIN INTERNET INFORMATION TECHNOLOGY Co.,Ltd. Registration number: Y2023980051775 |