patents.google.com

CN117911704A - Image segmentation method based on neural network and electronic equipment - Google Patents

  • ️Fri Apr 19 2024

Disclosure of Invention

In order to solve the defects of the prior art, the application provides an image segmentation method based on a neural network and electronic equipment.

The technical effects to be achieved by the application are realized by the following scheme:

in a first aspect, the present application provides a neural network-based image segmentation method, the method comprising:

s1, acquiring an image to be segmented, and executing data enhancement on the image to be segmented;

S2, an initial U-net++ network based on five layers of depths is combined with attention gate modules, and corresponding attention gate modules are added for each up-sampling, wherein the attention gate modules are used for strengthening the network performance of each convolution module;

s3, training the initial U-net++ network with five layers of depth by adopting a smooth progressive training method and using a smooth parameter alpha to obtain a target U-net++ network, wherein the initial value of the smooth parameter alpha is 1.

In some embodiments, the performing data enhancement on the image to be segmented includes:

determining a first segmentation size and a first step size according to the size of the image to be segmented;

and dividing the image to be divided according to the first dividing size and the first step length to obtain a divided image.

In some embodiments, the performing data enhancement on the image to be segmented further includes:

Discarding partial images of which the edges cannot be completely segmented under the condition that the images to be segmented are training images, wherein the training images represent training of an initial U-net++ network, and an image used by a target U-net++ network is obtained;

And filling a part of the image of which the edge cannot be completely segmented under the condition that the image to be segmented is a test image, obtaining a filled image, and taking the filled image as a segmented image.

In some embodiments, the training the initial U-net++ network with five layers of depths by using the smoothing parameter a to obtain a target U-net++ network by using a smooth progressive training method includes:

Five convolution modules corresponding to the downsampling of the initial U-net++ network are respectively set to be a convolution module 1, a convolution module 2, a convolution module 3, a convolution module 4 and a convolution module 5;

And a residual error module is arranged between every two convolution modules and is used for smoothing the output result of the convolution modules.

In some embodiments, the training the initial U-net++ network with five layers of depths by using the smoothing parameter a to obtain a target U-net++ network further includes:

before training starts, all layers of the initial U-net++ network are in a frozen state;

after training is started, firstly, the network layer 1 is unfrozen, the segmented image is input into the convolution module 1, and an output result A1 is obtained after the segmented image is processed by the convolution module 1.

In some embodiments, the training the initial U-net++ network with five layers of depths by using a smooth progressive training method to obtain a target U-net++ network further includes:

thawing a network layer 2, inputting A1 into a convolution module 2, processing A1 by the convolution module 2 to obtain A21, inputting A21 into a residual error module between the convolution module 2 and the convolution module 3, processing A21 by the residual error module to obtain A22, calculating to obtain A2, wherein A2=A22 (1-alpha) +A21 alpha, outputting A2 as a result, and continuing to finish the training process;

And after ten epochs pass, calculating the cross-over ratio, and if the cross-over ratio is larger than a set threshold, stopping training to obtain a trained target U-net++ network.

In some embodiments, if the ratio is less than or equal to a set threshold, then

Thawing a network layer 3, and fixing a residual error module between the convolution module 2 and the convolution module 3, wherein the fixing means that the product of a smoothing parameter alpha and an output result of the convolution module 2 is not used as a part of output result;

For training of the convolution module 3 and the residual modules between the convolution module 3 and the convolution module 4, repeating the training process of the convolution module 2 and the residual modules between the convolution module 2 and the convolution module 3, and after ten epochs pass, calculating the cross ratio and judging whether to continue to unfreeze other network layers or not until the target U-net++ network is obtained.

In some embodiments, each of the residual modules includes two network layers, the set threshold is 92%.

In a second aspect, the present application provides an electronic device comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of the preceding claims when the computer program is executed.

In a third aspect, the present application provides a computer readable storage medium storing one or more programs executable by one or more processors to implement the method of any of the preceding claims.

The image segmentation method based on the neural network provided by the application adopts a smooth progressive training method, and combines the smooth parameters, so that the flexibility of the U-net neural network can be improved, the training strategy is optimized, the training time is saved, the efficiency is improved, and the cost is reduced.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It is to be noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present application should be taken in a general sense as understood by one of ordinary skill in the art to which the present application belongs. The use of the terms "first," "second," and the like in one or more embodiments of the present application does not denote any order, quantity, or importance, but rather the terms "first," "second," and the like are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.

Various non-limiting embodiments of the present application are described in detail below with reference to the attached drawing figures.

The application provides a novel neural network, namely an SAU-NetPP network (refer to figure 1), which is a Super Attention U-Net++ (namely SAU-Net++) short for short, is a novel deep learning neural network, solves the problem of low network generalization in segmentation tasks, also has training self-adaption capability, region-of-interest emphasis marking capability, extremely high image detail analysis capability and macroscopic image distribution perception capability, wherein each rectangular block in the figure is composed of an attention mechanism and a convolution block, an upward arrow represents an up-sampling operation, a downward arrow represents a down-sampling operation, and a dotted line represents jumping long-short connection in the network; in the network depth downsampling process after x1_0, the backbone neural network on the left side of the figure has a progressive residual evolution module, and 5 types of (x0_1, x0_2, x0_3, x0_4 and x0_5) are output from the network, which correspond to the deep supervision output of 5 layers of depth respectively.

The image segmentation method based on the neural network is mainly divided into the following concepts:

1. Data enhancement, using a step-wise cut approach, increases robustness and richness of the data set as much as possible.

2. Creating a training model, adding a layer 5 depth based on network optimization of U-net++ self-grinding, adding an Attention Gate (Attention Gate) module during each up-sampling, wherein the framework of the model is definable;

3. a training phase, using a dynamically scaled cross entropy loss function (focal loss), to adjust the tendency weight of the training class; a random average weight SWA training method is used for relieving the weight concussion problem;

4. In the training process, a smooth progressive U-net training method can be adopted, smooth parameters are used, the parameters of the upper layer are gradually transferred to the lower layer in the training process, and the initialized simple U-net framework (2 layers or 3 layers) is gradually progressed to 4 layers or even 5 layers.

The method effectively solves the problems of insufficient perception of image details by the common U-net and insufficient utilization of parameters of the U-net derived network, and the architecture of the network is more flexible and changeable.

First, the image segmentation method based on the neural network according to the present application will be described in detail with reference to fig. 2.

The application provides an image segmentation method based on a neural network, which comprises the following steps:

s1, acquiring an image to be segmented, and executing data enhancement on the image to be segmented;

s2, an initial U-net++ network based on five layers of depth is combined with Attention Gate (Attention Gate) modules, and corresponding Attention Gate modules are added for each up-sampling, wherein the Attention Gate modules are used for strengthening the network performance of each convolution module;

s3, training the initial U-net++ network with five layers of depth by adopting a smooth progressive training method and using a smooth parameter alpha to obtain a target U-net++ network, wherein the initial value of the smooth parameter alpha is 1.

The initial U-net++ network may be, for example, an SAU-NetPP network.

In some embodiments, the performing data enhancement on the image to be segmented includes:

determining a first segmentation size and a first step size according to the size of the image to be segmented;

and dividing the image to be divided according to the first dividing size and the first step length to obtain a divided image.

In some embodiments, the performing data enhancement on the image to be segmented further includes:

Discarding partial images of which the edges cannot be completely segmented under the condition that the images to be segmented are training images, wherein the training images represent training of an initial U-net++ network, and an image used by a target U-net++ network is obtained;

And filling a part of the image of which the edge cannot be completely segmented under the condition that the image to be segmented is a test image, obtaining a filled image, and taking the filled image as a segmented image.

By way of example, a larger irregular picture may be cut into regular 256 x 256 square images using a step-wise cut method, for example. Because the step size and edge cutting may not be integers in the cutting process, there is a part (Padding) to be filled, if the step size and edge cutting are in the training stage, the superfluous edge part of the image is removed by adopting an upward rounding mode, so that experimental errors are reduced; in the test stage, all the cut images can be put into training, namely redundant edge parts are complemented, so that the robustness can be increased and the richness of the data set can be improved.

For example: a 600 x 600 image was cut to a size of 256 x 256, with a step size of 64. Then (600-256)/64= 5.375 steps are obtained, i.e. after 5 steps of segmentation in one direction, a part is left to be insufficient for one step, the segmentation rule at this time is that the 0.375 part is discarded in the training phase, then 6 images are obtained in one direction, and a complete image is obtained with 6*6 =36 image blocks;

if, during the test phase, the image is extended to ensure full coverage of the predicted area, i.e. the 0.375 portion is padded and counted, the last edge of the edge length 600-576=24 needs to be extended to 256 edge length image blocks, and the portion 256-24=234 is the portion to be filled, and the image background is used to fill the edge portion, so that the image has 7*7 =49 final image blocks.

Illustratively, the original U-net++ has a depth of 4 layers, and the U-net++ with the depth of 4 layers is often the depth with the most obvious training effect, but due to the limitation of receptive fields, details are easy to ignore in some complex image segmentation projects. Therefore, U-net++ with 5-layer depth is used as a basic framework structure of the application, so that the network can be deeper, and the network can be explored in more detail.

Wherein, the receptive field (RECEPTIVE FIELD) represents the area size of mapping the pixel point on the Feature map (Feature map) output by each layer of the convolutional neural network to the input image; in other words, a certain point on the feature map is represented, and the size of the feature map is also the size of the input image that can be reflected by the convolutional neural network feature.

In the present application, convolution modules 1 to 5 of U-net++ correspond to x1_0, x2_0, x3_0, x4_0 and x5_0, respectively, i.e., both convolution modules 1 and x1_0 refer to the same, and other convolution modules refer to the same.

Specifically, on the U-net++ with the depth of 5 layers, an attention gate is added to each up-sampling stage, so that a convolution module can be strengthened to enable the convolution module to evolve towards a characteristic image obtained by up-sampling of a convolution block of the next layer; and, training an alpha parameter for enhancement of the portion of interest. Note that the use of the gate can make the image feature more prominent and the edges more visible. Note that the gates may also be implemented, for example, not necessarily in all stacking operations, but it is optional to add attention gates at one or several layers of corresponding depth, which can be adjusted by those skilled in the art according to the actual situation.

For example, as shown in fig. 3, for the convolution modules 1 (x1_0) and 2 (x2_0) of U-net++, the x2_0 module corresponds to the attention gate module on the right side, the x2_0 module performs a deconvolution operation through the up-sampling module, and the stacking module performs a feature stacking operation to combine the attention module with the result of the previous layer of convolution layer. The process of stacking the attention result of the convolution module (x2_0) after up-sampling deconvolution together with the convolution module (x1_0) is regarded as an AG operation, which strengthens the behavior of the convolution block in the network.

In some embodiments, the training the initial U-net++ network with five layers of depths by using a smooth progressive training method and using a smooth parameter α to obtain a target U-net++ network includes:

Five convolution modules corresponding to the downsampling of the initial U-net++ network are respectively set to be a convolution module 1, a convolution module 2, a convolution module 3, a convolution module 4 and a convolution module 5;

And a residual error module is arranged between every two convolution modules and is used for smoothing the output result of the convolution modules.

In some embodiments, the training the initial U-net++ network with five layers of depths by using a smooth progressive training method to obtain a target U-net++ network further includes:

before training starts, all layers of the initial U-net++ network are in a frozen state;

after training is started, firstly, the network layer 1 is unfrozen, the segmented image is input into the convolution module 1, and an output result A1 is obtained after the segmented image is processed by the convolution module 1.

In some embodiments, the training the initial U-net++ network with five layers of depths by using a smooth progressive training method to obtain a target U-net++ network further includes:

thawing a network layer 2, inputting A1 into a convolution module 2, processing A1 by the convolution module 2 to obtain A21, inputting A21 into a residual error module between the convolution module 2 and the convolution module 3, processing A21 by the residual error module to obtain A22, calculating to obtain A2, wherein A2=A22 (1-alpha) +A21 alpha, outputting A2 as a result, and continuing to finish the training process;

And (3) after each training generation (namely, epochs), the smoothing parameter alpha is attenuated by 0.1, after ten epochs are passed, the cross-over ratio (IoU) is calculated, and if the cross-over ratio is larger than a set threshold, training is stopped, and a trained target U-net++ network is obtained.

Illustratively, after adding the residual modules (i.e., adding a residual module between each convolution module and the following convolution module), and setting a smoothing parameter α, as in the residual module of fig. 3, the smoothing parameter α will be multiplied by an arc side X of the residual module (the arc side X represents the result of directly passing the model of the previous layer into the convolution module of the next layer), while the output result after passing through the residual module is Y, which will be multiplied by (1- α), the initial value of the α parameter is 1, α will gradually decrease by 0.1 until it is 0, and the result of passing through the residual module is xα+y (1- α). This smoothes the network with the residual block to an increased depth and makes the residual edges transparent as it evolves to the next layer of the network. Specifically, each residual module includes two network layers.

At the same time, the other non-evolved parts of the network will be frozen and will not defrost the network of that layer until deep evolution has occurred there.

The attenuation value of alpha is set to be 0.1/epoch, the model automatically evaluates the loss and/or the cross ratio IoU of the model every 10 epochs to be trained, if IoU is more than 92%, the training is stopped, and if the model does not reach the standard, the thawing and the training of the next network are carried out.

Illustratively, for example, starting from x1_0 (because x0_0 does not involve a residual block), after 10 epochs, it is necessary to determine the size of the cross-over ratio and the set threshold, and determine whether x2_0 needs to be thawed; wherein, in the process of training x1_0 by using 10 epochs, the smoothing parameter alpha is gradually decreased, and the residual error module is changed from the existence to the non-existence of the arc-shaped edge X by using 10 epochs, and the function is lubrication and smooth transition, so as to lay a cushion for thawing the next layer.

In particular, no evolution has occurred at this level of x0_0, since the block input to x0_0 has not occurred with residual modules and AG, starting from the x1_0 level. When the smoothing parameter is transferred to x1_0, the up-sampling and the right-side feature related module of the network are activated, but all the network structures below x1_0, the up-sampling part and the feature stacking part are not activated, a small U-shaped structure is formed, after the structure is trained for 10 epochs, the residual structure current situation of x0_0 is fixed, the arc edge X is invisible, at the moment, whether the evolution of the next-level network is started or not is judged, and the output result is evaluated for one round. The next evolution deepens a layer of network depth level, widens the width of the network (up-sampling modules on the right side, and the like), forms a new output, obtains a new deep supervision output, and deepens the deep supervision continuously along with the training. The residual error module is changed from the existence to the nonexistence of the arc-shaped edge X, and the residual error module has the function of lubrication and smooth transition and is used for laying a next layer of thawing.

Where epoch is an important concept in the neural network training process, one epoch is equal to the process of training once using all samples in the training set. When a complete data set passes through the neural network once and returns once, i.e., one forward and backward propagation occurs, this process is called an epoch.

In some embodiments, if the ratio is less than or equal to a set threshold, then

Thawing the network layer 3, and fixing the residual error module between the convolution module 2 and the convolution module 3, wherein the fixing means that the product of the smoothing parameter alpha and the output result of the convolution module 2 is not used as a part of the output result, i.e. the product is not required to be adopted for smooth adjustment, and the partial product can be omitted and is not used for calculating the output result;

For training of the convolution module 3 and the residual modules between the convolution module 3 and the convolution module 4, repeating the training process of the convolution module 2 and the residual modules between the convolution module 2 and the convolution module 3, and after ten epochs pass, calculating the cross ratio (IoU) and judging whether to continue to unfreeze other network layers or not until the target U-net++ network is obtained.

In some embodiments, each of the residual modules includes two network layers, the set threshold is 92%.

According to the image segmentation method and the electronic device based on the neural network, which are provided by the application, the smooth progressive training method is cited, and the flexibility of the U-net neural network can be improved by combining the smooth parameters, meanwhile, the training strategy is optimized, the training time is saved, the efficiency is improved, and the cost is reduced.

Example introduction to other training methods skill and skill development environments to which the present application relates:

By using the focal loss, the tendency weight of the training type is regulated, the training is emphasized on the type IoU low under the common training, the self-definition of the weight is completely realized, and the degree of freedom of training regulation is very large.

By using a random average weight SWA training method, checkpoints on k optimized tracks are taken at the end of optimization, and the weights of the k optimized tracks are averaged to obtain the final network weight, so that the weight concussion problem can be relieved, and a smoother solution is obtained, and compared with the traditional training, the solution is more generalized. In the training process, the three learning rate changing modes of the cyclic learning rate, the constant learning rate or the declining learning rate can be selected, different training modes are applied to different tasks for training, and the effect of 10 or more for training epoch is more obvious.

The image segmentation method based on the neural network can realize the following technical effects:

(1) The U-Net++ model with 5 layers of depth is adopted, long and short connection is alternate, and the deep supervision image output channels with 5 depths are provided, so that the method is suitable for monitoring the effect of networks with different depths.

(2) In the process of up-sampling each image, an Attention Gate Attention mechanism is added to enhance the capture of the interested part by the network.

(3) The traditional cross-section copy is replaced by a focal loss method, after the image is marked and read, the proportion of each category is calculated, and the recognition weight is dynamically adjusted according to the scene requirement; and intelligently comparing the result of each layer of deep supervision to give an optimal solution after the combination of the optimal effect, model parameters and training time consumption.

(4) The training method of the novel progressive U-net and the derivative network thereof is provided, a main trunk extraction network of the U-net is gradually deepened in a progressive residual block mode, a similar process is shown in fig. 4, the progressive process of the neural network is realized, the residual network is used for simulating smooth transition of the process, and the freezing and thawing of the network are used for evolving the parameter information in the network and the network. By doing so, the training strategy can be optimized, the training time can be saved, and the network depth suitable for the data can be found.

It should be noted that the method according to one or more embodiments of the present application may be performed by a single device, such as a computer or server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the methods of one or more embodiments of the present application, the devices interacting with each other to accomplish the methods.

It should be noted that the foregoing describes specific embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

Based on the same inventive concept, the application also discloses an electronic device corresponding to the method of any embodiment;

Specifically, fig. 5 shows a schematic hardware structure of an electronic device of an image segmentation method based on a neural network, where the device may include: processor 410, memory 420, input/output interface 430, communication interface 440, and bus 450. Wherein processor 410, memory 420, input/output interface 430 and communication interface 440 are communicatively coupled to each other within the device via bus 450.

The processor 410 may be implemented by a general-purpose CPU (Central Processing Unit ), a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solutions provided by the embodiments of the present application.

The memory 420 may be implemented in the form of ROM (read only memory), RAM (Random Access Memory ), static storage, dynamic storage, etc. Memory 420 may store an operating system and other application programs, and when implementing the techniques provided by embodiments of the present application by software or firmware, the associated program code is stored in memory 420 and invoked for execution by processor 410.

The input/output interface 430 is used to connect with an input/output module to realize information input and output. The input/output module may be configured as a component in a device (not shown in the figure) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.

The communication interface 440 is used to connect communication modules (not shown) to enable communication interactions of the device with other devices. The communication module may implement communication through a wired manner (e.g., USB, network cable, etc.), or may implement communication through a wireless manner (e.g., mobile network, WIFI, bluetooth, etc.).

Bus 450 includes a path to transfer information between components of the device (e.g., processor 410, memory 420, input/output interface 430, and communication interface 440).

It should be noted that although the above device only shows the processor 410, the memory 420, the input/output interface 430, the communication interface 440, and the bus 450, in the implementation, the device may further include other components necessary to achieve normal operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary for implementing the embodiments of the present application, and not all the components shown in the drawings.

The electronic device of the foregoing embodiment is configured to implement the corresponding neural network-based image segmentation method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which is not described herein.

Based on the same inventive concept, one or more embodiments of the present application also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the neural network-based image segmentation method according to any of the embodiments above, corresponding to any of the embodiments methods described above.

The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.

The storage medium of the foregoing embodiments stores computer instructions for causing the computer to perform the neural network-based image segmentation method according to any one of the foregoing embodiments, and has the advantages of the corresponding method embodiments, which are not described herein.

Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the application (including the claims) is limited to these examples; combinations of features of the above embodiments or in different embodiments are also possible within the spirit of the application, steps may be implemented in any order and there are many other variations of the different aspects of one or more embodiments of the application described above which are not provided in detail for the sake of brevity.

Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure one or more embodiments of the application. Furthermore, the apparatus may be shown in block diagram form in order to avoid obscuring the embodiment(s) of the present application, and also in view of the fact that specifics with respect to implementation of such block diagram apparatus are highly dependent upon the platform on which the embodiment(s) of the present application are to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the application, it should be apparent to one skilled in the art that one or more embodiments of the application can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.

While the application has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.

The present application is intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Accordingly, any omissions, modifications, equivalents, improvements and others which are within the spirit and principle of the one or more embodiments of the application are intended to be included within the scope of the application.