patents.google.com

CN111526119A - Abnormal flow detection method and device, electronic equipment and computer readable medium - Google Patents

  • ️Tue Aug 11 2020

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Referring to fig. 1, a

flow

100 of one embodiment of an abnormal traffic detection method according to the present application is shown. The execution subject of the abnormal traffic detection method may be a server. The server may be hardware or software. When the server is hardware, it may be implemented as a distributed device cluster formed by multiple devices, or may be implemented as a single device. When the server is software, it may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module. And is not particularly limited herein. The abnormal flow detection method comprises the following steps:

step

101, obtaining an access log of a user to be tested.

In this embodiment, the execution subject of the abnormal traffic detection method may obtain an access log of the user to be tested. The access log can record each interface to be accessed by the user to be tested and the access time of each interface. Here, the interface may refer to each URL (Uniform Resource Locator) in the platform monitored by the execution body. Therefore, the interface that can be recorded in the access log may be a specific URL accessed by the user to be tested.

Taking a life service platform as an example, a user can access interfaces of various services or services in the platform through a client application of the platform. In practice, after the user opens the client application, the user can access each service or service interface by clicking the icon of each service or service. The URL of each interface accessed by the user, as well as the time, may be recorded in the user's access log.

By way of example, the platform may provide a variety of services or businesses such as take-away services, food services, hotel services, entertainment services, movie show services, taxi-taking services, and the like. After the user accesses the gourmet service, the leisure entertainment service and the movie performance service sequentially through the client application of the platform, the execution main body can record information such as the URL of the interface of the gourmet service, the access time to the URL, the URL of the interface of the leisure entertainment service, the access time to the URL, the URL of the interface of the movie performance service, the access time to the URL and the like in the user access log in sequence.

And 102, generating a path sequence chart of the access interface of the user to be tested based on the access log.

In this embodiment, the execution main body may generate a path sequence diagram of the user access interface to be tested based on the interface and the access time recorded in the access log. The path sequence diagram may characterize the path of the interface accessed by the user. Here, the path of the interface accessed by the user may indicate the sequence in which the user accesses the interface. For example, if the user accesses interface a, interface b and interface c in sequence, the path of the interface accessed by the user in sequence is a-b-c. The path timing diagram here may be various forms of images such as a single-channel image, a three-channel image, and the like.

As one example, the path timing diagram may be converted from a graph. Specifically, a coordinate system may be constructed with time as the horizontal axis and the interface as the vertical axis first. Then, for each access record in the access log, a coordinate point may be plotted in the coordinate system based on the interface and the access time recorded in the access record, thereby obtaining a coordinate graph. The coordinate graph may then be converted to a rectangular image.

In the process of converting into the image, a rectangular image corresponding to the coordinate graph can be constructed firstly, and the horizontal direction of the rectangular image represents the time and the vertical direction represents the interface. The rectangular image is then divided into rectangular blocks according to interface and time. Then, for each coordinate point drawn in the coordinate map, a rectangular block in which the coordinate point should be set in the rectangular image may be determined based on the coordinates of the point, and the pixel value of the rectangular block may be set. For example, the value may be set to a preset fixed value, or may be set to a pixel value corresponding to a statistical result based on statistics of data in the access record. The pixel value here may be set to a pixel value of a single channel or a pixel value of three channels, which is not limited in the present application.

Note that, for rectangular blocks to which no plotted coordinate points are mapped, the pixel values of these rectangular blocks may be set as default values, and the default values may be made different from the pixel values of the rectangular blocks to which the plotted coordinate points are mapped. For example, a default value may be set to zero, etc.

As yet another example, the path timing diagram may also be constructed directly. Specifically, a rectangular image may be constructed first, such that the rectangular image represents the time horizontally and the interface vertically. The rectangular image is then divided into rectangular blocks according to interface and time. Then, for each access record in the access log, a rectangular block corresponding to the access record by the rectangular image may be searched based on the interface and the access time recorded in the access record, and a pixel value may be set for the rectangular block. For example, the value may be set to a preset fixed value, or may be set to a pixel value corresponding to a statistical result based on statistics of data in the access record. The pixel value here may be set to a pixel value of a single channel or a pixel value of three channels, which is not limited in the present application. For tiles for which no record is accessed, the pixel values of those tiles may be set to default values and made different from the pixel values of the tile to which the plotted coordinate point is mapped. For example, a default value may be set to zero, etc.

By converting the access log into an image form, the image can contain information such as an interface, access time, sequence, distribution and the like accessed by a user to be detected, so that the image is automatically subjected to feature extraction and detection by using an abnormal flow detection model in the subsequent step, the information in the access log can be fully utilized, the information utilization rate is improved, and the accuracy of abnormal flow detection is improved.

And 103, inputting the path sequence diagram into a pre-trained abnormal flow detection model to obtain an abnormal flow detection result.

In this embodiment, the execution body may input the path timing chart to an abnormal traffic detection model trained in advance, so as to obtain an abnormal traffic detection result. The abnormal traffic detection result here can be used to indicate whether the access traffic generated by the user to be tested is abnormal traffic.

The abnormal traffic detection model herein may be used to detect the category of the path timing graph. In practice, the categories of the path timing diagram can be divided into two categories, which respectively correspond to the abnormal traffic and the normal traffic, for example, the abnormal traffic can be represented by 1, and the normal traffic can be represented by 0. If the output result of the abnormal flow detection model is 1, the abnormal flow detection result can indicate that the access flow generated by the user to be detected is abnormal flow. If the output result of the abnormal flow detection model is 0, the abnormal flow detection result can indicate that the access flow generated by the user to be detected is normal flow.

The abnormal flow detection model can be obtained by training various image processing models by adopting a machine learning method (such as a supervised learning method). As an example, the image processing model used may be a Convolutional Neural Network (CNN) of various structures. In practice, the convolutional neural network is a feedforward neural network, and its artificial neurons can respond to a part of surrounding units in the coverage range, and have excellent performance on image processing, so that an abnormal flow detection model can be obtained by training using the convolutional neural network. In general, convolutional neural networks may include convolutional layers, pooling layers, fully-connected layers, and the like. Convolutional layers may be used to extract image features. The pooling layer may be used to down-sample image features. The fully-connected layer can function as a classifier in the convolutional neural network to output a classification result.

As an example, the abnormal traffic detection model may include three convolutional layers, three pooling layers, and three fully-connected layers. The abnormal flow detection model sequentially comprises a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a third convolution layer, a third pooling layer, a first full-connection layer, a second full-connection layer and a third full-connection layer from a shallow layer to a deep layer.

Therefore, the access log of the user to be detected is converted into an image form, the image is automatically subjected to feature extraction and detection through the abnormal flow detection model, and the abnormal flow identification problem can be converted into an image classification problem. In the process, the abnormal flow detection model automatically extracts the features of the image, so that the complex image features are obtained. Therefore, compared with the existing mode of manually extracting the features, the method has the advantages that the labor cost is reduced, the access information in the form of the image can be fully utilized, the features which cannot be manually extracted are extracted, the information utilization rate is improved, and the accuracy of abnormal flow detection is improved.

In some optional implementations of the present embodiment, the abnormal traffic detection model may be obtained by training the following substeps S11 to substep S12:

in sub-step S11, a sample set is obtained.

The sample set may include path profile samples for normal user access interfaces and path profile samples for abnormal user access interfaces. Each path timing graph sample may carry label information indicating whether the traffic generated for the user is abnormal traffic.

As an example, fig. 2 is a schematic diagram of a path sequence chart of a normal user access interface, and fig. 3 is a schematic diagram of a path sequence chart of an abnormal user access interface. As can be seen from comparison between fig. 2 and fig. 3, the types of interfaces accessed by normal users are more, the distribution of the access interfaces is more distributed, the situations of continuous access to the same interface are less frequent, and the access situations do not have periodicity. The types of the access interfaces of the abnormal users are few, the distribution of the access interfaces is concentrated, the situation of continuously accessing the same interface is common, and approximate periodicity exists.

Optionally, the sample set in the sub-step S11 may be generated by:

the first step is to obtain a normal user data set and an abnormal user data set. The normal user data set includes access logs of normal users. The abnormal user data set comprises an access log of the abnormal user. Each access log records the interfaces accessed by the user and the access time of each interface.

And secondly, generating a path sequence chart of the user access interface for each user related to the normal user data set and the abnormal user data set based on the access log of the user. Thus, a path sequence chart of each normal user and a path sequence chart of each abnormal user can be obtained.

It should be noted that the generation manner of the path timing diagram in this step is basically the same as the generation manner of the path timing diagram in

step

102, and the description thereof is omitted here.

And thirdly, adding marking information for indicating normal flow to the path sequence diagram of the normal user, and adding marking information for indicating abnormal flow to the path sequence diagram of the abnormal user. As an example,

label

1 may be added to the path sequence diagram of the abnormal user, and label 0 may be added to the path sequence diagram of the normal user.

And fourthly, summarizing the samples of the timing chart of each path added with the labeling information to generate a sample set.

In the substep S12, the abnormal flow rate detection model is obtained by training the input path sequence diagram samples in the sample set by the machine learning method, and the label information corresponding to the input path sequence diagram samples is output.

In the training process, the path timing diagram samples can be input into the model one by one, and the detection result output by the model is obtained. Then, the loss value may be determined based on the output detection result and the label information corresponding to the input path timing chart sample. The loss value can be used for representing the difference between the detection result output by the model and the actual labeled information. The larger the loss value, the larger the difference. The above loss value may be determined based on various existing loss functions (loss functions). The loss value can then be used to update the parameters of the model. Therefore, every time one path sequence diagram sample is input, the parameters of the model can be updated once based on the labeling information corresponding to the path sequence diagram sample.

In practice, whether training is complete may be determined in a number of ways. As an example, when the probability that the detection result output by the model matches with the corresponding label information reaches a preset value (e.g., 98%), the model training may be considered to be completed. As yet another example, a model may be considered to be trained completely if the number of times the model is trained is equal to a preset number of times. As yet another example, model training may be considered complete if the loss values for the model converge. The trained model is the abnormal flow detection model.

In the method provided by the embodiment of the application, the access log of the user to be detected is obtained, then the path sequence diagram of the access interface of the user to be detected is generated based on the access time of the user to be detected to access the access interface and the access time of the user to be detected to the access interface recorded in the access log, and finally the path sequence diagram is input to the abnormal traffic detection model trained in advance, so that the abnormal traffic detection result for indicating whether the access traffic generated by the user to be detected is abnormal traffic is obtained. Therefore, on one hand, the access log of the user to be detected is converted into an image form, the abnormal flow identification problem can be converted into an image classification problem, the abnormal flow detection model can automatically extract and detect the features of the image, and compared with the existing mode of manually extracting the features, the labor cost is reduced. On the other hand, because the image contains information such as an interface, access time, sequence and distribution accessed by a user to be detected, the image is automatically subjected to feature extraction and detection through the abnormal flow detection model, information in an access log can be fully utilized, features which cannot be extracted manually can be extracted, the information utilization rate is improved, and the accuracy of abnormal flow detection is improved.

With further reference to fig. 4, a

flow

400 of yet another embodiment of an abnormal traffic detection method is shown. The

flow

400 of the abnormal flow detection method includes the following steps:

step

401, obtaining an access log of a user to be tested.

Step 401 in this embodiment can refer to step 101 in the embodiment corresponding to fig. 1, and is not described herein again.

And step 402, constructing a coordinate system by taking time as a horizontal axis and the codes of the interfaces as a vertical axis.

In this embodiment, the execution subject of the abnormal traffic detection method may construct a coordinate system with time as a horizontal axis and codes of the interface as a vertical axis. The coordinate system here may be a rectangular coordinate system.

Here, since the URL of the interface is usually complex, the interface can be encoded in advance (for example, the encoding is a number such as 1, 2, or 3), thereby simplifying the representation of the interface. Thus, the scale of the vertical axis may be represented in the coding of the interface.

Here, different interfaces have different codes, and the correspondence between the interfaces and the codes can be determined in various ways. Such as manual setting, setting based on the size of the access amount, etc., pre-determining based on the most relevant path of the normal user access interface, etc. Wherein the most relevant path may be determined based on the path of normal user access interface.

In some optional implementations of the present embodiment, the encoding of the interface is predetermined by sub-step S21 to sub-step S25 as follows:

and a substep S21, acquiring a normal user data set.

The normal user data set may include a large number of historical access logs for normal users. The historical access log records the interfaces accessed by normal users and the access time of each interface. It will be appreciated that when the number of historical access logs in the normal user data set is sufficiently large, the interfaces involved in the historical access logs may constitute the full number of interfaces in the monitored platform.

And a substep S22, counting the access amount of each interface related to the normal user data set, and sorting the interfaces related to the normal user data set according to the order of the access amount from large to small.

And a substep S23 of determining an initial code of the interfaces based on the arrangement order of the interfaces, and determining an initial code sequence of the interfaces accessed by each normal user based on the access log of each normal user and the initial code of each interface.

Here, the order of arrangement of the interfaces may be directly used as the initial encoding of the interfaces. For example, there are 7 interfaces a, b, c, d, e, f, g, and the access volumes of the 7 interfaces a, b, c, d, e, f, g are arranged in descending order, and then are c, b, a, e, d, f, g. In this case, the initial code of the interface c may be 1, the initial code of the interface b may be 2, the initial code of the interface a may be 3, the initial code of the interface e may be 4, the initial code of the interface d may be 5, the initial code of the interface f may be 6, and the initial code of the interface g may be 7. At this time, if the user access order is a-d-b-c, the access interface coding sequence is 3-5-2-1.

Optionally, in sub-step S23, since the access amount of the interface at the tail of the sorting result is very small after sorting according to the access amount, in order to facilitate statistics and reduce data amount, the execution body may further set the initial coding of the interface at the tail to be the same coding. In practice, for each interface, if the arrangement order of the interface is less than or equal to a preset threshold (which may be denoted as N), the arrangement order of the interface may be used as the initial code of the interface. If the ranking order of the interface is greater than the predetermined threshold (i.e., N), the predetermined code may be used as the initial code of the interface. Here, the preset code may be any value greater than the preset threshold, such as N + 1. Continuing with the above example, if N is set to 5, then the initial encoding of interface f and interface g are both 6.

And a sub-step S24 of determining, based on the initial coding sequences, the most relevant path for normal user access to the interfaces, the most relevant path being constituted by the initial coding of the interfaces involved in the user data set.

Here, the initial coding sequence of the interface accessed by each normal user can be used as an access path, and a path relation graph with weights can be generated based on the access path of each normal user. Then, a minimum spanning tree algorithm is used for determining the minimum spanning tree in the path relation graph. The conventional minimum spanning tree algorithm such as Kruskal minimum spanning tree algorithm, Prime minimum spanning tree algorithm, etc. may be used here. And then, sequencing the initial codes of the interfaces corresponding to the nodes based on the positions of the nodes in the minimum spanning tree, thereby obtaining the most relevant path of the user access interface.

Optionally, in sub-step S24, the most relevant path may be determined by:

firstly, each initial code is respectively used as a node, the frequency of occurrence of every two initial codes in the same initial code sequence as adjacent codes is used as the weight of a corresponding node connection line, and a path relation graph of a normal user access interface is constructed.

As an example, fig. 5 is a schematic diagram of a path relation diagram. As shown in fig. 5, there are 6 initial codes, which are 1, 2, 3, 4, 5, and 6, respectively, where the

initial code

1 corresponds to the node c, the

initial code

2 corresponds to the node b, the

initial code

3 corresponds to the node a, the

initial code

4 corresponds to the node e, the

initial code

5 corresponds to the node d, and the

initial code

6 corresponds to the nodes f and g.

Thus, the path relation graph includes 6 nodes in total, 1, 2, 3, 4, 5, and 6.

Node

1 and

node

3 appear as adjacent codes in the same initial code sequence 10 times. The number of

times node

2 and

node

3 appear as adjacent codes in the same initial code sequence is 2.

Node

1 and

node

4 appear as adjacent codes in the same initial code sequence at a number of 5.

Node

1 and

node

5 appear as adjacent codes in the same

initial code sequence

4 times.

Node

4 and

node

6 appear as adjacent codes in the same initial code sequence with a number of 3. The number of

times node

5 and

node

6 appear as adjacent codes in the same initial code sequence is 1.

And secondly, taking the reciprocal of the weight of each node connecting line in the path relation graph as a new weight, and determining the minimum spanning tree of the path relation graph after the weight is updated based on a minimum spanning tree algorithm.

Continuing with the path relation graph illustrated in FIG. 5 as an example, the new weight of the connection line between

node

1 and

node

3 is updated to 0.1 by 10. The weight of the connection line between the

node

2 and the

node

3 is updated from 2 to 0.5. The weight of the connection line between the

node

1 and the

node

4 is updated to 0.2 from 5. The weight of the connection line between the

node

1 and the

node

5 is updated to 0.25 from 4. The weight of the connection line between the

node

4 and the

node

6 is updated from 3 to 0.33. The weight of the connection line of the

nodes

5 and 6 is updated from 1 to 1.

Taking the Prime minimum spanning tree algorithm as an example, the Prime algorithm has a core step that V in a weighted connected graph (a path relation graph in the embodiment of the present application is a weighted connected graph) is a set including all vertices, U is a set formed by nodes in the minimum spanning tree, and the following operations are repeatedly performed starting from any vertex V in the graph, where the set U is { V }: finding an edge with the minimum weight value in all the edges (U, w) epsilon E of U epsilon U and w epsilon V-U, adding the edge (U, w) to the set of found edges, adding a point w to the set U, and finding the minimum spanning tree when U is V.

And thirdly, arranging all nodes in the minimum spanning tree according to the sequence from the root node to the leaf node, and generating the most relevant path of the normal user access interface.

Thus, taking the path relation diagram illustrated in fig. 5 as an example, taking any node (e.g., node 1) of the path relation diagram as a root node, processing the path relation diagram illustrated in fig. 5 by using the Prime minimum spanning tree algorithm, so as to obtain the most relevant paths 1-3-4-5-6-2, where the interfaces corresponding to the most relevant paths are interface c, interface a, interface e, interface d, interfaces f and g, and interface b in sequence.

Sub-step S25, for each interface involved in the normal user data set, the order in which the initial codes of the interface are arranged in the most relevant path is taken as the final code of the interface to replace the initial code of the interface.

Continuing with the above example, the most relevant path is 1-3-4-5-6-2, which corresponds to interface c, interface a, interface e, interface d, interfaces f and g, and interface b in turn. Since the

initial code

3 of interface a is arranged in the most relevant path in the order of 2, the final code of interface a is 2. Since the

initial code

2 of interface b is arranged in the most relevant path in the order of 6, the final code of interface b is 6. Since the

initial code

1 of the interface c is arranged in the most relevant path in the order of 1, the final code of the interface c is 1. Since the

initial code

5 of interface d is arranged in the most relevant path in the order of 4, the final code of interface d is 4. Since the

initial code

4 of the interface e is arranged in the most relevant path in the order of 3, the final code of the interface e is 3. Since the

initial codes

6 of the interfaces f and g are arranged in the most relevant path in the order of 5, the final codes of the interfaces f and g are 5.

By using the order of the interfaces arranged in the most relevant path as the coding of the interfaces, compared with manual coding or coding by using other rules, the path sequence diagram can reflect the relationship between the interfaces, and the closer the interfaces are, the stronger the relationship is. Therefore, the characteristics extracted from the path timing diagram through the abnormal flow detection model comprise the spatial structure correlation characteristics among the pixel points, and the accuracy of the abnormal flow detection result is improved.

And 403, taking the interface accessed by the user to be tested as a target interface, and drawing coordinate points in a coordinate system based on the codes and the access time of each target interface to obtain a coordinate graph.

In this embodiment, the execution subject may use an interface accessed by a user to be tested as a target interface, and draw a coordinate point in a coordinate system based on the code and the access time of each target interface to obtain a coordinate graph.

In some optional implementation manners of this embodiment, the access time of the first target interface accessed by the user to be tested may be first used as the initial time. Then, for each target interface, the difference between the access time of the target interface and the initial time is used as an abscissa, the code of the target interface is used as an ordinate, and a coordinate point corresponding to the target interface is drawn in the coordinate system.

And step 404, generating a path sequence diagram of the user access interface to be tested based on the coordinate graph.

In this embodiment, the execution main body may convert the coordinate graph into a path sequence diagram of the user access interface to be tested. In the process of converting into the image, a rectangular image corresponding to the coordinate graph can be constructed firstly, and the horizontal direction of the rectangular image represents the time and the vertical direction represents the interface. The rectangular image is then divided into rectangular blocks according to interface and time. Then, for each coordinate point drawn in the coordinate map, a rectangular block in which the coordinate point should be set in the rectangular image may be determined based on the coordinates of the point, and the pixel value of the rectangular block may be set. For example, the value may be set to a preset fixed value, or may be set to a pixel value corresponding to a statistical result based on statistics of data in the access record. The pixel value here may be set to a pixel value of a single channel or a pixel value of three channels, which is not limited in the present application.

Note that, for rectangular blocks to which no plotted coordinate points are mapped, the pixel values of these rectangular blocks may be set as default values, and the default values may be made different from the pixel values of the rectangular blocks to which the plotted coordinate points are mapped. For example, a default value may be set to zero, etc. Therefore, the path sequence diagram of the user access interface to be tested can be obtained.

In some optional implementations of this embodiment, the path timing diagram may be a three-channel image, and the pixel values in the image may be obtained based on statistics of information of the access interface. Specifically, the path timing diagram may be generated as follows:

the method comprises the following steps that firstly, for each coordinate point in a coordinate graph, the abscissa of the coordinate point is used as target time, and the three-channel pixel value corresponding to the coordinate point is determined based on the number of target interfaces visited by a user to be tested at the target time, the accumulated number of the target interfaces visited at the target time and the number of the target interfaces visited before the target time. Here, the determined quantity and the accumulated quantity may be converted into a value within the range of [0,255] according to a preset numerical conversion relationship (such as a formula or a function). Thereby obtaining RGB (Red Green Blue ) three-channel pixel values.

And secondly, converting the coordinate graph into a three-channel image based on three-channel pixel values corresponding to the coordinate points, and taking the three-channel image as a path timing sequence diagram of the user access interface to be detected. Specifically, a rectangular image corresponding to the coordinate map may be constructed in the above-described manner and divided into a plurality of rectangular blocks. Then, for each coordinate point drawn in the coordinate graph, the three-channel pixel value corresponding to the coordinate point is used as the pixel value of the rectangular block of which the coordinate point is mapped in the rectangular image. And sets the pixel values of the rectangular blocks in the rectangular image that do not correspond to the plotted coordinate points to some default value, such as zero. Thus, a path timing chart can be obtained.

Because the image contains information of an interface, access time, sequence, distribution and the like accessed by a user to be detected and can further comprise spatial structure correlation characteristics among pixel points, the image is automatically subjected to characteristic extraction and detection through the abnormal flow detection model, the information in an access log can be fully utilized, the information utilization rate is improved, and the accuracy of abnormal flow detection is improved.

Step

405, inputting the path sequence diagram into a pre-trained abnormal traffic detection model to obtain an abnormal traffic detection result.

Step 405 in this embodiment can refer to step 103 in the corresponding embodiment of fig. 1, and is not described herein again.

It should be noted that, in this embodiment, the path timing diagrams of the normal user and the abnormal user in the sample used for training the abnormal traffic detection model may be generated by generating the path timing diagram of the user to be detected as described in

steps

402 to 404, and details are not described here.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 1, the

flow

400 of the abnormal traffic detection method in the present embodiment relates to a step of generating a path sequence chart based on a graph with the code of the interface as a vertical axis, and relates to a step of determining the code of the interface based on the most relevant path of the normal user accessing the interface. Because the order of the interfaces arranged in the most relevant path is used as the coding of the interfaces, compared with manual coding or coding by using other rules, the path sequence diagram can reflect the relationship among the interfaces, and the closer the interfaces are, the stronger the relationship is. Therefore, the features extracted from the path sequence diagram by the abnormal flow detection model not only include information such as an interface, access time, sequence and distribution accessed by a user to be detected, but also can further include spatial structure correlation features among pixel points, so that the image is automatically subjected to feature extraction and detection by the abnormal flow detection model, information in an access log can be fully utilized, the information utilization rate is improved, and the accuracy of abnormal flow detection is improved.

With further reference to fig. 6, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an abnormal traffic detection apparatus, which corresponds to the embodiment of the method shown in fig. 1, and which can be applied to various electronic devices.

As shown in fig. 6, the abnormal flow rate detecting apparatus 300 according to the present embodiment includes: an obtaining

unit

601, configured to obtain an access log of a user to be tested, where the access log records interfaces accessed by the user to be tested and access time to the interfaces; a

generating unit

602 configured to generate a path sequence chart of the user access interface to be tested based on the access log; the detecting

unit

603 is configured to input the path sequence diagram into a pre-trained abnormal traffic detection model to obtain an abnormal traffic detection result, where the abnormal traffic detection result is used to indicate whether the access traffic generated by the user to be tested is abnormal traffic.

In some optional implementations of this embodiment, the generating

unit

602 is further configured to: constructing a coordinate system by taking time as a horizontal axis and the codes of the interfaces as a vertical axis; taking the interface accessed by the user to be tested as a target interface, and drawing coordinate points in the coordinate system based on the codes and the access time of each target interface to obtain a coordinate graph; and generating a path sequence diagram of the user access interface to be tested based on the coordinate graph.

In some optional implementations of this embodiment, the encoding of the interface is predetermined based on a most relevant path of the normal user access interface, the most relevant path being determined based on the path of the normal user access interface.

In some optional implementations of this embodiment, the generating

unit

602 is further configured to: taking the access time of the first target interface accessed by the user to be tested as initial time; and for each target interface, taking the difference between the access time of the target interface and the initial time as an abscissa, taking the code of the target interface as an ordinate, and drawing a coordinate point corresponding to the target interface in the coordinate system.

In some optional implementations of this embodiment, the generating

unit

602 is further configured to: for each coordinate point in the coordinate graph, taking the abscissa of the coordinate point as a target time, and determining a three-channel pixel value corresponding to the coordinate point based on the number of target interfaces visited by the user to be tested at the target time, the accumulated number of target interfaces visited at the target time, and the number of target interfaces visited before the target time; and converting the coordinate graph into a three-channel image based on three-channel pixel values corresponding to the coordinate points, and taking the three-channel image as a path timing sequence diagram of the user access interface to be tested.

In some optional implementations of this embodiment, the encoding of the interface is predetermined by: acquiring a normal user data set, wherein the normal user data set comprises a historical access log of normal users, and the historical access log records interfaces accessed by the normal users and access time to each interface; counting the access quantity of each interface related to the normal user data set, and sequencing each interface related to the normal user data set according to the sequence of the access quantity from large to small; determining initial codes of the interfaces based on the arrangement sequence of the interfaces, and determining initial coding sequences of the interfaces accessed by the normal users based on the access logs of the normal users and the initial codes of the interfaces; determining a most relevant path of a normal user access interface based on each initial coding sequence, wherein the most relevant path is formed by the initial coding of each interface related to the user data set; for each interface involved in the normal user data set, the initial codes of the interfaces are arranged in the most relevant path as the final codes of the interfaces to replace the initial codes of the interfaces.

In some optional implementations of this embodiment, the determining the initial encoding of the interfaces based on the arrangement order of the interfaces includes: for each interface, if the arrangement order of the interface is less than or equal to a preset threshold, taking the arrangement order of the interface as the initial code of the interface; and if the arrangement sequence of the interface is greater than the preset threshold, taking a preset code as the initial code of the interface, wherein the preset code is greater than the preset threshold.

In some optional implementations of this embodiment, the determining the most relevant path of the normal user access interface based on each initial coding sequence includes: respectively taking each initial code as a node, taking the frequency of occurrence of every two initial codes in the same initial code sequence as adjacent codes as the weight of corresponding node connection lines, and constructing a path relation graph of a normal user access interface; taking the reciprocal of the weight of each node connecting line in the path relation graph as a new weight, and determining the minimum spanning tree of the path relation graph after updating the weight based on a minimum spanning tree algorithm; and arranging all nodes in the minimum spanning tree according to the sequence from the root node to the leaf node to generate the most relevant path of the normal user access interface.

In some optional implementation manners of this embodiment, the abnormal traffic detection model is obtained by training through the following steps: acquiring a sample set, wherein the sample set comprises a path sequence diagram sample of a normal user access interface and a path sequence diagram sample of an abnormal user access interface, and each path sequence diagram sample is provided with marking information for indicating whether traffic generated by a user is abnormal traffic; and taking the path sequence diagram samples in the sample set as input, taking the labeling information corresponding to the input path sequence diagram samples as output, and training by using a machine learning method to obtain an abnormal flow detection model.

In some optional implementations of this embodiment, the sample set is generated by: acquiring a normal user data set and an abnormal user data set, wherein the normal user data set comprises access logs of normal users, the abnormal user data set comprises access logs of abnormal users, and each access log records an interface accessed by a user and access time to each interface; generating a path sequence diagram of the user access interface for each user related to the normal user data set and the abnormal user data set based on the access log of the user; adding marking information for indicating normal flow to the path sequence diagram of the normal user, and adding marking information for indicating abnormal flow to the path sequence diagram of the abnormal user; and summarizing the timing chart samples of each path added with the labeling information to generate a sample set.

The device provided by the above embodiment of the application obtains the access log of the user to be detected, then generates the path sequence diagram of the access interface of the user to be detected based on the access time of the user to be detected to access each interface and the access time to each interface recorded in the access log, and finally inputs the path sequence diagram to the abnormal traffic detection model trained in advance, thereby obtaining the abnormal traffic detection result for indicating whether the access traffic generated by the user to be detected is the abnormal traffic. Therefore, on one hand, the access log of the user to be detected is converted into an image form, the abnormal flow identification problem can be converted into an image classification problem, the abnormal flow detection model can automatically extract and detect the features of the image, and compared with the existing mode of manually extracting the features, the labor cost is reduced. On the other hand, because the image contains information such as an interface, access time, sequence and distribution accessed by a user to be detected, the image is automatically subjected to feature extraction and detection through the abnormal flow detection model, features which cannot be extracted manually can be extracted, information in an access log can be fully utilized, the information utilization rate is improved, and the accuracy of abnormal flow detection is improved.

Referring now to FIG. 7, shown is a block diagram of a

computer system

700 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 7, the

computer system

700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a

storage section

708 into a Random Access Memory (RAM) 703. In the

RAM

703, various programs and data necessary for the operation of the

system

700 are also stored. The CPU701, the

ROM

702, and the

RAM

703 are connected to each other via a

bus

704. An input/output (I/O)

interface

705 is also connected to

bus

704.

The following components are connected to the I/O interface 705: an

input portion

706 including a keyboard, a mouse, and the like; an

output section

707 including a display such as a Liquid Crystal Display (LCD) and a speaker; a

storage section

708 including a hard disk and the like; and a

communication section

709 including a network interface card such as a LAN card, a modem, or the like. The

communication section

709 performs communication processing via a network such as the internet. A

drive

710 is also connected to the I/

O interface

705 as needed. A

removable medium

711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the

drive

710 as necessary, so that a computer program read out therefrom is mounted into the

storage section

708 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the

communication section

709, and/or installed from the

removable medium

711. The computer program, when executed by a Central Processing Unit (CPU)701, performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The units described may also be provided in a processor, where the names of the units do not in some cases constitute a limitation of the units themselves.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring an access log of a user to be tested, and recording each interface to be accessed by the user to be tested and access time to each interface in the access log; generating a path sequence diagram of the access interface of the user to be tested based on the access log; and inputting the path sequence diagram into a pre-trained abnormal traffic detection model to obtain an abnormal traffic detection result, wherein the abnormal traffic detection result is used for indicating whether the access traffic generated by the user to be detected is abnormal traffic.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.