CN112291075B - Network fault positioning method and device, computer equipment and storage medium - Google Patents
- ️Tue Aug 30 2022
Info
-
Publication number
- CN112291075B CN112291075B CN201910668662.9A CN201910668662A CN112291075B CN 112291075 B CN112291075 B CN 112291075B CN 201910668662 A CN201910668662 A CN 201910668662A CN 112291075 B CN112291075 B CN 112291075B Authority
- CN
- China Prior art keywords
- network
- information
- user
- network device
- fault Prior art date
- 2019-07-23 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 238000012423 maintenance Methods 0.000 claims description 17
- 238000004891 communication Methods 0.000 claims description 15
- 238000012544 monitoring process Methods 0.000 abstract description 30
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 238000001914 filtration Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 15
- 238000012790 confirmation Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 9
- 238000011084 recovery Methods 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 8
- 238000001514 detection method Methods 0.000 description 7
- 230000008439 repair process Effects 0.000 description 5
- 238000012806 monitoring device Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000011144 upstream manufacturing Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000000429 assembly Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000003032 molecular docking Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
- H04L41/065—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving logical or physical relationship, e.g. grouping and hierarchies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0677—Localisation of faults
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/12—Discovery or management of network topologies
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
本发明实施例涉及技术领域,公开了一种网络故障定位方法,所述方法包括:获取所述网络中的用户状态信息;根据所述用户状态信息判断所述网络中的网络设备是否发生故障告警;当所述网络设备发生故障告警时,获取所述网络设备下的所有用户的二层路径树拓扑信息;根据所述二层路径树拓扑信息,确定所述网络中发生故障的网络设备节点信息。通过上述方式,本发明实施例实现了实时监控网络设备用户账号数据的波动,并通过TCA双因子聚类算法,过滤干扰因素的有益效果。
Embodiments of the present invention relate to the technical field, and disclose a method for locating network faults. The method includes: acquiring user status information in the network; and judging whether a network device in the network has a fault alarm according to the user status information ; When a fault alarm occurs in the network device, obtain the layer 2 path tree topology information of all users under the network device; According to the layer 2 path tree topology information, determine the network device node information that has failed in the network . In the above manner, the embodiment of the present invention realizes the beneficial effect of monitoring the fluctuation of the user account data of the network device in real time, and filtering the interference factors through the TCA two-factor clustering algorithm.
Description
技术领域technical field
本发明实施例涉及网络故障诊断技术领域,具体涉及一种网络故障定位方法、装置、计算机设备及存储介质。Embodiments of the present invention relate to the technical field of network fault diagnosis, and in particular, to a network fault location method, device, computer equipment, and storage medium.
背景技术Background technique
目前,随着互联网技术的发展以及各大电信运营商对宽带网络的发展,使得家庭客户宽带业务十分普及。然而随着设备和用户数的不断增多,使得家庭宽带的网络故障监控和保障服务越来越重要。现有的家庭宽带业务故障监控流程中,家庭宽带业务在出现大批量故障时,并没有一个有效且自动化的业务故障监控体系,其往往依赖于传统的OMC网管监控和投诉工单手段来判断宽带故障发生。请参考图1,现有的网络故障监控流程中,仅仅是通过OMC网管监控设备告警信息,再通过人工对逐条告警分析判断是否影响业务,确认业务真实受损后再通知调度抢修。对于投诉工单,则是业务受损后集中爆发用户投诉之后,才能确认故障发生。At present, with the development of Internet technology and the development of broadband networks by major telecom operators, broadband services for home customers are very popular. However, with the increasing number of devices and users, network fault monitoring and security services for home broadband are becoming more and more important. In the existing fault monitoring process of home broadband services, there is no effective and automatic service fault monitoring system when a large number of faults occur in home broadband services. It often relies on traditional OMC network management monitoring and complaint work order methods to determine broadband services. failure occurred. Please refer to Figure 1. In the existing network fault monitoring process, the OMC network management only monitors the alarm information of the equipment, and then manually analyzes the alarms one by one to determine whether the service is affected. After confirming that the service is really damaged, it is notified to dispatch and repair it. For the complaint work order, the fault can only be confirmed after the outbreak of user complaints after the business is damaged.
因此,在实现本发明实施例的过程中,发明人发现:上述的宽带网络故障确定方式耗时耗力、效率低下,且故障预警大大滞后于用户投诉,从而导致维护极为被动。具体地,现有家宽业务故障发现及预警方法存在如下缺点:Therefore, in the process of implementing the embodiments of the present invention, the inventor found that the above broadband network fault determination method is time-consuming, labor-intensive, and inefficient, and fault early warning lags far behind user complaints, resulting in extremely passive maintenance. Specifically, the existing home broadband service fault detection and early warning methods have the following shortcomings:
1)缺乏有效宽带业务监控手段:家庭宽带业务覆盖广,用户基数庞大,超百万的用户规模依靠传统的监控手段关注设备OMC网管告警,不仅手段单一,而且OMC告警面向设备层面,不能精细到用户层面,设备告警并不能完全代表用户业务受损,对于判断真实业务故障存在一定偏差。同时告警无法跟具体用户进行关联,需要人工查询资源管理平台定位用户信息,很难有效做到预警用户故障。1) Lack of effective broadband service monitoring methods: Home broadband services cover a wide range, with a huge user base, and the scale of over one million users relies on traditional monitoring methods to pay attention to equipment OMC network management alarms. Not only is there a single method, but OMC alarms are oriented at the equipment level and cannot be refined to the At the user level, device alarms do not fully represent user service damage, and there is a certain deviation in judging real service failures. At the same time, alarms cannot be associated with specific users, and it is necessary to manually query the resource management platform to locate user information, which is difficult to effectively warn users of faults.
2)故障发现不及时:传统的监控手段通过监视OMC网管告警,逐条告警分析、判断是否对用户业务产生影响,耗时耗力,效率低下,很难及时发现用户业务受损;而通过投诉工单预警方式,则是已经发生故障集中爆发投诉后得知故障发生,属于事后预警,跟维护专业主动预警目标背道而驰。2) Failure to find faults in a timely manner: Traditional monitoring methods monitor OMC network management alarms, analyze alarms one by one, and determine whether they have an impact on user services, which is time-consuming, labor-intensive, and inefficient, and it is difficult to detect damage to user services in time; The single early warning method means that the fault has occurred after a centralized outbreak of complaints has occurred, which is an after-the-fact early warning, which runs counter to the proactive early warning goal of the maintenance profession.
3)故障恢复确认慢:家庭宽带业务量庞大,批量故障抢修后,传统的业务确认模式,如上门、回访、数据后台确认等,不仅需要投入大量人力时间,而且可能涉及跨专业部门确认,往往需要二次甚至多次确认,很难做到一次性有效确认。3) Failure recovery confirmation is slow: the volume of home broadband services is huge, and after batch fault repairs, traditional business confirmation modes, such as door-to-door, return visit, and data background confirmation, not only require a lot of manpower and time, but also may involve cross-professional department confirmation, often It requires two or even multiple confirmations, and it is difficult to achieve a one-time effective confirmation.
因此,亟需一种能够智能分析诊断故障的宽带网络故障定位方法。Therefore, there is an urgent need for a broadband network fault location method capable of intelligently analyzing and diagnosing faults.
发明内容SUMMARY OF THE INVENTION
鉴于上述问题,本发明实施例提供了一种网络故障定位预警方法,克服了上述问题或者至少部分地解决了上述问题。In view of the above problems, embodiments of the present invention provide a network fault location and early warning method, which overcomes the above problems or at least partially solves the above problems.
根据本发明实施例的一个方面,提供了一种网络故障定位预警方法,所述方法包括:According to an aspect of the embodiments of the present invention, a network fault location and early warning method is provided, and the method includes:
获取所述网络中的用户状态信息;obtaining user status information in the network;
根据所述用户状态信息判断所述网络中的网络设备是否发生故障告警;Determine whether a network device in the network has a fault alarm according to the user state information;
当所述网络设备发生故障告警时,获取所述网络设备下的所有用户的二层路径树拓扑信息;When a fault alarm occurs on the network device, obtain the Layer 2 path tree topology information of all users under the network device;
根据所述二层路径树拓扑信息,确定所述网络中发生故障的网络设备节点信息。According to the layer 2 path tree topology information, the information of the network device node that has failed in the network is determined.
在一种可选的方式中,所述根据用户状态信息判断所述网络中的网络设备是否发生故障告警,包括:In an optional manner, the determining whether a network device in the network has a fault alarm according to the user state information includes:
获取所述网络中的各网络设备下用户状态为离线的用户数量;Obtain the number of users whose user status is offline under each network device in the network;
当所述用户数量大于预设的第一阈值时,则所述网络设备发生故障告警。When the number of users is greater than the preset first threshold, the network device generates a fault alarm.
在一种可选的方式中,所述获取网络设备下的所有用户的二层路径树拓扑信息,包括:In an optional manner, the obtaining layer 2 path tree topology information of all users under the network device includes:
获取所述网络设备下的所有用户信息;Obtain all user information under the network device;
根据所述用户信息获取该用户途经的网络设备信息;Acquiring network device information that the user passes through according to the user information;
根据所述用户途经的网路设备信息,形成用户的二层路径树拓扑信息。According to the network device information that the user passes through, the layer 2 path tree topology information of the user is formed.
在一种可选的方式中,所述获取网络设备下的所有用户的二层路径树拓扑信息之后,进一步包括:In an optional manner, after obtaining the Layer 2 path tree topology information of all users under the network device, the method further includes:
获取所述网络设备的OMC告警、集中故障告警和数据库逻辑表;Obtain OMC alarm, centralized fault alarm and database logic table of described network equipment;
通过双因聚类算法对所述OMC告警、集中故障告警和数据库逻辑表进行分析,获取所述用户的离线原因;The OMC alarm, the centralized fault alarm and the database logic table are analyzed by a two-factor clustering algorithm to obtain the offline cause of the user;
当用户离线原因为掉电离线时,则将所述用户删除。When the reason for the user being offline is power failure and offline, the user is deleted.
根据本发明实施例的另一方面,提供了一种网络定位装置,包括:According to another aspect of the embodiments of the present invention, a network positioning apparatus is provided, including:
用户状态信息获取模块,用于获取所述网络中的用户状态信息;a user state information acquisition module, used for acquiring user state information in the network;
判断模块,用于根据所述用户状态信息判断所述网络中的网络设备是否发生故障告警;a judgment module, configured to judge whether a network device in the network has a fault alarm according to the user state information;
拓扑分析模块,用于当所述网络设备发生故障告警时,获取所述网络设备下的所有用户的二层路径树拓扑信息;a topology analysis module, configured to acquire Layer 2 path tree topology information of all users under the network device when a fault alarm occurs in the network device;
定位模块,用于根据所述二层路径树拓扑信息,确定所述网络中发生故障的网络设备节点信息。The positioning module is configured to determine the information of the network device node in the network where the fault occurs according to the layer 2 path tree topology information.
根据本发明实施例的另一方面,提供了一种计算机设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;According to another aspect of the embodiments of the present invention, a computer device is provided, including: a processor, a memory, a communication interface, and a communication bus, and the processor, the memory, and the communication interface communicate with each other through the communication bus. communication between;
所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行以下操作:The memory is used to store at least one executable instruction, and the executable instruction causes the processor to perform the following operations:
获取所述网络中的用户状态信息;obtaining user status information in the network;
根据所述用户状态信息判断所述网络中的网络设备是否发生故障告警;Determine whether a network device in the network has a fault alarm according to the user state information;
当所述网络设备发生故障告警时,获取所述网络设备下的所有用户的二层路径树拓扑信息;When a fault alarm occurs on the network device, obtain the Layer 2 path tree topology information of all users under the network device;
根据所述二层路径树拓扑信息,确定所述网络中发生故障的网络设备节点信息。According to the layer 2 path tree topology information, the information of the network device node that has failed in the network is determined.
根据本发明实施例的又一方面,提供了一种计算机存储介质,所述存储介质中存储有至少一可执行指令,所述可执行指令使所述处理器执行以下操作:According to yet another aspect of the embodiments of the present invention, a computer storage medium is provided, where at least one executable instruction is stored in the storage medium, and the executable instruction causes the processor to perform the following operations:
获取所述网络中的用户状态信息;obtaining user status information in the network;
根据所述用户状态信息判断所述网络中的网络设备是否发生故障告警;Determine whether a network device in the network has a fault alarm according to the user state information;
当所述网络设备发生故障告警时,获取所述网络设备下的所有用户的二层路径树拓扑信息;When a fault alarm occurs on the network device, obtain the Layer 2 path tree topology information of all users under the network device;
根据所述二层路径树拓扑信息,确定所述网络中发生故障的网络设备节点信息。According to the layer 2 path tree topology information, the information of the network device node that has failed in the network is determined.
本发明实施例在网络设备发生故障告警时,获取二层路径数拓扑信息,实现了网络故障的快速定位,极大减少了人工投入成本,从业务层面监控,实现真正面向用户级别的监控,能够真实代表用户业务受损,实现故障从快速发现到业务恢复确认的全生命周期的把控管理。The embodiment of the present invention obtains the topology information of the number of Layer 2 paths when a network device has a fault alarm, realizes rapid location of network faults, greatly reduces labor input costs, monitors from the business level, and realizes real user-level monitoring, which can It truly represents that the user's business is damaged, and realizes the control and management of the whole life cycle from the rapid detection of the fault to the confirmation of the business recovery.
上述仅是本发明实施例技术方案的概述,为了能够更清楚了解本发明实施例的技术手段,而可依照说明书的内容予以实施,并且为了让本发明实施例的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。The above is only an overview of the technical solutions of the embodiments of the present invention. In order to enable a clearer understanding of the technical means of the embodiments of the present invention, they may be implemented according to the contents of the description, and in order to achieve the above and other purposes, features and advantages of the embodiments of the present invention To make it clearer and easier to understand, the following specific embodiments of the present invention are given.
附图说明Description of drawings
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are for the purpose of illustrating preferred embodiments only and are not to be considered limiting of the invention. Also, the same components are denoted by the same reference numerals throughout the drawings. In the attached image:
图1示出了现有家庭宽带业务网络故障预警业务架构图;Fig. 1 shows an existing home broadband service network fault early warning service architecture diagram;
图2示出了本发明一实施例提供基于PPPOE+建立的网络设备与用户账户关联示意图;2 shows a schematic diagram of the association between a network device and a user account established based on PPPOE+ provided by an embodiment of the present invention;
图3示出了本发明一实施例对网络故障定位方法的流程图;FIG. 3 shows a flowchart of a method for locating a network fault according to an embodiment of the present invention;
图4示出了本发明另一实施例提供的网络故障定位方法的流程图;FIG. 4 shows a flowchart of a method for locating a network fault provided by another embodiment of the present invention;
图5示出了本发明实施例提供的网络故障定位装置的结构示意图;FIG. 5 shows a schematic structural diagram of an apparatus for locating a network fault provided by an embodiment of the present invention;
图6示出了本发明实施例提供本发明网络故障定位设备的结构示意图;FIG. 6 shows a schematic structural diagram of a network fault location device of the present invention provided by an embodiment of the present invention;
图7示出了本发明实施例提供的计算机设备的结构示意图。FIG. 7 shows a schematic structural diagram of a computer device provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面将参照附图更详细地描述本发明的示例性实施例。虽然附图中显示了本发明的示例性实施例,然而应当理解,可以以各种形式实现本发明而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本发明,并且能够将本发明的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the present invention will be more thoroughly understood, and will fully convey the scope of the present invention to those skilled in the art.
图3示出了本发明实施例提供的网络故障定位方法实施例的流程图,该方法应用于网络故障定位设备中。该网络故障定位设备用于对家庭宽带业务中的网络故障进行分析及定位。如图3所示,该方法包括以下步骤:FIG. 3 shows a flowchart of an embodiment of a method for locating a network fault provided by an embodiment of the present invention, where the method is applied to a network fault locating device. The device for locating network faults is used for analyzing and locating network faults in home broadband services. As shown in Figure 3, the method includes the following steps:
步骤110:获取所述网络中的用户状态信息。Step 110: Acquire user status information in the network.
在该步骤之前,首选需要基于PPPoE+技术对网络各层设备部署PPPoE+功能,根据用户账号在全设备网元中途经的网络设备信息,建立起全设备网元的网络设备信息与用户账号关联的二层路径树拓扑。Before this step, it is preferred to deploy the PPPoE+ function on devices at all layers of the network based on the PPPoE+ technology. According to the network device information that the user account passes through in the network element of the entire device, a second association between the network device information of the network element of the entire device and the user account is established. Layer path tree topology.
本发明基于PPPoE+技术对网络各层设备部署PPPoE+功能,来获取用户账号在网络中途经的网络设备信息。PPPoE+(又称PPPoE Intermediate Agent),对PPPoE协议报文进行了扩充,接入设备截获PPPoE发现阶段的协议报文,在上行方向插入用户账号在网络中途经的网络设备的物理信息,在下行方向剥离掉用户的物理信息,然后再进行转发。Based on the PPPoE+ technology, the present invention deploys the PPPoE+ function to the equipment of each layer of the network, so as to obtain the information of the network equipment that the user account passes through in the network. PPPoE+ (also known as PPPoE Intermediate Agent) extends the PPPoE protocol packets. The access device intercepts the protocol packets in the PPPoE discovery phase, inserts the physical information of the network device that the user account passes through in the network in the upstream direction, and inserts the physical information of the network device in the downstream direction. Strip the user's physical information before forwarding.
因此,可以通过对网络各层设备部署PPPOE+功能获取用户账号在网络中途经的网络设备信息,以及每一网络设备节点下关联的用户状态信息。由此,可以根据对每一网络设备与用户账号的关联,获取网络中每一网络设备对应的用户账号的用户状态信息。用户状态信息表示用户账号在线、离线或掉电的状态信息。由于部署了PPPoE+功能,因此可以获取用户账号在网络中途经的网络设备信息,再以每个用户账号的设备路径树为监控单元,进行用户账号的用户状态信息实时波动的监控。具体地,可以根据关联OMC网管告警和集中故障平台获取监控单元的告警信息,从而判断出用户账号对应的离线的状态信息。Therefore, by deploying the PPPOE+ function on devices at various layers of the network, the network device information that the user account passes through in the network and the user status information associated with each network device node can be obtained. Thus, the user status information of the user account corresponding to each network device in the network can be acquired according to the association between each network device and the user account. The user status information indicates the status information of the user account being online, offline or powered off. Due to the deployment of the PPPoE+ function, it is possible to obtain the network device information that the user account passes through in the network, and then use the device path tree of each user account as the monitoring unit to monitor the real-time fluctuation of the user status information of the user account. Specifically, the alarm information of the monitoring unit can be obtained according to the associated OMC network management alarm and the centralized fault platform, so as to determine the offline state information corresponding to the user account.
步骤120:根据用户状态信息判断网络中的网络设备是否发生故障告警。Step 120: Determine whether a network device in the network has a fault alarm according to the user state information.
由于步骤110中已经获取了用户状态信息,因此通过对用户状态信息的分析,可以确定网络中的各网络设备下用户状态为离线的用户数量,根据离线的用户数量判断网络中的网络设备是否发生故障告警。具体地,本实施例中对离线的用户数量设置了第一阈值,当离线的用户数量大于预设的第一阈值时,则该网络设备发生故障告警。Since the user status information has been obtained in step 110, the number of users whose user status is offline under each network device in the network can be determined by analyzing the user status information, and it can be determined whether the network device in the network has an offline user according to the number of offline users. Fault alarm. Specifically, in this embodiment, a first threshold is set for the number of offline users, and when the number of offline users is greater than the preset first threshold, a fault alarm occurs on the network device.
其中,第一阀值可根据宽带网络的实际需求自定义配置。本发明的一个实施中,针对第一阈值的配置,为了避免意外配置的阈值比较低,导致生成大量的预警,因此在程序处理的时候做条件控制。第一阀值设置规则设置如下:PON口按最低要达到20个离线用户数,且离线率90%以上触发预警;OLT按用户最低达到200离线用户数或离线率达到90%以上触发预警;SW跟BRAS离线用户超1000个则触发预警。The first threshold value can be customized and configured according to the actual requirements of the broadband network. In an implementation of the present invention, for the configuration of the first threshold, in order to avoid the unexpectedly configured threshold being relatively low, resulting in the generation of a large number of early warnings, condition control is performed during program processing. The first threshold setting rule is set as follows: the PON port must have a minimum number of 20 offline users, and the offline rate is more than 90% to trigger an early warning; OLT triggers an early warning when the number of users reaches a minimum of 200 offline users or the offline rate reaches more than 90%; SW If there are more than 1000 offline users with BRAS, an alert will be triggered.
本实施例中,对于因掉电离线的用户数量不视为发生故障,因此在分析离线的用户数量时,需要将因掉电离线的用户删除。对因掉电离线的用户进行分析的过程如下:In this embodiment, the number of users who are offline due to power failure is not regarded as a failure. Therefore, when analyzing the number of offline users, it is necessary to delete the users who are offline due to power failure. The process of analyzing users offline due to power failure is as follows:
获取网络设备的OMC告警、集中故障告警和数据库逻辑表;Obtain OMC alarms, centralized fault alarms and database logic tables of network devices;
通过双因聚类算法对所述OMC告警、集中故障告警和数据库逻辑表进行分析,获取所述用户的离线原因;The OMC alarm, the centralized fault alarm and the database logic table are analyzed by a two-factor clustering algorithm to obtain the offline cause of the user;
当用户离线原因为掉电离线时,则将所述用户删除。When the reason for the user being offline is power failure and offline, the user is deleted.
具体地,利用TCA(Two-factor clustering algorithm)双因聚类算法,即通过关联OMC网管告警和集中故障平台获取告警信息,分析告警信息中的用户账号离线数据,对用户账号离线数据的故障原因进行分类,根据告警信息的OID信息,匹配该告警信息关联的网络设备信息所在的二层路径树拓扑的位置,识别告警信息关联的用户账号信息,由此判断用户账号离线原因,如:用户账号离线数据中关联到OMC的DGI告警信息表示用户账号断电;用户主动下线会发送presence消息,网络故障导致的断线则无通知发送,即判断为误告数据。然后过滤非故障数据(用户主动下线、闪断等)和误告数据(用户断电、小区断电等)这两大类因子导致的用户账号离线的误告数据,只保留真实发生故障的用户账号离线数据。判断用户是否为闪断恢复,若是则视为掉电电导致的。具体地,判断离线的用户账号是否闪断或短时间上线恢复,如设置纠错时间2分钟,2分钟恢复则不预警。判断用户状态为离线的用户在预定时间内重新上线时,则将所述用户从离线用户数据中删除。通过上述分析,最终得到真实发生故障的用户账号离线数据。Specifically, the TCA (Two-factor clustering algorithm) two-factor clustering algorithm is used, that is, the alarm information is obtained by correlating the OMC network management alarm and the centralized fault platform, the offline data of the user account in the alarm information is analyzed, and the cause of the failure of the offline data of the user account is analyzed. Classify, according to the OID information of the alarm information, match the location of the Layer 2 path tree topology where the network device information associated with the alarm information is located, identify the user account information associated with the alarm information, and thereby determine the reason for the offline user account, such as: user account The DGI alarm information associated with the OMC in the offline data indicates that the user account is powered off; a presence message will be sent if the user voluntarily goes offline, and no notification will be sent if the disconnection is caused by a network failure, that is, it is judged as false alarm data. Then filter the non-fault data (users take the initiative to go offline, flash, etc.) and false alarm data (user power outage, community power outage, etc.) caused by two types of factors, the user account offline false alarm data, only keep the real failure data User account offline data. Determine whether the user is recovering from a flash failure, if so, it is considered to be caused by a power failure. Specifically, it is judged whether the offline user account is disconnected or restored online in a short period of time. When it is determined that the user whose user status is offline is back online within a predetermined time, the user is deleted from the offline user data. Through the above analysis, the offline data of the user account that has actually failed is finally obtained.
步骤130:当所述网络设备发生故障告警时,获取所述网络设备下的所有用户的二层路径树拓扑信息。Step 130: When the network device generates a fault alarm, acquire the layer 2 path tree topology information of all users under the network device.
其中,本实施例中,如前述,该网络设备下的所有用户的二层路径树拓扑信息,通过以下步骤获取:Wherein, in this embodiment, as mentioned above, the Layer 2 path tree topology information of all users under the network device is obtained through the following steps:
获取网络设备下的所有用户信息;Obtain all user information under the network device;
根据所述用户信息获取该用户途经的网络设备信息;Acquiring network device information that the user passes through according to the user information;
根据所述用户途经的网路设备信息,形成用户的二层路径树拓扑信息。According to the network device information that the user passes through, the layer 2 path tree topology information of the user is formed.
请参阅图2,为PPPOE+网络设备路径示意图。具体地,基于PPPoE+技术对网络各层设备部署PPPoE+功能,来获取用户账号在网络中途经的网络设备信息,主要包括以下步骤:Please refer to Figure 2, which is a schematic diagram of the PPPOE+ network device path. Specifically, deploying the PPPoE+ function on devices at all layers of the network based on the PPPoE+ technology to obtain the network device information that the user account passes through in the network mainly includes the following steps:
步骤1301:对BRAS(宽带远程接入服务器,Broadband Remote Access Server)和OLT(光线路终端)等网络设备部署PPPOE+功能,获取家庭宽带的用户账号到3A认证计费时携带OLT、PON口、VLAN等网络设备信息。Step 1301: Deploy the PPPOE+ function on network devices such as BRAS (Broadband Remote Access Server) and OLT (Optical Line Terminal), and obtain the user account of home broadband to carry the OLT, PON port, and VLAN to 3A authentication and billing and other network equipment information.
具体的,当携带PPPoE+tag信息的报文经过某一网络设备的端口时,若端口模式为replace,则会把tag里的信息替换成该网络设备的本机设备信息,因此可以获取相应的用户账号在所述全设备网元中途经的网络设备信息。Specifically, when a packet carrying PPPoE+tag information passes through the port of a certain network device, if the port mode is replace, the information in the tag will be replaced with the local device information of the network device, so the corresponding device information can be obtained. Network device information that the user account passes through in the full-device network element.
步骤1302:通过SNMP OID读取mac查询家宽业务VLAN关联的SW(交换机),从而建立OLT-SW-BRAS从下往上的用户路径树拓扑。Step 1302: Read the mac through the SNMP OID to query the SW (switch) associated with the home broadband service VLAN, thereby establishing a user path tree topology of OLT-SW-BRAS from bottom to top.
SNMP(Simple Network Management Protocol)为简单网络管理协议,一个网络设备以守护进程的方式运行SNMP代理,该守护进程能够响应来自网络的各种请求信息。该SNMP代理提供大量的OID(对象标识符,Object Identifiers)。因此,通过使用SNMP协议获取某个SW相关设备信息,则可以通过设备的OID获取相关信息,并能获取SW下所有连接的PC机的mac地址、IP地址以及端口号信息,此处不再赘述。SNMP (Simple Network Management Protocol) is a simple network management protocol. A network device runs an SNMP agent as a daemon process, and the daemon process can respond to various request information from the network. The SNMP agent provides a number of OIDs (Object Identifiers). Therefore, by using the SNMP protocol to obtain the information of a certain SW-related device, the relevant information can be obtained through the OID of the device, and the mac address, IP address and port number information of all connected PCs under the SW can be obtained, which will not be repeated here. .
最终,通过步骤1301和步骤1302建立起OLT-SW-BRAS从下往上的用户路径树拓扑。Finally, through steps 1301 and 1302, a bottom-to-up user path tree topology of the OLT-SW-BRAS is established.
步骤1303:对接3A取数,通过解析实时获取的radius报文解析获得报文中的用户账号,以及PPPOE+内容,摘取用户上网所经过的PON端口、设备以及相关配置信息等网络设备信息,然后以用户账号为单位建立一条二层路径,从而形成“用户账号-PON端口-OLT-SW-BRAS”的二层路径,再以多用户账号形成二层路径树,最终形成网络设备信息与用户账号关联的二层路径树拓扑,也即用户账号的网络路径画像,从而构建监控单位,实现对某设备、某节点的在线用户数进行实时监控。Step 1303: Docking with 3A to retrieve data, obtain the user account and PPPOE+ content in the message by parsing the radius message obtained in real time, extract the network device information such as the PON port, equipment and related configuration information through which the user accesses the Internet, and then A layer-2 path is established based on the user account to form a layer-2 path of "user account-PON port-OLT-SW-BRAS", and then a layer-2 path tree is formed with multiple user accounts, and finally network device information and user accounts are formed. The associated Layer 2 path tree topology, that is, the network path portrait of the user account, builds a monitoring unit to monitor the number of online users of a device and a node in real time.
由此,在某一网络设备发生故障告警时,根据该二层路径树拓扑信息,获取所述发生故障告警的网络设备的拓扑信息;根据所述网络设备的拓扑信息,确定发生故障的网络设备节点的信息。Therefore, when a fault alarm occurs in a certain network device, the topology information of the network device in which the fault alarm occurs is obtained according to the layer 2 path tree topology information; and the faulty network device is determined according to the topology information of the network device. Node information.
步骤140:根据所述二层路径树拓扑信息,确定所述网络中发生故障的网络设备节点信息。Step 140: Determine the information of the network device node that has failed in the network according to the layer 2 path tree topology information.
本实施例中,根据所述用户状态为离线的用户数量和预设的告警级别阈值,确定相应的告警级别。并根据所述告警级别进行网络故障维护。具体地,定义蓝、黄、红三等故障预警级别,故障预警级别对应不同的用户账号离线数据等级,其可根据具体的实际需求自定义配置。例如,可设置黄色对应200个发生故障的用户账号离线数据,黄色对应500个发生故障的用户账号离线数据,红色对应1000个发生故障的用户账号离线数据。当发生故障的离线用户数达到相应预警级别后,即时生成故障预警信息,故障预警信息包括故障的离线用户数、故障的网络设备节点、根据发生故障的用户账号离线数据分析得到的扼要告警信息,从而实现精准判断故障范围和故障原因的目标,同时短信(第一途径)知会一线维护人员。In this embodiment, the corresponding alarm severity is determined according to the number of users whose user status is offline and a preset alarm severity threshold. And perform network fault maintenance according to the alarm severity. Specifically, define blue, yellow, and red fault warning levels. The fault warning levels correspond to different offline data levels of user accounts, which can be customized and configured according to specific actual needs. For example, you can set yellow to correspond to the offline data of 200 faulty user accounts, yellow to correspond to the offline data of 500 faulty user accounts, and red to correspond to the offline data of 1000 faulty user accounts. When the number of faulty offline users reaches the corresponding warning level, fault early warning information is generated immediately. The fault early warning information includes the number of faulty offline users, the faulty network device nodes, and the summary alarm information obtained by analyzing the offline data of the faulty user account. In this way, the goal of accurately judging the fault range and cause of the fault is achieved, and at the same time, the front-line maintenance personnel are notified by SMS (the first way).
本发明实施例在网络设备发生故障告警时,获取二层路径数拓扑信息,实现了网络故障的快速定位,极大减少了人工投入成本,从业务层面监控,实现真正面向用户级别的监控,能够真实代表用户业务受损,实现故障从快速发现到业务恢复确认的全生命周期的把控管理。The embodiment of the present invention obtains the topology information of the number of Layer 2 paths when a network device has a fault alarm, realizes rapid location of network faults, greatly reduces labor input costs, monitors from the business level, and realizes real user-level monitoring, which can It truly represents that the user's business is damaged, and realizes the control and management of the whole life cycle from the rapid detection of the fault to the confirmation of the business recovery.
图4示出了本发明网络故障定位方法另一个实施例的流程图,该方法应用于网络故障定位设备中。该网络故障定位设备用于对家庭宽带业务中的网络故障进行分析及定位。如图4所示,该方法包括以下步骤:FIG. 4 shows a flowchart of another embodiment of the method for locating network faults of the present invention, and the method is applied to a network fault locating device. The device for locating network faults is used for analyzing and locating network faults in home broadband services. As shown in Figure 4, the method includes the following steps:
步骤210:获取用户账号在全设备网元中途经的网络设备信息,形成全设备网元的网络设备信息与用户账号关联的二层路径树拓扑,对网络中的网络设备以用户账号为单位进行状态监控,获取用户状态信息。Step 210: Obtain the network device information that the user account passes through in the network elements of the entire device, form a two-layer path tree topology in which the network device information of the network element of the entire device is associated with the user account, and perform the network device in the network with the user account as a unit. Status monitoring to obtain user status information.
本方案基于PPPoE+技术对网络各层设备部署PPPoE+功能,来获取用户账号在网络中途经的网络设备信息。PPPoE+(又称PPPoE Intermediate Agent),对PPPoE协议报文进行了扩充,接入设备截获PPPoE发现阶段的协议报文,在上行方向插入用户账号在网络中途经的网络设备的物理信息,在下行方向剥离掉用户的物理信息,然后再进行转发。因此,可以通过对网络各层设备部署PPPOE+功能获取用户账号在网络中途经的网络设备信息。Based on the PPPoE+ technology, this solution deploys the PPPoE+ function to devices at all layers of the network to obtain the network device information that the user account passes through in the network. PPPoE+ (also known as PPPoE Intermediate Agent) extends the PPPoE protocol packets. The access device intercepts the protocol packets in the PPPoE discovery phase, inserts the physical information of the network device that the user account passes through in the network in the upstream direction, and inserts the physical information of the network device in the downstream direction. Strip the user's physical information before forwarding. Therefore, by deploying the PPPOE+ function on the devices at each layer of the network, the information of the network devices that the user account passes through in the network can be obtained.
其中,基于PPPoE+技术对网络各层设备部署PPPoE+功能,来获取用户账号在网络中途经的网络设备信息,主要包括以下步骤:Among them, deploying the PPPoE+ function on devices at all layers of the network based on the PPPoE+ technology to obtain the network device information that the user account passes through in the network mainly includes the following steps:
步骤2101:对BRAS(宽带远程接入服务器,Broadband Remote Access Server)和OLT(光线路终端)等网络设备部署PPPOE+功能,家庭宽带的用户账号到3A认证计费时携带OLT、PON口、VLAN等网络设备信息。Step 2101: Deploy PPPOE+ functions on network devices such as BRAS (Broadband Remote Access Server) and OLT (Optical Line Terminal), and the user account of home broadband will carry OLT, PON port, VLAN, etc. Network device information.
具体的,当携带PPPoE+tag信息的报文经过某一网络设备的端口时,若端口模式为replace,则会把tag里的信息替换成该网络设备的本机设备信息,因此可以获取相应的网络设备信息。Specifically, when a packet carrying PPPoE+tag information passes through the port of a certain network device, if the port mode is replace, the information in the tag will be replaced with the local device information of the network device, so the corresponding device information can be obtained. Network device information.
步骤2102:通过SNMP OID读取mac查询家宽业务VLAN关联的SW(交换机),从而建立OLT-SW-BRAS从下往上的用户路径树拓扑。Step 2102: Read the mac through the SNMP OID to query the SW (switch) associated with the home broadband service VLAN, thereby establishing a user path tree topology of OLT-SW-BRAS from bottom to top.
SNMP(Simple Network Management Protocol)为简单网络管理协议,一个网络设备以守护进程的方式运行SNMP代理,该守护进程能够响应来自网络的各种请求信息。该SNMP代理提供大量的OID(对象标识符,Object Identifiers)。因此,通过使用SNMP协议获取某个SW相关设备信息,则可以通过设备的OID获取相关信息,并能获取SW下所有连接的PC机的mac地址、IP地址以及端口号信息,此处不再赘述。SNMP (Simple Network Management Protocol) is a simple network management protocol. A network device runs an SNMP agent as a daemon process, and the daemon process can respond to various request information from the network. The SNMP agent provides a number of OIDs (Object Identifiers). Therefore, by using the SNMP protocol to obtain the information of a certain SW-related device, the relevant information can be obtained through the OID of the device, and the mac address, IP address and port number information of all connected PCs under the SW can be obtained, which will not be repeated here. .
最终,通过步骤2101和步骤2102建立起OLT-SW-BRAS从下往上的用户路径树拓扑,也即用户账号的网络路径画像。Finally, through steps 2101 and 2102, the user path tree topology of the OLT-SW-BRAS from bottom to top is established, that is, the network path portrait of the user account.
步骤2103:对接3A取数,通过解析实时获取的radius报文解析获得报文中的用户宽带账号,以及PPPOE+内容,摘取用户上网所经过的端口、设备以及相关配置信息等网络设备信息,然后以用户账号为单位建立一条二层路径,从而形成“用户账号-PON端口-OLT-SW-BRAS”的二层路径,再以多用户账号形成二层路径树,最终形成网络设备信息与用户账号关联的二层路径树拓扑,从而构建监控单位,实现对某设备、某节点的在线用户数进行实时监控。Step 2103: Connect to 3A to retrieve data, obtain the user broadband account and PPPOE+ content in the packet by parsing the radius packet obtained in real time, extract the network device information such as the port, device and related configuration information through which the user accesses the Internet, and then A layer-2 path is established based on the user account to form a layer-2 path of "user account-PON port-OLT-SW-BRAS", and then a layer-2 path tree is formed with multiple user accounts, and finally network device information and user accounts are formed. The associated Layer 2 path tree topology can build a monitoring unit to monitor the number of online users of a device and a node in real time.
步骤220:检测网络设备下的用户状态信息,获取用户账号离线数据,判断所述用户账号离线数据是否达到所述网络设备的预警阈值,若是,则执行步骤230,若否,则不生成故障预警。Step 220: Detect the user status information under the network device, obtain the offline data of the user account, and determine whether the offline data of the user account has reached the early warning threshold of the network device. .
其中,检测该网络设备下用户账号状态,获取用户账号离线数据,判断所述用户账号离线数据是否达到预警阈值,通过以下步骤实现:Wherein, detecting the status of the user account under the network device, obtaining the offline data of the user account, and judging whether the offline data of the user account has reached an early warning threshold, are realized by the following steps:
步骤2201:根据全设备网元的网络设备信息与用户账号关联的二层路径树拓扑,对全设备网元建立实时监控任务,全天候监控用户数异常波动,检测网络设备下的用户状态信息,获取用户账号离线数据。Step 2201: According to the layer 2 path tree topology associated with the network device information of the network element of the whole device and the user account, establish a real-time monitoring task for the network element of the whole device, monitor the abnormal fluctuation of the number of users around the clock, detect the user status information under the network device, and obtain User account offline data.
步骤2202:判断该网络设备下的离线用户数据是否达到预设的预警阀值,若否则不预警,若是则执行步骤230。Step 2202: Determine whether the offline user data under the network device reaches a preset warning threshold, if not, no warning, and if so, go to Step 230.
步骤230:当所述网络设备发生故障告警时,获取所述网络设备下的所有用户的二层路径树拓扑信息,分析用户账号离线数据,识别出因掉电离线的用户账号数据,并进行过滤,得到发生故障的用户账号离线数据。Step 230: When a fault alarm occurs on the network device, obtain the layer 2 path tree topology information of all users under the network device, analyze the offline data of user accounts, identify the user account data that is offline due to power failure, and filter , to get the offline data of the faulty user account.
具体通过以下步骤实现:Specifically through the following steps:
步骤2301:告警信息关联,判断是否全是用户断电导致离线,若是则不生成故障预警信息,若否则执行步骤2302。利用TCA(Two-factor clustering algorithm)双因聚类算法,即通过关联OMC网管告警和集中故障平台获取告警信息,分析告警信息中的用户账号离线数据,对用户账号离线数据的故障原因进行分类,根据告警信息的OID信息,匹配该告警信息关联的网络设备信息所在的二层路径树拓扑的位置,识别告警信息关联的用户账号信息,由此判断用户账号离线原因,如:用户账号离线数据中关联到OMC的DGI告警信息表示用户账号断电;用户主动下线会发送presence消息,网络故障导致的断线则无通知发送,即判断为误告数据。Step 2301 : correlate the alarm information, determine whether all the users are offline due to power failure, if so, no fault warning information is generated, otherwise, step 2302 is performed. Using the TCA (Two-factor clustering algorithm) two-factor clustering algorithm, that is, by correlating the OMC network management alarm and the centralized fault platform to obtain alarm information, analyze the offline data of the user account in the alarm information, and classify the fault cause of the offline data of the user account. According to the OID information of the alarm information, match the position of the Layer 2 path tree topology where the network device information associated with the alarm information is located, identify the user account information associated with the alarm information, and thus determine the reason for the offline user account, such as: in the user account offline data The DGI alarm information associated with the OMC indicates that the user account is powered off; a presence message will be sent when the user voluntarily goes offline, and no notification will be sent if the disconnection is caused by a network failure, that is, it is judged as false alarm data.
步骤2302:过滤因掉电导致的用户账号离线数据。Step 2302: Filter the offline data of the user account due to power failure.
步骤2303:判断过滤后的用户账号离线数据是否为闪断恢复,若是则不生成故障预警,若否则生成故障预警信息。具体地,判断过滤后的用户账号离线数据对应的离线用户账号是否闪断或短时间上线恢复,如设置纠错时间2分钟,2分钟恢复则判断为闪断或短时间上线恢复,则不进行故障预警。Step 2303: Determine whether the filtered offline data of the user account is flash recovery, if so, no fault warning is generated, otherwise, fault warning information is generated. Specifically, it is judged whether the offline user account corresponding to the filtered user account offline data is flashed or restored online in a short time. Fault warning.
步骤240:根据所述二层路径树拓扑信息,确定所述网络中发生故障的网络设备节点信息。Step 240: According to the layer 2 path tree topology information, determine the information of the network device node that has failed in the network.
根据最终过滤后的离线用户数目,生成故障预警信息。本实施例中,定义蓝、黄、红三等故障预警级别,故障预警级别对应不同的发生故障的用户账号离线数据,其可根据具体的实际需求自定义配置。本实施例中,可设置黄色对应200个发生故障的用户账号离线数据,黄色对应500个发生故障的用户账号离线数据,红色对应1000个发生故障的用户账号离线数据。当发生故障的离线用户数达到相应预警级别后,即时生成故障预警信息,故障预警信息包括故障的离线用户数、故障的网络设备节点、根据发生故障的用户账号离线数据分析得到的扼要告警信息,从而实现精准判断故障范围和故障原因的目标,同时短信(第一途径)知会一线维护人员。Generate fault warning information according to the final filtered number of offline users. In this embodiment, three fault warning levels, such as blue, yellow, and red, are defined, and the fault warning levels correspond to different offline data of user accounts that have failed, which can be customized and configured according to specific actual needs. In this embodiment, yellow can be set to correspond to the offline data of 200 faulty user accounts, yellow to correspond to the offline data of 500 faulty user accounts, and red to correspond to the offline data of 1000 faulty user accounts. When the number of faulty offline users reaches the corresponding warning level, fault early warning information is generated immediately. The fault early warning information includes the number of faulty offline users, the faulty network device nodes, and the summary alarm information obtained by analyzing the offline data of the faulty user account. In this way, the goal of accurately judging the fault range and cause of the fault is achieved, and at the same time, the front-line maintenance personnel are notified by SMS (the first way).
本实施例中,还通过GIS地图实时监控,将当前发生故障的网络设备根据故障预警级别来通过GIS地图的形式展示出来,即时呈现当前预警障的网络设备的站点位置信息、设备信息、故障预警级别等信息,以更直地观地、实时地、全面的掌握故障情况。In this embodiment, the GIS map is also used for real-time monitoring, and the currently faulty network equipment is displayed in the form of a GIS map according to the fault early warning level, and the site location information, equipment information, and fault early warning of the current early warning network equipment are displayed in real time. Level and other information to more intuitively, real-time, and comprehensively grasp the fault situation.
步骤250:发送包含故障预警信息的短信至一线维护人员设备终端,以使一线维护人员根据故障预警信息快速开展故障抢修。其中,故障预警信息还包括故障范围信息及原因信息。Step 250: Send a short message containing the fault warning information to the equipment terminal of the front-line maintenance personnel, so that the front-line maintenance personnel can quickly carry out the fault repair according to the fault warning information. The fault warning information further includes fault range information and cause information.
步骤260:更新用户账号离线数据,判断用户账户离线数据是否低于预警阈值,若是,则故障预警恢复。具体地,故障预警信息生成后,后台会新建预警监控进程,关注预警监控中离线用户的上线恢复情况和投诉情况,投诉直接从投诉平台对接取数,一旦故障抢修有效,用户会逐步恢复,投诉会平稳。绘制用户账户离线数据实时恢复曲线和投诉曲线,通过该曲线直观可以判断故障是否真正解决,是否真正可以故障闭环,实现故障发现到处理全生命周期的把控。Step 260: Update the offline data of the user account, and determine whether the offline data of the user account is lower than the warning threshold, and if so, the fault warning is restored. Specifically, after the fault early warning information is generated, a new early warning monitoring process will be created in the background to pay attention to the online recovery and complaints of offline users in the early warning monitoring. Complaints are directly accessed from the complaint platform. Once the fault repair is effective, users will gradually recover and complain. will be stable. Draw the real-time recovery curve and complaint curve of the offline data of the user account. Through the curve, you can intuitively judge whether the fault is really solved, whether the fault can be closed, and realize the control of the whole life cycle from fault discovery to processing.
本发明实施例在网络设备发生故障告警时,获取二层路径数拓扑信息,实现了网络故障的快速定位,极大减少了人工投入成本,从业务层面监控,实现真正面向用户级别的监控,能够真实代表用户业务受损,实现故障从快速发现到业务恢复确认的全生命周期的把控管理。通过GIS实时监控,即时呈现当前预警障站点位置信息、设备信息,可以直地观地、实时地、全面的掌握故障情况。更进一步的,通过将包含故障预警信息的短信推送至一线维护人员,可以在监控到批量故障的第一时间通知到相关维护人员采取处理措施,提高了网络维护的效率。The embodiment of the present invention obtains the topology information of the number of Layer 2 paths when a network device has a fault alarm, realizes rapid location of network faults, greatly reduces labor input costs, monitors from the business level, and realizes real user-level monitoring, which can It truly represents that the user's business is damaged, and realizes the control and management of the whole life cycle from the rapid detection of the fault to the confirmation of the business recovery. Through GIS real-time monitoring, the location information and equipment information of the current warning fault site can be displayed in real time, and the fault situation can be grasped intuitively, in real time and comprehensively. Furthermore, by pushing the short message containing the fault warning information to the front-line maintenance personnel, the relevant maintenance personnel can be notified to take treatment measures as soon as batch faults are monitored, which improves the efficiency of network maintenance.
图5示出了本发明网络故障定位装置实施例的结构示意图。该网络故障定位装置300包括:用户状态信息获取模块310、判断模块320、拓扑分析模块330、定位模块340。FIG. 5 shows a schematic structural diagram of an embodiment of an apparatus for locating network faults according to the present invention. The network fault location device 300 includes: a user state information acquisition module 310 , a judgment module 320 , a topology analysis module 330 , and a location module 340 .
用户状态信息获取模块310,用于获取所述网络中的用户状态信息。The user state information acquisition module 310 is configured to acquire user state information in the network.
判断模块320,用于根据所述用户状态信息判断所述网络中的网络设备是否发生故障告警。The judging module 320 is configured to judge whether a network device in the network has a fault alarm according to the user state information.
拓扑分析模块330,用于当所述网络设备发生故障告警时,获取所述网络设备下的所有用户的二层路径树拓扑信息。The topology analysis module 330 is configured to acquire the Layer 2 path tree topology information of all users under the network device when a fault alarm occurs on the network device.
定位模块340,用于根据所述二层路径树拓扑信息,确定所述网络中发生故障的网络设备节点信息。The positioning module 340 is configured to determine the information of the network device node in the network where the fault occurs according to the layer 2 path tree topology information.
在一种可选的方式中,用户状态信息获取模块310基于PPPoE+技术对网络各层设备部署的PPPoE+功能来获取网络中的用户状态信息。In an optional manner, the user state information obtaining module 310 obtains the user state information in the network based on the PPPoE+ function deployed on the devices at each layer of the network based on the PPPoE+ technology.
具体地,PPPoE+(又称PPPoE Intermediate Agent),对PPPoE协议报文进行了扩充,接入设备截获PPPoE发现阶段的协议报文,在上行方向插入用户账号在网络中途经的网络设备的物理信息,在下行方向剥离掉用户的物理信息,然后再进行转发。因此,可以通过对网络各层设备部署PPPOE+功能获取用户账号在网络中途经的网络设备信息,以及每一网络设备节点下关联的用户状态信息。由此,可以根据对每一网络设备与用户账号的关联,获取网络中每一网络设备对应的用户账号的用户状态信息。用户状态信息表示用户账号在线、离线或掉电的状态信息。由于部署了PPPoE+功能,因此可以获取用户账号在网络中途经的网络设备信息,再以每个用户账号的设备路径树为监控单元,进行用户账号的用户状态信息实时波动的监控。具体地,可以根据关联OMC网管告警和集中故障平台获取监控单元的告警信息,从而判断出用户账号对应的离线的状态信息。Specifically, PPPoE+ (also known as PPPoE Intermediate Agent) extends the PPPoE protocol packets. The access device intercepts the protocol packets in the PPPoE discovery phase, and inserts the physical information of the network devices that the user account passes through in the network in the upstream direction. In the downstream direction, the physical information of the user is stripped, and then forwarded. Therefore, by deploying the PPPOE+ function on devices at various layers of the network, the network device information that the user account passes through in the network and the user status information associated with each network device node can be obtained. Thus, the user status information of the user account corresponding to each network device in the network can be acquired according to the association between each network device and the user account. The user status information indicates the status information of the user account being online, offline or powered off. Due to the deployment of the PPPoE+ function, it is possible to obtain the network device information that the user account passes through in the network, and then use the device path tree of each user account as the monitoring unit to monitor the real-time fluctuation of the user status information of the user account. Specifically, the alarm information of the monitoring unit can be obtained according to the associated OMC network management alarm and the centralized fault platform, so as to determine the offline state information corresponding to the user account.
在一种可选的方式中,判断模块320根据用户状态信息判断网络中的网络设备是否发生故障告警。由于用户状态信息获取模块310已经获取了用户状态信息,因此判断模块320通过对用户状态信息的分析,可以确定网络中的各网络设备下用户状态为离线的用户数量,根据离线的用户数量判断网络中的网络设备是否发生故障告警。具体地,本实施例中对离线的用户数量设置了第一阈值,当离线的用户数量大于预设的第一阈值时,则该网络设备发生故障告警。其中,第一阀值可根据宽带网络的实际需求自定义配置。本发明的一个实施中,针对第一阈值的配置,为了避免意外配置的阈值比较低,导致生成大量的预警,因此在程序处理的时候做条件控制。第一阀值设置规则设置如下:PON口按最低要达到20个离线用户数,且离线率90%以上触发预警;OLT按用户最低达到200离线用户数或离线率达到90%以上触发预警;SW跟BRAS离线用户超1000个则触发预警。本实施例中,对于因掉电离线的用户数量不视为发生故障,因此在分析离线的用户数量时,需要将因掉电离线的用户删除。判断模块320对因掉电离线的用户进行分析的过程如下:In an optional manner, the judging module 320 judges whether a network device in the network has a fault alarm according to the user state information. Since the user status information acquisition module 310 has acquired the user status information, the judgment module 320 can determine the number of users whose user status is offline under each network device in the network by analyzing the user status information, and judge the network according to the number of offline users. Whether the network device in the fault alarm occurs. Specifically, in this embodiment, a first threshold is set for the number of offline users, and when the number of offline users is greater than the preset first threshold, a fault alarm occurs on the network device. The first threshold value can be customized and configured according to the actual requirements of the broadband network. In an implementation of the present invention, for the configuration of the first threshold, in order to avoid the unexpectedly configured threshold being relatively low, resulting in the generation of a large number of early warnings, condition control is performed during program processing. The first threshold setting rule is set as follows: the PON port must have a minimum number of 20 offline users, and the offline rate is more than 90% to trigger an early warning; OLT triggers an early warning when the number of users reaches a minimum of 200 offline users or the offline rate reaches more than 90%; SW If there are more than 1000 offline users with BRAS, an alert will be triggered. In this embodiment, the number of users who are offline due to power failure is not regarded as a failure. Therefore, when analyzing the number of offline users, it is necessary to delete the users who are offline due to power failure. The process of analyzing the users offline due to power failure by the judgment module 320 is as follows:
获取网络设备的OMC告警、集中故障告警和数据库逻辑表。Obtain OMC alarms, centralized fault alarms, and database logic tables of network devices.
通过双因聚类算法对所述OMC告警、集中故障告警和数据库逻辑表进行分析,获取所述用户的离线原因。The OMC alarm, the centralized fault alarm and the database logic table are analyzed through a two-factor clustering algorithm to obtain the offline cause of the user.
当用户离线原因为掉电离线时,则将所述用户删除。When the reason for the user being offline is power failure and offline, the user is deleted.
具体地,利用TCA(Two-factor clustering algorithm)双因聚类算法,即通过关联OMC网管告警和集中故障平台获取告警信息,分析告警信息中的用户账号离线数据,对用户账号离线数据的故障原因进行分类,根据告警信息的OID信息,匹配该告警信息关联的网络设备信息所在的二层路径树拓扑的位置,识别告警信息关联的用户账号信息,由此判断用户账号离线原因,如:用户账号离线数据中关联到OMC的DGI告警信息表示用户账号断电;用户主动下线会发送presence消息,网络故障导致的断线则无通知发送,即判断为误告数据。然后过滤非故障数据(用户主动下线、闪断等)和误告数据(用户断电、小区断电等)这两大类因子导致的用户账号离线的误告数据,只保留真实发生故障的用户账号离线数据。判断用户是否为闪断恢复,若是则视为掉电电导致的。具体地,判断离线的用户账号是否闪断或短时间上线恢复,如设置纠错时间2分钟,2分钟恢复则不预警。判断用户状态为离线的用户在预定时间内重新上线时,则将所述用户从离线用户数据中删除。通过上述分析,最终得到真实发生故障的用户账号离线数据。Specifically, the TCA (Two-factor clustering algorithm) two-factor clustering algorithm is used, that is, the alarm information is obtained by correlating the OMC network management alarm and the centralized fault platform, the offline data of the user account in the alarm information is analyzed, and the cause of the failure of the offline data of the user account is analyzed. Classify, according to the OID information of the alarm information, match the location of the Layer 2 path tree topology where the network device information associated with the alarm information is located, identify the user account information associated with the alarm information, and thereby determine the reason for the offline user account, such as: user account The DGI alarm information associated with the OMC in the offline data indicates that the user account is powered off; a presence message will be sent if the user voluntarily goes offline, and no notification will be sent if the disconnection is caused by a network failure, that is, it is judged as false alarm data. Then filter the non-fault data (users take the initiative to go offline, flash, etc.) and false alarm data (user power outage, community power outage, etc.) caused by two types of factors, the user account offline false alarm data, only keep the real failure data User account offline data. Determine whether the user is recovering from a flash failure, if so, it is considered to be caused by a power failure. Specifically, it is judged whether the offline user account is disconnected or restored online in a short period of time. When it is determined that the user whose user status is offline is back online within a predetermined time, the user is deleted from the offline user data. Through the above analysis, the offline data of the user account that has actually failed is finally obtained.
在一种可选的方式中,拓扑分析模块330通过以下过程获取该网络设备下的所有用户的二层路径树拓扑信息:In an optional manner, the topology analysis module 330 obtains the Layer 2 path tree topology information of all users under the network device through the following process:
获取网络设备下的所有用户信息;Obtain all user information under the network device;
根据所述用户信息获取该用户途经的网络设备信息;Acquiring network device information that the user passes through according to the user information;
根据所述用户途经的网路设备信息,形成用户的二层路径树拓扑信息。According to the network device information that the user passes through, the layer 2 path tree topology information of the user is formed.
具体地,基于PPPoE+技术对网络各层设备部署PPPoE+功能,来获取用户账号在网络中途经的网络设备信息,主要包括以下过程:Specifically, deploying the PPPoE+ function on devices at all layers of the network based on the PPPoE+ technology to obtain the network device information that the user account passes through in the network mainly includes the following processes:
对BRAS(宽带远程接入服务器,Broadband Remote Access Server)和OLT(光线路终端)等网络设备部署PPPOE+功能,获取家庭宽带的用户账号到3A认证计费时携带OLT、PON口、VLAN等网络设备信息。Deploy PPPOE+ function for network devices such as BRAS (Broadband Remote Access Server) and OLT (Optical Line Terminal), and carry network devices such as OLT, PON port, VLAN, etc. when obtaining a home broadband user account to 3A authentication and billing information.
具体的,当携带PPPoE+tag信息的报文经过某一网络设备的端口时,若端口模式为replace,则会把tag里的信息替换成该网络设备的本机设备信息,因此可以获取相应的用户账号在所述全设备网元中途经的网络设备信息。Specifically, when a packet carrying PPPoE+tag information passes through the port of a certain network device, if the port mode is replace, the information in the tag will be replaced with the local device information of the network device, so the corresponding device information can be obtained. Network device information that the user account passes through in the full-device network element.
通过SNMP OID读取mac查询家宽业务VLAN关联的SW(交换机),从而建立OLT-SW-BRAS从下往上的用户路径树拓扑。Read the mac through SNMP OID to query the SW (switch) associated with the home broadband service VLAN, so as to establish the user path tree topology of OLT-SW-BRAS from bottom to top.
SNMP(Simple Network Management Protocol)为简单网络管理协议,一个网络设备以守护进程的方式运行SNMP代理,该守护进程能够响应来自网络的各种请求信息。该SNMP代理提供大量的OID(对象标识符,Object Identifiers)。因此,通过使用SNMP协议获取某个SW相关设备信息,则可以通过设备的OID获取相关信息,并能获取SW下所有连接的PC机的mac地址、IP地址以及端口号信息,此处不再赘述。最终,建立起OLT-SW-BRAS从下往上的用户路径树拓扑。SNMP (Simple Network Management Protocol) is a simple network management protocol. A network device runs an SNMP agent as a daemon process, and the daemon process can respond to various request information from the network. The SNMP agent provides a number of OIDs (Object Identifiers). Therefore, by using the SNMP protocol to obtain the information of a certain SW-related device, the relevant information can be obtained through the OID of the device, and the mac address, IP address and port number information of all connected PCs under the SW can be obtained, which will not be repeated here. . Finally, the bottom-up user path tree topology of OLT-SW-BRAS is established.
通过对接3A取数,解析实时获取的radius报文解析获得报文中的用户账号,以及PPPOE+内容,摘取用户上网所经过的PON端口、设备以及相关配置信息等网络设备信息,然后以用户账号为单位建立一条二层路径,从而形成“用户账号-PON端口-OLT-SW-BRAS”的二层路径,再以多用户账号形成二层路径树,最终形成网络设备信息与用户账号关联的二层路径树拓扑,也即用户账号的网络路径画像,从而构建监控单位,实现对某设备、某节点的在线用户数进行实时监控。By connecting to 3A to get data, parse the radius packet obtained in real time and parse to obtain the user account and PPPOE+ content in the packet, extract the network equipment information such as the PON port, equipment and related configuration information through which the user accesses the Internet, and then use the user account. A layer-2 path is established for each unit, thereby forming a layer-2 path of "user account-PON port-OLT-SW-BRAS", and then a layer-2 path tree is formed with multiple user accounts, and finally a second layer of network device information associated with user accounts is formed. Layer path tree topology, that is, the network path portrait of the user account, so as to build a monitoring unit and realize real-time monitoring of the number of online users of a device and a node.
由此,拓扑分析模块330在某一网络设备发生故障告警时,根据该二层路径树拓扑信息,获取所述发生故障告警的网络设备的拓扑信息;根据所述网络设备的拓扑信息,确定发生故障的网络设备节点的信息。Therefore, when a failure alarm occurs in a certain network device, the topology analysis module 330 obtains the topology information of the network device in which the failure alarm occurs according to the layer 2 path tree topology information; Information about the failed network device node.
在一种可选的方式中,定位模块340根据最终过滤后的离线用户数目,生成故障预警信息。本实施例中,定义蓝、黄、红三等故障预警级别,故障预警级别对应不同的发生故障的用户账号离线数据,其可根据具体的实际需求自定义配置。本实施例中,可设置黄色对应200个发生故障的用户账号离线数据,黄色对应500个发生故障的用户账号离线数据,红色对应1000个发生故障的用户账号离线数据。当发生故障的离线用户数达到相应预警级别后,即时生成故障预警信息,故障预警信息包括故障的离线用户数、故障的网络设备节点、根据发生故障的用户账号离线数据分析得到的扼要告警信息,从而实现精准判断故障范围和故障原因的目标,同时短信(第一途径)知会一线维护人员。本发明通过基于PPPoE+的双因子聚类算法,建立了一套家庭宽带业务的智能网络故障定位体系,通过PPPoE+构建用户账号的网络路径画像,实时监控网络设备用户账号数据的波动,并通过TCA双因子聚类算法,过滤干扰因素,保留真实故障,达到即时预警的有益效果。In an optional manner, the location module 340 generates fault warning information according to the final filtered number of offline users. In this embodiment, three fault warning levels, such as blue, yellow, and red, are defined, and the fault warning levels correspond to different offline data of user accounts that have failed, which can be customized and configured according to specific actual needs. In this embodiment, yellow can be set to correspond to the offline data of 200 faulty user accounts, yellow to correspond to the offline data of 500 faulty user accounts, and red to correspond to the offline data of 1000 faulty user accounts. When the number of faulty offline users reaches the corresponding warning level, fault early warning information is generated immediately. The fault early warning information includes the number of faulty offline users, the faulty network device nodes, and the summary alarm information obtained by analyzing the offline data of the faulty user account. In this way, the goal of accurately judging the fault range and cause of the fault is achieved, and at the same time, the front-line maintenance personnel are notified by SMS (the first way). The present invention establishes a set of intelligent network fault location system for home broadband services through a two-factor clustering algorithm based on PPPoE+, constructs network path portraits of user accounts through PPPoE+, monitors the fluctuation of user account data of network equipment in real time, and uses TCA dual The factor clustering algorithm filters out the interference factors, retains the real faults, and achieves the beneficial effect of instant warning.
图6示出了本发明网络故障定位设备400实施例的结构示意图。如图6所示,该网络故障定位设备400包括:集中故障平台401、GIS实时监控装置402、短信推送模块403、资源管理平台404、OMC网管平台405、计费平台406,以及上述的家庭宽带网络故障预警装置300。FIG. 6 shows a schematic structural diagram of an embodiment of a network fault location device 400 according to the present invention. As shown in FIG. 6 , the network fault location device 400 includes: a centralized fault platform 401, a GIS real-time monitoring device 402, a short message push module 403, a resource management platform 404, an OMC network management platform 405, a billing platform 406, and the above-mentioned home broadband Network failure early warning device 300 .
其中,集中故障平台401、GIS实时监控装置402、短信推送模块403、资源管理平台404、OMC网管平台405、计费平台406,分别与家庭宽带网络故障预警装置300连接。网络故障定位装置300通过部署PPPoE+实时获取集中故障平台401、资源管理平台404、OMC网管平台405、计费平台406的网络设备信息,并根据这些网络设备信息生成网络设备信息与用户账号关联的二层路径树拓扑,也即用户账号的网络路径画像,生成故障预警信息。The centralized fault platform 401, the GIS real-time monitoring device 402, the short message push module 403, the resource management platform 404, the OMC network management platform 405, and the billing platform 406 are respectively connected to the home broadband network fault early warning device 300. The network fault location apparatus 300 obtains the network equipment information of the centralized fault platform 401, the resource management platform 404, the OMC network management platform 405, and the billing platform 406 in real time by deploying PPPoE+, and generates two data associated with the network equipment information and the user account according to the network equipment information. The layer path tree topology, that is, the network path portrait of the user account, generates fault warning information.
GIS实时监控装置402根据网络故障定位装置300的故障预警信息将当前发生故障的网络设备根据故障预警级别来通过GIS地图的形式展示出来,即时呈现当前预警障的网络设备的站点位置信息、设备信息、故障预警级别等信息,以更直地观地、实时地、全面的掌握故障情况。The GIS real-time monitoring device 402 displays the currently faulty network equipment according to the fault warning level in the form of a GIS map according to the fault warning information of the network fault locating device 300, and instantly presents the site location information and equipment information of the current fault warning network equipment. , fault warning level and other information, so as to grasp the fault situation more intuitively, in real time and comprehensively.
短信推送模块403发送包含故障预警信息的短信至一线维护人员设备终端,以使一线维护人员根据故障预警信息快速开展故障抢修。其中,故障预警信息还包括故障范围信息及原因信息。The short message push module 403 sends a short message containing the fault warning information to the equipment terminal of the front-line maintenance personnel, so that the front-line maintenance personnel can quickly carry out the fault repair according to the fault warning information. The fault warning information further includes fault range information and cause information.
本发明通过基于PPPoE+的双因子聚类算法,建立了一套家庭宽带业务的智能监控和故障快速发现体系,通过PPPoE+构建用户账号的网络路径画像,实时监控网络设备用户账号数据的波动,并通过TCA双因子聚类算法,过滤干扰因素,保留真实故障,达到即时预警的有益效果。进一步的,通过GIS实时监控,即时呈现当前预警障站点位置信息、设备信息,可以直地观地、实时地、全面的掌握故障情况。通过将包含故障预警信息的短信推送至一线维护人员,可以在监控到批量故障的第一时间通知到相关维护人员采取处理措施,提高了网络维护的效率。The present invention establishes a set of intelligent monitoring and fault rapid detection system for home broadband services through a two-factor clustering algorithm based on PPPoE+, constructs a network path portrait of a user account through PPPoE+, monitors the fluctuation of user account data of network equipment in real time, and uses TCA two-factor clustering algorithm filters interference factors, retains real faults, and achieves the beneficial effect of immediate warning. Further, through GIS real-time monitoring, the location information and equipment information of the current warning fault site can be displayed in real time, and the fault situation can be grasped intuitively, in real time, and comprehensively. By pushing the short message containing the fault warning information to the front-line maintenance personnel, the relevant maintenance personnel can be notified to take treatment measures as soon as batch faults are monitored, which improves the efficiency of network maintenance.
本发明实施例提供了一种非易失性计算机存储介质,执行上述实施例中的网络故障定位方法,所述计算机存储介质存储有至少一可执行指令,该计算机可执行指令可执行上述任意方法实施例中的网络故障定位预警方法。An embodiment of the present invention provides a non-volatile computer storage medium for executing the method for locating a network fault in the foregoing embodiment, where the computer storage medium stores at least one executable instruction, and the computer-executable instruction can execute any of the foregoing methods. The network fault location and early warning method in the embodiment.
可执行指令具体可以用于使得处理器执行以下操作:Executable instructions can specifically be used to cause the processor to perform the following operations:
获取所述网络中的用户状态信息;obtaining user status information in the network;
根据所述用户状态信息判断所述网络中的网络设备是否发生故障告警;Determine whether a network device in the network has a fault alarm according to the user state information;
当所述网络设备发生故障告警时,获取所述网络设备下的所有用户的二层路径树拓扑信息;When a fault alarm occurs on the network device, obtain the Layer 2 path tree topology information of all users under the network device;
根据所述二层路径树拓扑信息,确定所述网络中发生故障的网络设备节点信息。According to the layer 2 path tree topology information, the information of the network device node that has failed in the network is determined.
本发明实施例在网络设备发生故障告警时,获取二层路径数拓扑信息,实现了网络故障的快速定位,极大减少了人工投入成本,从业务层面监控,实现真正面向用户级别的监控,能够真实代表用户业务受损,实现故障从快速发现到业务恢复确认的全生命周期的把控管理。The embodiment of the present invention obtains the topology information of the number of Layer 2 paths when a network device has a fault alarm, realizes rapid location of network faults, greatly reduces labor input costs, monitors from the business level, and realizes real user-level monitoring, which can It truly represents that the user's business is damaged, and realizes the control and management of the whole life cycle from the rapid detection of the fault to the confirmation of the business recovery.
图7示出了本发明一种计算机设备实施例的结构示意图,本发明具体实施例并不对计算机设备的具体实现做限定。如图7所示,该计算机设备可以包括:处理器(processor)502、通信接口(Communications Interface)504、存储器(memory)506、以及通信总线508。FIG. 7 shows a schematic structural diagram of an embodiment of a computer device of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computer device. As shown in FIG. 7 , the computer device may include: a processor (processor) 502 , a communications interface (Communications Interface) 504 , a memory (memory) 506 , and a communication bus 508 .
其中:处理器502、通信接口504、以及存储器506通过通信总线508完成相互间的通信。通信接口504,用于与其它设备比如客户端或其它服务器等的网元通信。处理器502,用于执行程序510,具体可以执行上述用于该计算机设备的图形绘制方法实施例中的相关步骤。The processor 502 , the communication interface 504 , and the memory 506 communicate with each other through the communication bus 508 . The communication interface 504 is used to communicate with network elements of other devices such as clients or other servers. The processor 502 is configured to execute the program 510, and specifically may execute the relevant steps in the above embodiments of the graphics rendering method for the computer device.
具体地,程序510可以包括程序代码,该程序代码包括计算机操作指令。Specifically, the program 510 may include program code including computer operation instructions.
处理器502可能是中央处理器CPU,或者是特定集成电路ASIC(ApplicationSpecific Integrated Circuit),或者是被配置成实施本发明实施例的一个或多个集成电路。该计算机设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个ASIC。The processor 502 may be a central processing unit (CPU), or an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention. The one or more processors included in the computer equipment may be the same type of processors, such as one or more CPUs; or may be different types of processors, such as one or more CPUs and one or more ASICs.
存储器506,用于存放程序510。存储器506可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。The memory 506 is used to store the program 510 . Memory 506 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk memory.
程序510具体可以用于使得处理器402执行上述方法实施例中的网络故障定位方法。The program 510 may be specifically configured to cause the processor 402 to execute the method for locating network faults in the foregoing method embodiments.
本发明实施例在网络设备发生故障告警时,获取二层路径数拓扑信息,实现了网络故障的快速定位,极大减少了人工投入成本,从业务层面监控,实现真正面向用户级别的监控,能够真实代表用户业务受损,实现故障从快速发现到业务恢复确认的全生命周期的把控管理。The embodiment of the present invention obtains the topology information of the number of Layer 2 paths when a network device has a fault alarm, realizes rapid location of network faults, greatly reduces labor input costs, monitors from the business level, and realizes real user-level monitoring, which can It truly represents that the user's business is damaged, and realizes the control and management of the whole life cycle from the rapid detection of the fault to the confirmation of the business recovery.
在此提供的算法或显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述,构造这类系统所要求的结构是显而易见的。此外,本发明实施例也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本发明的内容,并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。The algorithms or displays provided herein are not inherently related to any particular computer, virtual system, or other device. Various general-purpose systems can also be used with teaching based on this. The structure required to construct such a system is apparent from the above description. Furthermore, embodiments of the present invention are not directed to any particular programming language. It is to be understood that various programming languages may be used to implement the inventions described herein, and that the descriptions of specific languages above are intended to disclose the best mode for carrying out the invention.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. It will be understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
类似地,应当理解,为了精简本发明并帮助理解各个发明方面中的一个或多个,在上面对本发明的示例性实施例的描述中,本发明实施例的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。Similarly, it is to be understood that, in the above description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together into a single implementation in order to simplify the invention and to aid in the understanding of one or more of the various aspects of the invention. examples, figures, or descriptions thereof. This disclosure, however, should not be construed as reflecting an intention that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art will understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and further they may be divided into multiple sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method so disclosed may be employed in any combination, unless at least some of such features and/or procedures or elements are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
此外,本领域的技术人员能够理解,尽管在此的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。Furthermore, it will be understood by those skilled in the art that although some of the embodiments herein include certain features, but not others, included in other embodiments, that combinations of features of the different embodiments are intended to be within the scope of the present invention And form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。上述实施例中的步骤,除有特殊说明外,不应理解为对执行顺序的限定。It should be noted that the above-described embodiments illustrate rather than limit the invention, and that alternative embodiments may be devised by those skilled in the art without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. do not denote any order. These words can be interpreted as names. The steps in the above embodiments should not be construed as limitations on the execution order unless otherwise specified.
Claims (9)
1.一种网络故障定位方法,其特征在于,所述方法包括:1. A network fault location method, wherein the method comprises: 获取所述网络中的用户状态信息;obtaining user status information in the network; 根据所述用户状态信息判断所述网络中的网络设备是否发生故障告警;Determine whether a network device in the network has a fault alarm according to the user state information; 当所述网络设备发生故障告警时,获取所述网络设备下的所有用户的二层路径树拓扑信息,包括:获取所述网络设备下的所有用户信息,根据所述用户信息获取该用户途经的网络设备信息,根据所述用户途经的网路设备信息,形成用户的二层路径树拓扑信息;其中,以用户账号为单位建立一条二层路径,形成“用户账号-PON端口-OLT-SW-BRAS”的二层路径,以多用户账号形成二层路径树,形成网络设备信息与用户账号关联的二层路径树拓扑;When a fault alarm occurs on the network device, acquiring the Layer 2 path tree topology information of all users under the network device includes: acquiring information about all users under the network device, and acquiring the user's route according to the user information. The network device information, according to the network device information that the user passes through, forms the user's two-layer path tree topology information; wherein, a two-layer path is established in units of user accounts, forming "user account-PON port-OLT-SW- BRAS" Layer 2 path, a Layer 2 path tree is formed with multiple user accounts, forming a Layer 2 path tree topology in which network device information is associated with user accounts; 根据所述二层路径树拓扑信息,确定所述网络中发生故障的网络设备节点信息。According to the layer 2 path tree topology information, the information of the network device node that has failed in the network is determined. 2.如权利要求1所述的网络故障定位方法,其特征在于,所述根据用户状态信息判断所述网络中的网络设备是否发生故障告警,包括:2. The method for locating a network fault according to claim 1, wherein the judging whether a network device in the network has a fault alarm according to the user state information comprises: 获取所述网络中的各网络设备下用户状态为离线的用户数量;Obtain the number of users whose user status is offline under each network device in the network; 当所述用户数量大于预设的第一阈值时,则所述网络设备发生故障告警。When the number of users is greater than the preset first threshold, the network device generates a fault alarm. 3.如权利要求2所述的网络故障定位方法,其特征在于,所述获取网络设备下的所有用户的二层路径树拓扑信息之后,进一步包括:3. The method for locating network faults according to claim 2, wherein after acquiring the Layer 2 path tree topology information of all users under the network device, the method further comprises: 获取所述网络设备的OMC告警、集中故障告警和数据库逻辑表;Obtain OMC alarm, centralized fault alarm and database logic table of described network equipment; 通过双因聚类算法对所述OMC告警、集中故障告警和数据库逻辑表进行分析,获取所述用户的离线原因;The OMC alarm, the centralized fault alarm and the database logic table are analyzed by a two-factor clustering algorithm to obtain the offline cause of the user; 当用户离线原因为掉电离线时,则将所述用户删除。When the reason for the user being offline is power failure and offline, the user is deleted. 4.如权利要求2所述的网络故障定位方法,其特征在于,所述获取网络设备下的所有用户的二层路径树拓扑信息之后,进一步包括:4. The method for locating network faults according to claim 2, wherein after acquiring the Layer 2 path tree topology information of all users under the network device, the method further comprises: 判断所述用户状态为离线的用户在预定时间内重新上线时,则将所述用户删除。When it is determined that the user whose user status is offline is back online within a predetermined time, the user is deleted. 5.如权利要求3或4所述的网络故障定位方法,其特征在于,所述根据二层路径树拓扑信息,确定所述网络中发生故障的网络设备节点信息,包括:5. The method for locating a network fault according to claim 3 or 4, wherein the determining, according to the Layer 2 path tree topology information, the information of the network device node where the fault occurs in the network, comprising: 根据所述二层路径树拓扑信息,获取所述发生故障告警的网络设备的拓扑信息;obtaining, according to the layer 2 path tree topology information, the topology information of the network device in which the fault alarm occurs; 根据所述网络设备的拓扑信息,确定发生故障的网络设备节点的信息。According to the topology information of the network device, the information of the failed network device node is determined. 6.如权利要求5所述的网络故障定位方法,其特征在于,所述确定网络中发生故障的网络设备节点信息之后,进一步包括:6. The method for locating network faults according to claim 5, characterized in that after said determining the information of the network device node that has failed in the network, the method further comprises: 根据所述用户状态为离线的用户数量和预设的告警级别阈值,确定相应的告警级别;Determine a corresponding alarm severity according to the number of users whose user status is offline and a preset alarm severity threshold; 根据所述告警级别进行网络故障维护。Perform network fault maintenance according to the alarm severity. 7.一种网络故障定位装置,其特征在于,所述定位装置包括:7. A network fault location device, wherein the location device comprises: 用户状态信息获取模块,用于获取所述网络中的用户状态信息;a user state information acquisition module, used for acquiring user state information in the network; 判断模块,用于根据所述用户状态信息判断所述网络中的网络设备是否发生故障告警;a judgment module, configured to judge whether a network device in the network has a fault alarm according to the user state information; 拓扑分析模块,用于当所述网络设备发生故障告警时,获取所述网络设备下的所有用户的二层路径树拓扑信息,包括:获取所述网络设备下的所有用户信息,根据所述用户信息获取该用户途经的网络设备信息,根据所述用户途经的网路设备信息,形成用户的二层路径树拓扑信息;其中,以用户账号为单位建立一条二层路径,形成“用户账号-PON端口-OLT-SW-BRAS”的二层路径,以多用户账号形成二层路径树,形成网络设备信息与用户账号关联的二层路径树拓扑;a topology analysis module, configured to acquire the Layer 2 path tree topology information of all users under the network device when a fault alarm occurs in the network device, including: acquiring information of all users under the network device, according to the user The information obtains the network device information that the user passes through, and according to the network device information that the user passes through, the user's two-layer path tree topology information is formed; wherein, a two-layer path is established with the user account as a unit, forming a "user account-PON". Port-OLT-SW-BRAS” layer 2 path, using multiple user accounts to form a layer 2 path tree, forming a layer 2 path tree topology associated with network device information and user accounts; 定位模块,用于根据所述二层路径树拓扑信息,确定所述网络中发生故障的网络设备节点信息。The positioning module is configured to determine the information of the network device node in the network where the fault occurs according to the layer 2 path tree topology information. 8.一种计算机设备,其特征在于,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;8. A computer device, comprising: a processor, a memory, a communication interface and a communication bus, and the processor, the memory and the communication interface communicate with each other through the communication bus; 所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如权利要求1-6任意一项所述的网络故障定位方法。The memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the network fault location method according to any one of claims 1-6. 9.一种计算机存储介质,其特征在于,所述存储介质中存储有至少一可执行指令,所述可执行指令使处理器执行如权利要求1-6任意一项所述的网络故障定位方法。9. A computer storage medium, characterized in that the storage medium stores at least one executable instruction, the executable instruction enables a processor to execute the network fault location method according to any one of claims 1-6 .
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910668662.9A CN112291075B (en) | 2019-07-23 | 2019-07-23 | Network fault positioning method and device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910668662.9A CN112291075B (en) | 2019-07-23 | 2019-07-23 | Network fault positioning method and device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112291075A CN112291075A (en) | 2021-01-29 |
CN112291075B true CN112291075B (en) | 2022-08-30 |
Family
ID=74418597
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910668662.9A Active CN112291075B (en) | 2019-07-23 | 2019-07-23 | Network fault positioning method and device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112291075B (en) |
Families Citing this family (6)
* Cited by examiner, † Cited by third partyPublication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115086154B (en) * | 2021-03-11 | 2024-08-06 | 中国电信股份有限公司 | Fault delimiting method and device, storage medium and electronic equipment |
CN115314358B (en) * | 2021-05-08 | 2024-04-09 | 中国移动通信集团福建有限公司 | Method and device for monitoring faults of dummy network elements of home wide network |
CN115484141B (en) * | 2021-06-15 | 2023-08-15 | 中国移动通信集团河南有限公司 | User determination method, device, equipment and storage medium |
CN114245242B (en) * | 2021-12-23 | 2023-10-27 | 海南神州泰岳软件有限公司 | User offline detection method and device and electronic equipment |
CN115426244B (en) * | 2022-08-09 | 2024-03-15 | 武汉虹信技术服务有限责任公司 | Network equipment fault detection method based on big data |
CN117544828A (en) * | 2023-12-28 | 2024-02-09 | 成都网丁科技有限公司 | IPTV network fault positioning method |
Citations (5)
* Cited by examiner, † Cited by third partyPublication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101047618A (en) * | 2006-03-29 | 2007-10-03 | 华为技术有限公司 | Method and system for acquiring network route information |
CN101640612A (en) * | 2009-09-07 | 2010-02-03 | 杭州华三通信技术有限公司 | Method and device for flow path discovery and fault fast positioning |
CN103986604A (en) * | 2014-05-23 | 2014-08-13 | 华为技术有限公司 | Network fault location method and device |
EP2993824A2 (en) * | 2014-09-08 | 2016-03-09 | Alcatel Lucent | Fault monitoring in multi-domain networks |
CN107659423A (en) * | 2016-07-25 | 2018-02-02 | 南京中兴新软件有限责任公司 | Method for processing business and device |
-
2019
- 2019-07-23 CN CN201910668662.9A patent/CN112291075B/en active Active
Patent Citations (5)
* Cited by examiner, † Cited by third partyPublication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101047618A (en) * | 2006-03-29 | 2007-10-03 | 华为技术有限公司 | Method and system for acquiring network route information |
CN101640612A (en) * | 2009-09-07 | 2010-02-03 | 杭州华三通信技术有限公司 | Method and device for flow path discovery and fault fast positioning |
CN103986604A (en) * | 2014-05-23 | 2014-08-13 | 华为技术有限公司 | Network fault location method and device |
EP2993824A2 (en) * | 2014-09-08 | 2016-03-09 | Alcatel Lucent | Fault monitoring in multi-domain networks |
CN107659423A (en) * | 2016-07-25 | 2018-02-02 | 南京中兴新软件有限责任公司 | Method for processing business and device |
Also Published As
Publication number | Publication date |
---|---|
CN112291075A (en) | 2021-01-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112291075B (en) | 2022-08-30 | Network fault positioning method and device, computer equipment and storage medium |
WO2018126645A1 (en) | 2018-07-12 | Communication network management method and apparatus therefor |
CN103178991B (en) | 2016-06-22 | A kind of method and system of Multi net voting association analysis |
CN106130761B (en) | 2019-06-18 | The recognition methods of the failed network device of data center and device |
CN106656588A (en) | 2017-05-10 | Fault locating method and device for intelligent substation |
CN102611568B (en) | 2016-03-30 | A kind of failure service path diagnostic method and device |
CN102158360A (en) | 2011-08-17 | Network fault self-diagnosis method based on causal relationship positioning of time factors |
CN105450472A (en) | 2016-03-30 | Method and device for automatically acquiring states of physical components of servers |
CN111030873A (en) | 2020-04-17 | Fault diagnosis method and device |
CN102546274A (en) | 2012-07-04 | Alarm monitoring method and alarm monitoring equipment in communication service |
Stiawan et al. | 2016 | Anomaly detection and monitoring in Internet of Things communication |
CN104113448A (en) | 2014-10-22 | Method for automatically finding and monitoring devices in local area network |
CN106452915B (en) | 2020-03-13 | Method and device for discovering MPLS VPN network topology |
CN102594613B (en) | 2015-04-08 | Method and device for failure diagnosis of multi-protocol label switching virtual private network (MPLS VPN) |
CN110752959A (en) | 2020-02-04 | An intelligent substation process layer physical link fault location system |
CN103634166B (en) | 2017-05-03 | Equipment survival detection method and equipment survival detection device |
CN116248479A (en) | 2023-06-09 | Network path detection method, device, equipment and storage medium |
CN110620693A (en) | 2019-12-27 | Railway station route remote restart control system and method based on Internet of things |
WO2019079961A1 (en) | 2019-05-02 | Method and device for determining shared risk link group |
CN101431435B (en) | 2012-01-04 | Connection-oriented service configuration and management method |
CN117650964A (en) | 2024-03-05 | Intelligent network operation and maintenance management system |
CN113766363B (en) | 2023-04-07 | Fault monitoring method and device and computing equipment |
CN115834365A (en) | 2023-03-21 | Method, device and equipment for home wide service diagnosis based on novel network |
JP2019145893A (en) | 2019-08-29 | Topology determination device, topology determination method, topology determination program, and communication system |
CN107864057B (en) | 2020-12-25 | Online automatic checking and alarming method based on networking state |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
2021-01-29 | PB01 | Publication | |
2021-01-29 | PB01 | Publication | |
2021-02-23 | SE01 | Entry into force of request for substantive examination | |
2021-02-23 | SE01 | Entry into force of request for substantive examination | |
2022-08-30 | GR01 | Patent grant | |
2022-08-30 | GR01 | Patent grant |