Incident and Alert Event Categories
Focus
Focus

Incident and Alert Event Categories

Table of Contents

Incident and Alert Event Categories

Lets see all the incident and alert event categories in Prisma SD-WAN.
Different types of events trigger alerts and incidents generated in the system. These events are categorized broadly as hardware and software issues, device related issues, peering issues, site level issues, tunnel issues, application performance issues, and secure fabric link issues. Based on the type of event, these issues may originate from the ION device or the controller.
Hardware and Software Issues
Hardware Issues—Hardware issues are raised by the device and deals with device hardware issues.
  • Power supply unit failure (Incident—warning).
  • Memory DIMM failure (Incident—critical).
  • Disk SMART threshold exceeding (Alert).
  • Disk failure (IO error) (Incident—critical).
Software Issues—Software issues are raised by the device and deals with device software issues. For example:
  • Software module fails to start or restart (Incident).
  • High CPU usage (Alert).
  • High disk usage (Alert).
  • Corrupt database (Incident).
Device Related Issues
Device level issues include offline devices or devices with missing or incorrect interface configurations, missing or incorrect routing configuration, DHCP, port or NAT configurations, or devices with offline status.
For example, when a device is in an offline state for an extended period of time, it impacts the VPNs connected to other sites. For device-to-controller connectivity issues, troubleshoot the device through the device toolkit, using utilities such as Ping and Curl. You may also check device reachability issues using Ping or SSH.
Device Interface Issues—Interface issues are raised by the device and deals with device interface issues. For example:
  • Interface down (Incident—warning).
  • Excessive errors on the interfaces (Alert).
Device Registration Issues—Registration issues are raised by the device or the controller depending on the incident and deals with device registration issues. For example:
  • Failure to establish CIC channel with the controller (Incident—warning).
  • Failure to retrieve inventory after repeated attempts (Incident—warning).
BGP Peering Issues
Routing level issues include routing protocol configurations issues, BGP neighbor establishment issues, static routing or misconfigured static routes, or issues with prefix learning or advertisement.
  • For BGP session establishment related issues, check if there is an incorrect AS# or global parameters, an incorrect BGP peer IP, or check if BGP multi-hop is required.
  • For BGP peer type issues (data center only), check if the right peer type is selected, core, edge, or classic peer type. The edge peer only learns prefixes.
  • For static routing related issues, check the configuration for administrative distance, next hop (interface, IP, self), local or global scope to block or allow the advertisement into the Prisma SD-WAN fabric.
  • For prefix learning and advertisement issues, check the route map configurations on Prisma SD-WAN and BGP peer devices, check for interactions with other routing protocols in the enterprise network, and check for split or no-split prefix scenarios.
This category of events deals with Border Gateway Protocol (BGP) peering issues in the data center. For example:
  • Peering with a WAN edge router is down (Incident—warning).
  • Peering with a core router is down (Incident—warning).
  • Routes learned from WAN edge indicate private WAN is down for a branch site (Incident —warning).
Site Level Issues
Site level issues may impact a single site or all sites. Select a site with an issue and quickly isolate site, link, or circuit issues. You can also isolate the issues through a quick view of incidents for a site.
Site related issues include overall quality of a link, VPNs down or VPNs configured for another site, connectivity issues, circuit issues, incorrect circuit definitions or inaccurate characteristics for a circuit, bandwidth issues, device assignment for a site, or issues with prefixes at a site.
A few other site level issues include scenarios where none of the applications are prioritized. If no applications are prioritized, check QoS, and then check if LQM is enabled or disabled. If the flow browser indicates that traffic is bridged, despite a configured port and an existing policy, then the issue could be the result of an interface without a circuit label.
The device or the controller, depending on incident specifics, raise site level category of incidents and deals with site level connectivity issues. For example:
  • Direct Internet path unavailable at a branch site (Incident—warning).
  • Private WAN path unavailable at a branch site (Incident—warning).
  • All paths unavailable at a branch site (Incident—critical), raised by the controller.
  • Loss of connectivity with the device/site (Incident—critical), raised by the controller.
  • Path labels specified in policies not available/assigned for the site.
From Release 5.5.1, two new incidents are generated for site connectivity issues.
SITE_CONNECTIVITY_DOWN
When the site has lost connectivity with the controller and all of the remote branches or data center, an incident is raised. The behavior is different for a branch and data center.
  • At a data center site, this incident is raised when all the network secure fabric links are down.
  • At a branch site, this incident is raised when all network secure fabric links are down and all devices at the site are disconnected from the controller for more than 10 minutes.
The incident is cleared when connectivity is re-established or if at least one of the VPNs is up.
SITE_CONNECTIVITY_DEGRADED
The controller raises this incident when any one of the below conditions exist:
  • The Direct Internet or Direct Private WAN is down.
  • At least one of the secure fabric links at the site is down.
  • At least one of the service links is down.
This incident is cleared when all WAN paths are up.
Secure Fabric Link Issues
A secure fabric link is between a branch site and a data center or between two branch sites. Palo Alto Networks enables virtual private network (VPN) overlays on all public and private circuits between a branch site and a data center. VPN overlays between Palo Alto Networks branch sites are disabled by default. You can selectively enable or disable these VPNs on the Prisma SD-WAN web interface.
This category of events deals with virtual private network (VPN) link connectivity issues. For example:
  • Warning incident is raised when all the VPN links from an active branch site for a given secure fabric link is down (Incident—warning).
  • Informational incident is raised if there are more than one VPN links from the active branch site for a given secure fabric link and at least one of the many links is up and at least one is down. (Incident—informational).
  • The following VPN link related incidents are aggregated as secure fabric link incidents (Incident—warning when secure fabric link is down and informational when secure fabric link is degraded):
    • NETWORK_VPNBFD_DOWN
    • NETWORK_VPNLINK_DOWN
    • NETWORK_VPNPEER_ UNAVAILABLE
    • NETWORK_VPNPEER_ UNREACHABLE
    • NETWORK_VPNSS_ UNAVAILABLE
    • NETWORK_VPNSS_MISMATCH
The following incidents indicate the underlay network connectivity for a given circuit is down:
  • DEVICEHW_INTERFACE_DOWN
  • NETWORK_DIRECTINTERNET_DOWN
  • NETWORK_DIRECTPRIVATE_DOWN
Incidents raised for the corresponding secure fabric links that use the underlay connectivity are suppressed.
When an administrator configures Admin Down for an interface, this condition suppresses all the corresponding raised secure fabric link incidents and this is displayed in the Reason field of the incident.
Service Endpoint Level Issues
Service endpoint level issues occur when a service link to a single endpoint goes down, DEVICEHW_INTERFACE_DOWN incident is raised by an ION device. From Release 5.6.1, when a DEVICEHW_INTERFACE_DOWN incident is raised by an ION device for at least two standard VPN interfaces to the same service endpoint, the NETWORK_STANDARD_VPN_ENDPOINT_DOWN summary incident is raised by the controller.
This incident is raised only when it is associated with Endpoint destinations and not Peer destinations.
The NETWORK_STANDARD_VPN_ENDPOINT_DOWN incident is not raised when:
  • The parent interface is in Admin Down state.
  • There exists a DEVICEHW_INTERFACE_DOWN incident on the parent interface.
  • The incident won't be raised when the parent interface does not have a WAN label attached.
  • The parent interface is in operationally down state.
  • The device is disconnected from the controller.
  • There exists a Layer 3 reachability issue on the parent interface.
The NETWORK_STANDARD_VPN_ENDPOINT_DOWN incident is cleared when there exists only one standard VPN interface down incident and all other incidents are cleared.
Logical Interface Level Issues
The controller raises an incident when a parent interface is down and suppresses the interface down incident raised by its logical interfaces.
The controller clears the incident only when the parent interface is up but does not suppress the interface down incident if any one of the child interfaces is down.