This is the troubleshooting process you can take to solve High CPU problems in your network. The root cause is always something different, but the steps are mostly the same.
High CPU since Market open….
C6500#show ver
System image file is “disk0:s72033-advipservicesk9_wan-mz.122-33.SXH.bin”
=-=
Below we see 77% Total CPU with 33% from interrupt traffic, and ~ 40% from IP Input.
C6500#show proc cpu | exc 0.00
CPU utilization for five seconds: 77%/33%; one minute: 75%; five minutes: 77%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
5 58144468 3708230 15679 0.87% 0.30% 0.29% 0 Check heaps
140 1850756 45849063 40 0.07% 0.11% 0.13% 0 CDP Protocol
146 106922548 674564539 158 40.09% 41.10% 41.95% 0 IP Input
168 15712 17711 887 0.23% 0.03% 0.22% 1 SSH Process
335 136091476 995454760 136 1.11% 0.51% 0.45% 0 Port manager per
374 18563992 190871473 97 0.15% 0.29% 0.28% 0 IGMP Input
376 11477444 194064823 59 0.15% 0.19% 0.18% 0 PIM Process
377 114620 192409350 0 0.15% 0.06% 0.06% 0 Mwheel Process
C6500#
=-=
Next we cleared the counters and then look at the vlan interfaces to see who has the most input queue drops. Vlan 10, and vlan 200 seem to be getting hit the hardest.
C6500#show int | inc is up|drop
Vlan10 is up, line protocol is up
Input queue: 16/75/416/416 (size/max/drops/flushes); Total output drops: 0
Vlan200 is up, line protocol is up
Input queue: 4/75/565/565 (size/max/drops/flushes); Total output drops: 0
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Loopback0 is up, line protocol is up
Next, we dump the buffers to see what kind of traffic is hitting the buffers of vlan 10, and 200. We see that it is all multicast traffic.
C6500#show buffers input-interface vlan 10 packet | inc source:
source: 10.5.1.54, destination: 239.248.10.134, id: 0x0000, ttl: 15,
source: 10.5.1.54, destination: 239.248.10.134, id: 0x0000, ttl: 15,
source: 10.5.1.78, destination: 239.248.10.54, id: 0x0000, ttl: 15,
source: 10.5.1.78, destination: 239.248.10.55, id: 0x0000, ttl: 15,
source: 10.5.1.54, destination: 239.248.10.132, id: 0x0000, ttl: 15,
source: 10.5.1.78, destination: 239.248.10.55, id: 0x0000, ttl: 15,
=-=
C6500#show buffers input-interface vlan 200 packet | inc source:
source: 10.5.200.103, destination: 239.248.10.175, id: 0x0000, ttl: 15,
source: 10.5.200.108, destination: 239.248.10.145, id: 0x0000, ttl: 15,
source: 10.5.200.112, destination: 239.248.10.224, id: 0x0000, ttl: 15,
source: 10.5.200.103, destination: 239.248.10.175, id: 0x0000, ttl: 15,
source: 10.5.200.108, destination: 239.248.10.146, id: 0x0000, ttl: 15,
source: 10.5.200.113, destination: 239.248.10.94, id: 0x0000, ttl: 15,
=-=
So, we focus on 1 multicast stream to see why it would be getting punted to the CPU for processing. We look at the mroute table and see many of the multicast routes in “Registering, Partial-SC”. This indicates that the DR is trying to register to the rendezvous point(RP), but process is not completing.
C6500#show ip mroute 239.248.10.134
IP Multicast Routing Table
Flags: D – Dense, S – Sparse, B – Bidir Group, s – SSM Group, C – Connected,
L – Local, P – Pruned, R – RP-bit set, F – Register flag,
T – SPT-bit set, J – Join SPT, M – MSDP created entry,
X – Proxy Join Timer Running, A – Candidate for MSDP Advertisement,
U – URD, I – Received Source Specific Host Report,
Z – Multicast Tunnel, z – MDT-data group sender,
Y – Joined MDT-data group, y – Sending to MDT-data group
V – RD & Vector, v – Vector
Outgoing interface flags: H – Hardware switched, A – Assert winner
Timers: Uptime/Expires
Interface state: Interface, Next-Hop or VCD, State/Mode
(*, 239.248.10.134), 02:06:47/stopped, RP 10.7.240.240, flags: SJCF
Incoming interface: Vlan6, RPF nbr 10.5.20.5, Partial-SC
Outgoing interface list:
Vlan10, Forward/Sparse, 01:44:18/00:02:48, H
(10.5.1.54, 239.248.10.134), 01:55:47/00:02:59, flags: PFT
Incoming interface: Vlan10, RPF nbr 0.0.0.0, Registering, Partial-SC
Outgoing interface list: Null
C6500#
=-=
So we look at the RP information and see several static RP statements.
C6500#show run | inc ip pim rp
ip pim rp-address 10.7.240.240 <—may not be needed
ip pim rp-address 198.140.52.4 AAAA
ip pim rp-address 198.140.52.3 BBBB
ip pim rp-address 198.140.52.1 CCCC
ip pim rp-address 198.140.52.2 DDDD
ip pim rp-address 198.140.33.5 EEEE
ip pim rp-address 198.140.33.2 FFFF
=-=
We set up a tempory rate-limit for the partial-SC packets hitting the cpu to only allow 10 per second(non-intrusive). With the Rate-limiter in place, the CPU is now in the 10-20% range, with is inline with the 72hour historical average. Customer will look into removing the invalid RP config.
C6500 (config)#mls rate-limit multicast ipv4 partial 10
C6500 (config)#do show proc cpu | exc 0.00
CPU utilization for five seconds: 13%/7%; one minute: 67%; five minutes: 71%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
146 107426044 674852622 159 3.67% 36.28% 38.19% 0 IP Input
168 20684 22373 924 0.15% 0.14% 0.35% 1 SSH Process
335 136098248 995490856 136 1.43% 0.50% 0.51% 0 Port manager per
374 18567412 190884074 97 0.23% 0.26% 0.28% 0 IGMP Input
386 41108136 139317998 295 0.07% 0.13% 0.12% 0 SNMP ENGINE
C6500 (config)#
Hope this helps!
Pingback: 3 Years in the Making | Jeff Greene