High CPU on a Catalyst switch running IOS

This is the troubleshooting process you can take to solve High CPU problems in your network. The root cause is always something different, but the steps are mostly the same.

High CPU since Market open….

C6500#show ver

System image file is “disk0:s72033-advipservicesk9_wan-mz.122-33.SXH.bin”

=-=

Below we see 77% Total CPU with 33% from interrupt traffic, and ~ 40% from IP Input.

C6500#show proc cpu | exc 0.00

CPU utilization for five seconds: 77%/33%; one minute: 75%; five minutes: 77%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

5 58144468 3708230 15679 0.87% 0.30% 0.29% 0 Check heaps

140 1850756 45849063 40 0.07% 0.11% 0.13% 0 CDP Protocol

146 106922548 674564539 158 40.09% 41.10% 41.95% 0 IP Input

168 15712 17711 887 0.23% 0.03% 0.22% 1 SSH Process

335 136091476 995454760 136 1.11% 0.51% 0.45% 0 Port manager per

374 18563992 190871473 97 0.15% 0.29% 0.28% 0 IGMP Input

376 11477444 194064823 59 0.15% 0.19% 0.18% 0 PIM Process

377 114620 192409350 0 0.15% 0.06% 0.06% 0 Mwheel Process

C6500#

=-=

Next we cleared the counters and then look at the vlan interfaces to see who has the most input queue drops. Vlan 10, and vlan 200 seem to be getting hit the hardest.

C6500#show int | inc is up|drop

Vlan10 is up, line protocol is up

Input queue: 16/75/416/416 (size/max/drops/flushes); Total output drops: 0

Vlan200 is up, line protocol is up

Input queue: 4/75/565/565 (size/max/drops/flushes); Total output drops: 0

Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0

Loopback0 is up, line protocol is up

Next, we dump the buffers to see what kind of traffic is hitting the buffers of vlan 10, and 200. We see that it is all multicast traffic.

C6500#show buffers input-interface vlan 10 packet | inc source:

source: 10.5.1.54, destination: 239.248.10.134, id: 0x0000, ttl: 15,

source: 10.5.1.78, destination: 239.248.10.54, id: 0x0000, ttl: 15,

source: 10.5.1.78, destination: 239.248.10.55, id: 0x0000, ttl: 15,

source: 10.5.1.54, destination: 239.248.10.132, id: 0x0000, ttl: 15,

source: 10.5.1.78, destination: 239.248.10.55, id: 0x0000, ttl: 15,

=-=

C6500#show buffers input-interface vlan 200 packet | inc source:

source: 10.5.200.103, destination: 239.248.10.175, id: 0x0000, ttl: 15,

source: 10.5.200.108, destination: 239.248.10.145, id: 0x0000, ttl: 15,

source: 10.5.200.112, destination: 239.248.10.224, id: 0x0000, ttl: 15,

source: 10.5.200.103, destination: 239.248.10.175, id: 0x0000, ttl: 15,

source: 10.5.200.108, destination: 239.248.10.146, id: 0x0000, ttl: 15,

source: 10.5.200.113, destination: 239.248.10.94, id: 0x0000, ttl: 15,

=-=

So, we focus on 1 multicast stream to see why it would be getting punted to the CPU for processing. We look at the mroute table and see many of the multicast routes in “Registering, Partial-SC”. This indicates that the DR is trying to register to the rendezvous point(RP), but process is not completing.

C6500#show ip mroute 239.248.10.134

IP Multicast Routing Table

Flags: D – Dense, S – Sparse, B – Bidir Group, s – SSM Group, C – Connected,

L – Local, P – Pruned, R – RP-bit set, F – Register flag,

T – SPT-bit set, J – Join SPT, M – MSDP created entry,

X – Proxy Join Timer Running, A – Candidate for MSDP Advertisement,

U – URD, I – Received Source Specific Host Report,

Z – Multicast Tunnel, z – MDT-data group sender,

Y – Joined MDT-data group, y – Sending to MDT-data group

V – RD & Vector, v – Vector

Outgoing interface flags: H – Hardware switched, A – Assert winner

Timers: Uptime/Expires

Interface state: Interface, Next-Hop or VCD, State/Mode

(*, 239.248.10.134), 02:06:47/stopped, RP 10.7.240.240, flags: SJCF

Incoming interface: Vlan6, RPF nbr 10.5.20.5, Partial-SC

Outgoing interface list:

Vlan10, Forward/Sparse, 01:44:18/00:02:48, H

(10.5.1.54, 239.248.10.134), 01:55:47/00:02:59, flags: PFT

Incoming interface: Vlan10, RPF nbr 0.0.0.0, Registering, Partial-SC

Outgoing interface list: Null

C6500#

=-=

So we look at the RP information and see several static RP statements.

C6500#show run | inc ip pim rp

ip pim rp-address 10.7.240.240 <—may not be needed

ip pim rp-address 198.140.52.4 AAAA

ip pim rp-address 198.140.52.3 BBBB

ip pim rp-address 198.140.52.1 CCCC

ip pim rp-address 198.140.52.2 DDDD

ip pim rp-address 198.140.33.5 EEEE

ip pim rp-address 198.140.33.2 FFFF

=-=

We set up a tempory rate-limit for the partial-SC packets hitting the cpu to only allow 10 per second(non-intrusive). With the Rate-limiter in place, the CPU is now in the 10-20% range, with is inline with the 72hour historical average. Customer will look into removing the invalid RP config.

C6500 (config)#mls rate-limit multicast ipv4 partial 10

C6500 (config)#do show proc cpu | exc 0.00

CPU utilization for five seconds: 13%/7%; one minute: 67%; five minutes: 71%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

146 107426044 674852622 159 3.67% 36.28% 38.19% 0 IP Input

168 20684 22373 924 0.15% 0.14% 0.35% 1 SSH Process

335 136098248 995490856 136 1.43% 0.50% 0.51% 0 Port manager per

374 18567412 190884074 97 0.23% 0.26% 0.28% 0 IGMP Input

386 41108136 139317998 295 0.07% 0.13% 0.12% 0 SNMP ENGINE

C6500 (config)#

Hope this helps!

High CPU on a Catalyst switch running IOS

5 Responses to High CPU on a Catalyst switch running IOS

Leave a Reply

Categories

Recent Photos

Upcoming Races

Links

High CPU on a Catalyst switch running IOS

5 Responses to High CPU on a Catalyst switch running IOS

Leave a Reply

Popular Posts

Categories

Recent Photos

Upcoming Races

Links