Wednesday, February 25, 2009

PIM NBMA, DR and RPF issues

Below is the topology. RIP is running everywhere, PIM-SM on all interfaces and everyone has R4 at 192.168.100.4 as the static RP.


R1 has the following config on its LAN interface:
interface Ethernet0/0
ip address 192.168.0.1 255.255.255.0
ip pim sparse-mode
ip igmp join-group 239.0.0.1
Let's ping from R6:
R6#ping 239.0.0.1 re 5  

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 239.0.0.1, timeout is 2 seconds:
.....
R6#
Hmmm....what gives? Let's look at R4:
R4#sho ip pim neighbor
PIM Neighbor Table
Neighbor Interface Uptime/Expires Ver DR
Address Prio/Mode
192.168.34.3 Ethernet0/0 03:29:50/00:01:39 v2 1 / S
192.168.100.2 Serial0/0 02:25:22/00:01:38 v2 1 / S
192.168.100.5 Serial0/0 02:25:22/00:01:39 v2 1 / DR S
192.168.100.1 Serial0/0 02:25:22/00:01:38 v2 1 / S

R4#sho ip mroute 239.0.0.1 | be \(
(*, 239.0.0.1), 00:24:31/00:02:33, RP 192.168.100.4, flags: S
Incoming interface: Null, RPF nbr 0.0.0.0
Outgoing interface list:
Serial0/0, 192.168.100.2, Forward/Sparse, 00:24:31/00:02:33

(192.168.56.6, 239.0.0.1), 00:02:03/00:02:45, flags: T
Incoming interface: Serial0/0, RPF nbr 192.168.100.5
Outgoing interface list:
Serial0/0, 192.168.100.2, Forward/Sparse, 00:02:03/00:00:57

R4#
Well, it looks R2 is showing up in the OIL, but why isn't R1? It is a PIM neighbor afterall. The reason is because R2 has won the DR election and has the right to forward traffic. So it is the neighbor that sends PIM joins to R4. R1 receives the traffic, but it comes in on its LAN interface and thus fails the RPF check.

R1#debug ip mpacket
IP multicast packets debugging is on
03:40:21: IP(0): s=192.168.56.6 (Ethernet0/0) d=239.0.0.1 id=197, ttl=251, prot=1, len=114(100), not RPF interface
03:40:23: IP(0): s=192.168.56.6 (Ethernet0/0) d=239.0.0.1 id=198, ttl=251, prot=1, len=114(100), not RPF interface


It is important to remember we have at least two ways to resolve this:

1) Make R1 the DR

R1(config)#int e0/0
R1(config-if)#ip pim dr-priority 3000

R6#ping 239.0.0.1 re 1

Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 239.0.0.1, timeout is 2 seconds:

Reply to request 0 from 192.168.100.1, 60 ms
R6#

R1(config-if)#^Z
03:41:47: IP(0): s=192.168.56.6 (Serial0/0) d=239.0.0.1 (Ethernet0/0) id=207, ttl=252, prot=1, len=100(100), mforward


2) Static mroute to R2 for 192.168.56.6

R1(config)#int e0/0
R1(config-if)#no ip pim dr-priority 3000
R1(config-if)#exit
R1(config)#ip mroute 192.168.56.0 255.255.255.0 192.168.0.2

Make sure to clear mroutes otherwise previous state may mislead you :)

R4#clear ip mroute *

R6#ping 239.0.0.1 re 1

Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 239.0.0.1, timeout is 2 seconds:

Reply to request 0 from 192.168.100.1, 56 ms
R6#


This is one of those labs where I had no idea where I was going and I ended up with a nice troubleshooting scenario. If multicast is one your weaknesses than I highly recommend digging in and making something happen. Debug ip mpacket works best with "no ip mroute-cache" on your interfaces. In this scenario, I started troubleshooting on R5, then worked my way around to resolve the issue :)

Monday, February 23, 2009

PIM Forwarder and the Assert Mechanism

I know, it's a cool name for a band, huh? Ladies and gentlemen...PIM Forwarder and the Assert Mechanism! Anyways, I always get confused about PIM DR and PIM Forwarder so this is to clear up my confusion. Here we take a look at PIM Forwarder and how to verify the assert process is working.

Here is the topology:


Here is what I have enabled:
-RIP on all interfaces
-ip multicast-routing on all routers
-ip pim sparse-dense on all interfaces
-ip igmp join-group 239.0.0.1 on R5 ethernet

For debugging:
-no ip mroute-cache
-debug ip mpacket
-ping

Scenario 1: R2 is the PIM Forwarder based on highest IP

From R4 we ping twice:
R4#ping 239.0.0.1 re 2

Type escape sequence to abort.
Sending 2, 100-byte ICMP Echos to 239.0.0.1, timeout is 2 seconds:

Reply to request 0 from 192.168.0.5, 20 ms
Reply to request 0 from 192.168.0.5, 20 ms
Reply to request 1 from 192.168.0.5, 8 ms
On R1 and R2 we see the following:

R1#
*Mar 2 02:05:36.795: IP(0): s=192.168.34.4 (Serial0/1) d=239.0.0.1 (Ethernet0/0) id=70, ttl=253, prot=1, len=100(100), mforward
*Mar 2 02:05:36.799: IP(0): s=192.168.34.4 (Ethernet0/0) d=239.0.0.1 id=70, ttl=252, prot=1, len=114(100), not RPF interface
*Mar 2 02:05:38.787: IP(0): s=192.168.34.4 (Ethernet0/0) d=239.0.0.1 id=71, ttl=252, prot=1, len=114(100), not RPF interface

R2#
*Mar 1 02:25:00.567: IP(0): s=192.168.34.4 (Serial0/1) d=239.0.0.1 (Ethernet0/0) id=70, ttl=253, prot=1, len=100(100), mforward
*Mar 1 02:25:00.571: IP(0): s=192.168.34.4 (Ethernet0/0) d=239.0.0.1 id=70, ttl=252, prot=1, len=114(100), not RPF interface
*Mar 1 02:25:02.559: IP(0): s=192.168.34.4 (Serial0/1) d=239.0.0.1 (Ethernet0/0) id=71, ttl=253, prot=1, len=100(100), mforward


Notice that each router sent the first packet onto the LAN and R5 responded to both. We can tell because R4 got two replies. What also happened is that R1 and R2 each saw that very same packet on their LAN interfaces. Immediately the PIM Assert process took over. Because both routers have the same AD (90) and metric (2) to the source, R2 won the right to forward based on highest IP.

Next we see that the second packet only gets forwarded by R2. Here we see that R2 has the A (Assert Winner) flag in its mroute entry. R1 has pruned that same interface.
R2#sho ip mroute 239.0.0.1 192.168.34.4 | be \(
(192.168.34.4, 239.0.0.1), 00:00:39/00:02:26, flags: T
Incoming interface: Serial0/1, RPF nbr 192.168.23.3
Outgoing interface list:
Ethernet0/0, Forward/Sparse-Dense, 00:00:39/00:00:00, A

R1#sho ip mroute 239.0.0.1 192.168.34.4 | be \(
(192.168.34.4, 239.0.0.1), 00:01:27/00:01:34, flags: PT
Incoming interface: Serial0/1, RPF nbr 192.168.13.3
Outgoing interface list:
Ethernet0/0, Prune/Sparse-Dense, 00:01:27/00:01:32

Scenario 2: R1 is the PIM Forwarder based on lowest AD

Now we change R1's AD for RIP below the default of 120:
R1(config)#router rip
R1(config-router)#distance 89
We see the same behavior from R4's perspective but now R1 has won the Assert process and is forwarding group 239.0.0.1 onto the LAN:
R4#ping 239.0.0.1 re 2

Type escape sequence to abort.
Sending 2, 100-byte ICMP Echos to 239.0.0.1, timeout is 2 seconds:

Reply to request 0 from 192.168.0.5, 12 ms
Reply to request 0 from 192.168.0.5, 12 ms
Reply to request 1 from 192.168.0.5, 8 ms
R4#

R1#sho ip mroute 239.0.0.1 192.168.34.4 | be \(
(192.168.34.4, 239.0.0.1), 00:00:07/00:02:54, flags: T
Incoming interface: Serial0/1, RPF nbr 192.168.13.3
Outgoing interface list:
Ethernet0/0, Forward/Sparse-Dense, 00:00:07/00:00:00, A

R1#

How Route-Reflector clusters prevent loops

This is the topology I used to get familiar with the concept:


The idea is fairly easy to understand. You never want to learn routes from someone who learned them from you (directly or indirectly). I made the peers one by one to step through the process.

Here is the route on R1:

R1#sho ip bgp 200.0.0.0
BGP routing table entry for 200.0.0.0/8, version 12
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Advertised to update-groups:
3
100, (Received from a RR-client)
6.6.6.6 (metric 2) from 6.6.6.6 (6.6.6.6)
Origin IGP, metric 0, localpref 100, valid, internal, best
R1#


Now on R2, we see the first case of the origintaor-id as set by R1. And we also see the beginning of the cluster-list:

R2#sho ip bgp 200.0.0.0
BGP routing table entry for 200.0.0.0/8, version 9
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Advertised to update-groups:
2
100
6.6.6.6 (metric 3) from 1.1.1.1 (1.1.1.1)
Origin IGP, metric 0, localpref 100, valid, internal, best
Originator: 6.6.6.6, Cluster list: 1.1.1.1
R2#


R2 appends itself to the cluster-list before advertising to R5:

R5#sho ip bgp 200.0.0.0
BGP routing table entry for 200.0.0.0/8, version 12
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Advertised to update-groups:
2
100
6.6.6.6 (metric 2) from 2.2.2.2 (2.2.2.2)
Origin IGP, metric 0, localpref 100, valid, internal, best
Originator: 6.6.6.6, Cluster list: 2.2.2.2, 1.1.1.1
R5#


Eventually, these are the messages we get on R6 and R2, respectively.

R6#
*Mar 1 00:44:55.807: BGP(0): 5.5.5.5 rcv UPDATE about 201.0.0.0/8 -- DENIED due to: ORIGINATOR is us;
*Mar 1 00:44:55.811: BGP(0): 5.5.5.5 rcv UPDATE about 200.0.0.0/8 -- DENIED due to: ORIGINATOR is us;


R2#
*Mar 1 00:53:39.075: BGP(0): 3.3.3.3 rcv UPDATE about 201.0.0.0/8 -- DENIED due to: CLUSTERLIST contains our own cluster ID;
*Mar 1 00:53:39.083: BGP(0): 3.3.3.3 rcv UPDATE about 200.0.0.0/8 -- DENIED due to: CLUSTERLIST contains our own cluster ID;

My new favorite IOS message

I don't know what my previous one was, but this is the new one:

R1(config-if)#traffic-shape rate 64000 ?
<0-100000000> bits per interval, sustained

R1(config-if)#traffic-shape rate 64000 640
less than 1000 bits in an interval doesn't make sense
R1(config-if)#

Saturday, February 14, 2009

Watch the RIP metric when summarizing redistributed routes

I was reading through the GS archives and saw this interesting issue about the metrics of summarized routes after being redistributed into RIP.

Scenario:

R1---RIP---R2---OSPF---R5---5.5.5.5/32

R2 is redistributing OSPF to RIP as follows:
R2#sho run | sec router rip
router rip
version 2
redistribute ospf 1 metric 2
network 192.168.0.0
no auto-summary
R1 has the following route:
R1#sho ip route | sec 5.0.0.0
5.0.0.0/32 is subnetted, 1 subnets
R 5.5.5.5 [120/2] via 192.168.0.2, 00:00:11, FastEthernet0/0
1) Manual Summary
R2(config)#int f0/0
R2(config-if)#ip summary-address rip 5.0.0.0 255.0.0.0

R1#sho ip route | sec 5.0.0.0
R 5.0.0.0/8 [120/3] via 192.168.0.2, 00:00:01, FastEthernet0/0
The metric increased by 1.

2) Auto-summary
R2(config-if)#router rip
R2(config-router)#auto-summary

R1#sho ip route | sec 5.0.0.0
R 5.0.0.0/8 [120/2] via 192.168.0.2, 00:00:05, FastEthernet0/0
The metric is the same as when redistributed.

CCIE Assessor Lab Review

I don't know how much I can say about this so I will keep it brief. I purchased both assessor labs, one for today and one for tomorrow. I just completed the first one in about 2 hours. That left a good chunk of time to verify and run the assessment. I only missed two tasks and they were very simple mistakes.

My one worry was that it would take awhile to get used to the topology and the web interface. I spent about 30 minutes last night reading the user guide and it was smooth transition getting used to the GUI and the controls. This should not worry you.

I redrew a diagram and kept a task/point tracker. I read the lab before I started and first glanced seemed to be pretty easy. There are some things that will leave you scratching your head and that is good. The best part: There were no errors or typos in any tasks or drawings! :)

The telnet sessions are Java based and you have to open one in each window and then arrange them on your screen. I opened R1 first, the moved on so they were arranged in my taskbar in order. I don't expect many difference for tomorrow's session, so hopefully I do good.

Friday, February 13, 2009

OSPF filtering issue when virtual-links are present

Here is the topology I will start off with:


R4 has two INTER-area routes to 1.1.1.1:
R4#sho ip route 1.1.1.1
Routing entry for 1.1.1.1/32
Known via "ospf 1", distance 110, metric 4, type inter area
Last update from 192.168.45.5 on Serial1/0, 00:00:11 ago
Routing Descriptor Blocks:
192.168.45.5, from 5.5.5.5, 00:00:11 ago, via Serial1/0
Route metric is 4, traffic share count is 1
* 192.168.34.3, from 2.2.2.2, 00:00:11 ago, via Serial1/1
Route metric is 4, traffic share count is 1
If we want to filter the path from R2 through 192.168.34.3 we could do it this way:
R4(config)#access-list 1 permit 1.1.1.1
R4(config)#access-list 2 permit 2.2.2.2
R4(config)#route-map OSPF deny 10
R4(config-route-map)#match ip address 1
R4(config-route-map)#match ip route-source 2
R4(config-route-map)#route-map OSPF permit 20
R4(config-route-map)#router ospf 1
R4(config-router)#distribute-list route-map OSPF in
R4(config-router)#^Z

R4#sho ip route 1.1.1.1
Routing entry for 1.1.1.1/32
Known via "ospf 1", distance 110, metric 4, type inter area
Last update from 192.168.45.5 on Serial1/0, 00:00:12 ago
Routing Descriptor Blocks:
* 192.168.45.5, from 5.5.5.5, 00:00:12 ago, via Serial1/0
Route metric is 4, traffic share count is 1
But let's say we have a task that asks us to create a new area attached to R4 as follows:


Now we need two virtual-links and look at was happened to our route 1.1.1.1.
R4(config)#router ospf 1                 
R4(config-router)#area 1 virtual-link 5.5.5.5
R4(config-router)#area 1 virtual-link 2.2.2.2

*Mar 3 01:31:13.935: %OSPF-5-ADJCHG: Process 1, Nbr 5.5.5.5 on
OSPF_VL2 from LOADING to FULL, Loading Done

*Mar 3 01:31:16.979: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on
OSPF_VL3 from LOADING to FULL, Loading Done


R4#sho ip route 1.1.1.1
Routing entry for 1.1.1.1/32
Known via "ospf 1", distance 110, metric 4, type intra area
Last update from 192.168.34.3 on Serial1/1, 00:00:00 ago
Routing Descriptor Blocks:
* 192.168.45.5, from 1.1.1.1, 00:00:00 ago, via Serial1/0
Route metric is 4, traffic share count is 1
192.168.34.3, from 1.1.1.1, 00:00:00 ago, via Serial1/1
Route metric is 4, traffic share count is 1
What gives? Well now we are learning 1.1.1.1 as an INTRA-area route so the router-ID advertising the LSA has changed. We are now learning the route from type-1 LSAs originated by R1 directly in Area 0. If we filter based on router-id we will lose both paths so now we need to filter based on next-hop:
R4(config)#access-list 3 permit 192.168.34.3
R4(config)#no route-map OSPF
R4(config)#route-map OSPF deny 10
R4(config-route-map)#match ip add 1
R4(config-route-map)#match ip next-hop 3
R4(config-route-map)#route-map OSPF pe 20
R4(config-route-map)#^Z

R4#sho ip route 1.1.1.1
Routing entry for 1.1.1.1/32
Known via "ospf 1", distance 110, metric 4, type intra area
Last update from 192.168.45.5 on Serial1/0, 00:00:02 ago
Routing Descriptor Blocks:
* 192.168.45.5, from 1.1.1.1, 00:00:02 ago, via Serial1/0
Route metric is 4, traffic share count is 1
All of this change could of course been prevented had we read ahead :-)

Thursday, February 12, 2009

OSPF on unnumbered links

I was reviewing the OSPF chapter in the CCIE exam guide today and something irked me. It said that OSPF neighbors will become adjacent if one or both of the neighbors are using unnumbered interfaces between them. I swear this was not case as I had experienced before so I labbed it up.
R3#sho ip ospf ne

Neighbor ID Pri State Dead Time Address Interface
2.2.2.2 0 FULL/ - 00:00:37 192.168.23.2 Serial1/1
4.4.4.4 0 FULL/ - 00:00:39 192.168.34.4 Serial1/0
R3#

R3(config)#int s1/0
R3(config-if)#ip unnumbered lo 0
R3(config-if)#
*Mar 2 06:31:01.600: %OSPF-5-ADJCHG: Process 1, Nbr 4.4.4.4 on Serial1/0
from FULL to DOWN, Neighbor Down: Interface down or detached

The adjcency will not come back up. Let's configure R4:
R4(config)#int s1/1
R4(config-if)#ip unnumbered lo 0
R4(config-if)#
*Mar 2 06:33:14.288: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on Serial1/1
from LOADING to FULL, Loading Done

There we go! If one side is unnumbered, the other side needs to be also. I am running 12.4(7) so maybe this was not the case awhile ago, but right now it seems so. There are a few other mistakes in this chapter, especially in the beginning quiz - SO QUESTION (LAB) EVERYTHING!

SNMP - sending traps to specific hosts

This was an issue I ran into awhile ago. I was trying to send BGP traps to one host, and PIM traps to another. As you can see below, BGP traps were getting sent to both hosts when I used version 1.

When I had version 2c specified, traps were only sent to the host configured for BGP. I do not know if this is difference in the protocol, but it is something you may want to be aware of if you need to send traps to different hosts.

Version 1, traps get sent to both hosts:
R1#sho run | inc snmp
snmp-server enable traps bgp
snmp-server enable traps pim
snmp-server host 2.2.2.2 public bgp
snmp-server host 3.3.3.3 public pim

R1#clear ip bgp *
R1#
00:11:49: %BGP-5-ADJCHANGE: neighbor 172.12.14.4 Down User reset
00:11:49: SNMP: Queuing packet to 2.2.2.2
00:11:49: SNMP: V1 Trap, ent bgp, addr 172.12.12.1, gentrap 6, spectrap 2
bgpPeerEntry.14.172.12.14.4 = 00 00
bgpPeerEntry.2.172.12.14.4 = 1
00:11:49: SNMP: Queuing packet to 3.3.3.3
00:11:49: SNMP: V1 Trap, ent bgp, addr 172.12.13.1, gentrap 6, spectrap 2
bgpPeerEntry.14.172.12.14.4 = 00 00
bgpPeerEntry.2.172.12.14.4 = 1
00:11:49: SNMP: Packet sent via UDP to 2.2.2.2
00:11:49: SNMP: Packet sent via UDP to 3.3.3.3
Version 2c, traps get sent to one as desired:
R1#sho run | inc snmp
snmp-server enable traps bgp
snmp-server enable traps pim
snmp-server host 2.2.2.2 version 2c public bgp
snmp-server host 3.3.3.3 version 2c public pim

R1#clear ip bgp *
R1#
00:13:09: %BGP-5-ADJCHANGE: neighbor 172.12.14.4 Down User reset
R1#
00:13:09: SNMP: Queuing packet to 2.2.2.2
00:13:09: SNMP: V2 Trap, reqid 21, errstat 0, erridx 0
sysUpTime.0 = 78967
snmpTrapOID.0 = bgpTraps.2
bgpPeerEntry.14.172.12.14.4 = 00 00
bgpPeerEntry.2.172.12.14.4 = 1
00:13:09: SNMP: Packet sent via UDP to 2.2.2.2
R1#

Wednesday, February 11, 2009

Messin' around with multicast boundary

I got a multicast lab in dynamips going so I thought I would just play around with some lesser known commands and learn how they actually work.

Here is the topology:

R5---R6---R1---R2---R3---R4

R1 = MA and RP for 232/8, 233/8, 234/8

R4#show ip pim rp mapping
PIM Group-to-RP Mappings

Group(s) 232.0.0.0/8
RP 1.1.1.1 (?), v2v1
Info source: 1.1.1.1 (?), elected via Auto-RP
Uptime: 00:10:08, expires: 00:02:48
Group(s) 233.0.0.0/8
RP 1.1.1.1 (?), v2v1
Info source: 1.1.1.1 (?), elected via Auto-RP
Uptime: 00:10:08, expires: 00:02:47
Group(s) 234.0.0.0/8
RP 1.1.1.1 (?), v2v1
Info source: 1.1.1.1 (?), elected via Auto-RP
Uptime: 00:10:08, expires: 00:02:46

R4 has the following on Loopback 0:

interface Loopback0
ip address 4.4.4.4 255.255.255.255
ip pim sparse-mode
ip igmp join-group 233.0.0.1
ip igmp join-group 234.0.0.1

R3 has set up a multicast boundary as follows:

access-list 1 permit 232.0.0.0 0.255.255.255
access-list 1 permit 233.0.0.0 0.255.255.255

interface Serial1/0
ip address 192.168.34.3 255.255.255.0
ip pim sparse-mode
ip multicast boundary 1

Now R3 only allows PIM joins that are in 232/8 or 233/8.

R3#sho ip mroute 234.0.0.1
Group 234.0.0.1 not found
R3#

Let's ping 233.0.0.1:

R6#ping 233.0.0.1 re 100

Type escape sequence to abort.
Sending 100, 100-byte ICMP Echos to 233.0.0.1, timeout is 2 seconds:
......................................................................
..........

Whoa now, what gives? Well...remember we only allowed 2 groups...what does Auto-RP use to propagate messages? Group 224.0.1.40! So even if you start passing traffic to 233.0.0.1 after you enable the boundary, eventually R3 will lose state for the Auto-RP discovery group and R4 will lose the RP information. All multicast traffic will then fail the RPF check.

So here is our modified ACL on R3:

R3#sho run | inc access
access-list 1 permit 224.0.1.40
access-list 1 permit 233.0.0.0 0.255.255.255
access-list 1 permit 232.0.0.0 0.255.255.255

224.0.1.39 is what the MA's listen to so we don't need to worry about that for this example. Now we can ping:

R6#ping 233.0.0.1 re 2

Type escape sequence to abort.
Sending 2, 100-byte ICMP Echos to 233.0.0.1, timeout is 2 seconds:

Reply to request 0 from 192.168.34.4, 212 ms
Reply to request 0 from 192.168.34.4, 216 ms
Reply to request 1 from 192.168.34.4, 184 ms
Reply to request 1 from 192.168.34.4, 184 ms

Now this seems a little inefficient, right? Why should R4 even know about the RP if R3 is going to prevent mroute state from being created for 234.0.0.1 on that interface. If we could prevent R4 from learning that RP information, that would be great. Well on R3 we can modify the boundary as follows:

R3(config)#int s1/0
R3(config-if)#ip multicast boundary 1 filter-autorp

Now R3 only sends RP information for the groups permitted in the ACL:

R4#show ip pim rp mapping
PIM Group-to-RP Mappings

Group(s) 232.0.0.0/8
RP 1.1.1.1 (?), v2v1
Info source: 1.1.1.1 (?), elected via Auto-RP
Uptime: 00:00:03, expires: 00:02:55
Group(s) 233.0.0.0/8
RP 1.1.1.1 (?), v2v1
Info source: 1.1.1.1 (?), elected via Auto-RP
Uptime: 00:00:03, expires: 00:02:53
R4#

Multicast TTL-Threshold

Maybe I am misunderstanding some things, but documents and books always say that the TTL of a packet must be higher than the threshold to be forwarded. From the 12.4 command reference:

ip multicast ttl-threshold

Usage Guidelines

"Only multicast packets with a TTL value greater than the threshold are forwarded out the interface."

Oh yeah?! I guess it depends on when you look at the TTL. Consider the network:

R1----R2----R3----R4

PIM-DM is enabled everywhere.
R4 has joined 239.0.0.1
R1 is sending pings which have 255 TTL when sent from R1.
R2 receives the PING, decrements the TTL to 254 before sending to R3.

So if we set TTL threshold to 254 on R2's interface to R3, it should block it right? No:

R2(config)#int s1/0
R2(config-if)#ip multicast ttl-threshold 254

R1#ping 239.0.0.1

Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 239.0.0.1, timeout is 2 seconds:

Reply to request 0 from 192.168.34.4, 164 ms
R1#

The router will still pass packets that have a TTL equal to the threshold if it was the router that decremented the TTL to reach that value. Here we see 255 will fail:

R2(config)#int s1/0
R2(config-if)#ip multicast ttl-threshold 255

R1#ping 239.0.0.1

Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 239.0.0.1, timeout is 2 seconds:
.
R1#

Sunday, February 8, 2009

IE Mock Lab 4 Review

If you plan on taking this lab, don't read this post as it may contain some spoilers. I took this lab yesterday and did okay, though I could have done a lot better. I got a 73, but I finished in about 4 hours. After verifying the whole the lab for the next 2 hours, I didn't really make any major changes. In fact, the only error I noticed was that I had an OSPF key configured wrong. Turns out, there was more...

"Do not configure anything on SW3 for this task." This refers to every thing in this task!

In a hub spoke topology, we were only allowed one map statement on one of the spokes. This means we need that map statement for L3/L2 resolution to the other spoke, then rely on INARP for spoke-to-hub resolution. On the hub, I mapped by local IP for self-ping (which was not required) which in effect turns of INARP for that IP. The bottom line is INARP has to be enabled on one end. My mappings did show up dynamically, but the grader said this wouldn't work after a reboot. I am going to lab this up again and verify.

I missed 3 tasks (out of 13) in the IGP section. One was impossible (IMO) but by looking at the SG it appears the task itself was worded incorrectly, had to do with summarizing in OSPF on R3 and R5. The SG has them summarizing SW2 and R5 which are in the same area so it would have worked. The task said to summarize R3 and R5, which are not in the same area. Another IGP task required a tunnel with a new adressing. I think this violates the rule (clearly stated at the beginning) that we are not allowed to add any addresses. Lastly, I failed to redistribute a BB link into an IGP.

My traffic filter in the security task was fine except I didn't allow IGMP, which then caused me to miss one multicast task. Two Birds, One stone. Meh.

The other two tasks I missed involved TFTP: Limiting access to a router's config via SNMP, and TFTP boot. These were definitely doable, I was just unfamiliar with a couple commands and configured "half-solutions" which are just as good as "no-solutions" :-)

This lab was rated a 9 (from what I here the real thing is about a 7 or so) and recommended by IE to take within the final month of preparation. The grade report wasn't too detailed but then again, there was not a whole lot to explain. The tasks I did miss, were very simple mistakes. There is also a report that says how well you did in relation to other people who took this lab. For example, I got an 11% in NAT which means 89% people also got this right. Each task has a breakdown like this.

I try not to put too much stock in that though. I just want to learn from my mistakes and work on time management. I know a couple people who failed miserably on mocks and then passed the lab. And I am sure there are people who did great on mocks, then failed the real thing. I would rather be part of that first group :-)

Wednesday, February 4, 2009

Overlapping/Duplicate AS-External-LSA IDs

I was reading OSPF: Anatomy of an Internet Routing Protocol by John T. Moy today and I came across an issue with AS-external LSA Link-State IDs. The LSA uses the network address as the identifier. If one router was to generate multiple Type 5 LSA's with the same network number but different masks, only 1 would be advertised because the LSA ID would be the same.

The book was published in 1998 and at the time there was no way of dealing with this. After doing this lab, I realized there was a way and it had since been documented in Appendix E of RFC 2328:

RFC 2328 Appendix E

Here I create 3 static routes, that all end up with the same network number and would normally have the same LSA ID:

R1(config)#ip route 192.9.0.0 255.255.0.0 Null0
R1(config)#ip route 192.9.0.0 255.255.254.0 Null0
R1(config)#ip route 192.9.0.0 255.255.255.0 Null0
R1(config)#router ospf 1
R1(config-router)#redistribute static subnets

Let's see what the LSA IDs are:

R1#sho ip osp database | inc 192.9
192.9.0.0 1.1.1.1 246 0x80000001 0x00933F 0
192.9.0.255 1.1.1.1 149 0x80000001 0x00933F 0
192.9.1.255 1.1.1.1 234 0x80000001 0x00834F 0
R1#

R1#sho ip ospf database external 192.9.0.0

OSPF Router with ID (1.1.1.1) (Process ID 1)

Type-5 AS External Link States

LS age: 14
Options: (No TOS-capability, DC)
LS Type: AS External Link
Link State ID: 192.9.0.0 (External Network Number )
Advertising Router: 1.1.1.1
LS Seq Number: 80000003
Checksum: 0x8F41
Length: 36
Network Mask: /16
Metric Type: 2 (Larger than any link state path)
TOS: 0
Metric: 20
Forward Address: 0.0.0.0
External Route Tag: 0

The router gives the last 2 networks the broadcast address of that respective network as the Link State ID. The /16 network got the network address as the ID. I wonder if order of operations has anything to do with it

R1(config)#no ip route 192.9.0.0 255.255.0.0 Null0
R1(config)#no ip route 192.9.0.0 255.255.254.0 Null0
R1(config)#no ip route 192.9.0.0 255.255.255.0 Null0
R1(config)#ip route 192.9.0.0 255.255.255.0 Null0

Ok, so now the /24 is the only in there and it is using 192.9.0.0 as its ID:

R1#sho ip osp database external 192.9.0.0

OSPF Router with ID (1.1.1.1) (Process ID 1)

Type-5 AS External Link States

LS age: 36
Options: (No TOS-capability, DC)
LS Type: AS External Link
Link State ID: 192.9.0.0 (External Network Number )
Advertising Router: 1.1.1.1
LS Seq Number: 80000001
Checksum: 0x933F
Length: 36
Network Mask: /24
Metric Type: 2 (Larger than any link state path)
TOS: 0
Metric: 20
Forward Address: 0.0.0.0
External Route Tag: 0

What happens if we add a /16 now?

R1(config)#ip route 192.9.0.0 255.255.0.0 Null0

R1#sho ip osp database external 192.9.0.0

OSPF Router with ID (1.1.1.1) (Process ID 1)

Type-5 AS External Link States

LS age: 12
Options: (No TOS-capability, DC)
LS Type: AS External Link
Link State ID: 192.9.0.0 (External Network Number )
Advertising Router: 1.1.1.1
LS Seq Number: 80000002
Checksum: 0x9140
Length: 36
Network Mask: /16
Metric Type: 2 (Larger than any link state path)
TOS: 0
Metric: 20
Forward Address: 0.0.0.0
External Route Tag: 0

R1#

The /16 stold the ID from the /24!

R1#sho ip osp database | inc 192.9
192.9.0.0 1.1.1.1 45 0x80000002 0x009140 0
192.9.0.255 1.1.1.1 45 0x80000001 0x00933F 0

Tuesday, February 3, 2009

How OSPF transmit capability can prevent virtual-link routing loops

I ran into the command "capability transit" some time ago but never really understood how it worked. The explanation in the RFC and the DocCD may seem pretty vague unless you understand what issues cause it to be necessary or desirable. It is on by default so you probably will never have any issues with it, but I find it an interesting feature to look into. And by doing so, you tend to learn more about how OSPF works.

In this lab, I turn it off so we can see what issues arise. We will focus on R2's path to R4's loopback of 4.4.4.4. Each router's interface IP address ends ends with the router number so we can tell easily where traffic is flowing. Here is the topology:


I disabled capability transit on all routers, but I found that in this lab R2 is where the action is, so that might be only place we need to do it:

router ospf 1
no capability transit

Now we begin...

R1 has a virtual link to R3 in order to connect area 234 to area 0. This works fine. R3 has become an ABR and R2 will use R3 to get to R4's loopback:

R2#sho ip route 4.4.4.4
Routing entry for 4.4.4.4/32
Known via "ospf 1", distance 110, metric 66, type inter area
Last update from 192.168.23.3 on Serial1/0, 00:00:07 ago
Routing Descriptor Blocks:
* 192.168.23.3, from 3.3.3.3, 00:00:07 ago, via Serial1/0
Route metric is 66, traffic share count is 1



Now let's say R2 needs to add a network to area 2 as follows

R2(config)#int lo 0
R2(config-if)#ip address 2.2.2.2 255.255.255.255
R2(config-if)#ip ospf 1 area 2

Since R2 does not have an interface in area 0 we can build a virtual-link to R1:

R2(config)#router ospf 1
R2(config-router)#area 123 virtual-link 1.1.1.1

R1(config)#router ospf 1
R1(config-router)#area 123 virtual-link 2.2.2.2
*Mar 1 00:59:19.191: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on
OSPF_VL3 from LOADING to FULL, Loading Done


Perfect, right?

Let's take a look at that route towards R4 again:

R2#sho ip route 4.4.4.4
Routing entry for 4.4.4.4/32
Known via "ospf 1", distance 110, metric 194, type inter area
Last update from 192.168.12.1 on Serial1/1, 00:00:04 ago
Routing Descriptor Blocks:
* 192.168.12.1, from 3.3.3.3, 00:00:04 ago, via Serial1/1
Route metric is 194, traffic share count is 1

Oh-no...Let's trace:

R2#trace 4.4.4.4

Type escape sequence to abort.
Tracing the route to 4.4.4.4

1 192.168.12.1 72 msec 24 msec 8 msec
2 192.168.12.2 56 msec 20 msec 84 msec
3 * * *
4 * * *

We have a loop all-right.To fix it, on R2:

R2(config)#router ospf 1
R2(config-router)#capability transit

R2#clear ip ospf process
Reset ALL OSPF processes? [no]: yes
R2#

Few moments later:

R2#sho ip route 4.4.4.4
Routing entry for 4.4.4.4/32
Known via "ospf 1", distance 110, metric 66, type inter area
Last update from 192.168.23.3 on Serial1/0, 00:00:18 ago
Routing Descriptor Blocks:
* 192.168.23.3, from 3.3.3.3, 00:00:18 ago, via Serial1/0
Route metric is 66, traffic share count is 1

From cisco.com

"The OSPF Area Transit Capability feature provides an OSPF Area Border Router (ABR) with the ability to discover shorter paths through the transit area for forwarding traffic that would normally need to travel through the virtual-link path."

OSPF Area Transit Capability

So in this case, we have allowed R2 to use it direct path to R3 instead of it's own path through the backbone area. We have basically made area 123 a transit area that can carry traffic to destinations not in it's own area. We are flowing from Area 0 (R2 is an ABR now) to Area 123 to Area 234!

Since this command is enabled by default on recent IOS versions, I am not sure you would ever run into this issue in the lab. However, it is still an interesting feature and it is always good to know what's really going on under the hood :-)

Monday, February 2, 2009

3560 QoS: Per-port per-vlan policing

I know the name is scary, but I do dig Catalyst QoS. This is the second of back-to-back posts on the subject. This is one is a little more complex than classification and decided on a Visio for it:


Per-van policing in the 3560s is different from the 3550s because there is no "match VLAN" clause available. Instead you create hierarchical policies and attach them to the SVI.

Here is the scenario:

VLAN100 will be policed to 64k (192.168.100.0/24)
VLAN200 Will be policed to 128k (192.168.200.0/24)

Because of bursts, I was not able to get these exact rates, but you will see how these policies are applied and the effect they have on traffic flow. Plus you can always play with the burst sizes on your own :)

Here is the tracker I created on R2:

access-list 1 permit 192.168.100.1
access-list 1 permit 192.168.100.3
access-list 2 permit 192.168.200.5
!
class-map match-any VLAN100
match access-group 1
class-map match-any VLAN200
match access-group 2
!
policy-map TRACKER
class VLAN100
class VLAN200
!
interface Ethernet0/0
no ip address
load-interval 30
full-duplex
!
interface Ethernet0/0.100
encapsulation dot1Q 100
ip address 192.168.100.2 255.255.255.0
service-policy input TRACKER
!
interface Ethernet0/0.200
encapsulation dot1Q 200
ip address 192.168.200.2 255.255.255.0
service-policy input TRACKER

All configuration is being done on SW2. There really is not an order of operations to follow, but basically you just need to make sure class-maps and policy-maps are created before you apply them. The logical flow is what you want to get used to. Otherwise you will be jumping into and out of classes and policies, reconfiguring them like I did :)

At our child (aka "second") level we have a class-map that matches the interface and we have our policer. The interface matching here is whats is referred into in the first clause of "per-port per-vlan" policing.

class-map match-all TRUNK
match input-interface FastEthernet0/13
!
policy-map VLAN100-POLICER
class TRUNK
police 64000 12000 exceed-action drop
policy-map VLAN200-POLICER
class TRUNK
police 128000 24000 exceed-action drop

As far as I know, this "bottom" or "second" level class-map can only match input-interface. And this second level policy must be a policer.

Now, at the parent level we create a new class to match IP traffic and then apply our child polices below that. This top-level class must match an ACL (match protocol ip gave me errors when applying the policy).

access-list 100 permit ip any any
!
class-map match-all IP
match access-group 100
!
policy-map VLAN100-PARENT
class IP
set ip precedence 1
service-policy VLAN100-POLICER
policy-map VLAN200-PARENT
class IP
set ip precedence 2
service-policy VLAN200-POLICER

Notice that I have the "set ip precedence" clause in our parent policies. These first level policies are required to have an action. You will get an error message stating this if you try to apply it to the SVI without an action:

SW2(config)#int vlan 100
SW2(config-if)#service-policy input VLAN100-PARENT
%QoS: No action is configured in the policymap VLAN100-PARENT classmap IP, or it is being modified.


So make sure you have set or trust clause in there. Now we can apply them to the SVIs:

mls qos
!
interface FastEthernet0/13

mls qos vlan-based
!
interface Vlan100
no ip address
service-policy input VLAN100-PARENT
!
interface Vlan200
no ip address
service-policy input VLAN200-PARENT

From R1, R3 and R5 I will send a bunch of pings to R2:

R1#ping 192.168.100.2 re 1000000
R3#ping 192.168.100.2 re 1000000
R5#ping 192.168.200.2 re 1000000

Let's look at R2 after a few minutes.

R2#sho policy-map interface e0/0.100 | section VLAN100
Class-map: VLAN100 (match-any)
107819 packets, 12722642 bytes
30 second offered rate 50000 bps
Match: access-group 1
107819 packets, 12722642 bytes
30 second rate 50000 bps

R2#sho policy-map interface e0/0.200 | section VLAN200
Class-map: VLAN200 (match-any)
156873 packets, 18511014 bytes
30 second offered rate 107000 bps
Match: access-group 2
156873 packets, 18511014 bytes
30 second rate 107000 bps

We don't see the limits of 64k and 128k being reached, but the drops on the senders indicate that policing is working. And we can also tell VLAN 200 is getting roughly twice the bandwidth that VLAN 100 is getting. We could get closer to the limit by adjusting the burst sizes appropriately.

Key things to remember:
  • Child classes use match input-interface
  • Child policies use police
  • Parent classes match ACL (I think you can also match dscp, maybe others)
  • Parent policies must have an action (e.g. set or trust)
  • Apply parent policies to SVI
I strongly recommend getting your hands dirty with these configurations if you want to master them. I read a lot about switch qos, but it wasn't until I started playing around with scenarios like this that I got a better understanding of how to do it and what is required. If we truly understand what each QoS method does, then we should have no trouble deciphering what we are asked to do on the lab :)

3560 QoS: VLAN-Based Classification

This is a topic I learned about while reading blogs over at IE. Here is the original:

Comparing Traffic Policing Features in the 3550 and 3560 switches

I have the following topology:

R1----|
R3---SW1---SW2---R2
R5----|

R1,R3 are in vlan 100, 192.168.100.0/24
R5 is in vlan 200, 192.168.200.0/24

R2 is on a trunked port with the following configuration:

interface Ethernet0/0.100
encapsulation dot1Q 100
ip address 192.168.100.2 255.255.255.0
ip accounting precedence input
no snmp trap link-status
!
interface Ethernet0/0.200
encapsulation dot1Q 200
ip address 192.168.200.2 255.255.255.0
ip accounting precedence input
no snmp trap link-status

On SW2 we will enable vlan-based qos and then mark traffic based on ACLs. First we make the ACLs:

ip access-list extended ICMP
permit icmp any any
ip access-list extended TCP
permit tcp any any

Next we make our class-maps and policy-maps:

class-map match-all ICMP
match access-group name ICMP
class-map match-all TCP
match access-group name TCP

policy-map VLAN
class TCP
set ip precedence 5
class ICMP
set ip precedence 3

Next enable mls qos, vlan-based qos and apply the policy to an SVI. Note that the SVI does not need an IP address:

mls qos

int f0/13
interface FastEthernet0/13
switchport trunk encapsulation dot1q
switchport trunk native vlan 50
switchport mode trunk
mls qos vlan-based

int vlan 100
service-policy input VLAN
int vlan 200
service-policy input VLAN

Now run some tests. Here I Ping and Telnet from R5, telnet from R1 and then ping from R3:

R5#ping 192.168.200.2 rep 100

Type escape sequence to abort.
Sending 100, 100-byte ICMP Echos to 192.168.200.2, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (100/100), round-trip min/avg/max = 1/3/4 ms
R5#

R5#telnet 192.168.200.2
Trying 192.168.200.2 ... Open

R2>exit

[Connection to 192.168.200.2 closed by foreign host]
R5#

R1#telnet 192.168.100.2
Trying 192.168.100.2 ... Open

R2>exit

[Connection to 192.168.100.2 closed by foreign host]
R1#

R3#ping 192.168.100.2 re 50

Type escape sequence to abort.
Sending 50, 100-byte ICMP Echos to 192.168.100.2, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (50/50), round-trip min/avg/max = 1/3/4 ms
R3#

Verify on R2:

R2#sho int precedence
Ethernet0/0.100
Input
Precedence 3: 50 packets, 5900 bytes
Precedence 5: 46 packets, 2953 bytes
Ethernet0/0.200
Input
Precedence 3: 100 packets, 11800 bytes
Precedence 5: 15 packets, 969 bytes
R2#

Sunday, February 1, 2009

TCP Load Balancing, Destination NAT

The "ip nat inside destination" command can be used to split up the load from what looks like one global destination, to several inside hosts. This behaves very much like server load balancing, at least without all the health checks.

Below is the topology. I have static default routes from R1, R2, and R3 pointing to R4. R7 has a static route to each serial link.


Here is R4's config:

interface FastEthernet0/0
ip address 192.168.0.4 255.255.255.0
ip nat inside
ip virtual-reassembly
!
interface Serial1/0
ip address 192.168.45.4 255.255.255.0
ip verify unicast reverse-path
ip nat outside
ip virtual-reassembly
serial restart-delay 0
!
interface Serial1/1
ip address 192.168.46.4 255.255.255.0
ip verify unicast reverse-path
ip nat outside
ip virtual-reassembly
!
ip route 0.0.0.0 0.0.0.0 192.168.45.5
ip route 0.0.0.0 0.0.0.0 192.168.46.6
!
ip nat pool POOL 192.168.0.1 192.168.0.3 prefix-length 24 type rotary
ip nat inside destination list 10 pool POOL
!
access-list 10 permit 192.168.45.10
access-list 10 permit 192.168.46.10

From R7 we will verify:

R7#telnet 192.168.45.10
Trying 192.168.45.10 ... Open

R1>
R1>exit

[Connection to 192.168.45.10 closed by foreign host]
R7#telnet 192.168.45.10
Trying 192.168.45.10 ... Open

R2>exit

[Connection to 192.168.45.10 closed by foreign host]
R7#telnet 192.168.45.10
Trying 192.168.45.10 ... Open

R3>exit

[Connection to 192.168.45.10 closed by foreign host]
R7#telnet 192.168.46.10
Trying 192.168.46.10 ... Open

R1>exit

[Connection to 192.168.46.10 closed by foreign host]
R7#telnet 192.168.46.10
Trying 192.168.46.10 ... Open

R2>exit

[Connection to 192.168.46.10 closed by foreign host]
R7#

R4's NAT table:

R4#sho ip nat translations
Pro Inside global Inside local Outside local Outside global
tcp 192.168.45.10:23 192.168.0.1:23 200.0.0.7:51519 200.0.0.7:51519
tcp 192.168.46.10:23 192.168.0.1:23 200.0.0.7:64139 200.0.0.7:64139
tcp 192.168.46.10:23 192.168.0.2:23 200.0.0.7:11691 200.0.0.7:11691
tcp 192.168.45.10:23 192.168.0.2:23 200.0.0.7:62913 200.0.0.7:62913
tcp 192.168.45.10:23 192.168.0.3:23 200.0.0.7:17295 200.0.0.7:17295

I used two links just to show the flexibility of this configuration. I was playing around with route-map NAT failover/LB and then decided to work on this scenario.

NTP - How long is too long?

This is how long I waited for NTP to sync today:

R2(config)#ntp server 136.10.4.4
R2(config)#^Z
Feb 1 19:26:53.915: %SYS-5-CONFIG_I: Configured from console by console

Feb 1 19:37:11.852: NTP Core(NOTICE): Clock is synchronized.


More than 10 minutes. It should be noted that the clocks were only seconds apart to begin with. Code on these routers is 12.4(22)T. I don't know if I have ever waited so long but it's unbelievably ridiculous.

Then I enable authentication:

R4(config)#ntp authentication-key 1 md5 ipexpert

R2(config)#ntp authentication-key 1 md5 ipexpert
R2(config)#ntp trusted-key 1
R2(config)#ntp authenticate
R2(config)#ntp server 136.10.4.4 key 1

Feb 1 19:45:02.628: NTP Core(INFO): key (1) added.
Feb 1 19:45:02.752: NTP Core(INFO): key (1) marked as trusted.
Feb 1 19:45:03.276: NTP Core(INFO): system event 'event_clock_reset' (0x05) status 'sync_alarm, sync_unspec, 10 events, event_peer/strat_chg' (0xC0A4)
Feb 1 19:45:03.276: NTP Core(NOTICE): Clock synchronization lost.


Peers never come up, I get this every so often (debug ntp all):

.Feb 1 19:45:47.852: NTP message sent to 136.10.4.4, from interface 'Loopback0' (136.10.2.2).
.Feb 1 19:45:47.852: NTP message received from 136.10.4.4 on interface 'Loopback0' (136.10.2.2).
.Feb 1 19:45:47.852: NTP Core(DEBUG): ntp_receive: message received
.Feb 1 19:45:47.852: NTP Core(DEBUG): ntp_receive: peer is 0x674B9DF8, next action is 1.
.Feb 1 19:45:47.852: NTP Core(NOTICE): ntp_receive: dropping message: crypto-NAK.

.Feb 1 19:50:52.852: NTP message sent to 136.10.4.4, from interface 'Loopback0' (136.10.2.2).
.Feb 1 19:50:52.852: NTP message received from 136.10.4.4 on interface 'Loopback0' (136.10.2.2).
.Feb 1 19:50:52.852: NTP Core(DEBUG): ntp_receive: message received
.Feb 1 19:50:52.852: NTP Core(DEBUG): ntp_receive: peer is 0x674B9DF8, next action is 1.
.Feb 1 19:50:52.852: NTP Core(NOTICE): ntp_receive: dropping message: crypto-NAK.


Here we are still

.Feb 1 19:58:19.851: NTP message sent to 136.10.4.4, from interface 'Loopback0' (136.10.2.2).
.Feb 1 19:58:19.851: NTP message received from 136.10.4.4 on interface 'Loopback0' (136.10.2.2).
.Feb 1 19:58:19.851: NTP Core(DEBUG): ntp_receive: message received
.Feb 1 19:58:19.851: NTP Core(DEBUG): ntp_receive: peer is 0x674B9DF8, next action is 1.
.Feb 1 19:58:19.851: NTP Core(NOTICE): ntp_receive: dropping message: crypto-NAK
.

So, for kicks on the master I do this:

R4(config)#ntp authenticate
R4(config)#ntp trusted-key 1


I now get a new message on R2:

.Feb 1 20:01:20.851: NTP message sent to 136.10.4.4, from interface 'Loopback0' (136.10.2.2).
.Feb 1 20:01:20.851: NTP message received from 136.10.4.4 on interface 'Loopback0' (136.10.2.2).
.Feb 1 20:01:20.851: NTP Core(DEBUG): ntp_receive: message received
.Feb 1 20:01:20.851: NTP Core(DEBUG): ntp_receive: peer is 0x674B9DF8, next action is 1.
.Feb 1 20:01:20.851: NTP Core(DEBUG): receive: packet given to process_packet


This looks promising:

R2#
.Feb 1 20:03:30.851: NTP message sent to 136.10.4.4, from interface 'Loopback0' (136.10.2.2).
.Feb 1 20:03:30.851: NTP message received from 136.10.4.4 on interface 'Loopback0' (136.10.2.2).
.Feb 1 20:03:30.851: NTP Core(DEBUG): ntp_receive: message received
.Feb 1 20:03:30.851: NTP Core(DEBUG): ntp_receive: peer is 0x674B9DF8, next action is 1.
.Feb 1 20:03:30.851: NTP Core(DEBUG): receive: packet given to process_packet
.Feb 1 20:03:30.851: NTP Core(DEBUG): Peer becomes reachable, poll set to 6.
.Feb 1 20:03:30.851: NTP Core(INFO): peer 136.10.4.4 event 'event_reach' (0x84) status 'unreach, conf, auth, 1 event, event_reach' (0xE014)


TA-DA!

.Feb 1 20:06:43.851: NTP Core(NOTICE): Clock is synchronized.

I have never had to enable trusted-key on the master before. Watch this:

R4(config)#no ntp trusted-key 1

Back on R2:

R2#
Feb 1 20:07:47.851: NTP message sent to 136.10.4.4, from interface 'Loopback0' (136.10.2.2).
Feb 1 20:07:47.851: NTP message received from 136.10.4.4 on interface 'Loopback0' (136.10.2.2).
Feb 1 20:07:47.851: NTP Core(DEBUG): ntp_receive: message received
Feb 1 20:07:47.851: NTP Core(DEBUG): ntp_receive: peer is 0x674B9DF8, next action is 1.
Feb 1 20:07:47.851: NTP Core(INFO): system event 'event_clock_reset' (0x05) status 'sync_alarm, sync_unspec, 15 events, event_peer/strat_chg' (0xC0F4)
Feb 1 20:07:47.851: NTP Core(NOTICE): Clock synchronization lost.
.Feb 1 20:07:47.851: NTP Core(NOTICE): ntp_receive: dropping message: crypto-NAK.


Maybe something has changed in this T train but looks like we need "ntp trusted-key" on the Master now. I am not an NTP guru by any means but if you look at some of my other ntp blogs, you will see I didn't need this command. Note that I only needed "trusted-key" on the Master, not "ntp authenticate" even though I showed it above. Removing it did not cause sync loss. Something to keep in mind if you find yourself singing the NTP blues.

Oh, and while you are waiting for the sync - go configure something else in the meantime!