Problem: the routing, forwarding, and management protocols that we run in data centers were designed for the general LAN setting and are proving inadequate along a number of dimensions.
With that in mind, there're requirements for future scenarios:
- Any VM may migrate to any physical machine without changing their IP address (if so, it will break pre-existing TCP connection and application-level state)
- An administrator should not need to configure any switch before deployment (if so, he is highly required to reconfigure when migrating any switch)
- Any end host may communicate with any others along any of communication path (fault-tolerant)
- No forwarding loop (especially in data center with a huge amount of data)
- Failure detection should be rapid and efficient
R1 and R2 require a singer layer 2 fabric => IP address is not affected when migrating VM
R3 requires a large MAC forwarding table with a large number of entries => impractical with switch hardware.
R5 requires efficient routing protocol
Forwarding
Layer 3: small forwarding table (due to hierarchically assign IP), failure is easily detected, add new switch requires administrative burden
Layer 2: less administrative overhead, bad scalable
Portland => Ethernet-compatible forwarding, routing and ARP with the goal of meeting R1 -> R5.
- Scalable layer-2 routing, forwarding, addressing
- Using fabric manager composed of PMAC and IP mapping entries. Pseudo MAC is hierarchical address => efficient forwarding and routing, as well as VM migration.
How to work?
Case 1: A packet with unknown MAC address from a host arrives at ingress switch (IS)
1 - IS create an entry in local PMAC table mapping IP and MAC of that host to PMAC of IS
2 - Send this mapping to fabric manager
An egress switch replace MAC with PMAC to maintain an illusion of unmodified MAC address at the destination host.
An ingress switch will rewrite the PMAC destination address to the MAC for any traffic destined to the host connected to that switch.
Case 2: ARP broadcast to retrieve MAC address of corresponding IP address
1 - IS intercepts that broadcast request and forward to fabric manager
2 - The fabric return that PMAC in case the IP exists in fabric tables
3 - If the IP doesn't exist in fabric manager, that request will be broadcasted to all of other pods.
4 - Then the request sent by the right host will once again rewritten by the IS (replaced MAC with PMAC) and forward to fabric manager and the requesting host
Case 3: newly migrated VM sends a gratuitous ARP with its new IP to MAC address mapping. This ARP is forwarded to fabric manager.
1 - Another host is unable to communicate due to the corresponding host with expected PMAC has not existed any more.
2 - Fabric manager sends an invalidation message to that PMAC to trap handling of subsequent packets destined to the invalid PMAC
Gratuitous ARP: packet which src_ip & dst_ip are set to the host issuing the packet and destination broadcast MAC address. This is used for:
- When a machine receives an ARP request containing a source IP that matches its own, then it knows there is an IP conflict.
- A host broadcast a gratuitous ARP reply to another hosts for updating their ARP tables.
Distribution Location Discovery
PortLand switches use their position in the global topology to perform more e cient forwarding and routing using
PortLand switches periodically send a Location Discovery Message (LDM) out all of their ports both, to set their positions and to monitor liveness in steady state. => help detecting switch + link failure
Packets will always be forwarded up to either an aggregation or core switch and then down toward their ultimate destination => avoid forwarding loop
Comments:
- Change the way that conventional switches work
- Fabric manager => centralized point of failure due to the number of mapping entries
- Converting MAC to PMAC at switch may increase the delay