Maintenance and power savings in large multiplane data center fabrics

Ramakrishnan Chokkanathapuram Sundaram,Pascal Thubert

semanticscholar(2020)

引用 0|浏览2
暂无评分
摘要
Within a Data Center (DC) environment network upgrades are often challenging and may consume significant amounts of time and network administrator resources. Additionally, DC networks tend to consume large amounts of energy and dissipate considerable amounts of heat that can be challenging to evacuate in densely populated fabrics. To address these challenges techniques are presented herein that support, possibly among other things, the construction of a network model; the use of Machine Learning (ML) to predict low and high load periods and, in low periods, determining the ratio of resources that may be taken offline; updating the equal-cost multi-path (ECMP) rules in the leaves to avoid selected planes so as to take the full plane spine and super-spine nodes offline; and upgrading one of the super-spine nodes and then one of the spine nodes to ensure that there is always a rollback path in case of a problem. If the upgrading is successful, the techniques may include proceeding to upgrade all of the super-spine nodes and all of the spine nodes. DETAILED DESCRIPTION Network upgrades are often painful and consume a significant number of cycles. Among other things, network administrators need to plan for the maintenance windows and often opt for a period of downtime. A facility such as In-Service Software Upgrade (ISSU) provides a mechanism to upgrade a network device with zero data plane down time. While ISSU can change the software version on a device it cannot change the configuration of the device. There are methods like fast reloads that help in changing the configuration of a device – e.g., after reloading the device can come up with a new configuration and the device will be up after a fixed amount of down time. Both ISSU and fast reloads assume 2 Sundaram and Thubert: MAINTENANCE AND POWER SAVINGS IN LARGE MULTIPLANE DATA CENTER FAB Published by Technical Disclosure Commons, 2020 2 6559 certain dimensions regarding the scale of an environment in connection with what can be brought up and running within a certain amount of time. While ISSU is mostly nondisruptive under certain constraints, since there is some control plane down time, the learning of new flows during this window is not guaranteed. Also, based on different router form factors the ISSU times for a top of rack (ToR) device and an end of row (EoR) device for platform upgrade times can vary. In service upgrades do not change configurations, and configuration changes can have additional down time – e.g., like fast reload where a router can be booted with a new configuration. The down times can depend on the type of configuration that is being pushed. Additionally, applying a new image to a router typically reboots the routing process. As well, setting a link offline may cause packet drops until the routing has recovered in the network. Aspects of the techniques presented herein load balance around a device (e.g., a router) before it is made offline. Aspects of the techniques presented herein employ a ML and Artificial Intelligence (AI) approach with software-defined networking (SDN) technology to understand the traffic patterns, the utilization of network components, feature dependency, etc. at different times to predict low utilization time slots for the components (e.g., nodes) in support of software upgrades for various routing planes in a DC fabric. Further aspects of the techniques presented herein employ planned traffic rerouting to approach optimal scenarios to achieve a smooth upgrade with lower or no down time. DC networks tend to consume large amounts of energy. In turn, DC systems (such as, for example, NX-OS-based systems) operating at high speed dissipate considerable amounts of heat that can be challenging to evacuate in densely populated fabrics. These characteristics arise even when the network experiences lower utilization – e.g., depending upon the time of the day or the day of the week, with special days like Christmas and Black Friday, etc. Green operations consist of shutting down some routers and routing around the failures. But a basic ‘power off’ of a router leads to a service disruption until the network converges. Therefore more subtle mechanisms are required. For instance, injecting an overload bit in an Intermediate System to Intermediate System (ISIS) protocol or 3 Defensive Publications Series, Art. 3703 [2020] https://www.tdcommons.org/dpubs_series/3703 3 6559 establishing high costs on the router's links to indicate that the router is not willing to route before it is effectively shut down (e.g., a make before break approach). Aspects of the techniques presented herein support detecting a window of opportunity where the network is less loaded, within that window reducing the load of a router to nominally zero (0), and taking the router offline either for upgrading or just to save energy. An additional challenge with the upgrade activity that was described above is a coexistence problem. That is, the new image or the new configuration of an updated router may behave differently from the old version and even though it interacts well with upgraded routers, it may not do so in a brownfield environment comprising legacy routers. Upgrades therefore create an ordering problem where upgrades must be completed in a certain order and enable rollback. The challenges that were described above are normally approached as a network problem. In a traditional routing world one may have, for example, a main path (e.g., Sender Policy Framework (SPF)), Traffic Engineering (TE) paths (e.g., SDN), and a Free Range Routing (FRR) path (e.g., Topology-Independent Loop-Free Alternate (TI-LFA)). Either of the above described methods may be used to route around a router to place it offline. However, because of the coexistence problem this is not sufficient, as there is still a problem of scheduling and rolling back which requires special attention from an operator to avoid stalled nodes in the network. DCs differ from a classical interior gateway protocol (IGP) environment. Accordingly, aspects of the techniques presented herein leverage that difference to support both new upgrade mechanisms and green operations. On one hand, a DC is a world of huge ECMP network routing strategies, where all of the leaf-to-leaf paths are basically equivalent and usable. Thus in a DC world the problem is not necessarily addressable as a routing change, but rather as a forwarding change whereby the ECMP operation is altered to avoid some routers. On the other hand, large DC fabrics are organized in planes. As one example, consider for instance Facebook's design where each plane may be depicted as a different color, as illustrated in Figure 1, below. 4 Sundaram and Thubert: MAINTENANCE AND POWER SAVINGS IN LARGE MULTIPLANE DATA CENTER FAB Published by Technical Disclosure Commons, 2020 4 6559 Figure 1: Exemplary Facebook Design From all of the above several fundamental observations include, possibly among other things, that: 1. Traffic is always leaf-in/leaf-out, the leaves being the nodes at the bottom of the schemas. 2. The ingress leaf selects the plane. 3. All nodes above the leaf level belong to a given plane (i.e., they have one single color whereas the leaves are white, combining all of the colors). 4. Routing happens within a plane all the way from ingress to egress. From these observations, aspects of the techniques presented herein leverage the partition of the cloud network into planes to selectively put to sleep and then upgrade a full plane. Elements of particular interest and note within the techniques that are presented herein are discussed below. A first element of aspects of the techniques presented herein supports the construction of a network model to predict flow patterns in a DC fabric. One activity that may take place during the construction of a network model is topology discovery. The network model will learn the link connectivity, capacity, etc. of the routers and will build a topological view. The topology will also learn the various routing planes. The 5 Defensive Publications Series, Art. 3703 [2020] https://www.tdcommons.org/dpubs_series/3703 5 6559 samples are chosen such that they reflect peaks of the various dimensions used. Data samples of link usages at different times, node usages, CPU consumption, and flow patterns may be sampled and learnt using telemetry data and are classified per routing planes in the fabric. The model is also able to predict the flow occurrences and flow patterns based on learning methods. The model will also predict the set of flows that are more likely to be impacted on a per routing plane down. The flow patterns of long flows are learnt by transporting the Elephant flow trap tables in telemetry. The model will also look at port queue utilization and congestion, the routing patterns, routing tables, etc. and track low utilization times for each node and calculate the effect of node removal or a routing change. A table may be constructed for each plane based on timestamps and a list of routers with various parameters such as, for example, congestion, bandwidth, multicast states, links, etc. as different dimensions. Minimized windows on a router or load at various times may be matched to that needed for the upgrade. Additionally, high priority flows that may be present on the router during the window may be predicted. Another activity that may take place during the construction of a network model comprises the development of a range of measurements. Such measurements may include, for example: 1. Plane and node utilization. A value that combines various relations such as, for example, CPU usage, route states, unicast flows, multicast floes, congestion on the links, and link utilization in a per plane way. In addition to arriving at a specific number, also determining how each of these various dimensions contribute to the node utilization or plane load. For a node utilization value one may identify how eac
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要