Machine Learning for Robust Network Design: A New Perspective

1 minute read


This post covers paper “What is being transferred in transfer learning” by Google / Google Brain (NeurIPS 2020).

Terminology and Abbreviations

GAT: Graph Attention Network (GAT)



  • Network architecture must provide resilience under numerous failures, including hardware faults, software bugs in the control plane, and misconfiguration by the network operator.
  • ML can identify critical failures impacting network performance from many possible failure scenarios.
  • ML provides a common kernel that can benefit many robust network design problems.
  • $GAT(\text{topology, traffic demand, routing decisions, and target failure scenarios}) = \text{failure impact predictions}$
  • Network Topology Links $= m$ and Simultaneous Link Failures $= f$
  • There are $m$ links, and each link can be up or down,
  • For a network topology with m links, only considering f simultaneous link failure combinations would bring O(m^f) failure scenarios under consideration, causing a super-linear growth in the LP/ILP problems with the network scale. Moreover, the solution time to such LP/ILP problems also increases super-linearly withthe number of decision variables and constraints
  • MLU increase indicates tiontraffic to predict the impact of target as failure scenarand assignments, e.g., how work (GAT) [12] is a type of graph neural network that the degree of congestion increase in the network ios and figure out critical failure scenarios with neighboring links are affected by a failed link in the network We have shown that a GAT-based function approximation could accurately predict the failure impact, detect the critical failure scenarios, and enhance the scalability of three important classes of robust network design problems Besides the results sentative cases, model, robust we network wide range graph ifsolution. Their corresponding links share a com- use based could design evaluatehas thea impact of all above, we could improve the model performance by a ofSecond, mon endpoint in the original topology To resolve the common kernel, we propose a GAT-based algorithm to evaluate the failure impact and figure out the critical failure scenarios among multiple link failures, and apply our proposed GAT-based algorithm to solve three typical recast robust network design problems