In recent years, data center networks have garnered significant attention, rapidly scaling up to meet the demands of the explosive nature of current applications. One notable facet driving this expansion is the pivotal role these networks play in advancing artificial intelligence (AI) and machine learning (ML). As applications like natural language processing models (e.g., OpenAI’s GPT series) and image recognition algorithms for autonomous vehicles continue to evolve, data center networks provide the essential computational infrastructure required for the training and deployment of these sophisticated AI and ML models. Lately, significant efforts have been dedicated to enhancing the performance of data center networks, particularly in comparison to the often performance-lagging standard Clos-based topologies like Fat-Tree. One of the approaches for performance improvement is to use alternative data center network topologies. Consequently, researchers explored topologies based on Expander Graphs (EGs), such as Jellyfish, Xpander, and STRAT, where they exploited the sparse and incremental nature of these new topologies. This thesis focuses on investigating the STructured Re-Arranged Topology (STRAT) as a potentially robust and efficient design for nextgeneration data centers. To benchmark STRAT’s performance against the well-known Expander data centers, a robustness framework based on geometric and connectivity-based metrics, along with throughput metrics, is adopted. The findings reveal that STRAT outperforms well-known Expander architectures, positioning them as promising alternatives that surpass the performance of present Clos-based topologies. Moreover, such observations are validated through extensive flow and packet level simulations, demonstrating STRAT’s superior performance as compared to other Expanders.
Moreover, the evolution of modern network technology has witnessed a transformative shift with the advent of programmable switches, marking a paradigmatic leap in the realm of data center networks. The programmable data plane of ASIC switches, a cornerstone of this technological advancement, has emerged as a pivotal catalyst for unprecedented innovation and efficiency in data center networks. Its versatility becomes evident in diverse applications, such as employing ML for network classification, enabling dynamic routing mechanisms to achieve line-rate speeds, and implementing In-band Network Telemetry (INT) for enhanced network visibility at a granular level. These applications underscore the transformative power of the programmable data plane, transcending traditional limitations and ushering in a new era of adaptability and performance in data center networks. Building upon this foundation, this thesis introduces a novel routing algorithm that is meticulously prototyped on the BMv2 virtual programmable switch, leveraging the expressive capabilities of the P4 programming language. This implementation serves as a tangible demonstration of the intersection between routing strategies, Expander-based topologies, and the programmable data plane. Notably, the novel routing algorithm showcases superior performance improvements over traditional Equal-Cost Multi-Path (ECMP) algorithm, affirming its potential as a promising solution for harnessing the abundant path diversity inherent in the Expander next-generation data center topologies.