Five Things I Learned at the “Ethernet in the Age of AI” Conference

Five Things I Learned at the “Ethernet in the Age of AI” Conference

This is first issue of my CIR newsletter “From My Desk” in which I plan to cover new developments in optical networking, AI and quantum technology with an eye to economics and use cases.

Last month I attended the “Ethernet in the Age of AI” event organized by the Ethernet Alliance (https://ethernetalliance.org) at the Santa Clara Convention Center.  My Two days in the Valley (https://www.imdb.com/title/tt0115438/) provided an excellent insight into the path forward for AI data centers and Ethernet’s role in this advancement.  It also gave me a few reasons for personal nostalgia.  AI was the first or many technologies that I got seriously obsessed with.

Actually, back in the 1970s I thought of doing a PhD in AI until I realized that, at the time, AI was a science project not a market. Also, if I remained in the UK (where I grew up) I could only do an AI PhD at the University of Edinburgh – a great school but in a city where the weather was less than friendly.  In the end I moved to the mid-Atlantic region of the US.  After a few years of apprenticeship at top notch analyst firms I started my own analyst firm focused high-speed networking, although I have also followed the progress of the AI market all these years.   Apart from one name change, and an extension of the coverage it is still very much the CIR (www.www.cir-inc.com) I run today.

I digress!   I learned a lot at “Ethernet in the Age of AI” event – too much to cram it into one newsletter.  However, my big takeaways from both the talks and panels at the event itself (and my own thoughts about what I heard afterwards) can be summarized in the four points below.

#1 AI will be the fastest-growing network workload, but three kinds of AI stand out.  In both cloud and enterprise (private) data centers, we can count on AI dominating networks for perhaps a decade or longer.  But what kind of AI will be doing the dominating?  Based on both revenue potential and the ability to protect IP three kinds of AI, strike me as likely to make a special contribution to growth and revenues. These are shown in the Exhibit below along with their impact on the network.

#2 Servers will be rethought and re-designed to accommodate the needs of the AI data center.  For LLMs, at least, more memory will be required in servers. and this will most likely be distributed memory spread across multiple processors in a server or across multiple servers.  And more processing power too.  Enter the “superchip.” Led by NVIDIA (www.nvidia.com) with its Grace-Blackwell node that contains two GPUs with one CPU to make such “superchip.” Despite the presence of NVIDIA, this is still somewhat virgin territory (1) presenting opportunities for semiconductor startups and (2) supporting the current shakeup in semiconductor industry — think NVIDIA replacing Intel (www.Intel .com) in the Dow.

 

Selected AI Technologies and their Impact on the Data Center
Type of AI Current Status Use Case Impact on data centers
Text-to-video AI Subject of considerable R&D and product development.  Small firms rushing into this space Driven by use case to enhance digital marketing with avatars and clips. Beyond that AI-based video offers some awesome opportunities in the entertainment, gaming, training and signage.  What’s next after text to speech Strong impact fueled by nature of video and the fact that the potential market is very large
Neural networks Underpinning infrastructure for Deep Learning Mirrors brain structure which means that there is less need to analyze the AI to understand its “thought processes.” Neural networks have considerable untapped potential to recognize patterns and to draw insight from large data sets Neural networks seem to have strong advantages and potential and may come to dominate AI traffic
Small language models In development and often focused on a particular application Stripped-down AI packages.  Training data, deployment, and maintaining is considerably less resource-intensive, than classic LLMs making it a viable option for smaller enterprises or specific departments within larger organizations. Better still such cost efficiency does not come at the expense of performance May appear and relieve the AI data center of some of its current bandwidth and power requirements. he smaller size of SLMs translates directly into lower computational and financial costs.

 

#3.  The optical transceiver market will explode in a way that we haven’t seen since the beginning of optical enterprise networks in 1990s.  At the “Ethernet in the Age of AI” conference that I attended, the focus was (guess what?) on Ethernet.  Everyone now agrees that Ethernet will be murdered by Ethernet in the data center (www. https://en.wikipedia.org/wiki/Cluedo), but there is disagreement over how long this killing will take place.  InfiniBand currently offers speeds up to 200 Gbps and beyond, which is highly suitable for AI workloads involving massive data transfers.  But with 800G Ethernet now commonplace and Synopsys (https://www.synopsys.com/) claiming the industry’s first 1.6T Ethernet core, InfiniBand seems doomed.

Recalling that Ethernet once ran at 10 Mbps, 1.6T Ethernet is gasp provoking.  And yet, from a long-term business perspective some caution is needed.  The economics of Ethernet is impressively solid and therefore not friendly to hyperbole or building a new technology sector.  Ethernet is based on a philosophy of reuse.  Thus subcomponents, signaling rates, and so on are borrowed from one generation for another.  This is conducive to low cost/Mbit but tends to create an Ethernet sector that can be hard to break into.

The good news – at least if you are looking for opportunities in the AI data center networking infrastructure — is that sooner or later, AI will also move beyond Ethernet.  Two points are important here.  First there seems to me to be some handwaving when it comes the ability of Ethernet to handle latency for AI.  Second there are thermal problems that must be overcome. According to one source under pressure to get AI onto the net, the power envelope per rack could increase beyond 100kw.

#4 Cool Runnings for AI

At some point – and I hesitate to say when – we are going to move beyond Ethernet.  The new solutions will most likely involve (1) co-packaged optics (CPO)/optical integration and/or (2) quantum processors (QPUs). There is much to say about these technologies as facilitators of bandwidth and adequate latency, but I’ll save this to another day.  For now, we’ll just note that both provide their own solutions to our current thermal concerns.

CPO as envisioned by the OIF Forum (https://www.oiforum.com/) already has some cool thinking (pun intended) in its implementors agreements (IAs).  As far as QPUs are concerned, they don’t have an implicit cooling mechanism as such but the literature on CPUs betrays that there is a widespread hope that QPUs will evolve to a point where they are more thermally efficient than MPUs that form much of the infrastructure in the AI data centers.

And then there is liquid cooling which is increasingly everyone’s favorite solution for the AI data center heat problem.  NVIDIA and many of the cloud service providers have this technology already but deploying it – whether retrofitting or building new data centers may prove challenging and costly.

These topics will be explored in CIR’s forthcoming report: Networks and Power Requirements for AI Data Centers: A Ten-year Market Forecast and Technology Assessment (To be Released in November 2024)

Until next time when From My Desk will turn its attention to the Quantum + AI event in New York.

Lawrence Gasman

President

lawrence@www.cir-inc.com

Scroll to Top
Scroll to Top