The energetics of computation in AI: scale is not free and lessons from the bar-tailed godwit
Aeronautical engineers are still fascinated by birds like the bar-tailed godwits which fly 7,000 miles each year from Alaska to New Zealand without stopping for food or rest [21]. They make the journey by relying on an inner “GPS” which is also a mystery. The record holder based on satellite tracking flew 8,100 miles (13,000 kilometers) in 10 days [23].
Birds have a highly efficient energy metabolism. For example, researchers found that pigeons not only have a higher number of neurons compared to mammals of similar size, but these neurons consume three times less glucose [1].
Birds develop fascinating learning and cognitive abilities such as spatial memory, episodic-like memory, high-acuity vision, motor control enabling sophisticated flight maneuvers, and even vocal learning. Pigeons have shown the ability to detect breast cancer in radiology images after few days of training through reinforcement [18].
Some researchers attribute the navigation abilities of bar-tailed godwits and other migratory birds to quantum effects in their eyes allowing them to perceive Earth’s magnetic field lines [2]. A bar-tailed godwit that was being tracked recently made a U-turn back to Alaska after flying for 57 hours non-stop [23].
Scale is not free
In a paper titled Energy limitation as a selective pressure on the evolution of sensory systems [3], Niven and Laughlin write from an evolutionary biology perspective:
“Excess signal processing capacity in sensory systems is severely penalized by increased energetic costs producing a Law of Diminishing Returns.”
Energetic costs represent a selective pressure not only on the evolution of biological organisms but also engineered AI systems. Ideas about the role of scale in the progress of AI (scaling laws, scale is all you need, scaling maximalism, or the bitter lesson [13] ) must consider the energetics of computation and its costs. We need to prepare for the end of Moore’s Law.
Compute costs are an important consideration in the real world because most organizations do not have Google-scale or Microsoft-scale data and compute infrastructure and many AI-enabled systems (like autonomous vehicles) and devices will actually operate at the edge of the network.
OpenAI’s impressive achievements with ChatGPT were made possible by Microsoft’s $13 billion investment to cover compute costs on the Azure supercomputer. As a result, Microsoft now has enormous leverage over OpenAI and its future.
Concerns over AI sovereignty in the UK have prompted a study titled Independent Review of The Future of Compute: Final report and recommendations [19] which was published in March 2023. Other European countries have also expressed their concerns over their autonomy and sovereignty in AI. The European Laboratory for Learning and Intelligent Systems (ELLIS) was created “to help ensure Europe’s sovereignty and competitiveness in the field of modern AI”.
At the time of writing, Europe is still dealing with an energy crisis resulting from Russia’s invasion of Ukraine which started in February 2022 [17]. In an article in Foreign Affairs titled The Age of Energy Insecurity — How the Fight for Resources Is Upending Geopolitics [20], Jason Bordoff and Meghan L. O’Sullivan argue for the convergence of energy security and climate security as policy priorities.
Energetic costs are not just financial but also environmental. In Making AI Less "Thirsty": Uncovering and Addressing the Secret Water Footprint of AI Models [22], Li et al. write
“For example, training GPT-3 in Microsoft’s state-of-the-art U.S. data centers can directly consume 700,000 liters of clean freshwater (enough for producing 370 BMW cars or 320 Tesla electric vehicles) and the water consumption would have been tripled if training were done in Microsoft's Asian data centers, but such information has been kept as a secret.”
Going nuclear is not the answer
Microsoft is considering nuclear power [28, 29, 30] as a solution to the energy demands of its AI business.
The safety and reliability of nuclear reactors have been proven over six decades of operations around the world. However, nuclear waste disposal remains an environmental and financial challenge [31]. In fact, a study titled Nuclear waste from small modular reactors [34] published in 2022 by researchers from Stanford University and the University of British Columbia in the Proceedings of the National Academy of Sciences states:
“Results reveal that water-, molten salt–, and sodium-cooled SMR [small modular reactors] designs will increase the volume of nuclear waste in need of management and disposal by factors of 2 to 30.”
I would like to learn more about Microsoft’s plans regarding the storage and handling of nuclear waste and the decommissioning of these reactors over long time horizons (decades or even centuries) without shifting the costs to taxpayers [32].
According to a 2023 report by the US National Academies of Sciences, Engineering, and Medicine titled Merits and Viability of Different Nuclear Fuel Cycles and Technology Options and the Waste Aspects of Advanced Nuclear Reactors [33].
“As the United States nears the 40th anniversary of the Nuclear Waste Policy Act (NWPA) (Public Law 97-425) and its Amendments (Public Law 100-203, Part E), there is no clear path forward for the siting, licensing, and construction of a geologic repository for the disposal of highly radioactive waste (mainly commercial spent nuclear fuel).”
The report then recommends:
“As a top priority, the committee highlights that Congress will need to establish a single-mission entity with responsibility for managing and disposing of commercial nuclear waste.”
Admiral Hyman G. Rickover is known as the “Father of the Nuclear Navy” for his pioneering work on the Naval Nuclear Propulsion Program and the safety and reliability of nuclear reactors. He was called by President Jimmy Carter “the greatest engineer who ever lived.” During a hearing before the Joint Economic Committee of the Congress of the United States in 1982, Admiral Hyman G. Rickover remarked [35]:
“There are, of course, many other things mankind is doing which, in the broadest sense, are having an adverse impact, such as using up scarce resources. I think the human race is ultimately going to wreck itself. It is important that we control these forces and try to eliminate them.
In this broad philosophical sense, I do not believe that nuclear power is worth the present benefits since it creates radiation. You might ask why do I design nuclear-powered ships? That is because it is a necessary evil. I would sink them all.”
Quality data is not free and will run out
What about training with more data? In the paper Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning [14], Villalobos et al. write:
“Our analysis indicates that the stock of high-quality language data will be exhausted soon; likely before 2026.”
There are also ongoing legal cases and class action lawsuits against certain companies that have released Generative AI products for the unauthorized mass collection and use of online content posted by internet users and commercial publishers.
The News/Media Alliance Group representing the New York Times and 2,200 members produced a White Paper titled How the Pervasive Copying of Expressive Works to Train And Fuel Generative Artificial Intelligence Systems Is Copyright Infringement and Not a Fair Use [27].
Evolving engineered AI systems toward better energy efficiency requires the right level of abstraction and separation of concerns. The neuroscientist David Marr proposed three levels of analysis of intelligent systems: computational, algorithmic, and implementational. The latter is the hardware or physical substrate and can co-evolve with the other two. So, progress in AI will move along these three axes.
The computational level
When quality data is scarce, you need inductive biases derived from human knowledge of the domain. Note that this was not the case in AlphaGo where massive data was machine-generated through self-play; the game was fully-observable; and there was Google-scale compute available.
The computational level is about understanding the cognitive problem at hand. It will come from treating AI as not pure statistical learning but the interdisciplinary field that it truly is and integrating bodies of knowledge from other disciplines like neuroscience, the cognitive sciences, biology, physics, and complexity science.
AI is software and you can’t build autonomy software by ignoring the requirements. At Atlantic AI Labs, an introductory course on key concepts in neuroscience and the cognitive sciences is part of the enrollment process. Python, Julia, Linear Algebra, Calculus, and Machine Learning come later.
British psychiatrist Robin Murray once remarked that “We won’t be able to understand the brain. It is the most complex thing in the universe” [4]. I am optimistic that science will continue to make progress in unraveling the mysteries the mind. For example, advances in neurophysiological tools for measuring brain activity and new data analysis methods based on machine learning are aiding discovery in neuroscience. There is indeed a mutually enriching relationship between AI and neuroscience.
As the physicist David Hilbert put it: Wir müssen wissen – wir werden wissen which means “We must know — we will know.” Interestingly, David Hilbert was responding to the maxim ignoramus et ignorabimus which means “We do not know and will not know” first articulated by German physiologist Emil du Bois-Reymond while expressing his opinion on the limits of scientific knowledge.
The algorithmic level
While Deep Learning using stochastic gradient descent and backpropagation is currently the leading approach to AI in vision, language, and speech, we need to keep an open mind about other approaches emerging in the future.
In his recent paper presented at the NeurIPS 2022 conference titled The forward-forward algorithm: Some preliminary investigations [5], Geoffrey Hinton writes:
“As a model of how cortex learns, backpropagation remains implausible despite considerable effort to invent ways in which it could be implemented by real neurons. There is no convincing evidence that cortex explicitly propagates error derivatives or stores neural activities for use in a subsequent backward pass.”
In a talk titled Aetherial Symbols [15] at an AI Symposium in March 2015, Geoffrey Hinton was dismissive of the role of symbols comparing them to Aether. To be fair, Hinton also said:
“The future depends on some graduate student who is deeply suspicious of everything I have said.”
Recent work by Dehaene et al. [16] suggests that in humans, symbols (in domains like language, mathematics, and music) play a role in compressing information for memory storage and also enable compositionality — the ability to compose new arbitrary concepts from existing ones. According to Dehaene, these recursive and hierarchical tree structures evolved from neural circuits involved in spatial navigation and the representation of geometric shapes as can be found on prehistoric engravings in the caves.
For example, the laws of physics are expressed in very compact mathematical formulae with few parameters compared to the hundreds of billions of parameters in a language model like GPT-4. These laws of physics have served us well in the field of aerospace. In [26], Udrescu and Tegmark developed a recursive multidimensional Symbolic Regression algorithm that was able to discover 100 equations from the Feynman Lectures on Physics.
Implementing symbols in AI systems does not necessarily mean a return to logic-based expert systems of 1980s also known as GOFAI or Good Old-Fashioned Artificial Intelligence. Two promising directions of research also leading to neuromorphic computing and energy-efficient AI are Hyperdimensional Computing aka Vector Symbolic Architectures (VSA) [25] and the Assembly Calculus introduced in [24] by Papadimitrioua et al.
The implementational level: the physics of computation
This is the physical substrate or hardware. Neuromorphic computing [7-10] and advances in the co-design of algorithm and hardware could be a part of the solution.
In the same paper mentioned above [5], Hinton explains why his proposed forward-forward (FF) algorithm could be suitable to analog computing and introduced the concept of Mortal Computation — which questions the current approach of de-coupling hardware and software — as a path to energy efficiency. Hinton describes the FF algorithm as follows:
“The idea is to replace the forward and backward passes of backpropagation by two forward passes that operate in exactly the same way as each other, but on different data and with opposite objectives. The positive pass operates on real data and adjusts the weights to increase the goodness in every hidden layer. The negative pass operates on ‘negative data’ and adjusts the weights to decrease the goodness in every hidden layer .”
In the paper, two different measures of goodness are described: the sum of the squared neural activities and the negative sum of the squared activities.
In the paper Computability of Optimizers [6], Lee at al. provide proof of the non-computability of optimizers for neural networks on digital hardware modeled by a Turing machine.
Despite all the hype and few real-world problem-solving implementations of Quantum Computing [11], we should not discount the possibility of a breakthrough because of how fundamental Quantum Mechanics is to describing the physical world. As Joe Altepeter who leads DARPA’s Utility-Scale Quantum Computing (US2QC) program put it:
“The goal of US2QC is to reduce the danger of strategic surprise from underexplored quantum computing systems” [12].
References
[1] von Eugen, Kaya, Heike Endepols, Alexander Drzezga, Bernd Neumaier, Onur Güntürkün, Heiko Backes, and Felix Ströckens. "Avian neurons consume three times less glucose than mammalian neurons." Current Biology 32, no. 19 (2022): 4306-4313.
[2] “How Migrating Birds Use Quantum Effects to Navigate” https://www.scientificamerican.com/article/how-migrating-birds-use-quantum-effects-to-navigate/. Retrieved March 23, 2023.
[3] Niven, Jeremy E., and Simon B. Laughlin. "Energy limitation as a selective pressure on the evolution of sensory systems." Journal of Experimental Biology 211, no. 11 (2008): 1792-1804.
[4] “The brain is the most complex thing in the universe” https://www.bbc.com/news/uk-scotland-18233409. Retrieved March 23, 2023.
[5] Hinton, Geoffrey. "The forward-forward algorithm: Some preliminary investigations." arXiv preprint arXiv:2212.13345 (2022).
[6] Lee, Yunseok, Holger Boche, and Gitta Kutyniok. "Computability of Optimizers." arXiv preprint arXiv:2301.06148 (2023).
[7] Schuman, Catherine D., Shruti R. Kulkarni, Maryam Parsa, J. Parker Mitchell, Prasanna Date, and Bill Kay. "Opportunities for neuromorphic computing algorithms and applications." Nature Computational Science 2, no. 1 (2022): 10-19.
[8] Rao, Arjun, Philipp Plank, Andreas Wild, and Wolfgang Maass. "A long short-term memory for AI applications in spike-based neuromorphic hardware." Nature Machine Intelligence 4, no. 5 (2022): 467-479.
[9] Li, Guoqi; Deng, Lei; Tang, Huajing; Pan, Gang; Tian, Yonghong; Roy, Kaushik; et al. (2023): Brain Inspired Computing: A Systematic Survey and Future Trends. TechRxiv. Preprint. https://doi.org/10.36227/techrxiv.21837027.v1
[10] Christensen, Dennis V., Regina Dittmann, Bernabe Linares-Barranco, Abu Sebastian, Manuel Le Gallo, Andrea Redaelli, Stefan Slesazeck et al. "2022 roadmap on neuromorphic computing and engineering." Neuromorphic Computing and Engineering 2, no. 2 (2022): 022501.
[11] “DARPA’s explorations in quantum computing search for the art of the possible in the realm of the improbable” https://breakingdefense.com/2022/12/darpas-explorations-in-quantum-computing-search-for-the-art-of-the-possible-in-the-realm-of-the-improbable/. Retrieved March 23, 2023.
[12] “DARPA Collaborates with Commercial Partners to Accelerate Quantum Computing.” https://www.darpa.mil/news-events/2023-01-31a. Retrieved March 23, 2023.
[13] “The Bitter Lesson” http://www.incompleteideas.net/IncIdeas/BitterLesson.html. Retrieved March 23, 2023.
[14] Villalobos, Pablo, Jaime Sevilla, Lennart Heim, Tamay Besiroglu, Marius Hobbhahn, and Anson Ho. "Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning." arXiv preprint arXiv:2211.04325 (2022).
[15] AAAI Spring Symposium on KRR, Stanford University, CA, March 23-25, 2015, https://sites.google.com/site/krr2015/home/schedule
[16] Dehaene, Stanislas, Fosca Al Roumi, Yair Lakretz, Samuel Planton, and Mathias Sablé-Meyer. "Symbols and mental programs: a hypothesis about human singularity." Trends in Cognitive Sciences (2022).
[17] Europe will still be fighting an energy crisis in 2023 https://www.cnn.com/2022/12/12/energy/europe-energy-2023-iea/index.html
[18] Levenson, Richard M., Elizabeth A. Krupinski, Victor M. Navarro, and Edward A. Wasserman. "Pigeons (Columba livia) as trainable observers of pathology and radiology breast cancer images." PloS one 10, no. 11 (2015): e0141357.
[19] Independent Review of The Future of Compute: Final report and recommendations https://www.gov.uk/government/publications/future-of-compute-review/the-future-of-compute-report-of-the-review-of-independent-panel-of-experts
[20] The Age of Energy Insecurity — How the Fight for Resources Is Upending Geopolitics By Jason Bordoff and Meghan L. O’Sullivan https://www.foreignaffairs.com/world/energy-insecurity-climate-change-geopolitics-resources
[21] These Mighty Shorebirds Keep Breaking Flight Records—And You Can Follow Along https://www.audubon.org/news/these-mighty-shorebirds-keep-breaking-flight-records-and-you-can-follow-along
[22] Making AI Less "Thirsty": Uncovering and Addressing the Secret Water Footprint of AI Models https://arxiv.org/abs/2304.03271
[23] Godwit (birds) migration tracked (Global) - BBC News - 14th November 2021
[24] Papadimitriou, Christos H., Santosh S. Vempala, Daniel Mitropolsky, Michael Collins, and Wolfgang Maass. "Brain computation by assemblies of neurons." Proceedings of the National Academy of Sciences 117, no. 25 (2020): 14464-14472.
[25] Kleyko, Denis, Mike Davies, Edward Paxon Frady, Pentti Kanerva, Spencer J. Kent, Bruno A. Olshausen, Evgeny Osipov et al. "Vector symbolic architectures as a computing framework for emerging hardware." Proceedings of the IEEE 110, no. 10 (2022): 1538-1571.
[26] Udrescu, Silviu-Marian, and Max Tegmark. "AI Feynman: A physics-inspired method for symbolic regression." Science Advances 6, no. 16 (2020): eaay2631.
[27] The News/Media Alliance, "How the Pervasive Copying of Expressive Works to Train and Fuel Generative Artificial Intelligence Systems Is Copyright Infringement And Not a Fair Use". October 31, 2023 https://www.newsmediaalliance.org/generative-ai-white-paper/
[28] Drake Bennett, "Microsoft Sees Artificial Intelligence and Nuclear Energy as Dynamic Duo". Bloomberg. September 29, 2023. https://www.bloomberg.com/news/newsletters/2023-09-29/microsoft-msft-sees-artificial-intelligence-and-nuclear-energy-as-dynamic-duo
[29] Jon Gold. "Microsoft’s data centers are going nuclear". Computerworld. Sep 25, 2023. https://www.computerworld.com/article/3707472/microsofts-data-centers-are-going-nuclear.amp.html
[30] Jennifer Hiller. "Microsoft Targets Nuclear to Power AI Operations". WSJ. Dec. 12, 2023. https://www.wsj.com/tech/ai/microsoft-targets-nuclear-to-power-ai-operations-e10ff798
[31] ALLISON MACFARLANE and RODNEY C. EWING "Nuclear Waste Is Piling Up. Does the U.S. Have a Plan?" Scientific American. MARCH 6, 2023. https://www.scientificamerican.com/article/nuclear-waste-is-piling-up-does-the-u-s-have-a-plan/
[32] Laura Strickler. "Cost to taxpayers to clean up nuclear waste jumps $100 billion in a year". NBC News. https://www.nbcnews.com/news/all/cost-taxpayers-clean-nuclear-waste-jumps-100-billion-year-n963586
[33] National Academies of Sciences, Engineering, and Medicine. 2023. Merits and Viability of Different Nuclear Fuel Cycles and Technology Options and the Waste Aspects of Advanced Nuclear Reactors. Washington, DC: The National Academies Press. https://www.nationalacademies.org/our-work/merits-and-viability-of-different-nuclear-fuel-cycles-and-technology-options-and-the-waste-aspects-of-advanced-nuclear-reactors
[34] National Academies of Sciences, Engineering, and Medicine. 2023. Merits and Viability of Different Nuclear Fuel Cycles and Technology Options and the Waste Aspects of Advanced Nuclear Reactors. Washington, DC: The National Academies Press.
[35] Economics of Defense Policy: Hearing before the Joint Economic Committee, Congress of the United States, 97th Cong., 2nd sess., Pt. 1 (1982).