Bosch Research | Large language models

This story is part of Bosch Research Blog

Discover the whole series

Bosch researcher Alessandro Oltramari explains the current challenges with Large Language Models

To build human-like intelligence into machines, we should design artificial models of cognition along with brain-inspired neural networks. Integrating cognitive architectures with generative models is the key to endowing intelligent systems with reasoning capabilities, thus unlocking the full potential of AI for real-world applications.

Large language models (LLMs) — where do we stand?

The long-standing debate on what intelligence is and entails has recently been rekindled by the advent of ChatGPT, whose remarkable fluency in conversations has spurred interest among experts and beyond. The service already averages more than a billion monthly visits. Open AI’s large language model (LLM) shows the magnitude of the social phenomenon sparked by generative AI, which is generating high expectations in industry and business markets. [i] As this new AI wave is moving at a never-before-seen pace, stimulating public curiosity and engagement, many challenges remain.

In particular, two questions arise: How can we prevent large language models from making trivial mistakes? And what is the origin of this problem?

A recent paper [ii] demonstrates that the most popular LLMs trained on “A is B” may fail to predict that “B is A”. For example, an AI that cannot reliably infer The founder of SpaceX is Elon Musk from Elon Musk is the founder of SpaceX should raise a flag or two. The Internet keeps an unforgiving record of LLMs’ fiascos, which are significant not in and of themselves, but as symptoms of an intrinsic reasoning deficiency, whether related to spatial, temporal, logical, causal, arithmetical or commonsense knowledge. So far, the attempts to address this issue have mostly focused on scaling the neural networks at the core of LLMs, e.g., the number of hyperparameters, layers, the amount of training data etc. Despite the incremental benefits that such an approach has brought about in terms of accuracy, it has not provided any breakthrough. Most of the LLMs’ errors apparently will not simply go away by augmenting the amount of training data and the computational scale of the “artificial brains”, i.e., the powerful transformer networks responsible for this AI revolution.

Is there something else missing?

One hypothesis is that, to get machines closer to human-like intelligence, we should design artificial models of cognition along with brain-inspired network structures. [iii]
Human intelligence, uncontroversially considered the highest manifestation of intelligence in nature, emerges from a complex architecture that has evolved over millions of years. Despite all the progress in neuroimaging, the functioning of the human brain is still to be fully understood. One peculiar property of human intelligence is reasoning, i.e., the ability to generalize over knowledge acquired via direct or mediated experience, and to exhibit robust behavior in novel situations. This activates several different areas of the brain. However, mapping these functions to the corresponding neural substrata does not explain how reasoning works. In this regard, the theory formulated by the Nobel prizewinner Daniel Kahneman is useful. The psychologist defined two ways of thinking, namely System 1 and System 2. System 1 is a near-instantaneous process; it happens automatically, intuitively, and with little effort. It is driven by instinct and our experiences. System 2 is slower, requires more effort, and is associated with conscious and logical thinking.

Thinking with System 1: a near-instantaneous process

State-of-the-art LLMs are based on pattern recognition — a method that falls under System 1. They can predict how to generate or complete a sentence by leveraging the co-occurrence of words statistically learned from huge training data sets. However, this skill does not prevent open AI language tools from making mistakes.

But it is not all about language. One of the biggest challenges to get cars to drive fully autonomous is performance degradation outside training data, which can prevent self-driving vehicles from safely adapting to unseen scenarios. If human drivers were only making decisions based on what they perceive, every street corner and intersection would be plagued by accidents. Current machine learning methods sometimes seem weak when they are required to generalize beyond the training data. [iv]

Thinking with System 2: a slower process that requires more effort

Humans, on the contrary, are very good at generalizing from a few examples and filling gaps in experience with System 2 thinking. When asked about what happens after a bottle of red wine is thrown against a wall, we can answer with the utmost certainty that the bottle will shatter and the wall will be wet and stained red, without the need for any empirical proof. We know that a forceful impact between a fragile material and a hard surface typically ends with the former being altered, if not destroyed. Compared to current deep learning architectures based on accelerated computing, human reasoning capabilities are even more impressive when we factor in what Nobel prize and Turing Award winner, Herbert Simon, used to call bounded rationality, i.e., the notion that humans operate with limited knowledge and are subject to time constraints when they make decisions in the real world. For instance, learner drivers only need limited instructions to learn how to safely drive a car, adapting their knowledge to novel situations.

Machines with human-like intelligence

As these arguments suggest, replicating the core aspects of human cognition at the computational level seems to be a sensible path towards machines that truly exhibit human-like intelligence. This is indeed the direction pursued by research in cognitive architectures [v], which are computational artifacts that attempt to capture the invariant mechanisms of human cognition, including those underlying the functions of control, learning, memory, adaptivity and action. Using a combination of symbolic and sub-symbolic methods, nowadays referred to as Neuro-symbolic AI, cognitive architectures have for many years been applied to practical problems in critical domains, where reasoning is key to a successful task execution, e.g. reproducing the behavior of an airplane pilot, modeling the strategies of cyber defenders and attackers or tutoring university-level students in mathematics.

The potential of cognitive architectures for industry is enormous, especially in sectors where both technical operators and ordinary users are increasingly aided by AI, e.g., manufacturing, smart environments or customer service support. However, without equipping cognitive architectures with powerful pattern recognition capabilities at scale, that potential could remain largely untapped.

In summary, the synergistic integration of cognitive architectures and generative models represents a necessary step to enable both System 1 and System 2 capabilities in machines.

But what would integration of both systems look like?

From October 25^th to 27^th 2023 a dedicated symposium of the Association for the Advancement of Artificial Intelligence will take place. Scientists around the world — including the writer of this blog, Bosch Research expert Alessandro Oltramari — are gathering in Westin Arlington Gateway, Arlington, Virginia, USA to start tackling this question. Stay tuned for highlights and takeaways from this event.

What are your thoughts on this topic?

Please feel free to share them or to contact me directly.

Portrait of our Bosch Research expert Alessandro Oltramari

Author: Alessandro Oltramari

Alessandro joined Bosch Research in 2016, after working as a postdoctoral fellow at Carnegie Mellon University, USA. At Bosch, he focuses on neuro-symbolic reasoning for decision support systems. Alessandro’s primary interest is to investigate how semantic resources can be integrated with data-driven algorithms, and help humans and machines make sense of the physical and digital worlds. Alessandro holds a PhD in Cognitive Science from the University of Trento, Italy.

ResearchGate

Expert profile