Reestablishing the Boundary Between Man and Machine 

Image by Google Deepmind

In the past couple of years, the terms “machine learning,” “artificial intelligence,” and “neural network” have become increasingly popular buzzwords. Everyone’s talking about how “artificial intelligence will replace humans,” and that “artificial general intelligence is on the horizon.” Except, in what ways do these complex, mathematical models actually mirror the brain? To what extent do they behave like a brain, and is it just imitation or actual emulation? 

The name neural network (NN) is strikingly similar to biological terms like nerves or neurons. Unsurprisingly, this is no coincidence; many of the emerging machine learning models base their architecture upon the brain. The interconnectivity of neurons, the flow of information—machine learning uses biological systems as its foundation. Except, to what degree are neural networks infra-structurally similar to the biological brain, and where does the mimicry stop? 

At the core of sentient life lies the brain, conveying information through electrical signals. Between neurons, action potentials cascade along an axon, before reaching the synapse, signaling the release of neurotransmitters into a synaptic cleft formed by a synapse and dendrite when the firing threshold is reached. The flow of information depends not only on signals moving throughout the brain, but also on how constituent parts respond and how they then interact. Additionally, depending on the task performed or information conveyed, signals traverse different paths; synaptic strength and efficacy determine the ease of communication between neurons. While this provides a general overview of biological neural architecture, particularly in the context of machine learning, it is sufficient to begin understanding where the two systems intersect. Although there are similarities between the human brain and NNs, AI is far simpler and less complex than the structure of the human brain. 

An artificial NN, in its simplest form, is a series of functions. A layer takes inputs, mathematically shapes them, and then produces an output. An example of an “NN” could be performing linear regression. In linear regression, you are to fit a line as close to some given points as possible. For an NN, the input, a point’s x position in the xy plane, needs to be related to some output y, via a function: the equation y=wx+b (1) where w,b are some constants. We call these constants the weight and bias, respectively. A NN has a node with this equation that adjusts these two constants, often referred to as parameters, to better fit the points in space. In order to adjust to these datapoints and get a better fit, the model computes the distance from the intended output—imagine we want a line to fit a point (3,6), and the model instead moves through (3,4), we want to reshape somehow so that the line moves two units higher, and then adjusts accordingly. There’s a bit of complicated math (gradient descent—the idea that if we’re at a point in a function, how can we move to reduce the value, our distance from the intended output in this case, as much as possible) that allows us to optimize values of w and b as much as possible. A layer has many of these nodes, each computing a weighted sum of the inputs (x) plus a bias. Additionally, a layer of nonlinear functions, called activation functions, takes the outputs of the previous layer to allow the model to fit to data that isn’t linearly associated. For example, lines tracing a parabolic trajectory can be associated with a curve ax2+bx+c, but a line of just the form wx+b can’t, so we give it that extra bit of potential. For different processes too, functions other than (1) are used, such as a convolution (fg)(t) for image processing, as it can up in nuances in spatial features not captured by standard operators, or recurrent layers for language, able to evaluate information at different pieces of time, conscious of how later phrases in a passage can allude to ones that came far earlier. Though these are not synonymous with biological systems, and more of a cool afterthought that builds on existing architecture. 

When looking at the NNs, there are obvious connections to the structural elements of the brain. The inputs to individual neurons are like dendrites receiving signals, shaping them, and passing them on to later neurons. The formula (1) resembles synaptic strength, modeling how strong connections between points are—in fact, the reason why w is denoted as weight is that we consider it the weight of the connection between two neurons, or how much information is shaped during its transition. Activation functions model firing thresholds, either rapidly increasing, decreasing, or flattening different signals depending on the neuron, albeit more abstractly. The outputs are much like the axon carrying information to the next neuron, data flowing from one place to the next. Additionally, the continual adjustment of parameters like weight and bias models the actual learning of the brain and neuroplasticity, strengthening or weakening connections with experience (data). Furthermore, in many models, the layers of neurons tend to be structured hierarchically, like in vision models that start by identifying edges, then shapes, then objects, much like the brain does in cortical processing. 

Except, while NNs set out to model the brain, they only mimic the complexity rather than match it. Even in the most complex models, there are only billions of adjustable parameters, orders of magnitude fewer than the estimated 100 trillion in the brain. Neural networks are mathematically optimal—with the math we have access to currently—but they’re also slaves to it, only able to shape and change as their internal numbers do, rather than flexibly like a living being. The electrochemical systems and neurotransmitters of the brain behave far more dynamically than NNs, fluctuating naturally throughout a period of time, rather than relying on a lengthy mathematical calculation to correct. Additionally, regardless of a model input, each neuron mathematically responds to it, whereas the human brain can prevent a signal’s passage through a certain part of the brain. When you pass an input to a layer, we haven’t devised complex enough architecture to completely shut off sections during the actual deployed function, and the information flows through the entire model, even the portions specialized to subtasks irrelevant to the one at hand. Moreover, NNs are trained on datasets, rather than adapting to real-world scenarios on the fly. We can teach neural networks to complete simple tasks, but we need immense volumes of data for it to learn—just look up the ARC prize and how simple the tasks are, yet how much AI struggles to solve them, relying on thousands and thousands of training steps to complete elementary puzzles. The reality is that NNs are still far off from the complexity of the human brain, unable to gain mastery over many subjects like living beings, and instead specialize in one category. Some attempts have been made to better understand how we can apply understanding of the brain to refine machine learning, but they’ve been slow. Recently, a paper on the “Baby Dragon Hatchling” was published, mimicking the brain through spiking neural networks that incorporate neuronal and synaptic state on top of baking in a concept of time with “Hebbian-like” (neurons that fire together wire together) functions. And while it initially generated a bit of buzz, people lost interest when they realized that it was just another model claiming to mimic the brain but fell quite short. Regardless, it’s incredible to know that people are trying, as the Baby Dragon Hatchling does perform better than its “brain-like” predecessors. Perhaps in the beauty of the brain lies the key to advancing AI. 

Previous
Previous

How Advil and Tylenol Supercharge Antibiotic Resistance

Next
Next

How Stress Impacts Mouse Memories (and How You Should Be Studying)