A Low-Energy Implementation of Finite Automata by Optimal-Size Neural Nets

(1)

Automata by Optimal-Size Neural Nets

Jiˇr´ı ˇS´ıma^?

Institute of Computer Science, Academy of Sciences of the Czech Republic, P. O. Box 5, 18207 Prague 8, Czech Republic,sima@cs.cas.cz

Abstract. Recently, a new so-called energy complexity measure has been introduced and studied for feedforward perceptron networks. This measure is inspired by the fact that biological neurons require more energy to transmit a spike than not to fire and the activity of neurons in the brain is quite sparse, with only about 1% of neurons firing. We investigate the energy complexity for recurrent networks which bounds the number of active neurons at any time instant of a computation. We prove that any deterministic finite automaton withmstates can be simulated by a neural network of optimal sizes=Θ(√

m) with time overhead O(s/e) per one input bit, using the energyO(e), for anye=Ω(logs) and e=O(s), which shows the time-energy tradeoff in recurrent networks.

1 Introduction

In biological neural networks the energy cost of a firing neuron is relatively high while energy supplied to the brain is limited and hence the activity of neurons in the brain is quite sparse, with only about 1% of neurons firing [4]. This is in con- trast to artificial neural networks in which on average every second unit fires dur- ing a computation. This fact has recently motivated the definition of a new complexity measure forfeedforwardperceptron networks (threshold circuits), the so- called energy complexity [11] which is the maximum number of units in the network which output 1, taken over all the inputs to the circuit. The energy has been shown to be closely related by tradeoff results to other complexity measures such as the network size (i.e., the number of neurons) [13, 15], the circuit depth (i.e., parallel computational time) [12, 13], and the fan-in (i.e., the maximum number of inputs to a single unit) [10] etc. In addition, energy complexity has found its use in circuit complexity, e.g. as a tool for proving the lower bounds [14] etc.

In this paper, we investigate for the first time energy complexity forrecurrent neural networks which we define to be the maximum number of neurons out- putting 1 at any time instant, taken over all possible computations. It has been known for a long time that the computational power of binary-state recurrent networks corresponds to that of finite automata since the network of sizesunits can reach only a finite number (at most 2^s) different states [8]. A simple way of simulating a given deterministic finite automaton A withm states by a neural

?Research was supported by the projects GA ˇCR P202/10/1333 and RVO: 67985807.

(2)

networkN of sizeO(m) is to implement each of the 2mtransitions ofA(having 0 and 1 transitions for each state) by a single unit inN which checks whether the input bit agrees the respective type of transition [6]. Clearly, this simple linear-size implementation of finite automata requires only a constant energy.

Much effort was given to reducing the size of neural automata (e.g. [1–3, 9]), and indeed, neural networks of sizeΘ(√

m) implementing a given deterministic finite automaton withmstates were proposed and proven to be size-optimal [2, 3]. A natural question arises: what is the energy consumption when simulating finite automata by optimal-size neural networks? We answer this question by proving the tradeoff between the energy and the time overhead of the simulation.

In particular, we prove that an optimal-size neural network of s = Θ(√ m) units can be constructed to simulate a deterministic finite automaton with m states using the energyO(e) for anye=Ω(logs) and e=O(s), while the time overhead for processing one input bit isO(s/e). For this purpose, we adapt the asymptotically optimal method of threshold circuit synthesis due to Lupanov [5].

This paper is organized as follows. In Section 2, the main result is formulated after a brief review of the basic definitions. The subsequent two sections are devoted to the technical proof: Section 3 deals with a decomposition of the transition function and Section 4 describes the construction of low-energy neural automata. Section 5 concludes with some remarks on lower bounds on the energy complexity of neural network automata.

2 Neural Networks as Finite Automata

We will first specify the model of a recurrent neural network N. The network consists of s units (neurons), indexed asV ={1, . . . , s}, where s is called the network size. The units are connected into an oriented graph representing the architectureofN, in which each edge (i, j) leading from unititojis labeled with an integer weight w(i, j). The absence of a connection within the architecture corresponds to a zero weight between the respective neurons, and vice versa.

Thecomputational dynamics ofN determines for each unitj∈V its binary state (output) y^(t)_j ∈ {0,1} at discrete time instants t= 0,1,2, . . .. We say that neuronj isactive (fires)at timetify_j^(t)= 1, whilejispassivefory^(t)_j = 0. This establishes the network state y^(t) = (y^(t)₁ , . . . , y^(t)s ) ∈ {0,1}^s at each discrete time instantt≥0. At the beginning of a computation,N is placed in aninitial state y⁽⁰⁾. At discrete time instant t ≥ 0, an excitation of any neuron j ∈ V is defined as ξ^(t)_j =Ps

i=1w(i, j)y_i^(t)−h(j) including an integer threshold h(j) local to unitj. At the next instantt+ 1, the neuronsj ∈α_t+1 from a selected subsetαt+1⊆V update their statesy_j^(t+1)=H(ξ_j^(t)) in parallel by applying the Heaviside function H(ξ) which is defined to be 1 forξ≥0 and 0 for ξ <0. The remaining unitsj∈V\αt+1do not change their outputs, that isy_j^(t+1)=y_j^(t)for j6∈αt+1. In this way, the new network statey^(t+1) at timet+ 1 is determined.

We define theenergy complexityofN to be the maximum number of active units Ps

j=1y_j^(t)at any time instantt≥0, taken over all computations of N.

(3)

The computational power of recurrent neural networks has been studied analogously to the traditional models of computations so that the networks are ex- ploited as acceptors of formal languages L⊆ {0,1}^∗ over the binary alphabet.

For the finite networks that are to recognize regular languages, the following input/output protocol has been used [1–3, 7–9]. A binary input word (string) x=x₁. . . x_n ∈ {0,1}ⁿ of arbitrary lengthn≥0 is sequentially presented to the network bit by bit via an input neuron in∈V. The state of this unit is exter- nally set (and clamped) to the respective input bits at prescribed time instants, regardless of any influence from the remaining neurons in the network, that is, y_in^(τ(i−1))=x_i fori= 1, . . . , nwhere an integer parameterτ≥1 is theperiod or time overhead for processing a single input bit. Then, anoutput neuronout∈V signals at time τ n whether the input word belongs to underlying language L, that is,y_out^{(τ n)}= 1 forx∈L, whereasy^{(τ n)}_out = 0 for x6∈L.

Now, we can formulate our main result concerning a low-energy implementation of finite automata by optimal-size neural nets:

Theorem 1. A given deterministic finite automaton A with m states can be simulated by a neural network N of optimal sizes=Θ(√

m) neurons with time overheadO(s/e)per one input bit, using the energyO(e), whereeis any function satisfying e=Ω(logs) ande=O(s).

Proof. A setQofmstates of a given deterministic finite automatonAcan be arbitrarily enumerated so that eachq∈Qis binary encoded usingp=dlogme+ 1 bits including one additional bit which indicates the final states. Then, the respective transition functionδ :Q× {0,1} −→Q ofA, producing its new state qnew=δ(qold, x)∈Qfrom the old stateqold∈Qand input bitx∈ {0,1}, can be viewed as a vector Boolean function f :{0,1}^p+1−→ {0,1}^p in terms of binary encoding of states. In the following two sections we will adapt the asymptotically optimal method of threshold circuit synthesis due to Lupanov [5] to implement f by a low-energy recurrent neural network.

3 The Transition Function Decomposition

The p+ 1 arguments of vector function f(u,v,z) are split into three groups u = (u₁, . . . , u_p₁),v = (v₁, . . . , v_p₂), and z= (z₁, . . . , z_p₃), respectively, where p₁=b(p+ 1−logp−log(p+ 1−logp))/2c, p₃ =blog(p+ 1−logp)−2c, and p2 = p+ 1−p3 −p1. Then, each function element fk : {0,1}^p+1 −→ {0,1}

(1≤k≤p) of vector functionf = (f1, . . . , fp) is decomposed to f_k(u,v,z) = _

c∈{0,1}^p3



f_k(u,v,c)∧

p₃

^

j=1

`_c_j(z_j)



 , (1) where the respective literals are defined as `c(z) =z for c= 1 and `c(z) =¬z forc = 0. Furthermore, we define vector functions gk :{0,1}^p¹^+p² −→ {0,1}^p¹ fork= 1, . . . , pas

gk(u,v) = (fk(u,v,[0]^p³), fk(u,v,[1]^p³), . . . , fk(u,v,[2^p³−1]^p³),0, . . . ,0) (2)

(4)

where [j]ⁿ =c= (c1, . . . , cn)∈ {0,1}ⁿ denotes ann-bit binary representation of integer j≥0, that is,j =hci=Pn

i=12ⁱ⁻¹ci. The vector produced bygk in (2) has p1 elements out of which the first 2^p³ items are defined usingfk for all possible values of argumentz∈ {0,1}^p³, while the remaining ones are 0s, which is a correct definition since 2^p³ < p1 for sufficiently largep.

Denote r=p₁−1. For eachg_k (1 ≤k≤ p), we will construct four vector functions g^a_k : {0,1}^r+p² −→ {0,1}^p¹ and h^a_k : {0,1}^r+p² −→ {0,1}^p¹ for a ∈ {0,1} such that

g_k(a,u⁰,v) =g^a_k(u⁰,v)⊕h^a_k(u⁰,v) (3) for any a ∈ {0,1}, u⁰ ∈ {0,1}^r, and v ∈ {0,1}^p², where ⊕ denotes a bitwise parity (i.e.,z=x⊕y∈ {0,1}ⁿ is defined for vectorsx= (x1, . . . , xn)∈ {0,1}ⁿ, y= (y1, . . . , yn)∈ {0,1}ⁿ, andz= (z1, . . . , zn)∈ {0,1}ⁿaszi= 1 iffxi6=yifor everyi= 1, . . . , n) which is an associative operation. In addition, the construction will guarantee that for anya∈ {0,1},v∈ {0,1}^p², andu⁰₁,u⁰₂∈ {0,1}^r,

ifu⁰₁6=u⁰₂, theng_kâ(u⁰₁,v)6=gâ_k(u⁰₂,v) andhâ_k(u⁰₁,v)6=hâ_k(u⁰₂,v). (4) For any v ∈ {0,1}^p², the function values of gâ_k are defined inductively as gâ_k([i]^r,v)∈ {0,1}^p¹\Gâ_k(i,v) is chosen arbitrarily fori= 0, . . . ,2^r−1 where

G^a_k(i,v) ={g^a_k([j]^r,v),

gk(a,[i]^r,v)⊕gk(a,[j]^r,v)⊕g^a_k([j]^r,v)|j = 0, . . . , i−1}, (5) and functions h^a_k are defined so that equation (3) is met:

Hence,|{0,1}^p¹\Gâ_k(i,v)| ≥2^p¹−2(2^r−1) = 2, which ensures that gâ_k(u⁰,v) is correctly defined for all arguments u⁰ ∈ {0,1}^r. Moreover, condition (4) is satisfied because for any i, j ∈ {0, . . . ,2^r−1} such that i > j, definition (5) secures gâ_k([i]^r,v) 6= gâ_k([j]^r,v) and hâ_k([i]^r,v) = gk(a,[i]^r,v)⊕gâ_k([i]^r,v) 6=

gk(a,[i]^r,v)⊕gk(a,[i]^r,v)⊕gk(a,[j]^r,v)⊕g_k^a([j]^r,v) =h^a_k([j]^r,v) by using (6) and the fact thatx⊕x⊕y=y.

We further decomposegâ_k andhâ_k by using the functionsϕâ_k :{0,1}^r+p² −→

{0, . . . ,2^p¹−1} andψ_k^a:{0,1}^r+p² −→ {0, . . . ,2^p¹−1}as

gâ_k(u⁰,v) = [ϕâ_k(u⁰,v)]^p¹ and hâ_k(u⁰,v) = [ψâ_k(u⁰,v)]^p¹ , (7) respectively, which satisfy for anya∈ {0,1},v∈ {0,1}^p², andu⁰₁,u⁰₂∈ {0,1}^r,

ifu⁰₁6=u⁰₂, thenϕâ_k(u⁰₁,v)6=ϕâ_k(u⁰₂,v) andψ_kâ(u⁰₁,v)6=ψ_kâ(u⁰₂,v) (8) according to (4). Now, we can plug (2), (3), and (7) into (1) which results in

f_k(a,u⁰,v,z) = _

c∈{0,1}^p3

(g_k(a,u⁰,v))_hci∧

p₃

^

i=1

`_c_i(z_i)

!

(5)

= _

c∈{0,1}^p3

[ϕ^a_k(u⁰,v)]^p¹

hci∧ ¬ [ψ^a_k(u⁰,v)]^p¹

hci∧

p3

^

i=1

`c_i(zi)

!

∨ ¬ [ϕ^a_k(u⁰,v)]^p¹

hci∧ [ψ_k^a(u⁰,v)]^p¹

hci∧

p₃

^

i=1

`c_i(zi)

!!

, (9)

where (x)i denotes theith element of vectorx.

4 The Finite Automaton Implementation

In this section, we will describe the construction of low-energy recurrent neural networkN simulating a given finite automatonA. In particular, set of neurons V is composed of four disjoint layersV =ν₀∪ν₁∪ν₂∪ν₃. A current state ofA and an input bit are stored usingp+ 1 neurons which constitute layerν₀. Thus, set ν₀ includes the input neuron in∈ν₀ and the output neuron out∈ν₀ which saves the bit (in the state encoding) that indicates the final states. We will implement formula (9) in N for evaluating the transition function f in terms of binary encoding of states in order to compute the new state of A. Layer ν0={in} ∪ν01∪ν02∪ν03 is disjointly split into four parts corresponding to the partition of arguments of f(a,u⁰,v,z), respectively, that is, ν01 ={u1, . . . , ur}, ν02={v1, . . . , vp₂}, andν03={z1, . . . , zp₃}.

The next layer ν1 = ν11∪ν12 consists of 2^p² neurons in ν11 = {µ_hbi|b ∈ {0,1}^p²} for computing all possible monomialsVp₂

i=1`b_i(vi) over input variables v, and two control units in ν12 ={κ⁰₀, κ¹₀} which indicate the input bit value.

This is implemented by weights w(v_i, µ_hbi) = 2b_i −1 for i = 1, . . . , p₂, and threshold h(µ_hbi) = Pp₂

i=1b_i, for anyb = (b₁, . . . , b_p₂) ∈ {0,1}^p² so that µ_hbi fires iffb=v. In addition, we definew(in, κ¹₀) =−w(in, κ⁰₀) = 1 andh(κ¹₀) = 1, h(κ⁰₀) = 0, which ensures thaty_in= 1 iffκ¹₀ fires iffκ⁰₀ is passive.

Furthermore, layerν2=ν21∪ν22 where ν21 ={γ_kj^ϕa, λ^ϕa_kj, γ_kj^ψa, λ^ψa_kj |1 ≤k≤ p , a ∈ {0,1}, j = 0, . . . ,2^p¹−1}, andν22 ={κ^a_i |a∈ {0,1}, i= 1, . . . , d+ 1}

with d=d2p2^p¹/ee, serves for a low-energy computation of functionsϕâ_k(u⁰,v) andψ_kâ(u⁰,v). We will first show how to implement functions ϕâ_k(u⁰,v) for any 1≤k≤panda∈ {0,1}with no constraints on energy by using the outputs of neurons from ν01 andν11. In particular, 2^p¹ pairs of neuronsγ^ϕa_kj, λ^ϕa_kj ∈ν21 for j= 0, . . . ,2^p¹−1 are employed having zero thresholds for now (their thresholds will be defined below for the low-energy implementation) and weightsw(ui, γ_kj^ϕa)

= −w(ui, λ^ϕa_kj) = 2ⁱ⁻¹ for i = 1, . . . , r and w(µ_hbi, λ^ϕa_kj) = −w(µ_hbi, γ_kj^ϕa) = d^ϕab_kj ∈ {0, . . . ,2^r−1} such thatj =ϕ^a_k([d^ϕab_kj ]^r,b)∈ {0, . . . ,2^p¹−1} for b∈ {0,1}^p². Note thatd^ϕab_kj is uniquely defined according to (8). It follows that for given u⁰ ∈ {0,1}^r and v ∈ {0,1}^p², neuron γ_kj^ϕa fires iff Pr

i=1w(ui, γ_kj^ϕa)yu_i

+P

b∈{0,1}^p2w(µ_hbi, γ_kj^ϕa)y_µ_hbi ≥0 iff Pr

i=12ⁱ⁻¹u⁰_i−d^ϕav_kj ≥0 iff hu⁰i ≥ d^ϕav_kj , sinceyµ_hbi = 1 iffb=v. Similarly, neuronλ^ϕa_kj is active iffhu⁰i ≤d^ϕav_kj . Hence, both neuronsγ^ϕa_kj andλ^ϕa_kj fire at the same time iffhu⁰i=d^ϕav_kj iffj =ϕâ_k(u⁰,v), which implements functionϕâ_k(u⁰,v). Functionsψâ_k(u⁰,v) for any 1≤k≤pand

(6)

a∈ {0,1}are implemented analogously (replace ϕbyψ above) using 2^p¹ pairs of neuronsγ_kj^ψa, λ^ψa_kj ∈ν₂₁forj= 0, . . . ,2^p¹−1, that is, both unitsγ_kj^ψaandλ^ψa_kj are active iffj=ψ_k^a(u⁰,v).

We employ control unitsκâ_i ∈ν12∪ν22 for a∈ {0,1} and i= 0, . . . , d+ 1, for synchronizing the computation of functionsϕâ_k(u⁰,v), ψ_kâ(u⁰,v) by neurons fromν₂₁so that their energy consumption is bounded bye+ 2. For this purpose, we split set ν₂₁ =ν₂₁⁰ ∪ν₂₁¹ into two parts ν₂₁â = {γ_kj^ϕa, λ^ϕa_kj, γ^ψa_kj, λ^ψa_kj |1 ≤k ≤ p , j = 0, . . . ,2^p¹−1}of size 4p2^p¹ according toa∈ {0,1}, and each such part is further partitioned into d blocks of size at most 2e, that is ν₂₁â = Sd

i=1β_iâ where |βâ_i| ≤ 2e. In addition, we require for every i = 1, . . . , d, if γ_kj^ϕa ∈ β_iâ, then λ^ϕa_kj ∈ β_iâ, and if γ_kj^ψa ∈ β_iâ, then λ^ψa_kj ∈ β_iâ. For any 1 ≤ i ≤ d and a∈ {0,1}, the neurons in blockβâ_i are activated by control unitκâ_i−1 using the weights w(κâ_i−1, j) =W for allj∈β_iâ, while all neurons j∈ν21 are blocked by thresholdsh(j) =W whereW = 2^r if there is no support from a corresponding control unit. For current input bity_in=a∈ {0,1}, the control unitsκâ₀, . . . , κâ_d+1 fire successively one by one, which is achieved by weights w(κâ_i, κâ_i+1) = 1 for i= 0, . . . , d,w(κâ_i, κâ₀) =−1 fori = 0, . . . , d+ 1, and thresholdsh(κâ_i) = 1 for i= 1, . . . , d+ 1. This ensures that only the neurons from one blockβ_iâ of size at most 2ecan fire at the same time. In fact, we know that just one unit of each pairγ_kj^ϕa, λ^ϕa_kj ∈β_iâ orγ_kj^ψa, λ^ψa_kj ∈β_iâ is active except for the special pairs of both firing units γ^ϕa_kj

ϕ, λ^ϕa_kj

ϕ and γ_kj^ψa

ψ, λ^ψa_kj

ψ such that ϕ^a_k(u⁰,v) =jϕ andψ_k^a(u⁰,v) = j_ψ, respectively. Hence, the energy consumption of ν₂₁ is bounded by e+ 2.

Finally, we must also guarantee that the resulting function valuesϕ^a_k(u⁰,v) =j_ϕ, ψ^a_k(u⁰,v) = jψ are stored, that is, neurons γ_kj^ϕa

ϕ, λ^ϕa_kj

ϕ, γ_kj^ψa

ψ, λ^ψa_kj

ψ remain active without any support from corresponding control units until all blocks perform computation which is indicated by control unitκâ_d+1. Neuronκâ_d+1then resets all neurons inν₂₁before becoming itself passive. This is implemented by symmetric weights w(γ_kj^ϕa, λ^ϕa_kj) = w(λ^ϕa_kj, γ_kj^ϕa) = w(γ_kj^ψa, λ^ψa_kj) = w(λ^ψa_kj, γ_kj^ψa) = W for a∈ {0,1},k= 1, . . . , p,j= 0, . . . ,2^p¹−1, andw(κâ_d+1, j) =−W for allj ∈ν₂₁. Finally, layerν₃={π_khci, %_khci|1≤k≤p ,c∈ {0,1}^p³}is composed of 2^p³ pairs of neuronsπ_khci, %_khcifor eachk= 1, . . . , pwhich compute ([ϕâ_k(u⁰,v)]^p¹)_hci

∧¬([ψ^a_k(u⁰,v)]^p¹)_hci∧Vp3

i=1`_c_i(z_i) and ¬([ϕ^a_k(u⁰,v)]^p¹)_hci∧([ψ_k^a(u⁰,v)]^p¹)_hci∧ Vp3

i=1`_c_i(z_i) from (9), respectively, for current input y_in =a ∈ {0,1} by using the states of neurons fromν03and the outputs of unitsγ^ϕa_kj, λ^ϕa_kj, γ_kj^ψa, λ^ψa_kj ∈ν21

for j = 0, . . . ,2^p¹ −1 after κ^a_d+1 fires. For c ∈ {0,1}^p³, we define weights w(γ_kj^ϕa, π_khci) = w(λ^ϕa_kj, π_khci) = −w(γ^ψa_kj, π_khci) = −w(λ^ψa_kj, π_khci) =

−w(γ_kj^ϕa, %_khci) = −w(λ^ϕa_kj, %_khci) = w(γ_kj^ψa, %_khci) = w(λ^ψa_kj, %_khci) = ([j]^p¹)_hci for a ∈ {0,1}, j = 0, . . . ,2^p¹ −1, and w(z_i, π_khci) = w(z_i, %_khci) = 2c_i−1 for i = 1, . . . , p₃, and thresholdh(π_khci) = h(%_khci) = 1 +Pp3

i=1c_i. Hence, neuron π_khci is active iff ([ϕ^a_k(u⁰,v)]^p¹)_hci = 1 and ([ψ^a_k(u⁰,v)]^p¹)_hci = 0 for y_in = a, and y_z_i = c_i for i = 1, . . . , p₃, since only one pair of neurons γ_kj^ϕa

ϕ, λ^ϕa_kj

ϕ for 0 ≤ j_ϕ ≤ 2^p¹ −1 fires such that j_ϕ = ϕ^a_k(u⁰,v) and only one pair of units γ_kj^ψa

ψ, λ^ψa_kj

ψ for 0 ≤ jψ ≤ 2^p¹ −1 is active such that jψ = ψ_k^a(u⁰,v), while the remaining units in v21 are passive after κ^a_d+1 fires. Analogously, neuron %_khci

(7)

fires iff ([ϕ^a_k(u⁰,v)]^p¹)_hci= 0 and ([ψ_k^a(u⁰,v)]^p¹)_hci= 1 foryin=a, and yz_i =ci

fori= 1, . . . , p3.

It follows that for any 1 ≤ k ≤ p, at most one unit among π_khci, %_khci ∈ ν3 for c ∈ {0,1}^p³ is active, which determines the value of fk(a,u⁰,v,z) for yin = a according (9). Thus, a binary encoding f(a,u⁰,v,z) of the new state of automaton Ais computed as disjunctions (9) for k= 1, . . . , pby units from ν₀\{in}(which rewrite the old state ofA) using the recurrent connections leading from neurons of ν₃. After re-indexing the units in layer ν₀\ {in} = {1, . . . , p}

properly, for eachk = 1, . . . , p, we define weights w(π_khci, k) =w(%_khci, k) = 1 for everyc∈ {0,1}^p³, and thresholdh(k) = 1.

Now we specify the computational dynamics of neural networkN simulating the finite automatonA. At the beginning, the states of neurons from ν0\ {in}

are placed in an initial state of A. Each bit xi (1 ≤ i ≤ n) of input word x=x1, . . . , xn, which is read by input neuron in∈ν0 at time instantτ(i−1) (i.e. y_in^(τ(i−1)) =xi), is being processed byN within the desired period ofτ = d+ 4 =O(p2^p¹/e) =O(√

2^p/e) =O(√

m/e) time steps. The states of neurons in N are successively updated in the order following the architecture of layers.

Thus, we define setsαtof units updated at time instantst≥1 asα_τ(i−1)+1=ν1, α_{τ(i−1)+j+1} = ν12∪ν2 for j = 1, . . . , d+ 1, α_{τ(i−1)+d+3} = ν12∪ν2∪ν3, and α_{τ i}=ν₀\ {in}, fori= 1, . . . , n. Eventually, the output neuron out∈ν₀ signals at time instantτ nwhether input wordxbelongs to underlying languageL, that is,y^{(τ n)}_out = 1 iffx∈L.

The size ofN simulating the finite automaton A with m states can be ex- pressed ass=|ν₀|+|ν₁|+|ν₂|+|ν₃|=p+ 1 + 2^p²+ 2 + 8p2^p¹+ 2(d+ 1) +p2^p³= O(√

2^p) =O(√

m) in terms ofm, which matches the known lower bound [2, 3].

Finally, the energy consumption can be bounded for particular layers as follows.

Layerν0can possibly require allp+1 units to fire for storing the binary encoding of a current automaton state. Moreover, there is only one active unit among neurons inν11which serve for evaluating all possible monomials over input variables v, and also only one control unit fromν12∪ν22fires at one time instant. In addition, we know that the energy consumption byν21is at moste+ 2, and at most pneurons amongπ_khci, %_khcifromν3fire (one for eachk= 1, . . . , p). Altogether, the global energy consumption ofNis bounded bye+2p+5 =O(e+logs) =O(e) ase=Ω(logs) is assumed. This completes the proof of the theorem. ut

5 Conclusions

We have, for the first time, applied the energy complexity measure to recurrent neural nets. This measure has recently been introduced and studied for feedforward perceptron networks. The binary-state recurrent neural networks recognize exactly the regular languages so we have investigated their energy consumption of simulating the finite automata with the asymptotically optimal number of neurons. We have presented a low-energy implementation of finite automata by optimal-size neural nets with the tradeoff between the time overhead for processing one input bit and the energy varying from the logarithm

(8)

to the full network size. In the full paper, we will also present lower bounds on the energy complexity of neural network automata. In particular, for time overheadτ =O(1), the energy satisfiese≥s^εfor some real constantεsuch that 0 < ε <1, and for infinitely many s, while for τ =O(log^εs), we have shown thate=Ω∞(s^{log log}^s/^log^η^s) for anyη > ε. An open problem remains for further research whether these bounds can be improved.

References

1. Alon, N., Dewdney, A.K., Ott, T.J.: Efficient simulation of finite automata by neural nets. Journal of the ACM 14(2), 495–514 (1991)

2. Horne, B.G., Hush, D.R.: Bounds on the complexity of recurrent neural network implementations of finite state machines. Neural Networks 9(2), 243–252 (1996) 3. Indyk, P.: Optimal simulation of automata by neural nets. In: Mayr, E.W.,

Puech, C. (eds.) Proceedings of the STACS 1995 Twelfth Annual Symposium on Theoretical Aspects of Computer Science. LNCS, vol. 900, pp. 337–348 (1995) 4. Lennie, P.: The cost of cortical computation. Current Biology 13(6),493–497 (2003) 5. Lupanov, O.: On the synthesis of threshold circuits. Problemy Kibernetiki 26, 109–

140 (1973)

6. Minsky, M.: Computations: Finite and Infinite Machines. Prentice-Hall, Englewood Cliffs (1967)

7. Siegelmann, H.T., Sontag, E.D.: Computational power of neural networks. Journal of Computer System Science 50(1), 132–150 (1995)

8. ˇS´ıma, J., Orponen, P.: General-purpose computation with neural networks: A sur- vey of complexity theoretic results. Neural Computation 15(12), 2727–2778 (2003) 9. ˇS´ıma, J., Wiedermann, J.: Theory of neuromata. Journal of the ACM 45(1), 155–

178 (1998)

10. Suzuki, A., Uchizawa, K., Zhou, X.: Energy and fan-in of threshold circuits computing Mod functions. In: Ogihara, M., Tarui, J. (eds.) Proceedings of the TAMC 2011 Eight Annual Conference on Theory and Applications of Models of Compu- tation. LNCS, vol. 6648, pp. 154–163 (2011)

11. Uchizawa, K., Douglas, R., Maass, W.: On the computational power of threshold circuits with sparse activity. Neural Computation 18(12), 2994–3008 (2006) 12. Uchizawa, K., Nishizeki, T., Takimoto E.: Energy and depth of threshold circuits.

Theoretical Computer Science 411(44-46), 3938–3946 (2010)

13. Uchizawa, K., Takimoto, E.: Exponential lower bounds on the size of constant- depth threshold circuits with small energy complexity. Theoretical Computer Sci- ence 407(1-3), 474–487 (2008)

14. Uchizawa, K., Takimoto, E.: Lower bounds for linear decision trees via an energy complexity argument. In: Murlak, F., Sankowski, P. (eds.): Proceedings of the MFCS 2011 Thirty-Sixth International Symposium on Mathematical Foundations of Computer Science. LNCS, vol. 6907, pp. 568-579 (2011)

15. Uchizawa, K., Takimoto, E., Nishizeki, T.: Size-energy tradeoffs for unate circuits computing symmetric Boolean functions. Theoretical Computer Science 412(8-10), 773–782 (2011)