Ist das ein Graph oder kann das weg? Funktionales Deep Learning in Haskell

Compiling to Categories

“As so often, I find that talks that I’m giving are basically explaining what Conal Elliot and Ed Kmett have done several years ago.”

Siemens Anomaly App

Neuronale Netze sind doch nur Funktionen

neuralNet wL bL ... w1 b1 = 
  layer wL bL 
  . ... 
  . layer w1 b1

layer weightMatrix biasVector =
  fmap activationFunction
  . vectorAddition biasVector
  . matrixVectorProduct weightMatrix

$⇒$ Komposition purer Funktionen

Deep learning community: hold my beer

”[…] layers (die im modernen maschinellen Lernen als zustandsbehaftete Funktionen mit impliziten Parametern verstanden werden sollten) werden typischerweise als Python-Klassen dargestellt, deren Konstruktoren ihre Parameter erzeugen und initialisieren […]”

Übersetzt aus:

Trainingsalgorithmus

Finde Parameter, für die das NN auf gegebenem Trainingsdatensatz möglichst gute Ergebnisse liefert
Güte durch skalarwertige Fehlerfunktion beurteilt
$⇒$ Löse Optimierungsproblem:

$\textrm{argmin}_{ω ∈ Ω} (\textrm{loss} ˆ \textrm{neuralNet} (ω; \textrm{data}))$

Gradient Descent

$w_n+1 = w_n - α ⋅ \frac {∂ f}{∂ w}$

Automatic Differentiation

$nn = l_L ˆ … ˆ l_1$

$⇒$ Ableitung: $Dnn = Dl_L ˆ … ˆ Dl_1$

Funktionskomposition ist assoziativ, erlaubt Auswertung in beliebiger Reihenfolge
Aufwand “vorwärts” abhängig von Eingangsdimension (hier: Anzahl der Parameter; heute bis $10¹¹$)
Aufwand “rückwärts” abhängig von Ausgangsdimension (hier: 1)

Reverse Automatic Differentiation

$Dnn(v) = Dl_L(l_L-1(…)) ˆ … ˆ Dl_1(v)$

Um $Dl_i$ zu berechnen, müssen wir die Ausgabe von $l_{i - 1}$ kennen

$⇒$ “Wengert-Liste”

Deep Learning Bibliotheken sind im Wesentlichen Werkzeuge zur Generierung und Verwaltung von Berechnungsgraphen.

TensorFlow Graphen

Implementierung in TensorFlow

class SimpleNN:

    def __init__(self, dimIn dimOut):
        self.dims = [dimIn, dimIn, dimIn, dimOut, dimOut]
        self.weights = [] 
        self.biases = []
        for i in range(4):
            self.weights.append(
                tf.Variable(tf.random.normal(shape=(self.dims[i+1],self.dims[i])))
                )
            self.biases.append(
                tf.Variable(tf.random.normal(shape=(self.dims[i+1],1)))
                )
    
    def __call__(self, x):
        inputs = tf.convert_to_tensor([x], dtype=tf.float32)
        out = tf.matmul(self.weights[0], 
                        inputs, transpose_b=True) + self.biases[0]
        out = tf.tanh(out)
        out = tf.matmul(self.weights[1], out) + self.biases[1]
        out = tf.nn.relu(out)
        out = tf.matmul(self.weights[2], out) + self.biases[2]
        out = tf.tanh(out)
        
        return tf.matmul(self.weights[3], out) + self.biases[3]

Neuronale Netze mit ConCat

simpleNN ::
  ( KnownNat m,
    KnownNat n,
    Functor f,
    Foldable f, 
    Floating num,
    ...
  ) =>
  SimpleNNParameters f m n num -> 
  f m num -> 
  f n num
simpleNN = 
  affine 
  @. affTanh 
  @. affRelu 
  @. affTanh

(@.) :: 
  (q s -> b -> c) -> 
  (p s -> a -> b) -> 
  ((q :*: p) s -> a -> c)
(g @. f) (q :*: p) = g q . f p

type p --* q = q :.: p

type Bump h = h :*: Par1

bump :: Num s => a s -> Bump a s
bump a = a :*: Par1 1

type a --+ b = Bump a --* b

type SimpleNNParameters (f :: Nat -> * -> *) m n =
  ( (f n --+ f n)
      :*: (f m --+ f n)
      :*: (f m --+ f m)
      :*: (f m --+ f m)
  )

(<.>) :: (Foldable a, Zip a, Additive s, Num s) 
      => a s -> a s -> s
xs <.> ys = sumA (zipWith (*) xs ys)

linear :: (Foldable a, Zip a, Functor b, Additive s, Num s)
       => (a --* b) s -> (a s -> b s)
linear (Comp1 ba) a = (<.> a) <$> ba

affine :: (Foldable a, Zip a, Functor b, Additive s, Num s)
       => (a --+ b) s -> (a s -> b s)
affine m = linear m . bump

affRelu :: 
  ( Foldable a, 
    Zip a, 
    Functor b, 
    Ord s, 
    Additive s, 
    Num s
  ) => 
  (a --+ b) s -> (a s -> b s)
affRelu l = relus . affine l

ConCat Funktionsweise

Nutzt Isomorphie zwischen Lambda-Kalkülen und kartesisch abgeschlossenen Kategorien (CCC) [*]
Übersetzt Haskell-Core in kategorielle Sprache
Ausdrücke in kategorieller Sprache können in beliebigen CCCs interpretiert werden
Abstrahiert dadurch Haskells Funktionspfeil {{{inline-hs((->))}}}

Beispiel einer ConCat-Transformation

magSqr :: Num a => (a, a) -> a
magSqr (a, b) = sqr a + sqr b

$⇒$ ConCat:

$magSqr = addC ˆ (mulC ˆ (exl \triangle exl) \triangle mulC ˆ (exr \triangle exr))$

In Kategorie der Graphen – src_haskell[:exports code]{(a, a) `Graph` a}:

Generalized Derivatives

Idee: Ergänze Funktion um ihre Ableitung

$a \mapsto f(a) ⇒ a \mapsto (f(a), f’(a))$

Kategorie der generalisierten Ableitungen:

newtype GD k a b = D {unD :: a -> b :* (a `k` b)}

Komposition für Generalized Derivatives

$a \mapsto (f(a), f’(a))$

instance Category k => Category (GD k) where 
  ...
  D g . D f = 
    D (\ a -> 
          let (b, f') = f a
              (c, g') = g b
           in (c, g' . f')
      )

Kettenregel: $(g ˆ f)’(x) = g’(f(x)) ˆ f’(x)$

Multiplikation für Generalized Derivatives

$a \mapsto (f(a), f’(a))$

instance (LinearCat k s, Additive s, Num s) => NumCat (GD k) s where  
  ...
  mulC    = D (mulC &&& \ (u,v) -> scale v |||| scale u)

Produktregel: $(f(x) ⋅ g(x))’ = f’(x) ⋅ g(x) + f(x) ⋅ g’(x)$

Forward Automatic Differentiation

Duale Kategorien

Im Dual einer Kategorie drehen sich alle Pfeile um

$a → b ⇒ b → a$

In Haskell:

newtype Dual k a b = Dual (b `k` a)

Beispiele Dualer Morphismen

instance Category k => Category (Dual k) where
  ...
  -- flip :: (a -> b -> c) -> b -> a -> c
  (.) = inAbst2 (flip (.))

instance CoproductPCat k => ProductCat (Dual k) where
  ...
  -- exl :: (a, b) -> a; inlP :: a -> (a, b)
  exl = abst inlP

Reverse Automatic Differentiation

Graphenfreie Gradienten

type RAD = GD (Dual (-+>))

grad :: Num s => (a -> s) -> (a -> a)
grad = friemelOutGrad . toCcc @RAD

nnGrad :: parameters -> parameters
nnGrad = grad (loss . nn)

Beschleunigtes Deep Learning in Haskell

“Data.Array.Accelerate defines an embedded array language for computations for high-performance computing in Haskell. […] These computations may then be online compiled and executed on a range of architectures.”

Kategorie der Accelerate-Funktionen:

newtype AccFun a b where
  AccFun :: (AccValue a -> AccValue b) -> AccFun a b

ConCelerate: ConCat + Accelerate

simpleNN :: (SimpleNNConstraints f m n num) => SimpleNN f m n num
simpleNN = affine @. affTanh @. affRelu @. affTanh

simpleNNGrad ::
  (KnownNat m, KnownNat n) =>
  (Vector m Double, Vector n Double) ->
  SimpleNNParameters m n Double ->
  SimpleNNParameters m n Double
simpleNNGrad = errGrad simpleNN

simpleNNGradAccFun ::
  (KnownNat m, KnownNat n) =>
  (Vector m Double, Vector n Double) ->
  SimpleNNParameters m n Double `AccFun` SimpleNNParameters m n Double
simpleNNGradAccFun pair = toCcc (simpleNNGrad pair)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dlih.org

dlih.org

Ist das ein Graph oder kann das weg? Funktionales Deep Learning in Haskell

Compiling to Categories

Siemens Anomaly App

Neuronale Netze sind doch nur Funktionen

Deep learning community: hold my beer

Trainingsalgorithmus

Gradient Descent

Automatic Differentiation

Reverse Automatic Differentiation

TensorFlow Graphen

Implementierung in TensorFlow

Neuronale Netze mit ConCat

ConCat Funktionsweise

Beispiel einer ConCat-Transformation

Generalized Derivatives

Komposition für Generalized Derivatives

Multiplikation für Generalized Derivatives

Forward Automatic Differentiation

Duale Kategorien

Beispiele Dualer Morphismen

Reverse Automatic Differentiation

Graphenfreie Gradienten

Beschleunigtes Deep Learning in Haskell

ConCelerate: ConCat + Accelerate

Vielen Dank!

Files

dlih.org

Latest commit

History

dlih.org

File metadata and controls

Ist das ein Graph oder kann das weg? Funktionales Deep Learning in Haskell

Compiling to Categories

Siemens Anomaly App

Neuronale Netze sind doch nur Funktionen

Deep learning community: hold my beer

Trainingsalgorithmus

Gradient Descent

Automatic Differentiation

Reverse Automatic Differentiation

TensorFlow Graphen

Implementierung in TensorFlow

Neuronale Netze mit ConCat

ConCat Funktionsweise

Beispiel einer ConCat-Transformation

Generalized Derivatives

Komposition für Generalized Derivatives

Multiplikation für Generalized Derivatives

Forward Automatic Differentiation

Duale Kategorien

Beispiele Dualer Morphismen

Reverse Automatic Differentiation

Graphenfreie Gradienten

Beschleunigtes Deep Learning in Haskell

ConCelerate: ConCat + Accelerate

Vielen Dank!