An Asynchronous Dataflow Implementation – Part 2

An excerpt from the upcoming book, “Dataflow and Reactive Programming Systems”

Asynchronous Dataflow Architecture

Main Data Types

Besides the Fire Pattern data type below, all of these can be implemented as classes in an Object Oriented languages, a struct in C or the equivalent in your language of choice. The itemized elements under each data type are the members of the type.

Port Address
Defines a unique location (combination of node ID and port ID) of a port within this engine

  • Node Id – Unique to engine
  • Port Id – Unique to node only

Data Token
A container for the data value and its location

  • Value – Any data value
  • Port Address – Current location (port) of token

Execute Token
Combines everything needed to fire a node

  • Data Tokens – A collection of all tokens on inputs of node
  • Node – A means to activate the node

Node
A runtime node combines the Node Definition and a unique Node ID

  • Node Definition – Defines how to activate the node
  • Node Id – Engine wide, unique id

Node Definition
A node declaration and definition. A single Node Definition may be used to define many runtime nodes that all act the same as the Node Definition — just with different Node IDs.

  • Node’s activation function – Function that does the real work
  • List of Ports – All the ports on this node
  • Fire Patterns – A collection of Fire Patterns

Arc
An Arc is a connection between to two ports

  • Source Port Address – Address of the output port
  • Sink Port Address – Address of the input port

Fire Pattern
This is a union type in C/C++, a sum type in Haskell, or, in object oriented languages, a base class of Fire Pattern with one sub-class for Exists, one for Empty and so on. This could also be implemented as an enumeration.

A Fire Pattern for a single port is one of the following:

  • Exists – A token must exist on this input
  • Empty – Input must be empty
  • Don’t Care – Doesn’t matter if a token is available or not
  • Always – Ignores all other patterns, and forces the whole fire pattern to match

The pattern for the whole node is simply a collection of the all the Fire Patterns for that node.

Token Store
A collection of all the Data Tokens in the system. Read and written to by the Enable Unit only. The tokens in here represent the complete state of the program.

Node Store
A collection of all the nodes in the system. Note changing this at runtime allows for dynamic node re-definitions.

Arc Store
A collection of all the connections in the system. Note changing this at runtime allows for dynamic graphs.

Dataflow Program
The Node Store and Arc Store together are everything needed to define a dataflow program. This is loaded into the engine before execution.

  • Node Store – A collection of all the nodes in the program.
  • Arc Store – A collection of all the arcs in the program.

Implementation Components

IO Unit

Input: Data Token
Local Output: Data Token
External Output: Data Token

The IO Unit is the interface to the engine. Tokens coming arriving at the input port with internal addresses are directed to the “local” port of the component and those with external addresses are directed to the “external” port.

Transmit Unit

Input: Data Token
Output: Data Token

Token movement along arcs are implemented with this unit. The Transmit Unit looks at the tokens address and compares it to a lookup table of connections in the system (Arc Store). If it finds that the new token’s address is on an output port, then it will make one copy of the token for each input port and give it the address of that input port. This action is equivalent to moving the token along the arc and sending a copy down each path.

The lookup table is an encoding of all the connections in the program. Changing the values in this table changes the graph, allowing for runtime dynamism.

Enable Unit

Input: Data Token
Output: Execute Token

The Enable Unit looks at the incoming tokens and compares them to a store of waiting tokens. The Token Store holds all the tokens currently moving through the program. By comparing the waiting tokens with the node’s fire pattern (pulled from the Token Store), the Enable Unit can determine if a node can fire.

For activated nodes, it creates and sends an Execute Token that packages together all the data tokens for the inputs of the node and a way to invoke the node itself. The tokens are removed from the Token Store and the node definition is copied from the Node Store.

If the incoming token does not cause a node to activate, then it will save the new token in the Token Store for later use.

This implementation allows for per-node firing patterns. The original Manchester Processor design had one firing pattern for every node… all inputs must have tokens waiting before the node can activate. And since all nodes in the original design only had 1 or 2 inputs, the Manchester architecture’s Enable Unit didn’t need access to the node’s definition.

Due to the addition of per-node firing patterns and more than 2 inputs allowed per node, this design requires Enable to connect to the Node Store and the Token Store while the Manchester design only needed a connection to the Token Store.

The Enable Unit is the only component that writes to the Token Store so no special locking is needed for multi-threaded operation.

Execute Unit

Input: Execute Token
Output: Data Token

With the contents of the Execute Token, this unit has everything it needs fire nodes.

It takes the collection of data tokens from the Execute Token and passes it to the node definition for evaluation. In this implementation the node definition is simply a function that takes a collection of tokens and returns another (possibly empty) collection of output tokens.

The node’s function only deals with what I call, “local tokens.” They are just like the regular data tokens (that I refer to in this context as a “global token”) without the Node ID field. Nodes should be isolated and not have to know anything about the outside world. The node’s ID is external to the definition of the node itself. It doesn’t matter if there are 10 nodes, all with the same definition and different node IDs, they should all operate exactly the same. What the node does know about is its Port IDs. Port IDs are unique to the node definition. The node’s function returns a collection of local tokens with addresses of output ports that exist on the node.

The Execute Unit must first convert a global token to a local token. It does this by simply stripping off the node’s ID but retaining the port ID and data value. It calls the function with the locals (input tokens) and gets back a collection of locals (output tokens). The unit converts these to globals by adding the current node’s ID back to the tokens along with the port ID and data value.

In the original Manchester design, nodes where defined by an op-code in the same way that instructions in a Von Neumann processor are defined. The Execute Unit knew how to execute an op-code it received so the Execute Token only needed to include an op-code number instead of the full node definition like this implementation requires. In software it costs the same to pass an integer as it does to pass the full node definition and makes the design more general.

Haskell source code can be found at https://github.com/dataflowbook/

Next time we will step through the execution of an example dataflow program.

Tags:

Comments are closed.

Learn the Theory and Application of Dataflow and Reactive Programming

Download Sample

Dataflow Programming in .Net with Detailed Examples using the TPL Dataflow Library

Download Sample