Intro to Graphix
Graphix is a programming language using the dataflow paradigm. It is particularly well suited to building user interfaces, and interacting with resources in netidx. Dataflow languages like Graphix are "reactive", like the popular UI library, except at the language level instead of just as a library. A Graphix program is compiled to a directed graph, operations (such as +) are graph nodes, edges represent paths data can take through the program. Consider,
2 + 2
This compiles to a graph like,
const(2) ==> + <== const(2)
When executed the program will have 1 output, 4. Which is exactly what you'd expect and is no different from a non data flow program. We need a more complex example to see the difference,
let x = cast<i64>(net::subscribe("/foo")?)?;
print(x * 10)
net::subscribe, subscribes to a netidx path and returns it's value, or an error (more on ? later). Now lets see what happens, the graph we get from this program looks something like this,
const(10) ==
|
|
v
const("/foo") => net::subscribe => cast<i64> => * => print
Unlike the first example, the value of net::subscribe
isn't a constant, it can
change if the value published in netidx changes. Graphix programs never
terminate on their own, they are just graphs, if one of their dependent nodes
changes, then they update. So if the published value of "/foo" is initially 10,
then this program will print 100, if the value of "/foo" changes to 5 then the
output of the program will change to 50, and so on forever.
This is a powerful way to think about programming, and it's especially well suited to building user interfaces and transforming data streams.
Besides being a dataflow language Graphix tries hard to be a normal language that would feel familiar to anyone who knows a modern functional programming language. Some of it's features are,
- lexically scoped
- expression oriented
- strongly statically typed
- type inference
- structural type discipline
- parametric polymorphism
- algebraic data types
- pattern matching
- first class functions, and closures
Installing Graphix
To install the Graphix shell from source you need to install a rust build environment. See here for instructions on how to do that for your platform. Once you have that set up, you can just run
cargo install graphix-shell
That should build the graphix
command and install it in your ~/.cargo/bin
directory. On linux you may need to install kerberos headers, as well as clang
libs for gssapi to build properly (on linux). On debian/ubuntu install
libclang-dev
, and libkrb5-dev
. On other distributions the names will be
similar.
Core Language
This section documents the core language constructs in Graphix.
Fundamental Types
Graphix has a few fundamental data types, the Graphix shell is a good way to
explore them by trying out small Graphix expressions. You can run the Graphix
shell by invoking graphix
with no arguments.
Numbers
i32
, u32
, i64
, u64
, f32
, f64
, and decimal
are the fundamental
numeric types in Graphix. Literals are written with their type prefixed, except
for i64
and f64
which are written bare. for example, u32:3
is a u32
literal value.
decimal
is an exact decimal representation for performing financial
calculations without rounding or floating point approximation errors.
The basic arithmetic operations are implemented on all the number types with all the other number types.
Operation | Operator |
---|---|
Add | + |
Subtract | - |
Multiply | * |
Divide | / |
Mod | % |
The compiler will let you do arithmatic on different types of numbers directly without casting, however the return type of the operation will be the set of all the types in the operation, representing that either type could be returned. If you try to pass this result to a function that wants a specific numeric type, it will fail at compile time.
〉1. + 1
-: [i64, f64]
2
〉let f = |x: i64| x * 10
〉f(1. + 1)
error: in expr
Caused by:
0: at: line: 1, column: 3, in: (f64:1. + i64:1)
1: at: line: 1, column: 3, in: (f64:1. + i64:1)
2: type mismatch '_1046: i64 does not contain [i64, f64]
Division by zero is raised as an error to the nearest error handler (more on that later) and will be printed to stderr by the shell if it is never handled. Overflow and underflow are handled by wrapping,
〉0 / 0
thread 'tokio-runtime-worker' panicked at /home/eric/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/num/mod.rs:319:5:
attempt to divide by zero
-: i64
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
unhandled error: error:"in expr at line: 1, column: 1 attempt to divide by zero"
〉u32:0 - u32:1
-: u32
4294967295
The thread panic message is an artifact of how the overflow error is handled at runtime, it is safe to continue using the shell and runtime if such an error occurrs. However the particular arith operation that caused the error will not update, which may cause problems depending on what your program is doing with it.
Number Sets
There are a few sets of number types that classify numbers into various kinds.
Number
being the most broad, it contains all the number types. Int
contains
only integers, Real
contains only reals (decimal plus the two float types),
SInt
contains signed integers, UInt
contains unsigned integers.
Bool
Boolean literals are written as true
and false
, and the name of the boolean
type is bool
.
Boolean expressions using &&
, ||
, and !
are supported. These operators
only operate on bool
. They can be grouped with parenthesis. For example,
〉true && false
-: bool
false
〉true || false
-: bool
true
〉!true
-: bool
false
〉!1
error: in expr
Caused by:
0: at: line: 1, column: 2, in: i64:1
1: type mismatch bool does not contain i64
Duration
A time duration. The type name is duration
, and the literals are written as,
duration:1.0s
, duration:1.0ms
, duration:1.0us
, duration:1.0ns
. Durations
can be added, and can be multiplied and divided by scalars.
〉duration:1.0s + duration:1.0s
-: duration
2.s
〉duration:1.0s * 50
-: duration
50.s
〉duration:1.0s / 50
-: duration
0.02s
DateTime
A date and time in the UTC time zone. The type name is datetime
and literals
are written in RFC3339 format inside quotes. For example,
datetime:"2020-01-01T00:00:00Z"
. You can add and subtract duration
from
datetime
.
〉datetime:"2020-01-01T00:00:00Z" + duration:30.s
-: datetime
2020-01-01 00:00:30 UTC
You can enter datetime
literals in local time and they will be converted to UTC. For example,
〉datetime:"2020-01-01T00:00:00-04:00"
-: datetime
2020-01-01 04:00:00 UTC
String
Strings in Graphix are UTF8 encoded text. The type name is string
and the
literal is written in quotes "this is a string"
. C style escape sequences are
supported, "this is \" a string with a quote and a \n"
. Non printable
characters such as newline will be escaped by default when strings are printed
to the console, you can use print
to print the raw string including control
characters.
String Interpolation
String literals can contain expressions that will be evaluated and joined to the string,
such expressions are surrounded by unescaped []
in the string. For example,
〉let row = 1
〉let column = 999
〉"/foo/bar/[row]/[column]"
-: string
"/foo/bar/1/999"
Values in an interpolation need not be strings, they will be cast to a string
when they are used. You can write a literal [
or ]
in a string by escaping
it.
〉"this is a string with a \[ and a \] but it isn't an interpolation"
-: string
"this is a string with a [ and a ] but it isn't an interpolation"
Any
The Any
type is a type that unifies with any other type, it corresponds to the
underlying variant type that represents all values in Graphix (and netidx). It
is not used very often, as it provides very few guarantees, however it has it's
place. For example, Any
is the type returned by net::subscribe
, indicating
that any valid netidx value can come from the network. Usually the first thing
you do with an Any
type is call cast
to turn it into the type you expect (or
an error), or use a select
expression to match it's type (more on select later).
Null
Null is nothing, just like in many other languages. Unlike most other languages
null
is a type not a catch all. If the type of a value does not include null
then it can't be null. The set ['a, null]
(alias Option<'a>
) is commonly
used to represent things that will sometimes return null
.
Array
Arrays are immutable, contiguous, and homogenous. They are parameterized,
Array<string>
indicates an array of strings. Arrays are zero indexed a[0]
is
the first element. Array elements can be any type, including other arrays at
arbitrary levels of nesting. There is a special array
(case sensitive), that
represents the fundamental array type in the underlying value representation.
Array literals are written like [x, y, z]
. There are many functions in the
array
module of the standard library for working with arrays.
Array Slicing and Indexing
Graphix supports array subslicing, the syntax will be familar to Rust programmers.
a[2..]
a slice from index 2 to the end of the arraya[..4]
a slice from the beginning of the array to index 3a[1..3]
a slice from index 1 to index 2a[-1]
the last element in the arraya[-2]
the second to last element in the array
..=
is not supported however, the second part of the slice will always be the
exclusive bound. Literal numbers can always be replaced with a Graphix
expression, e.g. a[i..j]
is perfectly valid.
Mutability and Implementation
Arrays are not mutable, like all other Graphix values. All operations that "change" an array, actually create a new array leaving the old one unchanged. This is even true of the connect operator, which we will talk more about later.
There are a couple of important notes to understand about the implementation of Arrays.
-
Arrays are memory pooled, in almost all cases (besides really huge arrays) creating an array does not actually allocate any memory, it just reuses a previously used array that has since become unused. This makes using arrays a lot more efficient than you might expect.
-
Arrays are contiguous in memory, there is no funny business going on (looking at you lua). This means they are generally very memory efficient, each element is 3 machine words, and fast to access. However there are a few cases where this causes a problem, such as building up an array by appending one element at a time. This is sadly an O(N^2) operation on arrays. You may wish to use another data structure for this kind of operation.
-
Array slices are zero copy. They do not allocate memory, and they do not clone any of the array's elements, they simply create a light weight view into the array. This means algorithms that progressively deconstruct an array by slicing are O(N) not O(N^2) and the constants are very fast.
Tuples
Tuples are written (x, y)
, they can be of arbitrary length, and each element
may have a different type. Tuples may be indexed using numeric field indexes.
Consider
let x = (1, 2, 3, 4);
x.0
Will print 1.
Map
Maps in Graphix are key-value data structures with O(log(N)) lookup, insert, and
remove operations. Maps are parameterized by their key and value type, for
example Map<string, i64>
indicates a map with string keys and integer values.
There are many functions for working with maps in the map
standard library
module
Map Literals
Maps can be constructed using the {key => value}
syntax:
〉{"a" => 1, "b" => 2, "c" => 3}
-: Map<'_1893: string, '_1895: i64>
{"a" => 1, "b" => 2, "c" => 3}
Keys and values can be any Graphix type, for example here is a map where the key
is a Map<string, i64>
.
{{"foo" => 42} => "foo", {"bar" => 42} => "bar"}
-: Map<'_1919: Map<'_1915: string, '_1917: i64>, '_1921: string>
{{"bar" => 42} => "bar", {"foo" => 42} => "foo"}
Map Indexing
Maps can be indexed using the map{key}
syntax to retrieve values:
〉let m = {"a" => 1, "b" => 2, "c" => 3}
〉m{"b"}
-: ['_1907: i64, Error<`MapKeyError(string)>]
2
If a key is not present in the map, indexing returns a MapKeyError
:
〉m{"missing"}
-: ['_1907: i64, Error<`MapKeyError(string)>]
error:["MapKeyError", "map key \"missing\" not found"]
Mutability and Implementation
Like all Graphix values, maps are immutable. All operations that "change" a map actually create a new map, leaving the original unchanged. Maps are memory pooled and very efficient - creating new maps typically reuses existing memory rather than allocating new memory.
Maps maintain their key-value pairs in a balanced tree structure, ensuring O(log(N)) performance for all operations regardless of map size.
Error
Error is the built in error type. It carries a type parameter indicating the
type of error, for example Error<`MapKeyError(string)>
is an error that
carries a `MapKeyError
variant. You can access the inner error value
using e.0
e.g.,
〉let e = error(`MapKeyError("no such key"))
〉e.0
-: `MapKeyError(string)
`MapKeyError("no such key")
More information about dealing with errors is available in the section on error handling.
Let Binds
Let bindings introduce names that are visible in their scope after they are defined.
let x = 2 + 2 + x; // compile error x isn't defined yet
let y = x + 1 // ok
The same name can be used again in the same scope, it will shadow the previous value.
let x = 1;
let x = x + 1; // ok uses the previous definition
x == 2 // true
You can annotate the binding with a type, which will then be enforced at compile time. Sometimes this is necessary in order to help type inference.
let x: Number = 1; // note x will be of type Number even though it's an i64
let y: string = x + 1; // compile time type error
You can use patterns in let binds as long as they will always match.
let (x, y) = (3, "hello"); // binds x to 3 and y to "hello"
x == 3; // true
y == "hello" // true
You can mix type annotations with pattern matches
let (x, y): (i64, string) = (3, "hello")
You can assign documentation to a let bind using a ///
comment. Documentation
will be displayed in the shell when the user tab completes and will be made
available by the lsp server.
// this is a normal comment
let x = 1;
/// this is documentation for y
let y = 2;
Connect
Connect, written x <- expr
is where things get interesting in Graphix. The
sharp eyed may have noticed that up until now there was no way to introduce a
cycle in the graph. Connect is the only graph operator in Graphix, it allows you
to connect one part of the graph to another by name, causing the output of the
right side to flow to the name on the left side. Consider,
let x = "off"
x <- time::timer(duration:1.0s, false) ~ "on"
print(x)
This program will first print off
, and after 1 second it will print on
. Note
the ~
operator means, when the expression on the left updates return the
current value of the expression on the right (called the sample operator). The
graph we created looks like,
const("off") ===============> "x" =======> print
^
|
time::timer ====> sample =====
^
|
const("on") =====
We can also build an infinite loop with connect. This won't crash the program, and it won't stop other parts of the program from being evaluated,
let x = 0;
x <- x + 1;
print(x)
This program will print all the i64s from 0 to MAX and then will wrap around. It will print numbers forever. You might notice, and you might wonder, why does it start from zero, shouldn't it start from 1? After all we increment x BEFORE the print right? Well, no, not actually, it will start at 0, for the same reason this infinite loop won't lock up the program or cause other expressions not to be evaluated. Graphix programs are evaluated in cycles, a batch of updates from the network, timers, and other IO is processed into a set of all events that happened "now", then the parts of the program that care about those particular events are evaluated, and then the main loop goes back to waiting for events.
What connect does is it schedules an update to x
for the next cycle, the
current cycle proceeds as normal to it's conclusion as if the connect didn't
happen yet, because it didn't. In the above case the event loop would never
wait, because there is always work to do adding 1 to x
, however it will still
check for other events every cycle.
When combined with other operations, specifically select, connect becomes a powerful general looping construct, and is the only way to write a loop in Graphix. A quick example,
let count = {
let x = 0;
select x {
n if n < 10 => x <- x + 1,
_ => never() // never() never updates
};
x
};
count
This program creates a bind count
that will update with the values 0 to 10. If
you put it in a file test.gx
and execute it using graphix ./test.gx
it will
print 0 to 10 and then wait.
eric@katana ~> proj/graphix/target/debug/graphix ./test.gx
0
1
2
3
4
5
6
7
8
9
10
Is Connect Mutation?
Connect causes let bound names to update, so it's kind of mutation. Kind of. A
better way to think about it is that every let bound value is a pipe with
multiple producers and multiple consumers. Connect adds a new producer to the
pipe. The values being produced are immutable, an array [1, 2, 3]
will always
and forever be [1, 2, 3]
, but a new array [1, 2, 3, 4]
might be pushed into
the same pipe [1, 2, 3]
came from, and that might make it appear that the
array changed. The difference is, if you captured the original [1, 2, 3]
and
put it somewhere, a new [1, 2, 3, 4]
arriving on the pipe can't change the
original array.
Blocks
A block is a group of code between {
and }
that has it's own scope, and
evaluates to the last value in the block. Expressions in a block are ;
separated, meaning every expression except the last one must end in a ;
and
it is illegal for a block to have just one expression (it will not parse).
You can use blocks to hide intermediate variables from outer scopes, and to group code together in a logical way.
let important_thing = {
let x = 0;
let y = x + 1;
43 - y
};
x; // compile error, x isn't in scope
y; // compile error, y isn't in scope
important_thing
This program won't compile because you can't reference y and x from outside the block scope, but if you removed those refernces it would print a very important number. Blocks are valid anywhere an expression is valid, and they are just expressions. They will become very important when we introduce lambda expressions.
Use
Use allows you to bring names in modules into your current scope so they can be used without prefixing.
net::subscribe(...); // call subscribe in the net module
use net;
subscribe(...) // same function
Use is valid anywhere expressions are valid
let list = {
use array;
map([1, 2, 3, 4, 5], |x| x * 2)
};
list
will print [2, 4, 6, 8, 10]
let list = {
use array;
map([1, 2, 3, 4, 5], |x| x * 2)
};
map(list, |x| x * 2)
will not compile, e.g.
eric@katana ~> proj/graphix/target/debug/graphix ./test.gx
Error: in file "/home/eric/test.gx"
Caused by:
at line: 5, column: 1 map not defined
Use shadows earlier declarations in it's scope. Consider,
let map = |a, f| "hello you called map!";
let list = {
use array;
map([1, 2, 3, 4, 5], |x| x * 2)
};
(list, map(list, |x| x * 2))
prints
eric@katana ~> proj/graphix/target/debug/graphix ./test.gx
([2, 4, 6, 8, 10], "hello you called map!")
Select
Select lets us create a graph node with multiple possible output paths that will choose one path for each value based on a set of conditions. Kind of like,
| if foo > 0 => ...
|
|
ref(foo) => select | if foo < 0 => ...
|
|
| otherwise => ...
is written as
select foo {
n if n > 0 => ...,
n if n < 0 => ...,
n => ...
}
select takes an expression as an argument and then evaluates one or more "arms". Each arm consists of an optional type predicate, a destructuring pattern, and an optional guard clause. If the type predicate matches, the pattern matches, and the guard evaluates to true then the arm is "selected". Only one arm may be selected at a time, the arms are evaluated in lexical order, and first arm to be selected is chosen as the one and only selected arm.
The code on the right side of the selected arm is the only code that is evaluated by select, all other code is "asleep", it will not be evaluated until it is selected (and if it has netidx subscriptions or published values they will be unsubscribed and unpublished until it is selected again).
Matching Types
Consider we want to select from a value of type [Array<i64>, i64, null]
,
let x: [Array<i64>, i64, null] = null;
x <- time::timer(duration:1.s, false) ~ [1, 2, 3, 4, 5];
x <- time::timer(duration:2.s, false) ~ 7;
select x {
Array<i64> as a => array::fold(a, 0, |s, x| s + x),
i64 as n => n,
null as _ => 42
}
This program will print 42, 15, 7 and then wait. The compiler will check that you have handled all the possible cases. If we remove the null case from this select we will get a compile error.
eric@katana ~> proj/graphix/target/debug/graphix ./test.gx
Error: in file "/home/eric/test.gx"
Caused by:
missing match cases type mismatch [i64, Array<i64>] does not contain [[i64, null], Array<i64>]
If you read this carefully you can see that the compiler is building up a set of types that we did match, and checking that it contains the argument type. This goes both ways, a match case that could never match is also an error.
let x: [Array<i64>, i64, null] = null;
x <- time::timer(duration:1.s, false) ~ [1, 2, 3, 4, 5];
x <- time::timer(duration:2.s, false) ~ 7;
select x {
Array<i64> as a => array::fold(a, 0, |s, x| s + x),
i64 as n => n,
f64 as n => cast<i64>(n)?,
null as _ => 42
}
Here we've added an f64
match case, but the argument type can never contain an
f64
so we will get a compile error.
eric@katana ~> proj/graphix/target/debug/graphix ./test.gx
Error: in file "/home/eric/test.gx"
Caused by:
pattern f64 will never match null, unused match cases
The diagnostic message gives you an insight into the compiler's thinking. What
it is saying is that, by the time it's gotten to looking at the f64
pattern,
the only type left in the argument that hasn't already been matched is null
,
and since f64
doesn't unify with null
it is sure this pattern can never
match.
Guarded patterns can always not match because of the guard, so they do not subtract from the argument type set. You are required to match without a guard at some point. No analysis is done to determine if your guard covers the entire range of a type.
let x: [Array<i64>, i64, null] = null;
x <- time::timer(duration:1.s, false) ~ [1, 2, 3, 4, 5];
x <- time::timer(duration:2.s, false) ~ 7;
select x {
Array<i64> as a => array::fold(a, 0, |s, x| s + x),
i64 as n if n > 10 => n,
null as _ => 42
}
This will fail with a missing match case because the i64
pattern is guarded
and no unguarded pattern exists that matches i64
.
eric@katana ~> proj/graphix/target/debug/graphix ./test.gx
Error: in file "/home/eric/test.gx"
Caused by:
missing match cases type mismatch [null, Array<i64>] does not contain [[i64, null], Array<i64>]
This is the same error you would get if you omitted the i64
match case
entirely.
Matching Structure
The type predicate is optional in a pattern, and the more commonly used pattern is structural. Graphix supports several kinds of structural matching,
- array slices
- tuples
- structs
- variants
- literals, ignore
NB: In most contexts you can match the entire value as well as parts of it's
structure by adding a v@
pattern before the pattern. You will see this in many
of the examples.
Slice Patterns
Suppose we want to classify arrays that have at least two elements vs arrays that don't, and we want to return a variant with a triple of the first two elements and the rest of the array or `Short with the whole array.
let a = [1, 2, 3, 4];
a <- [1];
a <- [5, 6];
select a {
[x, y, tl..] => `Ok((x, y, tl)),
a => `Short(a)
}
This program will print,
eric@katana ~> proj/graphix/target/debug/graphix ./test.gx
`Ok((1, 2, [3, 4]))
`Short([1])
`Ok((5, 6, []))
The following kinds of slice patterns are supported,
-
whole slice, with binds, or literals, e.g.
[1, x, 2, y]
matches a 4 element array and binds it's 2nd and 4th element tox
andy
respectivly. -
head pattern, like the above program, e.g.
[(x, y), ..]
matches the first pair in an array of pairs and ignores the rest of the array, binding the pair elements tox
andy
. You can also name the remainder, as we saw, e.g.[(x, y), tl..]
does the same thing, but binds the rest of the array totl
-
tail pattern, just like the head pattern, but for the end of the array. e.g.
[hd.., {foo, bar}]
matches the last element of an array of structs with fieldsfoo
andbar
, bindinghd
to the array minus the last element, andfoo
to field foo andbar
to field bar.
Structure patterns (all of the differnt types) can be nested to any depth.
Tuple Patterns
Tuple patterns allow you to match tuples. Compared to slice patterns they are
fairly simple. You must specify every field of the tuple, you can choose to bind
it, or ignore it with _
. e.g.
("I", "am", "a", "happy", "tuple", w, _, "patterns")
Struct Patterns
Struct patterns, like tuple patterns, are pretty simple.
{ x, y }
if you like the field names then there is no need to change them{ x: x_coord, y: y_coord }
but if you need to use a different name you can{ x, .. }
you don't have to write every field
Consider
let a = {x: 54, y: 23};
a <- {x: 21, y: 88};
a <- {x: 5, y: 42};
a <- {x: 23, y: 32};
select a {
{x, y: _} if (x < 10) || (x > 50) => `VWall,
{y, x: _} if (y < 10) || (y > 40) => `HWall,
{x, y} => `Ok(x, y)
}
does some 2d bounds checking, and will output
eric@katana ~> proj/graphix/target/debug/graphix ./test.gx
`VWall
`HWall
`VWall
`Ok(23, 32)
You might be tempted to replace y: _
with ..
as it would be shorter.
Unfortunatly this will confuse the type checker, because the Graphix type system
is structural saying {x, ..}
without any other information could match ANY
struct with a field called x
. This is currently too much for the type checker
to handle,
eric@katana ~> proj/graphix/target/debug/graphix ./test.gx
Error: in file "/home/eric/test.gx"
Caused by:
pattern {x: '_1040} will never match {x: i64, y: i64}, unused match cases
The error is slightly confusing at first, until you understand that since we
don't know the type of {x, ..}
we don't think it will match the argument type,
and therefore the match case is unused. This actually saves us a lot of trouble
here, because the last match is exhaustive, if we didn't check for unused match
cases this program would compile, but it wouldn't work. You can easily fix this
by naming the type, and for larger structs it's often worth it if you only need
a few fields.
type T = {x: i64, y: i64};
let a = {x: 54, y: 23};
a <- {x: 21, y: 88};
a <- {x: 5, y: 42};
a <- {x: 23, y: 32};
select a {
T as {x, ..} if (x < 10) || (x > 50) => `VWall,
T as {y, ..} if (y < 10) || (y > 40) => `HWall,
{x, y} => `Ok(x, y)
}
Here since we've included the type pattern T
in our partial patterns the
program compiles and runs.
eric@katana ~> proj/graphix/target/debug/graphix ./test.gx
`VWall
`HWall
`VWall
`Ok(23, 32)
Note that we never told the compiler that a
is of
type T
. In fact T
is just an alias for {x: i64, y: i64}
which is the type
of a
. We could in fact write our patterns without the alias,
{x: i64, y: i64} as {x, ..} if (x < 10) || (x > 50) => `VWall
The type alias just makes the code less verbose without changing the semantics.
Variant Patterns
Variant patterns match variants. Consider,
let v: [`Bare, `Arg(i64), `MoreArg(string, i64)] = `Bare;
v <- `Arg(42);
v <- `MoreArg("hello world", 42);
select v {
`Bare => "it's bare, no argument",
`Arg(i) => "it has an arg [i]",
x@ `MoreArg(s, n) => "it's big [x] with args \"[s]\" and [n]"
}
produces
eric@katana ~> proj/graphix/target/debug/graphix ./test.gx
"it's bare, no argument"
"it has an arg 42"
"it's big `MoreArg(\"hello world\", 42) with args \"hello world\" and 42"
Variant patterns enforce the same kinds of match case checking as all the other pattern types
let v: [`Bare, `Arg(i64), `MoreArg(string, i64)] = `Bare;
v <- `Arg(42);
v <- `MoreArg("hello world", 42);
select v {
`Bare => "it's bare, no argument",
`Arg(i) => "it has an arg [i]",
x@ `MoreArg(s, n) => "it's big [x] with args \"[s]\" and [n]",
`Wrong => "this won't compile"
}
yields
eric@katana ~> proj/graphix/target/debug/graphix ./test.gx
Error: in file "/home/eric/test.gx"
Caused by:
pattern `Wrong will never match [`Arg(i64), `MoreArg(string, i64)], unused match cases
Literals, Ignore
You can match literals as well as bind variables, as you may have noticed, and
the special pattern _
means match anything and don't bind it to a variable.
Missing Features
A significant missing feature from patterns vs other languages is support for multiple alternative patterns in one arm. I plan to add this at some point.
Select and Connect
Using select and connect together is one way to iterate in Graphix. Consider,
let a = [1, 2, 3, 4, 5];
let len = 0;
select a {
[x, tl..] => {
len <- len + 1;
a <- tl
},
_ => len
}
produces
eric@katana ~> proj/graphix/target/debug/graphix ./test.gx
5
This is not normally how we would get the length of an array in Graphix, or even
how we would do something with every element of an array (see array::map
and
array::fold
), however it illustrates the power of select and connect together.
Error Handling
Errors in Graphix are represented by the Error<'a>
type. A new instance of
which can be created with the error
function. e.g.
〉error(`Foo)
-: Error<'a: `Foo>
error:"Foo"
Try Catch and ?
While errors are normal values, and can be matched in select, they can also be
thrown and handled like exceptions. The ?
operator throws errors generated by
the expression on it's left to the nearest try catch block in dynamic scope. for
example,
〉let a = [1, 2, 3, 4]
〉try a[15]? catch(e) => println(e)
-: i64
error:[["cause", null], ["error", ["ArrayIndexError", "array index out of bounds"]], ["ori", [["parent", null], ["source", "Unspecified"], ["text", "try a[15]? catch(e) => println(e)"]]], ["pos", [["column", i32:5], ["line", i32:1]]]]
Catches the array index error and prints it's full context to stdout. Every
error raised with ?
is wrapped in an ErrChain
struct, the full definition of
which is,
type Pos = {
line: i32,
column: i32
};
type Source = [
`File(string),
`Netidx(string),
`Internal(string),
`Unspecified
];
type Ori = {
parent: [Ori, null],
source: Source,
text: string
};
type ErrChain<'a> = {
cause: [ErrChain<'a>, null],
error: 'a,
ori: Ori,
pos: Pos
}
This gives the full context of where the error happened, and whether it was previously caught and reraised, giving the full history back to the first time it was ever raised.
The scope is dynamic, not lexical, mirroring exception systems that unwind the stack,
〉let div0 = try |x| x / 0 catch(e) => println(e ~ "never triggered")
〉try div0(0) catch(e) => println(e)
-: i64
error:[["cause", null], ["error", ["ArithError", "attempt to divide by zero"]], ["ori", [["parent", null], ["source", "Unspecified"], ["text", "let div0 = try |x| x / 0 catch(e) => println(e ~ \"never triggered\")"]]], ["pos", [["column", i32:20], ["line", i32:1]]]]
The catch surrounding the function call site, not the definition site, is the one triggered.
Try Catch Block Value
The try catch block always evaluates to the last value inside the try catch, never to the value of the catch block. An error being raised to try catch does not stop the execution of nodes in the try catch.
Checked Errors
Graphix function types are annotated by the type of error they might raise. In
most cases this is automatic, but for some higher order functions it may be
neccessary to specify it explicitly. For example array map has type
fn(Array<'a>, fn('a) -> 'b throws 'e) -> Array<'b> throws 'e
indicating that
while the map function itself does not throw any errors, it will throw any
errors the function passed to it throws. This is all in the service of being
able to statically check the type of thrown errors, for example,
let a = [0, 1, 2, 3];
try a[0]? + a[1]?
catch(e) => select (e.0).error {
`ArithError(s) => println("arithmetic operation error [s]"),
`ArrayIndexError(s) => println("array index error [s]")
}
There are two types of errors that can happen in this example, and the compiler knows that. If you were to omit one of them, then the example would not compile. Suppose we remove the pattern for ArrayIndexError, we would get,
Error: in file "/home/eric/test.gx"
Caused by:
0: at: line: 3, column: 13, in: select (e.0).error {`ArithError(s) => ..
1: missing match cases type mismatch `ArithError('_1897: string) does not contain '_1895: [`ArithError(string), `ArrayIndexError(string)]
You'll recognize that this is just the normal select exhaustivness checking at work. Since errors are just normal types, the important point is the compiler knows the type of every error at compile time, everything else flows from there.
Unhandled Errors
By default when evaluating a file, the compiler will print a warning whenever an
error raised by ?
is not handled explicitly by a try catch block. Arithmetic
errors such as overflow do not generate this warning by default. Using -W
flags you can change the compilers behavior in this respect.
The $ Operator, aka Or Never
The $
operator goes in the same position as ?
, and is best described as "or
never". It the expression on it's left is a non error, then $
doesn't do
anything, otherwise it returns nothing. This is a concise way of writing,
select might_fail(1, 2, 3) {
error as _ => never(),
v => v
}
can instead be written as,
might_fail(1, 2, 3)$
Functions
Functions are first class values. They can be stored in variables, in data structures, and they can be passed around to other functions. Etc. They are defined with the syntax,
|arg0, arg1, ...| body
This is often combined with a let bind to make a named function.
let f = |x, y| x + y + 1
f
is now bound to the lambda that adds it's two arguments and 1. You can also
use structure patterns in function arguments as long as the pattern will always
match.
let g =|(x, y), z| x + y + z
Type annotations can be used to constrain the argument types and the return type,
let g = |(x, y): (f64, f64), z: f64| -> f64 x + y + z
Functions are called with the following syntax.
f(1, 1)
Would return 3. If the function is stored in a data structure, then sometimes you need parenthesis to call it.
(s.f)(1, 1)
Would call the function f
in the struct s
.
Labeled and Optional Arguments
Functions can have labeled and also optional arguments. Labeled arguments need not be specified in order, and optional arguments don't need to be specified at all. When declaring a function you must specify the labeled and optional arguments before any non labeled arguments.
let f = |#lbl1, #lbl2, arg| ...
In this case lbl1 and 2 are not optional, but are labeled. You can call f with
either labeled argument in either order. e.g. f(#lbl2, #lbl1, a)
.
let f = |#opt = null, a| ...
opt
need not be specifed when f
is called, if it isn't specified then it
will be null
. e.g. f(2)
is a valid way to call f
. You can also apply type
constraints to labeled and optional arguments.
let f = |#opt: [i64, null] = null, a| ..
Specifies that opt
can be either an i64
or null
and by default it is null.
The compiler implements subtyping for functions with optional arguments. For
example if you write a function that takes a function with a labeled argument
foo
, you can pass any function that has a labeled argument foo
, even if it
also has other optional arguments. The non optional and non labeled arguments
must match, of course. For example,
let f = |g: fn(#foo:i64, i64) -> i64, x: i64| g(#foo:x, x);
let g = |#foo:i64, #bar: i64 = 0, x: i64| foo + bar + x;
f(g, 42) // valid call
outputs
eric@katana ~> proj/graphix/target/debug/graphix ./test.gx
84
Lexical Closures
Functions can reference variables outside of their definition. These variables are captured by the function definition, and remain valid no matter where the closure is called. For example,
let f = {
let v = cast<i64>(net::subscribe("/local/foo")$)$;
|n| v + n
};
f(2)
f
captures v
and can use it even when it is called from a scope where v
isn't visible. Closures allow functions to encapsulate data, just like an object
in OOP.
Functions are First Class values
We can store a function in a structure, which can itself be stored in a data structure, a file, or even sent across the network to another instance of the same program. Here we build a struct that maintains a count, and a function to operate on the count, returning a new struct of the same type with a different count.
type T = { count: i64, f: fn(T) -> T };
let t = { count: 0, f: |t: T| {t with count: t.count + 1} };
(t.f)(t)
when run this example will output,
{count: 1, f: 158}
158 is the lambda id, it's the actual value that is stored to represent a function.
Late Binding
Functions are always late bound. Late binding means that the runtime actually
figures out which function is going to be called at runtime, not compile time.
At compile time we only know the type of the function we are going to call. This
complicates the compiler significantly, but it is a powerful abstraction tool.
For example we can create two structs of type T
that each contain a different
implementation of f
, and we can use them interchangibly with any function that
accepts a T
. In this simple example we create one implementation of f
that
increments the count, and one that decrements it.
type T = { count: i64, f: fn(T) -> T };
let ts: Array<T> = [
{ count: 0, f: |t: T| {t with count: t.count + 1} },
{ count: 0, f: |t: T| {t with count: t.count - 1} }
];
let t = array::iter(ts);
(t.f)(t)
when run this example will output,
{count: 1, f: 158}
{count: -1, f: 159}
You can clearly see that f is bound to different functions by the runtime since the lambda ids (158 and 159) are different. While Graphix is not an object oriented language, you can use closures and late binding to achieve some of the same outcomes as OOP.
Polymorphism
All functions are polymorphic, even without annotations, argument and return types are inferred at each call site, and thus may differ from one site to another. Any internal constraints are calculated when the definition is compiled and are enforced at each call site. For example consider,
〉let f = |x, y| x + y
〉f
-: fn<'_2073: unbound: Error<ErrChain<`ArithError(string)>>, '_2069: unbound: Number, '_2067: unbound: Number, '_2071: unbound: Number>('_2067: unbound, '_2069: unbound) -> '_2071: unbound throws '_2073: unbound
159
The type is a bit of a mouthfull, lets format it a bit so it's easier to read.
fn<'_2073: unbound: Error<ErrChain<`ArithError(string)>>,
'_2069: unbound: Number,
'_2067: unbound: Number,
'_2071: unbound: Number>
('_2067: unbound, '_2069: unbound) -> '_2071: unbound throws '_2073: unbound
After fn the stuff between the <>
are the type constraints, the syntax in this
readout is a colon separated list of,
- type variable name, for example '_2073
- current value, or unbound if there is no current value
- constraint type
We can remove the (unbound) current values and it becomes even easier to read,
fn<'_2073: Error<ErrChain<`ArithError(string)>>,
'_2069: Number,
'_2067: Number,
'_2071: Number>
('_2067, '_2069) -> '_2071 throws '_2073
Here we can see that '_2067
, '_2069
, and '_2071
represent the two
arguments and the return type of the function. They are all unbound, meaning
that when the function is used they can have any type. They are also all
constrained to Number
, and this will be enforced when the function is called,
it's arguments must be numbers and it will return a number. We learned this
because internally the function uses +
, which operates on numbers, this
constraint was then propagated to the otherwise free variables representing the
args and the return type.
So in plain English this says that the arguments to the function can by any type as long as it is a number, and the function will return some type which is a number. None of the three numbers need to be the same type of number.
Finally lets address throws '_2073
. This states that the function may throw an
error, and if it does it's type will be '_2073
, which in this case is
constrained to be
Error<ErrChain<`ArithError(string)>>
.
This is what happens in the case of overflow, underflow, and other arithmetic
errors. The throws
clause of the type is used by the try catch(e) => ...
expression to compute the type of e
, which is just the union of all the throws
types within the try catch
.
We can indeed call f
with different number types, and it works just fine,
〉f(1.0, 1)
-: Number
2
The type we get back really depends on the values we pass. For example,
〉f(1.1212, 1)
-: Number
2.1212
Wherever we use f
the compiler will force us to handle every possible case in
the Number
type
Explicit Type Specifications
While the compiler does a pretty good job of inferring the types of functions,
sometimes you want to express a constraint that can't be inferred. Suppose we
wanted to modify the example in the last section to say that while you can pass
any type of number to f
, it has to be the same type for both arguments, and
the return type will be the same as the argument type. We can say that using
type annotations.
〉let f = 'a: Number |x: 'a, y: 'a| -> 'a x + y
〉f
-: fn<'a: unbound: Number, '_2101: unbound: Error<ErrChain<`ArithError(string)>>>('a: unbound, 'a: unbound) -> 'a: unbound throws '_2101: unbound
160
In type annotations of lambda expressions,
- The constraints come before the first
|
, separated by commas if there are multiple constrained type variables. e.g.'a: Number
- Each argument may optionally have a
: Type
after it, and this will set it's type, e.g.x: 'a
- After the second
|
you can optionally include an-> Type
which will set the return type of the function, e.g.-> 'a
- After the return type, you can optionally specify a throws type,
throws Type
, which will set the type that is thrown by the function
The type we ended up with is actually quite a bit simpler, but lets format it anyway,
fn<'a: Number,
'_2101: Error<ErrChain<`ArithError(string)>>>
('a, 'a) -> 'a throws '_2101
We just have two variables now, 'a
representing both argument types and the
return type, and '_2101
representing the throws type. We can still call this f
with any number type,
〉f(1.212, 2.0)
-: f64
3.2119999999999997
However notice that we get back the explicit type we passed in,
〉f(2, 2)
-: i64
4
In one case f64
, in the other i64
. We can't pass numbers of different types,
〉f(1, 1.2)
error: in expr
Caused by:
0: at: line: 1, column: 6, in: f64:1.2
1: type mismatch 'a: i64 does not contain f64
Here the compiler is saying that 'a
is already initialized as i64
and i64
doesn't unify with f64
.
Higher Order Functions
Since functions are first class, they can take other functions as arguments, and even return functions. These relationships can be often inferred automatically without issue, but sometimes annotations are required.
〉 let apply = |x: 'a, f: fn('a) -> 'b throws 'e| -> 'b throws 'e f(x)
〉 apply
-: fn<'e: unbound: _>('a: unbound, fn('a: unbound) -> 'b: unbound throws 'e: unbound) -> 'b: unbound throws 'e: unbound
163
Here we've specified a single argument apply, it takes an argument, and a
function f
, and calls f
on the argument. Note that we've explicitly said
that whatever type of error f
throws, apply
will throw as well. That was
constrained by the compiler to _
meaning basically this could throw anything
or also not throw at all, it just depends on f
.
We can see a more practical example in the type of array::map
(this
implementation of which I will not repeat here), which is,
fn(Array<'a>, fn('a) -> 'b throws 'e) -> Array<'b> throws 'e
So map takes an array of 'a
, and a function mapping 'a
to 'b
and possibly
throwing 'e
and returns an array of 'b
possibly throwing 'e
.
Recursion
Detailed Semantics
Considering the underlying execution model functions might be better described as "polymorphic graph templates", in that they allow you to specify a part of the graph once, and then use it multiple times with different types each time. Most of the time this difference in semantics doesn't matter. Most of the time. Consider,
let n = cast<i64>(net::subscribe("/hev/stats/power")?)?;
f(n, 1)
What happens here? Does f
get "called" every time n
updates? Does it only
work for the first n
? Does it explode? Lets transform it like the compiler
would in order to understand it better,
let n = cast<i64>(net::subscribe("/hev/stats/power")?)?;
n + 1 + 1
The "arguments" to the function call were plugged into the holes in the graph template and then the whole template is copied to the call site, and from then on the graph runs as normal.
Lets revisit an earlier example where we used select and connect to find the length of an array. Suppose we want to generalize that into a function,
let len = |a: Array<'a>| {
let sum = 0;
select a {
[x, tl..] => {
sum <- sum + 1;
a <- tl
},
_ => sum
}
}
Lets just ignore the 'a
for now. Here we have a function that takes an array
with any element type and returns it's length. Brilliant, lets call it,
let a = [1, 2, 3, 4, 5];
len(a)
and when we run this we get,
eric@katana ~ [1]> proj/graphix/target/debug/graphix ./test.gx
5
That's the right answer. Are we done? Noooooooo. No we are not done. Lets see what happens if we do,
let a = [1, 2, 3, 4, 5];
a <- [1, 2, 3];
a <- [1, 2];
len(a)
this results in,
eric@katana ~> proj/graphix/target/debug/graphix ./test.gx
4
What!? That's not even wrong. That's just nonsense, what happened? The key to
understanding this problem is that there is just one call site, which means we
instantiated this little reusable bit of graph one time, just one time. That
means there is just one sum
, one a
, basically just one graph. When we use
connect to iterate we are using graph traversal cycles to do a new element of
the array every cycle until we are done. It will take 5 cycles for the first
array to be done, and that's the problem, because we update a
with a whole new
array in cycle 1 and again in cycle 2. That's why we get 4, it's determanistic,
we will get 4 every time.
- the first cycle we add 1 to
sum
and set the innera
totl
(it's not the same variable as the outera
, which is why the chaos isn't even greater). But the outera
also gets set to[1, 2, 3]
and that overwrites the inner set because it happens after it (because that's just the way the runtime works). - the second cycle we add 1 to
sum
and set the innera
to[2, 3]
and the outera
to[1, 2]
- the third cycle we add 1 to
sum
and set the innera
to[2]
- the 4th cycle we add 1 to
sum
and seta
to[]
- the 5th cycle we update our return value with
sum
, which is now 4
We can only fix this be understanding that we're programming a graph. I tried to
make Graphix as much like a normal language as possible, but this is where we
depart from that possibility. The general idea is, we need to queue updates to
the outer a
until we're done processing the current one. For that we have a
builtin called queue
, here is the correct implementation
let len = |a: Array<'a>| {
let clock = once(null);
let q = queue(#clock, a);
let sum = 0;
select q {
[x, tl..] => {
sum <- sum + 1;
q <- tl
},
_ => {
clock <- null;
sum <- 0;
once(sum)
}
}
}
Every time clock
updates queue
will output something it has queued, or if it
has nothing queued it will store that the next thing that arrives can go out
immediatly. So the first a
will immediatly pass through the queue, but
anything after that will be held. Then the normal select loop will run, except
it will look at q
instead of a
now, so that a
can update without
disturbing it. When we get to the terminating case, we update for next cycle
clock
with null
and sum
with 0 and we return once(sum)
. We return
once(sum)
instead of just sum
because removing something from the queue
takes one cycle, so it will be two cycles before we start on the next array, and
in the mean time the existing array will still be empty, meaning the second
select arm will still be selected, and sum
is updating to 0 which we do not
want to return. If we run this with the same set of examples we will get the
correct answer,
eric@katana ~> proj/graphix/target/debug/graphix ./cycle_iter.gx
5
3
2