Writing Built in Functions

As mentioned in the introduction you can extend Graphix by writing built in functions in Rust. This chapter will deep dive into the full API. If you just want to write a pure function see the Overview, or better yet the Creating Packages guide.

Most users should create a package rather than using these traits directly. The defpackage! macro handles registration and module setup automatically. This chapter is for understanding the internals.

In order to implement a built-in Graphix function you must implement two traits, graphix_compiler::BuiltIn and graphix_compiler::Apply. See the rustdoc for details. These two traits give you more control than the CachedArgs method we covered in the overview. Lets look at the simplest possible example.

Understanding The Once Function

The once function evaluates its argument every cycle and passes through one and only one update. The update method is the most important method of Apply, it is called every cycle and returns something only when the node being updated has “updated”. The meaning of that is specific to what the node does, but in the case of once it means that the argument to once updated, and once has not already seen an update. Consider the example program,

let clock = sys::time::timer(1, true);
println(once(clock))

We expect this example to print the datetime exactly one time. Lets dig in to how that actually works. The clock created by sys::time::timer will tick once per second forever. The sys::time::timer built-in will call set_timer in the Rt, which is part of the ExecCtx. This will schedule a cycle to happen 1 second from now, and will also register that this toplevel node (let clock = ...) depends on the timer event. When the timer event happens the approximate sequence of events is,

let clock = sys::time::timer(1, true), update called on toplevel node (Bind)
- sys::time::timer(1, true), bind calls update on its rhs
- sys::time::timer checks events to see if it should update, returns Some(DateTime(..))
- bind sets the id of clock in events to Value::DateTime(..)
- Rt checks for nodes that depend on clock schedules println(..)
println(once(clock)), update called on toplevel node (CallSite)
- once(clock), println calls update on its argument, once(clock)
- once::update calls update on clock
- ref clock checks events to see if it updated, returns Some(Value::DateTime(..))
- once::update checks if it’s the first time its argument has updated, it is
- once::update returns Some(Value::DateTime(..))
- println prints the datetime

Implementing Once

#![allow(unused)]
fn main() {
use anyhow::Result;
use graphix_compiler::{
    expr::ExprId, typ::FnType, Apply, BuiltIn, Event, ExecCtx, Node, Rt, Scope, UserEvent,
};
use netidx_value::Value;

#[derive(Debug)]
struct Once {
    val: bool,
}

impl<R: Rt, E: UserEvent> BuiltIn<R, E> for Once {
    const NAME: &str = "core_once";

    fn init<'a, 'b, 'c>(
        _ctx: &'a mut ExecCtx<R, E>,
        _typ: &'a FnType,
        _resolved_typ: Option<&'a FnType>,
        _scope: &'b Scope,
        _from: &'c [Node<R, E>],
        _top_id: ExprId,
    ) -> Result<Box<dyn Apply<R, E>>> {
        Ok(Box::new(Once { val: false }))
    }
}

impl<R: Rt, E: UserEvent> Apply<R, E> for Once {
    fn update(
        &mut self,
        ctx: &mut ExecCtx<R, E>,
        from: &mut [Node<R, E>],
        event: &mut Event<E>,
    ) -> Option<Value> {
        match from {
            [s] => s.update(ctx, event).and_then(|v| {
                if self.val {
                    None
                } else {
                    self.val = true;
                    Some(v)
                }
            }),
            _ => None,
        }
    }

    fn sleep(&mut self, _ctx: &mut ExecCtx<R, E>) {
        self.val = false
    }
}
}

The BuiltIn trait is for construction. It declares the built-in’s name. The function’s type is declared in the .gx file where the builtin is bound — all arguments and the return type must be annotated. For example, once would be bound as:

let once = |v: 'a| -> 'a 'core_once

The init method is called when the built-in is instantiated at a call site. It receives the execution context, the concrete function type, the lexical scope, the argument nodes, and the top-level expression id. In this case we don’t care about any of that information, but it will be useful later.

The most important method of Apply is update. sleep is expected to reset all the internal state and unregister anything registered with the context.

Higher Order Functions

Right, now that the easy stuff is out of the way, lets see how we can implement a built-in that takes another Graphix function as an argument. This gets compiler guts all over the place, sorry about that. Again lets look at the simplest example from the standard library, which is array::group.

#![allow(unused)]
fn main() {
use anyhow::bail;
use compact_str::format_compact;
use graphix_compiler::{
    expr::ExprId,
    genn,
    node::Node,
    typ::{FnType, Typ, Type},
    Apply, BindId, BuiltIn, Event, ExecCtx, LambdaId, Refs, Rt, Scope, UserEvent,
};
use netidx_value::Value;
use smallvec::{smallvec, SmallVec};
use std::collections::VecDeque;

#[derive(Debug)]
pub(super) struct Group<R: Rt, E: UserEvent> {
    queue: VecDeque<Value>,
    buf: SmallVec<[Value; 16]>,
    pred: Node<R, E>,
    ready: bool,
    pid: BindId,
    nid: BindId,
    xid: BindId,
}

impl<R: Rt, E: UserEvent> BuiltIn<R, E> for Group<R, E> {
    const NAME: &str = "array_group";

    fn init<'a, 'b, 'c>(
        ctx: &'a mut ExecCtx<R, E>,
        typ: &'a FnType,
        _resolved_typ: Option<&'a FnType>,
        scope: &'b Scope,
        from: &'c [Node<R, E>],
        top_id: ExprId,
    ) -> Result<Box<dyn Apply<R, E>>> {
        match from {
            [_, _] => {
                let scope =
                    scope.append(&format_compact!("fn{}", LambdaId::new().inner()));
                let n_typ = Type::Primitive(Typ::I64.into());
                let etyp = typ.args[0].typ.clone();
                let mftyp = match &typ.args[1].typ {
                    Type::Fn(ft) => ft.clone(),
                    t => bail!("expected function not {t}"),
                };
                let (nid, n) =
                    genn::bind(ctx, &scope.lexical, "n", n_typ.clone(), top_id);
                let (xid, x) =
                    genn::bind(ctx, &scope.lexical, "x", etyp.clone(), top_id);
                let pid = BindId::new();
                let fnode =
                    genn::reference(ctx, pid, Type::Fn(mftyp.clone()), top_id);
                let pred = genn::apply(fnode, scope, vec![n, x], &mftyp, top_id);
                Ok(Box::new(Self {
                    queue: VecDeque::new(),
                    buf: smallvec![],
                    pred,
                    ready: true,
                    pid,
                    nid,
                    xid,
                }))
            }
            _ => bail!("expected two arguments"),
        }
    }
}

impl<R: Rt, E: UserEvent> Apply<R, E> for Group<R, E> {
    fn update(
        &mut self,
        ctx: &mut ExecCtx<R, E>,
        from: &mut [Node<R, E>],
        event: &mut Event<E>,
    ) -> Option<Value> {
        macro_rules! set {
            ($v:expr) => {{
                self.ready = false;
                self.buf.push($v.clone());
                let len = Value::I64(self.buf.len() as i64);
                ctx.cached.insert(self.nid, len.clone());
                event.variables.insert(self.nid, len);
                ctx.cached.insert(self.xid, $v.clone());
                event.variables.insert(self.xid, $v);
            }};
        }
        if let Some(v) = from[0].update(ctx, event) {
            self.queue.push_back(v);
        }
        if let Some(v) = from[1].update(ctx, event) {
            ctx.cached.insert(self.pid, v.clone());
            event.variables.insert(self.pid, v);
        }
        if self.ready && self.queue.len() > 0 {
            let v = self.queue.pop_front().unwrap();
            set!(v);
        }
        loop {
            match self.pred.update(ctx, event) {
                None => break None,
                Some(v) => {
                    self.ready = true;
                    match v {
                        Value::Bool(true) => {
                            break Some(Value::Array(
                                netidx_value::ValArray::from_iter_exact(
                                    self.buf.drain(..),
                                ),
                            ))
                        }
                        _ => match self.queue.pop_front() {
                            None => break None,
                            Some(v) => set!(v),
                        },
                    }
                }
            }
        }
    }

    fn typecheck(
        &mut self,
        ctx: &mut ExecCtx<R, E>,
        _from: &mut [Node<R, E>],
    ) -> anyhow::Result<()> {
        self.pred.typecheck(ctx)
    }

    fn refs(&self, refs: &mut Refs) {
        self.pred.refs(refs)
    }

    fn delete(&mut self, ctx: &mut ExecCtx<R, E>) {
        ctx.cached.remove(&self.nid);
        ctx.cached.remove(&self.pid);
        ctx.cached.remove(&self.xid);
        self.pred.delete(ctx);
    }

    fn sleep(&mut self, ctx: &mut ExecCtx<R, E>) {
        self.pred.sleep(ctx);
    }
}
}

This implements array::group, which given an argument, stores that argument’s updates internally, and creates an array out of them when the predicate returns true. Its type is

fn(v: 'a, f: fn(len: i64, x: 'a) -> bool) -> Array<'a>

For example,

let n = seq(0, 100);
array::group(n, |_, n| (n == 50) || (n == 99))

seq(0, 100) updates 100 times from 0 to 99. The array::group will create two arrays, one containing [0, .. 50] and the other containing [51, .. 99].

The implementation needs to build a Node representing the predicate. Node is the fundamental type of everything in the graph. Ultimately the entire program compiles to a node. The kind of node we need to create here is a function call site, that will handle all the details of late binding, optional arguments, default args, etc. The genn module is specifically for generating nodes.

Typecheck

Because we generated code, we have to hook into the typecheck compiler phase and make sure the type checker runs on it. This requires that we implement the typecheck method. In our case all we have to do is typecheck our generated call site.

Two-Phase Typecheck for Higher-Order Functions

Higher-order builtins must return TypecheckResult::NeedsCallSite and handle the TypecheckPhase::CallSite(resolved) phase. This is required so that type information from the call site propagates through to the inner predicate function. Without this, if the user passes a builtin like json::read that requires a concrete return type, the type checker won’t be able to verify or initialize it.

The pattern is:

#![allow(unused)]
fn main() {
fn typecheck(
    &mut self,
    ctx: &mut ExecCtx<R, E>,
    _from: &mut [Node<R, E>],
    phase: TypecheckPhase<'_>,
) -> anyhow::Result<TypecheckResult> {
    // During CallSite phase, update stored types from the resolved FnType
    if let TypecheckPhase::CallSite(resolved) = phase {
        self.mftyp = match &resolved.args[PRED_INDEX].typ {
            Type::Fn(ft) => ft.clone(),
            t => bail!("expected a function not {t}"),
        };
        // Update any other stored types (element type, etc.)
    }
    // Create and typecheck the inner CallSite — this is critical
    // because it pushes deferred checks that cascade type information
    // to inner builtins (e.g. json::read getting its cast_typ set)
    let (_, node) = genn::bind(ctx, &self.scope.lexical, "x", self.etyp.clone(), self.top_id);
    let ft = self.mftyp.clone();
    let fnode = genn::reference(ctx, self.predid, Type::Fn(ft.clone()), self.top_id);
    let mut node = genn::apply(fnode, self.scope.clone(), vec![node], &ft, self.top_id);
    node.typecheck(ctx)?;
    node.delete(ctx);
    Ok(TypecheckResult::NeedsCallSite)
}
}

The NeedsCallSite return value tells the compiler to store this builtin for deferred type checking. When the outer call site’s deferred check runs, it calls typecheck(CallSite(resolved)) with the fully resolved function type. The builtin updates its stored predicate type (mftyp) from the resolved type, then creates and typechecks an inner CallSite node. This inner typecheck pushes its own deferred checks, cascading type information to any inner builtins that need it.

For builtins that store a persistent predicate node (like filter or group), the CallSite phase should rebuild the predicate node with the resolved types, since the original was built with unresolved type variables.

BindIds and Refs

BindId is a very fundamental type in compiler guts. The Event struct contains two tables indexed by it. The most important is variables. Every bound variable has a BindId. If a variable has updated this cycle, then its updated value will be in the variables table indexed by its BindId. In order to call this predicate function we actually create three different variables and store their BindIds as xid, nid, and pid. genn::reference returns a reference Node and the BindId of the variable it is referencing. Since those ref nodes become the arguments to the predicate call site we create, xid and nid allow us to control the arguments passed into the function. We just have to set xid and nid in Event::variables before we update the predicate in order to call the function. This may cause it to update immediately, or, it may depend on something else that needs to update before it will update. Either way, once we’ve set xid and nid once and called update on the predicate we’ve done our duty (it may never update, and that’s ok). That just leaves pid, what is it for? Well, earlier it was mentioned that functions are always late bound. This is how that works. The lambda argument we were passed from[1], whatever kind of node it is, will ultimately update and return a Lambda, which a compiled function. So every cycle we need to call update on this node just like any other node, because the Lambda we are calling might change, and if that happens the call site we created with genn::apply needs to know about it. Luckily we don’t have to handle any of the wonderful details of late binding beyond this simple passing through of updates, the call site will take care of that.

ExecCtx::cached, refs, delete

What is all this ctx.cached stuff? Well, when call sites get initialized for the first time, or when a branch of select wakes from sleep, it turns out we need to know what the current value of every variable they depend on is. Which means we need to cache globally the current value of every variable. So if you’re setting variables, 90% chance you need to update cached.

And this also explains another function that you have to implement when you’re generating nodes, which is refs. Turns out we need to know all the variables a node depends on, so we can set them when it’s being woken up from sleep, or stood up at a call site for the first time.

That just leaves delete. The structure of the graph changes at runtime, and we need to keep everything straight. It would be nice if we could do this with Drop, but that would require holding a reference to the ExecCtx at every Node. I’d really rather not pay to wrap every access to the context in a mutex, so we’re doing it the hard way (for now).

Keyboard shortcuts

The Graphix Programming Language