Wednesday, December 5, 2012

Introducing Mojo

Introducing Mojo
During my last couple of months at Rocket Ninja (AKA ThridMotion) I did a lot of work in Node.js. My time with it brought me to one inevitable conclusion: Node.js is a powerful tool, but JavaScript is NOT a good server-side scripting language. The trouble is that JavaScript is single threaded and servers necessarily need to deal with many potentially blocking external resources, the result is a HUGE reliance on callback methods. In many Node.js apps I've seen callbacks nest five deep, callbacks pass callbacks they've been passed to other callbacks, callbacks are wrangled by special libraries just for that purpose, and so on. This makes code that is hard to debug, difficult to maintain, impossible to refactor and challenging to train people on. So JavaScript's single threaded nature forces Node.js code into complex design patterns that turn any medium to large sized project into a tangle of callbacks. It would be nice if Node would implement a language that handles potential blocking better, with multiple threads or processes.

Thus charmed by the idea of Node.js but frustrated by JavaScript I set out to find something better. There isn't really anything. Erlang promises to do much the same thing using an actor style multi-threaded environment but the languages syntax is so far removed from typical C and Pascal based languages that getting good with it would require time that's hard to justify. Other common scripting languages, like Python and PHP, are also single threaded and would suffer the same problem as JavaScript.

So, suddenly having more free time than usual, I dusted off an old scripting language I started working on a decade ago and determined to create my own ideal server side scripting language using a multiple process model of my own design. The end goal is to replace the V8 engine in Node.js, though that is a long, long way away.

The language is called Mojo, and while the prototype nears being functionally complete I've decided to post some information and solicit feedback on the design. This is not nearly all there is to it, this is just a brief overview of the process model and syntax. I'll fill in details as I go.



Mojo is a versatile, objective multi-process scripting language intended to be instantly familiar to anyone who's used JavaScript, Java, C++ or C# before. Its loosely actor based process model was designed to be easy and clean for managing any number of processes without the need for complex locking or synchronizing mechanisms.

All code in Mojo is in classes. When the VM starts an internal process class is instantiated which runs the start up code or any code passed in to be executed, this instance is called the “root” process instance. If your program simply consists of

printf(“Hello World”);

Then that code is executed as part of the constructor of the root instance.

To create a new process in Mojo you create a class to manage it. A process class is defined just as you would define any other class, it's a best practice to name the class “Process<name>”.

class ProcessA{
var count = 0;
method initialize(){
// do something
}

method doAddCount(amount){
count += amount;
}

method getCount(){
return count;
}
}

Constructor methods in Mojo are named “initialize”, you can use them in anonymous classes as well.

You start a new process as you would create a new instance but by using “spawn” instead of “new”

var testProc = spawn ProcessA();

This creates a new class instance and a new process to run it. It's a best practice to name a class instance that is a process with “<name>Proc”, it is referred to as a process instance, or just a process. The initialize method in the instance will execute when the process is created, but the process will not terminate when it runs out of  code to run, it will hang around waiting for more things to do. A class instance that creates a process is considered that processes owner, only an owner or the process itself can kill a process. It's owner can kill testProc with

testProc.exit();

To get testProc to do something you call a method on it

testProc.doAddCount(5);

This will cause the testProc process to execute the code in doAddCount(). The calling process does not execute the code owned by some other process. This brings up the somewhat complex issue of code ownership. As I mentioned before all code is part of a class, when an instance of a class is created that instance (and that code) is owned by the instance of the class that called "new" to create it. That owning class instance is itself owned by the instance of the class that created it, and so on up an ownership chain. Somewhere up that chain will be an instance that is also a process, even if it's all the way up to the root instance, and that process will run all the code in that ownership chain.

In the case where a new class instance is also a new process (via “spawn”), the instance is owned by it's creating codes class instance but it runs its own code and it owns and runs the code of instances it creates.

Using the above syntax of testProc.doAddCount(5) the calling process will block waiting for the  testProc process to finish running doAddCount(). If  testProc is a busy process this could take some time, which is obviously not optimal. Why have multiple processes if they have to wait for one another? There is another way to call a method in Mojo

testProc:doAddCount(5);

This is a non-blocking method call, the calling process continues executing while doAddCount() is pushed onto the testProc process's execution stack. If doAddCount(), were to try and return a value it would be lost.

Lets look at a more involved example:

class RequestHandler(){

var pool = [];

class DoStuffProcess(){
method doIt(stuff){
stuff = stuff + “ done.”;
this.owner:doneStuff(stuff, this);
}

} // class DoStuffProcess

method handleRequest(request){
var handlerProc = null;
if (pool.length > 0){
handlerProc = pool.pop();
} else {
handlerProc = spawn DoStuffProcess();
}

handlerProc:doIt(request);
}

method doneStuff(stuff, workerProc){
pool.push(workerProc);
printf(stuff);
}

} // class RequestHandler

var handler = new RequestHandler();
handler.handleRequest(“Test”);
handler.handleRequest(“Test1”);
handler.handleRequest(“Test2”);
handler.handleRequest(“Test3”);

In this example a RequestHandler instance will maintain a pool of  DoStuffProcess processes creating new ones as needed. Notice that we do not need any special locking or synchronizing around the pop() and push() calls, this is because the process that owns the RequestHandler instance (in this case the root process) is the only process that will ever execute handleRequest() and  doneStuff() so we need not worry about things like two process's trying to pop() the same single worker from the pool at the exact same time or two processes trying to push() new entries into the pool at the exact same time. Also notice how the RequestHandler does not need to keep track of the new instances of  DoStuffProcess it creates because the process's themselves will call doneStuff() to get added to the pool when they are ready for more work. On top of that, if something goes wrong and the worker process dies handler doesn't need to do anything about it.

We can limit pool growth by modifying doneStuff() like this

method doneStuff(stuff, workerProc){
if (pool.length < 101){
pool.add(workerProc);
} else {
workerProc.exit();
}
printf(stuff);
}

We must explicitly exit the processes or they will hang around forever waiting for something to call one of their methods. (or maybe the garbage collector will terminate processes that are no longer referenced anywhere, I'm not sure yet).

So that's a brief overview of the process model of Mojo, please ask any questions you have or point out any absurdities I'm instigating.

Thanks!