Caffeinated Simpleton

Posts Tagged ‘JavaScript’

JavaScript is Not Perfect

After posting cobra, one of the things I heard most was “don’t try to make JavaScript be something it’s not”. This is good advice, but I feel that in this case it was given too hastily. Cobra did not come out of a desire to make JavaScript more like Python, even though that was the result. Cobra came out of careful consideration of how to make JavaScript better.

Problems with JavaScript

I think most people who happen upon this lonely corner of the internet are already pretty familiar with the flaws of JavaScript, but I’ll mention a couple of the most glaring faults briefly.

Scope

JavaScript’s scoping rules are dumb. Variables default to the global scope unless told otherwise and the this keyword, which is supposed to point to the current instance, actually points to the global object unless told otherwise. These scoping rules cause huge problems for beginning JavaScript programmers and trip up the most experienced programmers if they aren’t paying attention. Since these odd scoping rules default to the global object, and not to some error state, errors can go by unnoticed for long periods of time.

Object Syntax

JavaScript claims to be a prototypal language. It lies. It’s a language that also has prototypes. In a true prototypal language, there are objects. Objects can be derived from other objects, either by copying the parent object or by linking to it. JavaScript does something similar with its objects’ built in prototype attribute. Its prototype points to an object that it replicates the behavior of. However, it doesn’t replicate this behavior until after you create a new object. Let me demonstrate with some unsupported features of JavaScript:

//This is my base object. It's a pretty simple 4 step dance.
Dance = {
    danceAround: function() {},
    steps: 4
}
 
// Let it be known that I wouldn't know what a tango was if I saw it.
// (Unless there was a rose in somebody's mouth, of course)
Tango = {
     steps: 34,
     dip: function(){}
}
// Now we pull some true prototype magic.
Tango.__proto__ = MyDance;
 
// This works then:
Tango.danceAround();

That’s true prototypal behavior. There’s no “new” keyword, just behaviors that can be stolen by other objects. Unfortunately, JavaScript was thrown together in (I believe) about 15 minutes, minus a 5 minute coffee break. The writers realized that people were going to flip if it didn’t resemble the popular languages of the time, so they implemented a prototype-based language, pulled the prototypes into a separate property, and added a “new” keyword. This led to syntax like the following:

//This is my base object. It's a pretty simple 4 step dance.
Dance = {
    danceAround: function() {},
    steps: 4
}
 
Tango = function() {
    this.dip = function();
    this.steps = 37;
}
Tango.prototype = Dance;

That’s not quite as pretty. Not only is it not as pretty, but there are some fundamental flaws. The dip function gets recreated every time a new Tango object is instantiated. This isn’t a big deal most of the time, but once a year the king has a ball and all of a sudden you have 1000 partners tangoing about, and with them, 1000 identical copies of the dip function.

Another flaw is that the prototype is not in the lookup path of the object until a new instance is instantiated. In this example, Tango.danceAround is not defined. This is because prototypes are not applied to objects until a new instance of the object is instantiated. A “class” in most languages is a definition of object behavior. Instances of a class are objects that behave as defined by the class. This is very close to how prototypes behave in JavaScript.

To summarize, JavaScript isn’t quite a prototype based language, it isn’t quite a class based language, and the syntax for doing either is ugly (for more on the ugliness, check out one of my previous posts).

Fixing the Problems

Fortunately, JavaScript is awesome in most ways. It’s so flexible that fixing the issues I’ve spelled out above is no problem.

Fixing Scope

You can’t entirely fix JavaScript scoping. Local variables which aren’t declared with var become global, and there’s nothing you can do about that.1 You can, however, fix the this object. You can wrap any given object method in another method which asserts that its this will be set to a specific object.

MyDance = {
     danceAround: function() { console.log(this.cheer) },
     cheer: "WHOOO!"
}
 
MyDance.danceAround = function() {
    MyDance.danceAround.apply(MyDance, arguments);
}

This is called binding, and it makes this be what you would want it to be in most instances. My initial thought was to just bind all object methods to their instances at the time of instantiation. However, I don’t really like this idea. For one thing, it changes the language. For another, some libraries (I’m looking at you jQuery) like to mess around with this. If this is expected to be something, I do not want to change that. What I needed was a shadow this, a variable that was always present and always pointed to the instance of the object. Luckily, there was an easy solution. Python always passes its instance object as the first parameter of any method of a class. I could replicate this behavior easily using essentially the same binding code and have the code look familiar to Python programmers everywhere. So I did. Every instance method in Cobra has “self” passed to it as the first parameter, which is automatically guaranteed to be the instance, no matter what. No binding required.

Fixing Object Syntax

It is fairly easy to get JavaScript to behave like a true prototypal language. However, I don’t much care for true prototypal behavior since it still leaves an ugly syntax. My solution was to create a “Class” object that will implicitly apply prototypes.

Instead of:

MyNewThingy.prototype.doSomeStuff = function () {};
MyNewThingy.prototype.doMoreStuff = function() {};

We can do:

MyNewThingy = new Class ({
    doSomeStuff: function() {},
    doMoreStuff: function() {}
});

These end up being exactly the same, except the latter is clearer and cleaner in my opinion.

Inheritance is a bit tricky in the first case. To achieve prototypal inheritance, I have to do some magic.

Base = {
   basicStuff: function() {}
}
ThingyPrototype = new function() {
      this.doSomeStuff = function () {};
      this.doMoreStuff = function() {};
}
ThingyPrototype.prototype = Base;
 
ChildThingy.prototype = ThingyPrototype;

Now ChildThingy inherits from Base and has some of its own functions in its prototype. Cobra takes care of all of this for you:

Base = new Class({
    basicStuff: function() {}
});
ChildThingy = new Class({
    __extends__: Base,
    childsOwnStuff: function() {}
});

Again, I think this is a lot clearer and cleaner.

If you put both these fixes together, you end up with Cobra, which you can read all about here.

To wrap things up, augmenting JavaScript to fix its flaws is not a bad thing. The question is what to add. I haven’t used Cobra for anything yet, but it’s my current pet project. We’ll see if it really makes JavaScript that much more pleasant.

  1. If you don’t care about standards or cross-platform compatibility, check out FireFox’s built in __parent__ attribute, which lets you mess around with enclosing scopes.

Cobra: A Little JavaScript Class Library

In my last post, I talked a bit about the problems I saw with trying to express new types and objects in JavaScript. It’s not so much that anything is difficult to do, it’s that doing different things requires very different (and sometimes very verbose) syntax. I tried a few different things to trick JavaScript into behaving differently, but in the end, I realized that perhaps just keeping things simple was the best thing to do. I wrote down a few small features for my little library.

  • Classes. Not really the full blown classes of Java lore, but more a shortcut way of doing ClassName.prototype.functionName = function(args).
  • Inheritance. I wanted my classes to support the prototypal inheritance built into JavaScript.
  • Singletons. I wanted to be able to create single instances of classes without leaving a class around that might confuse people.
  • Namespaces. I eventually realized that Object was good enough for this.

I also wanted some convention for how to deal with private methods and properties. In JavaScript, it’s pretty easy to have private members, and it’s pretty easy to do prototypal inheritance, but it’s a bit messier to do both.

My final bit of inspiration was Python. Python doesn’t care about privacy, it just trusts the developer not to mess things up. I like this philosophy. I also like the explicit self in python methods.  A lot of people don’t like self, but I think it makes it very obvious what instance your code is acting on. I took these things to heart and wrote Cobra, a class system for JavaScript that’s really simple and looks a whole lot like Python. Without further ado, let me show you some examples of Cobra.

Classes

/* This is our base class. In its initialization function,
 * all it does is set things that are true for
 * all living animals.
 */
var Animal = new Class({
    __init__: function(self) {
        self.breathes = true;
    }
});
 
/* Feline extends animal and overrides it's initialization function.
 * Notice that it calls it's parents __init__ function, just to be safe.
 */
var Feline = new Class({
    __extends__: Animal,
    __init__: function(self) {
        Class.super(Feline, '__init__', self);
        self.claws = true;
        self.furry = true;
    },
    says: function(self) {
        console.log ('GRRRRR');
    }
});
 
/* This is a cat. It inherits from Feline, and therefore also inherits from
 * Animal. It says something a bit different from most felines.
 */
var Cat = new Class({
    __extends__: Feline,
    __init__: function(self) {
        Class.super(self, '__init__', self);
        self.weight = 'very little';
    },
    says: function(self) {
        console.log('MEOW');
    }
});
 
/* Tigers are like most Felines except that they weigh a lot.
 */
var Tiger = new Class({
    __extends__: Feline,
    __init__: function(self) {
        Class.super(Tiger, '__init__', self);
        self.weight = 'quite a bit';
    }
});

If we try using these classes, you’ll see that they work as they should:

>>> sneakers = new Cat();
Object breathes=true claws=true furry=true
>>> sneakers.breathes
true
>>> sneakers.claws
true
>>> sneakers.furry
true
>>> sneakers.weight
"very little"
>>> sneakers.says();
MEOW
>>> tigger = new Tiger()
Object breathes=true claws=true furry=true
>>> tigger.says()
GRRRRR

You’ve probably noticed something very strange at this point (at least for the JavaScript world). Every instance method takes “self” as its first parameter. This parameter is the instance. Whether “this” is currently bound to the window or any other random object, “self” will always be the instance of the class. It’s a nice property, and you can still use “this” however you may wish. Just remember that your methods have to take “self” as their first arguments or weird things will happen.

Singletons

As I mentioned in my last post, I’ve had some issues creating singletons in JavaScript*.There is more than one way to do it, and you have to change a whole lot to go from one form to the other. If you no longer want your object to be a singleton, that might be a whole new refactoring pain. So I created a simple Singleton class that uses the same syntax as the Class type above, but it immediately discards its class and returns a single instance. This is nice because creating it is exactly the same as creating a class. You don’t have to think about it at all, and if you eventually realize a singleton was a bad idea, it’s trivial to convert it to a class.

var sanFranciscoZoo = new Singleton({
    __init__: function(self) {
        self.cats = [new Tiger(), new Tiger(), new Cat()];
    }
});

That’s pretty much it.

Namespaces, statics, and privacy

Everything else needs to be solved by convention. By looking at a method, it’s easy to tell whether it is static or not. If “self” is the first parameter, then it’s not static. So how do you create static methods? Well, for now, you don’t. If you have a group of static methods, stick them in an Object, which can be used as a namespace. You can even do the equivalent of C++’s “using”:

with (MyApp.Utils) {
    //this would normally be referenced as MyApp.Utils.utilityFunction();
    utilityFunction();
}

How about privacy? It’s handled the same as Python. Everything stuck into “self” is public, so use a leading underscore to indicate that others shouldn’t touch that data or method.

var Secretive = new Class({
    __init__: function(self) {
         self._doNotTouch = 5;
    }
    _doNotCall: function(self, add) {
        self._doNotTouch += add;
    }
});

I know that a lot of people don’t like the underscore approach (including JavaScript hero Doug Crockford), but it’s simple, consistent, and clear. I’m stickin’ with it.

Try It

I would love it if people would try it out on their own and tell me what they think. The source is available on bitbucket. Any problems can be reported in the comments or be filed here. There certainly might be some, I just whipped this up a few hours ago.

*When I say singleton, I mean a single instance of a class. In the design pattern world, this means that doing “new MySingleton()” would always return the same instance. I find that deceptive, so I just want it to throw an error when you try to make a “new” singleton.

A better object oriented JavaScript

There are some problems with objects in JavaScript. The system is beautiful in its simplicity, but at times the JavaScript approach to objects is limiting and confusing. I have found this to be especially true when trying to develop large pieces of software with many components written by different people that need to interact.

To deal with these situations, I have set out to provide an object system that has a consistent syntax across a variety of situations, can be introspected with the console as much as possible, and exploits the strengths of the JavaScript object system. This isn’t easy, but by starting with a solid foundation and adding a few key features, I feel that I’m well on my way.

First, let me illustrate some specific issues that I come across in day to day development.

The first is static classes. These are typically manager type classes that regulate access to resources. Connection managers are a good example. An important attribute for these classes is that they cannot be re-instantiated. They are singletons. One approach to these is the namespace:

ConnectionManager = {
     connect: function(data) {
         do_stuff_with_data(data);
     },
     disconnect: function() {
        close_my_connection();
     }
}

This does not leave behind a constructor, so it is not possible to accidentally create another ConnectionManager object. However, there is no constructor at all. If I need to do some initialization of this object, I can do the following:

ConnectionManager = new (function() {
     var my = this;
     var my_private_data = initialize_something();
     my.connect = function(data) {
          do_something_with_data(data, my_private_data);
     }
 
     my.disconnect = function(data) {
         close_my_connection();
     }
})();

This form has a couple of advantages. First, you can initialize things. This often has to happen, so it already has an advantage over the first form. Secondly, you can control your scope. In the preceding example, I assign this to my. From any of the member functions, I am then free to use my in place of this with the knowledge that it will be the correct object. I never have to worry about binding. Finally, you can hide private data. By keeping private data in an enclosing scope, you keep it from being used improperly. This isn’t to say that I think private variables are a necessary part of an OO system, but when I’m designing an interface, it’s a bit more pleasant when the person I’m designing the interface for can inspect my object without wading through a bunch of implementation details.

Beyond the Singleton case, there are more traditional object oriented issues. Without any help from a library, a typical class definition would looks something like this:

Connection = function(){
     initialize_some_stuff();
};
 
Connection.prototype.permissionMap = {
     1: 'read',
     2: 'read-write',
     3: 'none'
};
 
Connection.prototype.open = function() {
     open_this_connection();
};
 
Connection.prototype.close = function() {
     close_this_connection();
};

This is a typical JavaScript class that uses prototypes. Prototypes are a necessary part of defining classes in JavaScript since a prototype defines all the properties that will be defined across all instances of a class. This is essentially how most people expect classes to work (coming from a more traditional OO point of view).

Also included in this example is a static member. The permissionMap above is an example of providing static data across all instances of a class. This is a good thing to do for constants, jump tables, string tables, or anything else that doesn’t change but might take up a lot of memory. Data that does change on an instance by instance basis cannot be defined as a member of the prototype or when it is changed by one instance, that change will be reflected across all instances. That data should be established and initialized in the constructor (The Connection function above).

There are some problems with this syntax. The first is that it’s not obvious what’s going on. People who use prototypes all the time know what’s going on, but it is never explicitly stated. For instance, permissionMap is a static variable. I’ve explained why, but nowhere is that clear in the code. This form is also quite verbose. I get so tired of typing Connection.prototype that I usually just making a shortcut variable and deleting it after the method definitions are all over with. Finally, this syntax doesn’t really look like any of the other objects we defined. It looks like a bunch of functions with the same prefix. Nowhere is it implied that this is a class.

Out of those basic problems, I have started trying to define a common syntax that can fit any of the situations above, plus basic inheritance (which is basically just copying and extending a prototype). In addition, I’m building in as many introspection features as possible to make interfaces defined in this way a snap to sit down and figure out how to use.

I haven’t made that much progress yet, but I was given a big head start by basing all of this off of the outstanding mootools Class system. I really enjoy their code, their approach to modifying JavaScript, and their minimalism. Mootools’ system, in turn, is based on Dean Edwards’ Base.js. As you can see, I have a nice body of work backing me up here. It should make my job much easier.

I’ll go into more detail about how I want to solve these problems in a later post. For now, you can follow development on my fork of mootools-core at github.

A Threading Model Overview

I noticed in a story on Hacker News that many people do not understand that differences in threading implementations between different programming languages. In the single processor days, understanding the threading model that you were working with was not that important. With more than one core, it is a good thing to know. This is an overview.

The Beginning (C and Native Threads)

The first threading model we will look at is the standard OS level thread. Every modern OS has support for this, though the APIs change from OS to OS. Basically, a thread is a process that can run on its own processor, is scheduled by the OS scheduler, and can block. It acts just like its own process except that it shares resources with every other thread in the process. This mainly means that memory and file descriptors are shared between all threads in a process. This is what people mean by “native threading”. From C on linux, you can use these threads by linking with the pthread library. BSDs generally support pthreads as well, and Windows does its own thing that is very similar.

Java and Green Threads

When Java came out, it introduced a different type of threading model to the world called green threads. Green threads are essentially simulated threads. The Java virtual machine would take care of switching between different green threads, but the virtual machine itself would only run in one OS thread. This generally has some advantages. OS threads have almost as much overhead as a process on most POSIX systems. It is also usually slower to switch between native threads than it is between green threads.

This can mean that in some situations, green threads are much preferable to native threads. A system can usually support a much higher number of green threads than OS threads. For instance, it would be practical to spawn a new green thread for every new connection on a web server, but it is not generally practical to spawn a new native thread for every incoming HTTP connection.

There are disadvantages, however. The biggest is that you cannot have two threads running at the same time. There is only one native thread, so it is the only thread that gets scheduled. Even if there are multiple CPUs and multiple green threads, only one CPU will be running any given green thread at any given time. This is because it all looks like one thread to the OS scheduler.

Java has supported native threading since version 1.2, and it has been the default for some time now.

Python

Python is one of my favorite scripting languages, and was one of the first scripting languages to offer threading. Python exposes a threading module that manipulates native threads. This means that Python can benefit from all the advantages of true native threading, except for one catch.

Python has a global interpreter lock (GIL). This lock is necessary to keep Python threads from corrupting the global state of the interpreter. This means that no two Python instructions can be running simultaneously. The GIL gets released every 100 Python instructions or so and another Python thread is free to acquire the lock and begin executing.

On the face of it, this seems like a major flaw. However, in practice it is not that big of a deal. Any thread that blocks will generally release the GIL. C extensions can also release the GIL whenever they are not interacting with the Python/C api, so CPU intensive operations can be carried out in C without blocking the executing Python threads. The only situation in which the GIL proves problematic is when you have more than one CPU bound thread written in Python on a multi-core machine.

Stackless python is an implementation of Python that brings “tasklets” (essentially green threads) to Python. The greenlet module is derived from their work and is compatible with the standard cPython implementation.

Ruby

Ruby’s threading model is and always has been in a state of flux. Ruby’s original implementation only supported cooperative green threads. These work fine in many situations, but they do not take advantage of multiple processors.

JRuby mapped Ruby’s threads straight to Java’s threads, which are generally OS native threads. This doesn’t work. Since Ruby’s threads are cooperative, there is no need to synchronize between the threads. Every thread can be assured that no other thread is accessing a resource while it is accessing it. This breaks down in JRuby, since native threads are generally preemptive, meaning any thread could be accessing any shared data at any time.

Because of the mismatches and the desire for native threading from the C Ruby folks, it was decided that Ruby would move to a native threading in Ruby 2.0. In Ruby 1.9, a different interpreter was swapped into the standard Ruby distribution. 1.9 adds a threading model it calls Fibers, which as far as I know are a more efficient implementation of green threads.

In short, Ruby’s threading model is a poorly documented mess.

Perl

Perl has an interesting threading model, one which Mozilla borrowed for SpiderMonkey if I’m not mistaken. Instead of having a global interpreter lock like Python, Perl makes all global state thread local and spawns off a new interpreter with each new thread. This allows for true native threading. There are two catches though.

First, you must explicitly make variables available to threads outside your own. This is the nature of everything being thread local. The values must then be kept up to date across threads.

The second catch is that every new thread is very expensive to create. The interpreter is not small, and duplicating it with every thread makes for a lot of overhead.

Erlang, JavaScript, C# and so on

There are a lot of other models out there that people play with from time to time. Erlang, for instance, has a shared nothing architecture that forces you to use lightweight, user-land processes over threading. This is actually an outstanding architecture for parallel programming since it takes out all of the headaches involved with synchronizing memory, and the processes are so lightweight you can generally just spawn as many of them as you want.

JavaScript is usually not thought of as a language that supports threading, but it needs to support it for a browser implemented largely in JavaScript like Mozilla. Its threading model is very similar to that of Perl’s.

C# uses native threads.

Well, I hope that makes the whole threading picture a little bit clearer. Please let me know if anything is confusing or if I messed anything up. I don’t know everything, after all.