Archive for January, 2009
The Internet is Fluff
I’ve written three blog posts recently, all at various technical levels. The first was a post about a little JavaScript class library called Cobra that I had written. It detailed what problems the library solved and how to use it. The library was nothing special, just a few lines, but it was unique and got a thousand views or so. The next post was a very technical post on how and why Cobra was written the way it was. It assumed the reader had a fairly good knowledge of JavaScript and went into some deep issues on how the internals of inheritance and scoping work in JavaScript. It got very little attention. My last post was mostly fluff. It just talked about how I’ve hopped on the git bandwagon. Nothing controversial, nothing enlightening, nothing really worth reading. It has been, by far, the most popular of the three.
This bothers me. It does not bother me that my blog is not that popular; I keep it mostly for me. It does not bother me that posts that are deeply technical are not the most popular articles. I might not be a very good writer, and I might not be saying anything that interesting. What does bother me is that the most helpful posts I have written are probably going to be the least read. The only true value on this blog is posts that go into the internals of how something works. These are the posts that, hopefully, people will read and get something out of. I do a lot of work gathering the knowledge that I do have, and if I can pass it on efficiently, then I am accomplishing my goal. However, Google is the reality of how the internet works. If Google doesn’t see a lot of links coming into your page, then you won’t get any visitors. That’s fine, except that you also won’t get that one guy who really wants to understand the internals of JavaScript inheritance; that one guy who really just wants a library to allow some consistency in JavaScript class declarations without getting in the way.
Now I’m not saying I’m the one guy who can answer these questions. JavaScript inheritance is fairly well covered by the likes of JavaScript gurus Douglas Crockford and John Resig. However, the concept extends well beyond this specific topic. When a person develops a deeply technical knowledge on a new topic, his knowledge will be lost to the internet until enough people have developed the same knowledge to recognize the first guy’s contributions. Once there is a big enough base, all these people can cross reference each other and gain critical mass on google. Then, finally, the topic can hit the mainstream. The problem is, the necessary knowledge for a less technical person to get into a topic was there months, or even years ago. It was just impossible to find.
Solving this problem is really, really hard. Ideally, a search engine could capture the meaning of an article and register that it is something original, something that explains something generally not known. It would then have to infer that a person searching wants something more technical than what he has found so far. I would go as far as to say that today, that’s impossible. Instead, the “semantic web” is probably the evolution that has to happen to make this work. With quality tagging, I should be able to indicate that an article is about JavaScript, inheritance, and that it is not for beginners. I should be able to indicate that links to other articles about JavaScript inheritance are not just random links to things I’m discussing, but are references to articles that deal with the exact same issue. I should be able to indicate which references explain things at a higher level and which give a more general overview. I should be able to qualify how the rest of the body of knowledge relates to my small contribution.
This, in my mind, is the promise of the semantic web. The problem is, of course, that it requires a lot more work. As an author, I have to describe how my contribution fits into the rest of the web. As a search engine, results are less a linear ranking, and more a web of how topics interrelate. Designing an interface that pulls in all the relevant information and is still easy to use will be difficult.
Someday somebody will figure out how to achieve similar results without the up front effort. Otherwise, my dreams will probably never be achieved. Until then, however, I’ll continue to watch as baseless opinions win the day and potentially new knowledge is lost in the sludge somewhere near the end of Google’s long tail.
Fine. Git is Awesome.
I’ve been a big fan of Mercurial for a while now. Harmonize is all kept in Mercurial, and most of my side projects are as well. Mercurial is great for a number of reasons. It’s easy to use, not only for former subversion users, but also at a conceptual level. Understanding how DVCSs work is easy when you’re using Mercurial. The vocabulary is sparse and makes a lot of sense. As added bonuses, it runs on Windows just as well as it runs on every other OS, and it’s written in Python (my favorite language).
However, I’ve decided to allow git into the very exclusive club of version control systems that don’t suck. Git makes two, and it might become my favorite as I get more comfortable with it.
To pause for a moment, one of the things git does not do better is github. Github has, perhaps, a slight advantage over bitbucket because of the size of the user base, but that doesn’t make it inherently better. It’s just a better social experience for now. From what I can tell, bitbucket matches github feature for feature and outdoes it in some areas. I’m a huge fan.
That being said, there are a few things git does better than mercurial. The first is editing commit histories. Mercurial supports a similar feature through extensions, but git has a leg up by coming with the ability to alter any part of your commit history built in. This allows things like cherry picking, merging batches of commits into one, and cleaning up commit logs. Mercurial can kind of do these things, but it’s much better supported in git. Plus, as a relative newcomer to git, I’m sure I haven’t even scratched the surface of what’s possible.
The second advantage that goes to git is local branching. Now let’s be clear here. Mercurial does support branching in much the same way git does. However, if you ever push to a remote repository, your branches must exist there or you must create them there. Local branches are nice because they don’t need to clutter up whatever master repository you have. You can have your own branches for whatever you’re up to and don’t need to mess with what the rest of the world is doing. I use this all the time and love it.
In the end though, there is one feature that has pulled me toward git and kept me there, and that’s git-svn. My current employer uses Subversion, and since that is not one of the version control systems that don’t suck, I quickly became agitated. I desperately tried to use Mercurial with SVN, to no avail. There is pretty good support for mirroring a SVN repository in Mercurial, but pushing back to svn quickly became a nightmare. I even found a script that did a pretty good job, but in the end, the support wasn’t there.
Git-svn, however, is a different story. It utilizes git’s amazing ability to manipulate commit histories to allow you to pull down changes from an SVN server, do your own thing, commit as much as you want in yout git repository, and then when you’re ready, push it all back to the SVN server. It’s incredible. I can do all the local branching I want, play with things, finalize a feature, whatever, and in the master branch keep up with the day to day bug fixes. When I’m ready, I just merge and push. Now I don’t even think about dealing with straight SVN. The first thing I do if I’m going to be doing anything beyond reading code is pull it into a fresh git repository. From there, I’m free to use version control as it should be used without bothering anybody else. At the end of the day, I can either push all my commits out, or modify them into a few larger, more well defined commits. I pretty much never fight with subversion anymore, and that makes me a happier person.
Despite what I’ve said here, I still love Mercurial. It’s such a simple and well designed piece of software that it makes me smile a bit. It’s like the Mac where git is like Linux. I work on Linux because I need the raw power, but at the end of the day, I really want a mac to be sitting there beside me. So make your own decisions, but at this point I’m willing to admit it. Git is awesome.
JavaScript is Not Perfect
After posting cobra, one of the things I heard most was “don’t try to make JavaScript be something it’s not”. This is good advice, but I feel that in this case it was given too hastily. Cobra did not come out of a desire to make JavaScript more like Python, even though that was the result. Cobra came out of careful consideration of how to make JavaScript better.
Problems with JavaScript
I think most people who happen upon this lonely corner of the internet are already pretty familiar with the flaws of JavaScript, but I’ll mention a couple of the most glaring faults briefly.
Scope
JavaScript’s scoping rules are dumb. Variables default to the global scope unless told otherwise and the this keyword, which is supposed to point to the current instance, actually points to the global object unless told otherwise. These scoping rules cause huge problems for beginning JavaScript programmers and trip up the most experienced programmers if they aren’t paying attention. Since these odd scoping rules default to the global object, and not to some error state, errors can go by unnoticed for long periods of time.
Object Syntax
JavaScript claims to be a prototypal language. It lies. It’s a language that also has prototypes. In a true prototypal language, there are objects. Objects can be derived from other objects, either by copying the parent object or by linking to it. JavaScript does something similar with its objects’ built in prototype attribute. Its prototype points to an object that it replicates the behavior of. However, it doesn’t replicate this behavior until after you create a new object. Let me demonstrate with some unsupported features of JavaScript:
//This is my base object. It's a pretty simple 4 step dance. Dance = { danceAround: function() {}, steps: 4 } // Let it be known that I wouldn't know what a tango was if I saw it. // (Unless there was a rose in somebody's mouth, of course) Tango = { steps: 34, dip: function(){} } // Now we pull some true prototype magic. Tango.__proto__ = MyDance; // This works then: Tango.danceAround();
That’s true prototypal behavior. There’s no “new” keyword, just behaviors that can be stolen by other objects. Unfortunately, JavaScript was thrown together in (I believe) about 15 minutes, minus a 5 minute coffee break. The writers realized that people were going to flip if it didn’t resemble the popular languages of the time, so they implemented a prototype-based language, pulled the prototypes into a separate property, and added a “new” keyword. This led to syntax like the following:
//This is my base object. It's a pretty simple 4 step dance. Dance = { danceAround: function() {}, steps: 4 } Tango = function() { this.dip = function(); this.steps = 37; } Tango.prototype = Dance;
That’s not quite as pretty. Not only is it not as pretty, but there are some fundamental flaws. The dip function gets recreated every time a new Tango object is instantiated. This isn’t a big deal most of the time, but once a year the king has a ball and all of a sudden you have 1000 partners tangoing about, and with them, 1000 identical copies of the dip function.
Another flaw is that the prototype is not in the lookup path of the object until a new instance is instantiated. In this example, Tango.danceAround is not defined. This is because prototypes are not applied to objects until a new instance of the object is instantiated. A “class” in most languages is a definition of object behavior. Instances of a class are objects that behave as defined by the class. This is very close to how prototypes behave in JavaScript.
To summarize, JavaScript isn’t quite a prototype based language, it isn’t quite a class based language, and the syntax for doing either is ugly (for more on the ugliness, check out one of my previous posts).
Fixing the Problems
Fortunately, JavaScript is awesome in most ways. It’s so flexible that fixing the issues I’ve spelled out above is no problem.
Fixing Scope
You can’t entirely fix JavaScript scoping. Local variables which aren’t declared with var become global, and there’s nothing you can do about that.1 You can, however, fix the this object. You can wrap any given object method in another method which asserts that its this will be set to a specific object.
MyDance = { danceAround: function() { console.log(this.cheer) }, cheer: "WHOOO!" } MyDance.danceAround = function() { MyDance.danceAround.apply(MyDance, arguments); }
This is called binding, and it makes this be what you would want it to be in most instances. My initial thought was to just bind all object methods to their instances at the time of instantiation. However, I don’t really like this idea. For one thing, it changes the language. For another, some libraries (I’m looking at you jQuery) like to mess around with this. If this is expected to be something, I do not want to change that. What I needed was a shadow this, a variable that was always present and always pointed to the instance of the object. Luckily, there was an easy solution. Python always passes its instance object as the first parameter of any method of a class. I could replicate this behavior easily using essentially the same binding code and have the code look familiar to Python programmers everywhere. So I did. Every instance method in Cobra has “self” passed to it as the first parameter, which is automatically guaranteed to be the instance, no matter what. No binding required.
Fixing Object Syntax
It is fairly easy to get JavaScript to behave like a true prototypal language. However, I don’t much care for true prototypal behavior since it still leaves an ugly syntax. My solution was to create a “Class” object that will implicitly apply prototypes.
Instead of:
MyNewThingy.prototype.doSomeStuff = function () {}; MyNewThingy.prototype.doMoreStuff = function() {};
We can do:
MyNewThingy = new Class ({ doSomeStuff: function() {}, doMoreStuff: function() {} });
These end up being exactly the same, except the latter is clearer and cleaner in my opinion.
Inheritance is a bit tricky in the first case. To achieve prototypal inheritance, I have to do some magic.
Base = { basicStuff: function() {} } ThingyPrototype = new function() { this.doSomeStuff = function () {}; this.doMoreStuff = function() {}; } ThingyPrototype.prototype = Base; ChildThingy.prototype = ThingyPrototype;
Now ChildThingy inherits from Base and has some of its own functions in its prototype. Cobra takes care of all of this for you:
Base = new Class({ basicStuff: function() {} }); ChildThingy = new Class({ __extends__: Base, childsOwnStuff: function() {} });
Again, I think this is a lot clearer and cleaner.
If you put both these fixes together, you end up with Cobra, which you can read all about here.
To wrap things up, augmenting JavaScript to fix its flaws is not a bad thing. The question is what to add. I haven’t used Cobra for anything yet, but it’s my current pet project. We’ll see if it really makes JavaScript that much more pleasant.
- If you don’t care about standards or cross-platform compatibility, check out FireFox’s built in
__parent__attribute, which lets you mess around with enclosing scopes. ↩
Cobra: A Little JavaScript Class Library
In my last post, I talked a bit about the problems I saw with trying to express new types and objects in JavaScript. It’s not so much that anything is difficult to do, it’s that doing different things requires very different (and sometimes very verbose) syntax. I tried a few different things to trick JavaScript into behaving differently, but in the end, I realized that perhaps just keeping things simple was the best thing to do. I wrote down a few small features for my little library.
- Classes. Not really the full blown classes of Java lore, but more a shortcut way of doing ClassName.prototype.functionName = function(args).
- Inheritance. I wanted my classes to support the prototypal inheritance built into JavaScript.
- Singletons. I wanted to be able to create single instances of classes without leaving a class around that might confuse people.
- Namespaces. I eventually realized that Object was good enough for this.
I also wanted some convention for how to deal with private methods and properties. In JavaScript, it’s pretty easy to have private members, and it’s pretty easy to do prototypal inheritance, but it’s a bit messier to do both.
My final bit of inspiration was Python. Python doesn’t care about privacy, it just trusts the developer not to mess things up. I like this philosophy. I also like the explicit self in python methods. A lot of people don’t like self, but I think it makes it very obvious what instance your code is acting on. I took these things to heart and wrote Cobra, a class system for JavaScript that’s really simple and looks a whole lot like Python. Without further ado, let me show you some examples of Cobra.
Classes
/* This is our base class. In its initialization function, * all it does is set things that are true for * all living animals. */ var Animal = new Class({ __init__: function(self) { self.breathes = true; } }); /* Feline extends animal and overrides it's initialization function. * Notice that it calls it's parents __init__ function, just to be safe. */ var Feline = new Class({ __extends__: Animal, __init__: function(self) { Class.super(Feline, '__init__', self); self.claws = true; self.furry = true; }, says: function(self) { console.log ('GRRRRR'); } }); /* This is a cat. It inherits from Feline, and therefore also inherits from * Animal. It says something a bit different from most felines. */ var Cat = new Class({ __extends__: Feline, __init__: function(self) { Class.super(self, '__init__', self); self.weight = 'very little'; }, says: function(self) { console.log('MEOW'); } }); /* Tigers are like most Felines except that they weigh a lot. */ var Tiger = new Class({ __extends__: Feline, __init__: function(self) { Class.super(Tiger, '__init__', self); self.weight = 'quite a bit'; } });
If we try using these classes, you’ll see that they work as they should:
>>> sneakers = new Cat();
Object breathes=true claws=true furry=true
>>> sneakers.breathes
true
>>> sneakers.claws
true
>>> sneakers.furry
true
>>> sneakers.weight
"very little"
>>> sneakers.says();
MEOW
>>> tigger = new Tiger()
Object breathes=true claws=true furry=true
>>> tigger.says()
GRRRRR
You’ve probably noticed something very strange at this point (at least for the JavaScript world). Every instance method takes “self” as its first parameter. This parameter is the instance. Whether “this” is currently bound to the window or any other random object, “self” will always be the instance of the class. It’s a nice property, and you can still use “this” however you may wish. Just remember that your methods have to take “self” as their first arguments or weird things will happen.
Singletons
As I mentioned in my last post, I’ve had some issues creating singletons in JavaScript*.There is more than one way to do it, and you have to change a whole lot to go from one form to the other. If you no longer want your object to be a singleton, that might be a whole new refactoring pain. So I created a simple Singleton class that uses the same syntax as the Class type above, but it immediately discards its class and returns a single instance. This is nice because creating it is exactly the same as creating a class. You don’t have to think about it at all, and if you eventually realize a singleton was a bad idea, it’s trivial to convert it to a class.
var sanFranciscoZoo = new Singleton({ __init__: function(self) { self.cats = [new Tiger(), new Tiger(), new Cat()]; } });
That’s pretty much it.
Namespaces, statics, and privacy
Everything else needs to be solved by convention. By looking at a method, it’s easy to tell whether it is static or not. If “self” is the first parameter, then it’s not static. So how do you create static methods? Well, for now, you don’t. If you have a group of static methods, stick them in an Object, which can be used as a namespace. You can even do the equivalent of C++’s “using”:
with (MyApp.Utils) { //this would normally be referenced as MyApp.Utils.utilityFunction(); utilityFunction(); }
How about privacy? It’s handled the same as Python. Everything stuck into “self” is public, so use a leading underscore to indicate that others shouldn’t touch that data or method.
var Secretive = new Class({ __init__: function(self) { self._doNotTouch = 5; } _doNotCall: function(self, add) { self._doNotTouch += add; } });
I know that a lot of people don’t like the underscore approach (including JavaScript hero Doug Crockford), but it’s simple, consistent, and clear. I’m stickin’ with it.
Try It
I would love it if people would try it out on their own and tell me what they think. The source is available on bitbucket. Any problems can be reported in the comments or be filed here. There certainly might be some, I just whipped this up a few hours ago.
*When I say singleton, I mean a single instance of a class. In the design pattern world, this means that doing “new MySingleton()” would always return the same instance. I find that deceptive, so I just want it to throw an error when you try to make a “new” singleton.