Posts Tagged ‘JavaScript’
A better object oriented JavaScript
There are some problems with objects in JavaScript. The system is beautiful in its simplicity, but at times the JavaScript approach to objects is limiting and confusing. I have found this to be especially true when trying to develop large pieces of software with many components written by different people that need to interact.
To deal with these situations, I have set out to provide an object system that has a consistent syntax across a variety of situations, can be introspected with the console as much as possible, and exploits the strengths of the JavaScript object system. This isn’t easy, but by starting with a solid foundation and adding a few key features, I feel that I’m well on my way.
First, let me illustrate some specific issues that I come across in day to day development.
The first is static classes. These are typically manager type classes that regulate access to resources. Connection managers are a good example. An important attribute for these classes is that they cannot be re-instantiated. They are singletons. One approach to these is the namespace:
ConnectionManager = { connect: function(data) { do_stuff_with_data(data); }, disconnect: function() { close_my_connection(); } }
This does not leave behind a constructor, so it is not possible to accidentally create another ConnectionManager object. However, there is no constructor at all. If I need to do some initialization of this object, I can do the following:
ConnectionManager = new (function() { var my = this; var my_private_data = initialize_something(); my.connect = function(data) { do_something_with_data(data, my_private_data); } my.disconnect = function(data) { close_my_connection(); } })();
This form has a couple of advantages. First, you can initialize things. This often has to happen, so it already has an advantage over the first form. Secondly, you can control your scope. In the preceding example, I assign this to my. From any of the member functions, I am then free to use my in place of this with the knowledge that it will be the correct object. I never have to worry about binding. Finally, you can hide private data. By keeping private data in an enclosing scope, you keep it from being used improperly. This isn’t to say that I think private variables are a necessary part of an OO system, but when I’m designing an interface, it’s a bit more pleasant when the person I’m designing the interface for can inspect my object without wading through a bunch of implementation details.
Beyond the Singleton case, there are more traditional object oriented issues. Without any help from a library, a typical class definition would looks something like this:
Connection = function(){ initialize_some_stuff(); }; Connection.prototype.permissionMap = { 1: 'read', 2: 'read-write', 3: 'none' }; Connection.prototype.open = function() { open_this_connection(); }; Connection.prototype.close = function() { close_this_connection(); };
This is a typical JavaScript class that uses prototypes. Prototypes are a necessary part of defining classes in JavaScript since a prototype defines all the properties that will be defined across all instances of a class. This is essentially how most people expect classes to work (coming from a more traditional OO point of view).
Also included in this example is a static member. The permissionMap above is an example of providing static data across all instances of a class. This is a good thing to do for constants, jump tables, string tables, or anything else that doesn’t change but might take up a lot of memory. Data that does change on an instance by instance basis cannot be defined as a member of the prototype or when it is changed by one instance, that change will be reflected across all instances. That data should be established and initialized in the constructor (The Connection function above).
There are some problems with this syntax. The first is that it’s not obvious what’s going on. People who use prototypes all the time know what’s going on, but it is never explicitly stated. For instance, permissionMap is a static variable. I’ve explained why, but nowhere is that clear in the code. This form is also quite verbose. I get so tired of typing Connection.prototype that I usually just making a shortcut variable and deleting it after the method definitions are all over with. Finally, this syntax doesn’t really look like any of the other objects we defined. It looks like a bunch of functions with the same prefix. Nowhere is it implied that this is a class.
Out of those basic problems, I have started trying to define a common syntax that can fit any of the situations above, plus basic inheritance (which is basically just copying and extending a prototype). In addition, I’m building in as many introspection features as possible to make interfaces defined in this way a snap to sit down and figure out how to use.
I haven’t made that much progress yet, but I was given a big head start by basing all of this off of the outstanding mootools Class system. I really enjoy their code, their approach to modifying JavaScript, and their minimalism. Mootools’ system, in turn, is based on Dean Edwards’ Base.js. As you can see, I have a nice body of work backing me up here. It should make my job much easier.
I’ll go into more detail about how I want to solve these problems in a later post. For now, you can follow development on my fork of mootools-core at github.
A Threading Model Overview
I noticed in a story on Hacker News that many people do not understand that differences in threading implementations between different programming languages. In the single processor days, understanding the threading model that you were working with was not that important. With more than one core, it is a good thing to know. This is an overview.
The Beginning (C and Native Threads)
The first threading model we will look at is the standard OS level thread. Every modern OS has support for this, though the APIs change from OS to OS. Basically, a thread is a process that can run on its own processor, is scheduled by the OS scheduler, and can block. It acts just like its own process except that it shares resources with every other thread in the process. This mainly means that memory and file descriptors are shared between all threads in a process. This is what people mean by “native threading”. From C on linux, you can use these threads by linking with the pthread library. BSDs generally support pthreads as well, and Windows does its own thing that is very similar.
Java and Green Threads
When Java came out, it introduced a different type of threading model to the world called green threads. Green threads are essentially simulated threads. The Java virtual machine would take care of switching between different green threads, but the virtual machine itself would only run in one OS thread. This generally has some advantages. OS threads have almost as much overhead as a process on most POSIX systems. It is also usually slower to switch between native threads than it is between green threads.
This can mean that in some situations, green threads are much preferable to native threads. A system can usually support a much higher number of green threads than OS threads. For instance, it would be practical to spawn a new green thread for every new connection on a web server, but it is not generally practical to spawn a new native thread for every incoming HTTP connection.
There are disadvantages, however. The biggest is that you cannot have two threads running at the same time. There is only one native thread, so it is the only thread that gets scheduled. Even if there are multiple CPUs and multiple green threads, only one CPU will be running any given green thread at any given time. This is because it all looks like one thread to the OS scheduler.
Java has supported native threading since version 1.2, and it has been the default for some time now.
Python
Python is one of my favorite scripting languages, and was one of the first scripting languages to offer threading. Python exposes a threading module that manipulates native threads. This means that Python can benefit from all the advantages of true native threading, except for one catch.
Python has a global interpreter lock (GIL). This lock is necessary to keep Python threads from corrupting the global state of the interpreter. This means that no two Python instructions can be running simultaneously. The GIL gets released every 100 Python instructions or so and another Python thread is free to acquire the lock and begin executing.
On the face of it, this seems like a major flaw. However, in practice it is not that big of a deal. Any thread that blocks will generally release the GIL. C extensions can also release the GIL whenever they are not interacting with the Python/C api, so CPU intensive operations can be carried out in C without blocking the executing Python threads. The only situation in which the GIL proves problematic is when you have more than one CPU bound thread written in Python on a multi-core machine.
Stackless python is an implementation of Python that brings “tasklets” (essentially green threads) to Python. The greenlet module is derived from their work and is compatible with the standard cPython implementation.
Ruby
Ruby’s threading model is and always has been in a state of flux. Ruby’s original implementation only supported cooperative green threads. These work fine in many situations, but they do not take advantage of multiple processors.
JRuby mapped Ruby’s threads straight to Java’s threads, which are generally OS native threads. This doesn’t work. Since Ruby’s threads are cooperative, there is no need to synchronize between the threads. Every thread can be assured that no other thread is accessing a resource while it is accessing it. This breaks down in JRuby, since native threads are generally preemptive, meaning any thread could be accessing any shared data at any time.
Because of the mismatches and the desire for native threading from the C Ruby folks, it was decided that Ruby would move to a native threading in Ruby 2.0. In Ruby 1.9, a different interpreter was swapped into the standard Ruby distribution. 1.9 adds a threading model it calls Fibers, which as far as I know are a more efficient implementation of green threads.
In short, Ruby’s threading model is a poorly documented mess.
Perl
Perl has an interesting threading model, one which Mozilla borrowed for SpiderMonkey if I’m not mistaken. Instead of having a global interpreter lock like Python, Perl makes all global state thread local and spawns off a new interpreter with each new thread. This allows for true native threading. There are two catches though.
First, you must explicitly make variables available to threads outside your own. This is the nature of everything being thread local. The values must then be kept up to date across threads.
The second catch is that every new thread is very expensive to create. The interpreter is not small, and duplicating it with every thread makes for a lot of overhead.
Erlang, JavaScript, C# and so on
There are a lot of other models out there that people play with from time to time. Erlang, for instance, has a shared nothing architecture that forces you to use lightweight, user-land processes over threading. This is actually an outstanding architecture for parallel programming since it takes out all of the headaches involved with synchronizing memory, and the processes are so lightweight you can generally just spawn as many of them as you want.
JavaScript is usually not thought of as a language that supports threading, but it needs to support it for a browser implemented largely in JavaScript like Mozilla. Its threading model is very similar to that of Perl’s.
C# uses native threads.
Well, I hope that makes the whole threading picture a little bit clearer. Please let me know if anything is confusing or if I messed anything up. I don’t know everything, after all.