Posts Tagged ‘Python’
Cobra: A Little JavaScript Class Library
In my last post, I talked a bit about the problems I saw with trying to express new types and objects in JavaScript. It’s not so much that anything is difficult to do, it’s that doing different things requires very different (and sometimes very verbose) syntax. I tried a few different things to trick JavaScript into behaving differently, but in the end, I realized that perhaps just keeping things simple was the best thing to do. I wrote down a few small features for my little library.
- Classes. Not really the full blown classes of Java lore, but more a shortcut way of doing ClassName.prototype.functionName = function(args).
- Inheritance. I wanted my classes to support the prototypal inheritance built into JavaScript.
- Singletons. I wanted to be able to create single instances of classes without leaving a class around that might confuse people.
- Namespaces. I eventually realized that Object was good enough for this.
I also wanted some convention for how to deal with private methods and properties. In JavaScript, it’s pretty easy to have private members, and it’s pretty easy to do prototypal inheritance, but it’s a bit messier to do both.
My final bit of inspiration was Python. Python doesn’t care about privacy, it just trusts the developer not to mess things up. I like this philosophy. I also like the explicit self in python methods. A lot of people don’t like self, but I think it makes it very obvious what instance your code is acting on. I took these things to heart and wrote Cobra, a class system for JavaScript that’s really simple and looks a whole lot like Python. Without further ado, let me show you some examples of Cobra.
Classes
/* This is our base class. In its initialization function, * all it does is set things that are true for * all living animals. */ var Animal = new Class({ __init__: function(self) { self.breathes = true; } }); /* Feline extends animal and overrides it's initialization function. * Notice that it calls it's parents __init__ function, just to be safe. */ var Feline = new Class({ __extends__: Animal, __init__: function(self) { Class.super(Feline, '__init__', self); self.claws = true; self.furry = true; }, says: function(self) { console.log ('GRRRRR'); } }); /* This is a cat. It inherits from Feline, and therefore also inherits from * Animal. It says something a bit different from most felines. */ var Cat = new Class({ __extends__: Feline, __init__: function(self) { Class.super(self, '__init__', self); self.weight = 'very little'; }, says: function(self) { console.log('MEOW'); } }); /* Tigers are like most Felines except that they weigh a lot. */ var Tiger = new Class({ __extends__: Feline, __init__: function(self) { Class.super(Tiger, '__init__', self); self.weight = 'quite a bit'; } });
If we try using these classes, you’ll see that they work as they should:
>>> sneakers = new Cat();
Object breathes=true claws=true furry=true
>>> sneakers.breathes
true
>>> sneakers.claws
true
>>> sneakers.furry
true
>>> sneakers.weight
"very little"
>>> sneakers.says();
MEOW
>>> tigger = new Tiger()
Object breathes=true claws=true furry=true
>>> tigger.says()
GRRRRR
You’ve probably noticed something very strange at this point (at least for the JavaScript world). Every instance method takes “self” as its first parameter. This parameter is the instance. Whether “this” is currently bound to the window or any other random object, “self” will always be the instance of the class. It’s a nice property, and you can still use “this” however you may wish. Just remember that your methods have to take “self” as their first arguments or weird things will happen.
Singletons
As I mentioned in my last post, I’ve had some issues creating singletons in JavaScript*.There is more than one way to do it, and you have to change a whole lot to go from one form to the other. If you no longer want your object to be a singleton, that might be a whole new refactoring pain. So I created a simple Singleton class that uses the same syntax as the Class type above, but it immediately discards its class and returns a single instance. This is nice because creating it is exactly the same as creating a class. You don’t have to think about it at all, and if you eventually realize a singleton was a bad idea, it’s trivial to convert it to a class.
var sanFranciscoZoo = new Singleton({ __init__: function(self) { self.cats = [new Tiger(), new Tiger(), new Cat()]; } });
That’s pretty much it.
Namespaces, statics, and privacy
Everything else needs to be solved by convention. By looking at a method, it’s easy to tell whether it is static or not. If “self” is the first parameter, then it’s not static. So how do you create static methods? Well, for now, you don’t. If you have a group of static methods, stick them in an Object, which can be used as a namespace. You can even do the equivalent of C++’s “using”:
with (MyApp.Utils) { //this would normally be referenced as MyApp.Utils.utilityFunction(); utilityFunction(); }
How about privacy? It’s handled the same as Python. Everything stuck into “self” is public, so use a leading underscore to indicate that others shouldn’t touch that data or method.
var Secretive = new Class({ __init__: function(self) { self._doNotTouch = 5; } _doNotCall: function(self, add) { self._doNotTouch += add; } });
I know that a lot of people don’t like the underscore approach (including JavaScript hero Doug Crockford), but it’s simple, consistent, and clear. I’m stickin’ with it.
Try It
I would love it if people would try it out on their own and tell me what they think. The source is available on bitbucket. Any problems can be reported in the comments or be filed here. There certainly might be some, I just whipped this up a few hours ago.
*When I say singleton, I mean a single instance of a class. In the design pattern world, this means that doing “new MySingleton()” would always return the same instance. I find that deceptive, so I just want it to throw an error when you try to make a “new” singleton.
A Threading Model Overview
I noticed in a story on Hacker News that many people do not understand that differences in threading implementations between different programming languages. In the single processor days, understanding the threading model that you were working with was not that important. With more than one core, it is a good thing to know. This is an overview.
The Beginning (C and Native Threads)
The first threading model we will look at is the standard OS level thread. Every modern OS has support for this, though the APIs change from OS to OS. Basically, a thread is a process that can run on its own processor, is scheduled by the OS scheduler, and can block. It acts just like its own process except that it shares resources with every other thread in the process. This mainly means that memory and file descriptors are shared between all threads in a process. This is what people mean by “native threading”. From C on linux, you can use these threads by linking with the pthread library. BSDs generally support pthreads as well, and Windows does its own thing that is very similar.
Java and Green Threads
When Java came out, it introduced a different type of threading model to the world called green threads. Green threads are essentially simulated threads. The Java virtual machine would take care of switching between different green threads, but the virtual machine itself would only run in one OS thread. This generally has some advantages. OS threads have almost as much overhead as a process on most POSIX systems. It is also usually slower to switch between native threads than it is between green threads.
This can mean that in some situations, green threads are much preferable to native threads. A system can usually support a much higher number of green threads than OS threads. For instance, it would be practical to spawn a new green thread for every new connection on a web server, but it is not generally practical to spawn a new native thread for every incoming HTTP connection.
There are disadvantages, however. The biggest is that you cannot have two threads running at the same time. There is only one native thread, so it is the only thread that gets scheduled. Even if there are multiple CPUs and multiple green threads, only one CPU will be running any given green thread at any given time. This is because it all looks like one thread to the OS scheduler.
Java has supported native threading since version 1.2, and it has been the default for some time now.
Python
Python is one of my favorite scripting languages, and was one of the first scripting languages to offer threading. Python exposes a threading module that manipulates native threads. This means that Python can benefit from all the advantages of true native threading, except for one catch.
Python has a global interpreter lock (GIL). This lock is necessary to keep Python threads from corrupting the global state of the interpreter. This means that no two Python instructions can be running simultaneously. The GIL gets released every 100 Python instructions or so and another Python thread is free to acquire the lock and begin executing.
On the face of it, this seems like a major flaw. However, in practice it is not that big of a deal. Any thread that blocks will generally release the GIL. C extensions can also release the GIL whenever they are not interacting with the Python/C api, so CPU intensive operations can be carried out in C without blocking the executing Python threads. The only situation in which the GIL proves problematic is when you have more than one CPU bound thread written in Python on a multi-core machine.
Stackless python is an implementation of Python that brings “tasklets” (essentially green threads) to Python. The greenlet module is derived from their work and is compatible with the standard cPython implementation.
Ruby
Ruby’s threading model is and always has been in a state of flux. Ruby’s original implementation only supported cooperative green threads. These work fine in many situations, but they do not take advantage of multiple processors.
JRuby mapped Ruby’s threads straight to Java’s threads, which are generally OS native threads. This doesn’t work. Since Ruby’s threads are cooperative, there is no need to synchronize between the threads. Every thread can be assured that no other thread is accessing a resource while it is accessing it. This breaks down in JRuby, since native threads are generally preemptive, meaning any thread could be accessing any shared data at any time.
Because of the mismatches and the desire for native threading from the C Ruby folks, it was decided that Ruby would move to a native threading in Ruby 2.0. In Ruby 1.9, a different interpreter was swapped into the standard Ruby distribution. 1.9 adds a threading model it calls Fibers, which as far as I know are a more efficient implementation of green threads.
In short, Ruby’s threading model is a poorly documented mess.
Perl
Perl has an interesting threading model, one which Mozilla borrowed for SpiderMonkey if I’m not mistaken. Instead of having a global interpreter lock like Python, Perl makes all global state thread local and spawns off a new interpreter with each new thread. This allows for true native threading. There are two catches though.
First, you must explicitly make variables available to threads outside your own. This is the nature of everything being thread local. The values must then be kept up to date across threads.
The second catch is that every new thread is very expensive to create. The interpreter is not small, and duplicating it with every thread makes for a lot of overhead.
Erlang, JavaScript, C# and so on
There are a lot of other models out there that people play with from time to time. Erlang, for instance, has a shared nothing architecture that forces you to use lightweight, user-land processes over threading. This is actually an outstanding architecture for parallel programming since it takes out all of the headaches involved with synchronizing memory, and the processes are so lightweight you can generally just spawn as many of them as you want.
JavaScript is usually not thought of as a language that supports threading, but it needs to support it for a browser implemented largely in JavaScript like Mozilla. Its threading model is very similar to that of Perl’s.
C# uses native threads.
Well, I hope that makes the whole threading picture a little bit clearer. Please let me know if anything is confusing or if I messed anything up. I don’t know everything, after all.
Django round 2
Last year when I was going through the various python frameworks, I eventually dismissed Django in favor of TurboGears. From TurboGears I settled on Pylons as my web framework of choice, and excepting some quirks and some bad documentation, I’ve been very happy.
Last night I had the privledge of revisiting Django to do a programming problem as part of an interview. I was tasked with making a little blog system that supported pingbacks. Having a lot more experience with frameworks now, I was able to really sit back and think about what I liked and what I didn’t, with respect to Pylons.
First of all, the emphasis on loosely coupled components is terriffic. One of the major problems we realized with harmonize was that our componentization was not fine grained enough and required too much knowledge of the code base to effectively navigate. Django is right to emphasize this from the beginning.
Django’s documentation is also quite a bit better organized than Pylons. They have a standardized approach that details all their components, and good tutorials and an alright book. On the other hand, the documentation for pylons is spotty and disorganized, with the documentation for its components being hit and miss. The docs for SQLAlchemy are second to none, but the docs for Beaker, the caching framework shipped with Pylons, barely exist.
The actual components of Django could use some help, however.
Django does not have a robust web server built in, like Pylons, so the recommended deployment strategy is with mod_python and Apache. This is hardly the lightweight, scalable approach that has become popular lately. I feel that an excellent static web server that can also serve as a reverse proxy is a great way of getting the most out of your servers. When developing new apps, you don’t want to be giving huge amounts of memory to apache. Your young app is most likely going to need that memory!
Django’s model is pretty good, but it clearly lacks the power that SQLAlchemy has. SQLAlchemy is an amazing bit of technology with support for everything from programmatically constructing SQL to sharding databases to a fully declarative ORM. Django’s model is just an ORM. I didn’t find querying to be very intuitive either (attributes just magically appear in objects that represent relationships), but it may just be that I am used to SQLAlachemy’s model. SQLAlchemy uses a “session” in a way that makes a lot of sense once you read a bit about it. I never know when Django is flushing transactions because there’s no real concept of a session.
By far the worst part of Django is the templating system. Perhaps I wasn’t using it correctly, but I hated it. After spending all summer with a robust and powerful templating solution in Pylons (Mako), I was completely taken aback at the simplicity and incompleteness of the Django solution. Beyond optimizations like the lack of compiling and caching of templates, the Django system doesn’t even support basic python syntax. You can pass variables, define blocks, and do basic if statements and loops, but beyond that, you can’t actually actually execute any code. Simple things like accessing a single element of a list seemed to be unsupported. I could not import modules or call functions, and forget about blocks of python code.
The syntax itself of the templating system seemed poorly thought out. The template tags did not look like html tags, python code, or the code of any other language. Instead they use “{% %}” to denote blocks and “{{ }}” to denote where the values of variables should be inserted. This seems completely arbitrary. I feel that Mako’s habit of making its syntax look like regular tags is much cleaner and natural.
Finally there was the URL routing system, in which I had to define the route of every URL I wanted. This may have just been my own lack of knowlege, but I really like the Routes strategy (if not Routes itself) in Pylons. Routes just automatically routes urls to a class and function, but allows you to specify custom routes that behave very similarly to Django’s. The difference is that you don’t need to specify a complicated regular expression in the average case. Normally things just work.
Django defends their URL mapper by saying that other solutions are Black Magic and that explicit is better than implicit. I think that’s ridiculous. You can easily look in the routes.py file in a pylons project to see exactly how they achieve their “Magic”. Furthermore, Django uses magic all over their model, both in how objects are constructed and in how you query the model. The Django URL routing could definitely benefit from some useful defaults.
Well, I think that’s all I want to say on the topic right now. Django seems like a very nice framework, and a bit more organized than Pylons. In the end, however, I’ll stick with the superior components Pylons ships with, and maybe try to help them out with their organizational issues.