Java, please stop ruining my fun.
I don’t like Java. I haven’t learned Java well because I don’t enjoy using it. I don’t enjoy using it because it’s verbose, for one, but mostly because it’s constantly making things hard for me to do. I know there are ways to do what I want, after all, millions of people use Java successfully every day, but I don’t know what they are. Furthermore, finding out what they are is excruciatingly painful.
I recently did a series of articles on a project I was doing to learn Clojure. It kind of petered out for a number of reasons, but one constant annoyance in learning Clojure was dealing with the Java-isms. Java has given Clojure a vast library of high quality software essentially for free, but it’s also brought on a lot of the pain, much of which I think needs to be fixed before Clojure can have the nice feel of my favorite dynamic languages.
Installing Clojure
The first thing one has to do is install Clojure. It’s not a package in Ubuntu yet, but it’s young, so that’s ok cause we’re veterans and don’t need no stinking packages. To compile, we just download the source and type “ant”.
And that’s it. There’s no install process that makes a nice pretty “clojure” command that takes us to the REPL or executes scripts that are passed to it. To run clojure, you need to run it using Java:
$ java -cp clojure.jar clojure.lang.Repl
That is a lot to type just to get a Repl, and getting a usable command line is even harder. After installing JLine ConsoleRunner, you need to get the library into your classpath (a rant on which is upcoming) and then run
$ java -cp jline-0.9.91.jar:clojure.jar jline.ConsoleRunner clojure.lang.Repl
Not exactly intuitive, but whatever. We put it in a bash script, put it in our path, and head off to the races. After a while, we have a few lines of a quality script we would like to save and run. How do we do that?
Obviously, it’s:
$ java -cp clojure.jar clojure.lang.Script my-script.clj
This assumes that clojure.jar is in the same directory as the script you want to run. If you don’t have clojure.jar there, you must provide a specific path to the jar file. There is no idea of a default directory where Java will look for jar files. You must provide every single jar file to Java at runtime.
Contrast this with the Python install process:
$ sudo apt-get install python $ python ... Have fun in the interpreter ... Write a script $ python my_script.py
Simple.
The Classpath
First of all, I’m no expert on the classpath, but it seems like an unholy abomination thrust upon us by invisible powers that must be extinguished at all costs. It would appear, and again, I am no expert, but it would appear that every single dependency of a program must be explicitly passed to Java at the time you run your program. I wrote a bash script to automate the process, but viewing the command line for running my simple Compojure-based webapp is apalling:
java -Djava.library.path=/usr/local/lib -cp :/mnt/data/Users/justin/bin/compojure/compojure.jar:/mnt/data/Users/justin/bin/compojure/deps/clojure-contrib.jar:/mnt/data/Users/justin/bin/compojure/deps/clojure.jar:/mnt/data/Users/justin/bin/compojure/deps/fact.jar:/mnt/data/Users/justin/bin/compojure/deps/jetty-6.1.14.jar:/mnt/data/Users/justin/bin/compojure/deps/jetty-util-6.1.14.jar:/mnt/data/Users/justin/bin/compojure/deps/re-rand.jar:/mnt/data/Users/justin/bin/compojure/deps/servlet-api-2.5-6.1.14.jar:/mnt/data/Users/justin/lib/clj-http-client.jar:/mnt/data/Users/justin/lib/clojure-contrib.jar:/mnt/data/Users/justin/lib/clojure.jar:/mnt/data/Users/justin/lib/commons-codec-1.3.jar:/mnt/data/Users/justin/lib/commons-httpclient-3.1.jar:/mnt/data/Users/justin/lib/commons-io-1.4-javadoc.jar:/mnt/data/Users/justin/lib/commons-io-1.4-sources.jar:/mnt/data/Users/justin/lib/commons-io-1.4.jar:/mnt/data/Users/justin/lib/commons-logging-1.1.1-javadoc.jar:/mnt/data/Users/justin/lib/commons-logging-1.1.1-sources.jar:/mnt/data/Users/justin/lib/commons-logging-1.1.1.jar:/mnt/data/Users/justin/lib/commons-logging-adapters-1.1.1.jar:/mnt/data/Users/justin/lib/commons-logging-api-1.1.1.jar:/mnt/data/Users/justin/lib/commons-logging-tests.jar:/mnt/data/Users/justin/lib/compojure.jar:/mnt/data/Users/justin/lib/jline-0.9.94.jar:/mnt/data/Users/justin/lib/tokyo-cabinet-clj.jar:/mnt/data/Users/justin/lib/tokyo-cabinet.jar:/mnt/data/Users/justin/lib/tokyocabinet.jar:/mnt/data/Users/justin/lib/tokyotyrant-0.6.jar clojure.lang.Script index.clj
That is bad. That is not correct, that is not how software should be designed, I object. Every other language I can think of off the top of my head (except JavaScript) has some structured way of finding its dependencies, and most have a way of adding additional rules to that search should the defaults not be adequate. While this can lead to “DLL hell”, I do not see how the Java situation is any better when everybody just ends up with scripts to automate the process and then those scripts pick up the wrong things and you can’t figure out why.
The classpath makes me very upset. If Clojure can find a way to mask it, I would appreciate it very much.
Maven
First of all, what the hell is Maven? A quick trip to their site reveals a huge chunk of text with hundreds of links and an initial sentence that describes it as:
Maven, a Yiddish word meaning accumulator of knowledge, was originally started as an attempt to simplify the build processes in the Jakarta Turbine project.
I went to the site with some hope that it would provide some relief to my dependency issues (All I want is “pip install”, or “gem install”), and I get greeted with a dense paragraph of history combined with some mumbo-jumbo about “best practices”.
After reading a bit I find that Maven downloads and builds dependencies and installs them in a local repository, along with the library you are trying to compile. Perfect! Sounds like exactly what I want. However, it doesn’t mention anything about the classpath. Am I still responsible for dealing with all that muck, even though it’s tucking my libraries in a hidden directory (implying that it’s responsible for managing them)?
To answer that question I need to wade through dozens of other pages that alternately describe how to accomplish basic tasks and lecture me on software engineering. Finally I come to the conclusion that while Maven does indeed find dependencies for you, it does not actually help you execute programs with those dependencies in place. This means you either need a script that automatically passes your entire maven local repository to Java, or you need to know the dependencies that Maven was conveniently supposed to hide from you. To top it off, it doesn’t play well with Clojure. Completely useless.
(For the record, there is a Maven extension that does exactly this.)
The Last Word
Dependency management is a hard problem that all languages must learn to deal with. Higher level languages have an even harder time in that they must not only deal with whatever dependencies they have written in their own language, but also with extensions written in other languages. Clojure, which is still very young, suffers tremendously from the godawful environment that Java has ensconsed itself in. I am largely a veteran of the *nix world, which seems quite different from the world Java developers have built around themselves. They have their own tools, their own build systems, their own set of “best practices”, and the Apache foundation. What I have seen in my brief saunter over the wall has appalled me. It has appalled me far more than similar saunters into the somewhat exciting world of Microsoft and .NET. It strikes me very much as a world in need of fixing, and I hope that Clojure (or Scala) can do it. Heck, I may even do my part to help.
But probably I’ll just run back to Python.
The New GOP
Speculating about the future of the Republican party has become a popular conversation amongst the political pundits. The most popular discussions revolve around who will be the face of the rebranded GOP that will inevitably come about. That isn’t a very interesting topic. In all probability, the big names now will fade and a fresh, charismatic politician will emerge onto the scene and carry the new platform forward. The more important issue is deciding what that platform will look like. There are a few things that must be in that platform to be successful.
Renounce the Religious Right
The religious right is the worst thing to happen to the GOP. There is nothing wrong with their beliefs, I am a Christian myself, but their desire to impose those beliefs on the rest of the country gets them nothing but resentment and a reputation for narrow-minded ignorance. It is vitally important that the influence of this group is greatly reduced or removed entirely. Their needs can be met by the party without their face being on it.
Make Peace with the Gay Community
While it’s true that gay rights don’t have popular support, the active opposition of them by the Republican party makes them look, for lack of a better word, mean. To undo this damage, Republicans should start vehemently defending personal rights from intrusion by the government. This goes along with the principles of small government that Republicans are supposedly for. Government intrusion into peoples lives should be kept to a minimum, and when it is unclear whether a person has a right to do something, err in favor of the right. A Republican that can defend the rights of every citizen, gay or not, will come across as much more principled than somebody writing anti-gay marriage bills just because they know it will pass.
Take Back National Security
The Republican dominance of national security issues took a big hit during the Bush administration. Whether or not Bush’s strategies are vindicated, the Republican party still needs to repair the immediate damage to their reputation. One way to draw a clear line between the two parties is free trade. The democratic caucus is too diversified to take a strong position on free trade, and Republicans would be well advised to exploit the clear links between trade and security.
Get a Pragmatic Energy Policy
Republicans don’t have anything to say about energy. They need to find something. Whether they align themselves with the Democrats or strike out and find something new, the “drill at home” argument isn’t going to cut it. An underrepresented energy policy that the Republicans could take up is one oriented around nuclear power. By simultaneously pushing for nuclear power plants and for more electic motors, their policy would be easy to understand and very green.
Win Back the Latino Vote
The Republicans should have the Latino vote. Their small business support and smart economic policies should speak to Latinos. However, the immigration debate has aliented large numbers of them. The Republicans should come out with a sensible immigration bill that puts amnesty at the top. A party oriented around less goverment interference in people’s lives should apply those principles to any person who wants to be an American.
Find a Good Tone
One of the biggest differences between Democrats and Republicans is that Democrats tend to regulate and Repulicans tend to incentivize. This is a point that works well for Republicans. If they can show that they are trying to acheive the same goals as the rest of the country but want to leave choice in the equation, they can almost always win the debate against somebody trying to achieve those goals through force. People do not like to be forced into things, and the Republicans won’t do it. They’ll just charge you a bit.
Stop Being Corrupt
Every Republican that gets caught in some corruption scheme or affair reflects poorly on the entire party. Democrats have this problem too, but since they are in power, it will probably be happening more often to them for a while. If the Republicans can have a few scandal free years, they can emerge as the honest brokers for the people. Trust is probably the most important thing you can have in politics, and the Republicans have some time to earn theirs back.
If the Republicans can do these few things, as well as define clear and workable alternative plans for hot issues like education and health care reform, they will be able to compete again. Where the Republican party is currently associated with hate, foolish wars, closed borders, guns, and rednecks, a few smart changes could position them as the party defending the people from government encroachment, accepting of every person, and ready to do what it takes to keep Americans safe and prosperous. That’s a party people can vote for, and one that we haven’t had in a very long time. Perhaps losing power was the best thing to ever happen to the GOP.
The Asynchronous Conversation
Conversations are marvelous ways to spend time. One person says one little thing, which causes somebody else to think of something else, and that causes yet another thought in another corner. They are organic, engaging, educational, and brilliant fun. For many of us, a simple conversation is what being “social” means, and many social events are organized around this one thing that we almost all love to do.
Bringing a conversation to the internet is an idea as old as the thing itself. The internet is designed around sending messages back and forth. Communication and the internet have practically become synonymous. One of the huge advantages of the internet is the ability to communicate asynchronously. However, no internet technology has really been able to capture the elegant simplicity of a casual, social conversation and make it asynchronous.
Twitter has changed that. They have achieved the asynchronous converation where no other technology has been able to. I know that’s a bold statement, but let me explain. Conversation, in the social sense, is not about getting things done. Conversation is about bouncing a thought out there and seeing what bounces back. Conversation is about talking to whoever in the room will listen, just because you want to hear what they have to say. Conversation is about expressing yourself and yielding the floor to others. Twitter captures all of these things, and removes the necessity of being in the same place and time.
The “@” replies are really what gave this capability to twitter. By being able to view the replies of people you know to people you know, your twitter feed becomes a chat room transcript, except you don’t have to stay logged in and you aren’t expected to answer. It becomes your email inbox except there’s no “To: ” list, the messages are just sentences, and you don’t need a reason to post. It becomes your IM except any other friend can throw in his two cents. In fact, it’s welcome.
While the “@” replies make Twitter a channel on which you can have a conversation, it’s the 140 character limit that forces the conversational feel. Whenever you are not limited in time or length, people tend to go on too long. In a real conversation, somebody will usually interrupt somebody else who is taking up too much time to make his point. The asynchronous nature of much of the internet does not offer these interruptions, so there’s no impetus for conversation sized-snippets of thought. Twitter’s brilliant move to limit statements to 140 characters forces single thoughts to be expressed, which is exactly what happens in a good conversation.
These two simple design decisions have created a phenomenon that I used to mock mercilessly but now embrace. The web-wide conversation is a great way to casually keep in touch with people you rarely get to have a real conversation with, and a great way to have a little fun during work without distracting you for hours on end, like facebook can do. Twitter is probably not the end of network news, real time search is probably not the end of Google, and it’s probably not without some element of narcissm. However, I’ve grown to enjoy this simple technology immensely just because it lets me talk a bit more to a few more people. More conversations are never a bad thing.
Exploring memcmp
Yesterday I had a problem. I had a chunk of binary data saved in a file that I had extracted from another file. I needed to find where this chunk of data had come from. I wanted to know if it appeared in the original file more than once, and if so, where. I also wanted to see if it appeared in a number of other files.
There is no command in Linux (that I know of) that does this, so I set out to write my own. I used the obvious and naive implementation. I memory mapped the file containing the chunk I was looking for (the needle) and the file containing all the data I was looking through (the haystack). Then I walked through the haystack byte by byte and ran memcmp to see if it was the same.
int compare(char *needle_pa, int needle_size, char *haystack_pa, int haystack_size) { int i; int found = 0; for (i = 0; i < haystack_size-needle_size; i++) { if (memcmp(needle_pa, haystack_pa + i, needle_size) == 0) { printf("Found needle at byte offset %x\n", i); found++; } } return found; }
This was fast enough for my purposes, but intellectual curiosity lead me to try and optimize this function. There are a number of ways to speed this up, but the one that seemed most fun was to compare more bits at once. This lead me to explore in internals of memcmp.
glibc’s memcmp
Glibc comes with a memcmp implementation, of course. You can view the source here. It’s quite simple. If the passed buffer is not aligned on a word boundary, iterate byte by byte doing comparison via subtraction until you get to a word boundary. Then loop through word by word comparing with the equality operator (==). Once you get to the end of all the full words, compare the final few bytes by subtraction again.
This implementation is pretty good. It’s fast on most platforms as it maximizes aligned comparisons. It’s very portable as it does not take advantage of any processor specific features. However, the fact that it does not take advantage of processor features means that it could be faster.
SIMD (Single Instruction Multiple Data)
Many platforms have SIMD instructions. SIMD instructions are very specific vector instructions that allow you to manipulate large chunks of data with a single instruction. A common operation found in SIMD instruction sets is string comparison, which is what essentially what memcmp does. For instance, in the x86 architecture, there are instructions to compare strings 16 bytes at a time. That’s 4x the throughput of the glibc implementation.
Taking advantage of SIMD instructions is a bit tricky. They’re not supported on all platforms, even in the same processor family. Intel, for instance, has SSE1, SSE2, and SSE3 instruction sets, which all add new instructions not found in the previous iteration. Generally there’s no reason to go through the pain of using these instructions unless you absolutely know you need the speed boost. Today, we’re just doing it for fun.
GCC built-in functions
GCC comes with a number of SIMD optimized replacements for common libc functions. Memcmp is one of those functions that GCC optimizes. However, turning on this optimization is a bit tricky. Again, since every platform supports different instructions, you have to be very specific about what features are supported in order to get the maximum optimization. GCC will always create the most compatible binary it can unless specifically told otherwise, which is the right thing to do.
To compile with GCC’s optimized memcmp, we run the following:
gcc -Wall findneedle.c -O3 -march=opteron -o findneedle
This tells gcc to optimize as much as it can (-O3) and that it can expect to be running on a processor with the same features as the Opteron (x86-64, SSE1-3). I’m running this on an Intel Core Duo, which supports the same feature set. This gives us a version of the findneedle program that does not use the standard memcmp implementation, but instead uses GCC’s SIMD optimized version.
The Results
My test was simple: I timed how long it took to a 16K needle in a 328MB haystack.
At first I was doing this on a virtual machine. The most interesting result of any of this is that the program running in a vmware image slowed down significantly. It went from an average of ~4 seconds to an average of ~10 seconds. Thinking that the virtual machine might be emulating the SIMD instructions, I compiled on my mac itself. In that situation I saw the expected speedup. The average search time went from ~4.4 seconds to ~4.1 seconds. That’s a 7% speedup!
Clearly, these results show that optimizing your program to take advantage of processor features does not always help very much. However, if you’re compiling your own programs (I’m looking at you, gentoo fanatics), it does help to make sure that your processor options are set correctly. Just switching a few flags can cause core functionality to be improved, and that makes the whole experience that much better.
Also, assume your virtual machine is a 486. It’s way faster that way.
Update:
There was a good debate about optimizing the algorithm itself over on hacker news. jws was nice enough to do a rundown on a number of different algorithms. Not surprisingly, the one demonstrated above is indeed the most naive.
6th: Clojure Agents
Flockr is slow. I would profile it, but profiling in Java is a pain, so I’m going to commit a cardinal sin and guess as to what is causing the slowdown. Every time a user loads their flockr page, a bunch of synchronous http requests go out to twitter, return, and the responses are rendered. This could easily happen in a parallel fashion, and my guess is that the slowdown is caused by doing these requests synchronously.
“The Right Way” to solve this problem is using non-blocking I/O. However, that doesn’t let me play with fancy clojure features. Clojure has Agents built in. Agents are bits of code that execute asynchronously and in a thread safe manner. They allow concurrency without any of the usual pains.
Internally, I believe agents are simply functions distributed to a thread pool. Since state is immutable in clojure, it’s pretty trivial to make that arrangement thread safe. However, this is not the optimal solution for my problem as it is limited by the capabilities of the thread pool. The highest throughput would be through non-blocking IO, but we’ll use agents for now for the sake of learning Clojure.
Ok. This might get complicated. However, Berlin Brown was very helpful in getting this all figured out.
First we need to construct the agents. Since there are a dynamic number of channels, we need to put them in a data structure. I just construct a single agent for every twitter channel in column 1 and stick those in a list (the output of map), and then the same thing for the second column.
Constructing the agents is pretty easy. All we need to do is delcare an agent with agent and its default value. If I were to read it right when I created it (by doing @<agent-ref>), I would get the empty string.
I then pass the freshly constructed agent to send-off. send-off takes an agent and the function that will modify the agent. The return value of the generic function that I pass in will become the new value of the agent at some unspecified time in the future. send-off itself returns a reference to its agent immediately.
After running those first two maps I have two lists of agents, which represent the two columns of content. I then need to wait for all that content to get filled in. To do that, I use await. await takes any number of agents that it will wait for before continuing. If I did not wait for the agents to finish, I would return a blank page to the user! Not wanting to do that, I take my two lists of agents, concatenate them, and them use them as the arguments to await using apply.
After that, it’s easy! I have all the rendered channels in my lists of agents, so I iterate through each one, stick them in their columns by dereferencing (@ch-agent) and then send the whole thing off.
The question is whether it really improves performance. Without understanding the internal implementation of the thread pool, we are still limited by the slowest response from Twitter. This is very unfortunate, and in the end, this should be handled on the client side with JavaScript. That’s no fun though, so we’ll just optimize the server side and see what kind of performance we can squeeze out of the thread pool (I’m pretty sure mine is still sub-optimal). Even this fairly straightforward default configuration did cut average response time in half, however, so that’s a pretty good start.
As always, the entirety of the code is available on github.