Caffeinated Simpleton

Posts Tagged ‘Learning-Clojure’

Java, please stop ruining my fun.

I don’t like Java. I haven’t learned Java well because I don’t enjoy using it. I don’t enjoy using it because it’s verbose, for one, but mostly because it’s constantly making things hard for me to do. I know there are ways to do what I want, after all, millions of people use Java successfully every day, but I don’t know what they are. Furthermore, finding out what they are is excruciatingly painful.

I recently did a series of articles on a project I was doing to learn Clojure. It kind of petered out for a number of reasons, but one constant annoyance in learning Clojure was dealing with the Java-isms. Java has given Clojure a vast library of high quality software essentially for free, but it’s also brought on a lot of the pain, much of which I think needs to be fixed before Clojure can have the nice feel of my favorite dynamic languages.

Installing Clojure

The first thing one has to do is install Clojure. It’s not a package in Ubuntu yet, but it’s young, so that’s ok cause we’re veterans and don’t need no stinking packages. To compile, we just download the source and type “ant”.

And that’s it. There’s no install process that makes a nice pretty “clojure” command that takes us to the REPL or executes scripts that are passed to it. To run clojure, you need to run it using Java:

$ java -cp clojure.jar clojure.lang.Repl

That is a lot to type just to get a Repl, and getting a usable command line is even harder. After installing JLine ConsoleRunner, you need to get the library into your classpath (a rant on which is upcoming) and then run

$ java -cp jline-0.9.91.jar:clojure.jar jline.ConsoleRunner clojure.lang.Repl

Not exactly intuitive, but whatever. We put it in a bash script, put it in our path, and head off to the races. After a while, we have a few lines of a quality script we would like to save and run. How do we do that?

Obviously, it’s:

$ java -cp clojure.jar clojure.lang.Script my-script.clj

This assumes that clojure.jar is in the same directory as the script you want to run. If you don’t have clojure.jar there, you must provide a specific path to the jar file. There is no idea of a default directory where Java will look for jar files. You must provide every single jar file to Java at runtime.

Contrast this with the Python install process:

$ sudo apt-get install python
$ python
... Have fun in the interpreter
... Write a script
$ python my_script.py

Simple.

The Classpath

First of all, I’m no expert on the classpath, but it seems like an unholy abomination thrust upon us by invisible powers that must be extinguished at all costs. It would appear, and again, I am no expert, but it would appear that every single dependency of a program must be explicitly passed to Java at the time you run your program. I wrote a bash script to automate the process, but viewing the command line for running my simple Compojure-based webapp is apalling:

java -Djava.library.path=/usr/local/lib -cp :/mnt/data/Users/justin/bin/compojure/compojure.jar:/mnt/data/Users/justin/bin/compojure/deps/clojure-contrib.jar:/mnt/data/Users/justin/bin/compojure/deps/clojure.jar:/mnt/data/Users/justin/bin/compojure/deps/fact.jar:/mnt/data/Users/justin/bin/compojure/deps/jetty-6.1.14.jar:/mnt/data/Users/justin/bin/compojure/deps/jetty-util-6.1.14.jar:/mnt/data/Users/justin/bin/compojure/deps/re-rand.jar:/mnt/data/Users/justin/bin/compojure/deps/servlet-api-2.5-6.1.14.jar:/mnt/data/Users/justin/lib/clj-http-client.jar:/mnt/data/Users/justin/lib/clojure-contrib.jar:/mnt/data/Users/justin/lib/clojure.jar:/mnt/data/Users/justin/lib/commons-codec-1.3.jar:/mnt/data/Users/justin/lib/commons-httpclient-3.1.jar:/mnt/data/Users/justin/lib/commons-io-1.4-javadoc.jar:/mnt/data/Users/justin/lib/commons-io-1.4-sources.jar:/mnt/data/Users/justin/lib/commons-io-1.4.jar:/mnt/data/Users/justin/lib/commons-logging-1.1.1-javadoc.jar:/mnt/data/Users/justin/lib/commons-logging-1.1.1-sources.jar:/mnt/data/Users/justin/lib/commons-logging-1.1.1.jar:/mnt/data/Users/justin/lib/commons-logging-adapters-1.1.1.jar:/mnt/data/Users/justin/lib/commons-logging-api-1.1.1.jar:/mnt/data/Users/justin/lib/commons-logging-tests.jar:/mnt/data/Users/justin/lib/compojure.jar:/mnt/data/Users/justin/lib/jline-0.9.94.jar:/mnt/data/Users/justin/lib/tokyo-cabinet-clj.jar:/mnt/data/Users/justin/lib/tokyo-cabinet.jar:/mnt/data/Users/justin/lib/tokyocabinet.jar:/mnt/data/Users/justin/lib/tokyotyrant-0.6.jar clojure.lang.Script index.clj

That is bad. That is not correct, that is not how software should be designed, I object. Every other language I can think of off the top of my head (except JavaScript) has some structured way of finding its dependencies, and most have a way of adding additional rules to that search should the defaults not be adequate. While this can lead to “DLL hell”, I do not see how the Java situation is any better when everybody just ends up with scripts to automate the process and then those scripts pick up the wrong things and you can’t figure out why.

The classpath makes me very upset. If Clojure can find a way to mask it, I would appreciate it very much.

Maven

First of all, what the hell is Maven? A quick trip to their site reveals a huge chunk of text with hundreds of links and an initial sentence that describes it as:

Maven, a Yiddish word meaning accumulator of knowledge, was originally started as an attempt to simplify the build processes in the Jakarta Turbine project.

I went to the site with some hope that it would provide some relief to my dependency issues (All I want is “pip install”, or “gem install”), and I get greeted with a dense paragraph of history combined with some mumbo-jumbo about “best practices”.

After reading a bit I find that Maven downloads and builds dependencies and installs them in a local repository, along with the library you are trying to compile. Perfect! Sounds like exactly what I want. However, it doesn’t mention anything about the classpath. Am I still responsible for dealing with all that muck, even though it’s tucking my libraries in a hidden directory (implying that it’s responsible for managing them)?

To answer that question I need to wade through dozens of other pages that alternately describe how to accomplish basic tasks and lecture me on software engineering. Finally I come to the conclusion that while Maven does indeed find dependencies for you, it does not actually help you execute programs with those dependencies in place. This means you either need a script that automatically passes your entire maven local repository to Java, or you need to know the dependencies that Maven was conveniently supposed to hide from you. To top it off, it doesn’t play well with Clojure. Completely useless.

(For the record, there is a Maven extension that does exactly this.)

The Last Word

Dependency management is a hard problem that all languages must learn to deal with. Higher level languages have an even harder time in that they must not only deal with whatever dependencies they have written in their own language, but also with extensions written in other languages. Clojure, which is still very young, suffers tremendously from the godawful environment that Java has ensconsed itself in. I am largely a veteran of the *nix world, which seems quite different from the world Java developers have built around themselves. They have their own tools, their own build systems, their own set of “best practices”, and the Apache foundation. What I have seen in my brief saunter over the wall has appalled me. It has appalled me far more than similar saunters into the somewhat exciting world of Microsoft and .NET. It strikes me very much as a world in need of fixing, and I hope that Clojure (or Scala) can do it. Heck, I may even do my part to help.

But probably I’ll just run back to Python.

6th: Clojure Agents

Flockr is slow. I would profile it, but profiling in Java is a pain, so I’m going to commit a cardinal sin and guess as to what is causing the slowdown. Every time a user loads their flockr page, a bunch of synchronous http requests go out to twitter, return, and the responses are rendered. This could easily happen in a parallel fashion, and my guess is that the slowdown is caused by doing these requests synchronously.

“The Right Way” to solve this problem is using non-blocking I/O. However, that doesn’t let me play with fancy clojure features. Clojure has Agents built in. Agents are bits of code that execute asynchronously and in a thread safe manner. They allow concurrency without any of the usual pains.

Internally, I believe agents are simply functions distributed to a thread pool. Since state is immutable in clojure, it’s pretty trivial to make that arrangement thread safe. However, this is not the optimal solution for my problem as it is limited by the capabilities of the thread pool. The highest throughput would be through non-blocking IO, but we’ll use agents for now for the sake of learning Clojure.

Ok. This might get complicated. However, Berlin Brown was very helpful in getting this all figured out.

First we need to construct the agents. Since there are a dynamic number of channels, we need to put them in a data structure. I just construct a single agent for every twitter channel in column 1 and stick those in a list (the output of map), and then the same thing for the second column.

Constructing the agents is pretty easy. All we need to do is delcare an agent with agent and its default value. If I were to read it right when I created it (by doing @<agent-ref>), I would get the empty string.

I then pass the freshly constructed agent to send-off. send-off takes an agent and the function that will modify the agent. The return value of the generic function that I pass in will become the new value of the agent at some unspecified time in the future. send-off itself returns a reference to its agent immediately.

After running those first two maps I have two lists of agents, which represent the two columns of content. I then need to wait for all that content to get filled in. To do that, I use await. await takes any number of agents that it will wait for before continuing. If I did not wait for the agents to finish, I would return a blank page to the user! Not wanting to do that, I take my two lists of agents, concatenate them, and them use them as the arguments to await using apply.

After that, it’s easy! I have all the rendered channels in my lists of agents, so I iterate through each one, stick them in their columns by dereferencing (@ch-agent) and then send the whole thing off.

The question is whether it really improves performance. Without understanding the internal implementation of the thread pool, we are still limited by the slowest response from Twitter. This is very unfortunate, and in the end, this should be handled on the client side with JavaScript. That’s no fun though, so we’ll just optimize the server side and see what kind of performance we can squeeze out of the thread pool (I’m pretty sure mine is still sub-optimal). Even this fairly straightforward default configuration did cut average response time in half, however, so that’s a pretty good start.

As always, the entirety of the code is available on github.

Fifth: Static Storage and Tokyo Cabinet

If you’ve been following, you know that I’m trying to build the Web 2.0iest site out there. In fact, this is so Web 2.0, I’m tempted to call it Web 2.1. I’m using only the hottest language (Clojure) and the coolest social networking APIs (twitter). Now I’m kicking it up a notch and using the newest player in the key/value database arena, Tokyo Cabinet.

Introduction

Tokyo Cabinet is a simple, small, fast key/value store. Similar to DBM, it’s a very basic database. If you combine it with Tokyo Tyrant, it becomes a very capable, scalable network database (like mysql or couchdb). However, we don’t really need those things right now, so I’ll just be using Tokyo Cabinet straight up. It comes with a Java interface, so I’ll be using that. I will be using Tokyo Cabinet to store and retrieve user preferences.

Setting up Tokyo Cabinet

Going with the latest and greatest does have its drawbacks. Tokyo Cabinet isn’t packaged up and easy to install yet, so there is some setup involved. In fact, I would recommend not using this for anything beyond your own satisfaction. The project is out of Japan, and the support in English isn’t really there yet. Luckily, since it’s so small and simple, it’s really not that bad.

  • Install Tokyo Cabinet
    • Download the source from sourceforge.
    • Unpackage
    • Make sure you have the development headers for zlib and bzip2
    • ./configure
    • make
    • make check
    • make install
  • Install Java bindings
    • Download the source from sourceforge
    • Make sure you have jni.h. I didn’t and installed the ubuntu package “default-jdk-builddep”. That in turn installed most of the current open source software in existence, which seemed to work.
    • Run the configure script. Mine screwed up the paths to all my java utilities, so I manually edited the Makefile to get it to work. Perhaps that’s not “The Right Way” to do things, but it worked.
    • Make sure tokyocabinet.jar ends up in your classpath
    • Make sure the libjtokyocabinet.*  is in your java.library.path. I haven’t figured out a good system for configuring Java yet, so I just have it defined in my “compojure” bash script.

Storing Data

The next challenge for me (but luckily not for you) was to write a nice clojure wrapper over the Java interface for Tokyo Cabinet. This introduced me to all sorts of new concepts in clojure, and I wouldn’t have survived if it weren’t for hiredman and others on the #clojure IRC channel. Those guys are great!

I wanted the API to be simple and clear, and this is what I came up with:

To accomplish this, I ended up with the wrapper below.

Whew. Pretty intense stuff. There’s all sorts of new stuff here for the beginning lisper, so let’s step through this line by line, though we’ll skip a few of the less interesting lines.

 (declare *db*)

This is pretty straightforward. It declares *db* in the tokyo-cabinet namespace. *<var name>* is the convention for declaring globals in lisp.

(defmacro use [filename &amp; body]

Macros are what lispers tend to rave about, and this is my first one. Macros in lisp are basically the same concept as in C, you can substitute whatever you like into the place where it’s used at compile time. The difference is that lisp’s macro system is part of the language itself, so you can do absolutely anything. They do have their drawbacks, however, in that they’re exceptionally difficult to debug. After all, the whole thing is being substituted into your original source, so errors come up as if they had happened inline.

The “&” symbol might also be new to some of you. It allows for the macro to be passed an arbitrary number of arguments after the name of the database that we’re operating on. In our case, we’ll be executing the passed expressions in the context of the open database. Therefore any calls to put and/or get will operate on that particular database.

`(with-open [hdb# (HDB.)]

Oh boy. This line is basically magic. The backtick (`) is a form quote macro. This means that everything after from the backtick until the following expression is closed is the source that will be substituted into the caller’s source. There are two forms of form quotes in clojure, backtick (`) and quote (‘). They have one important difference. The backtick version namespaces all things declared with the macro to the current namespace. So in this case, anything declared would be namespaced to tokyo-cabinet. The quote version does not do this.

The # macro is the next confusing thing. When you have a macro, there’s a chance that you will be overwriting a symbol in the original source. If the caller of “use” already had “hdb” defined, I would overwrite it. The # macro automatically generates a unique name, so all references to #hdb will actually refer to something like hdb_1829_auto. Finally we’re using with-open. This macro binds hdb# to the HDB() instance for all the expressions passed to it. It then calls .close on hdb# when it exits. It basically does exactly what we want.

(.open hdb# ~filename (bit-or HDB/OWRITER HDB/OCREAT))

The new thing here is the “~”. The tilde is the unquote macro. Since we’re in a quoted form (due to the backtick on the previous line), we can’t actually access filename. filename would need to be defined in our caller. The tilde pops us out of the quoting for a second to pull in the passed filename. After that, we continue to be quoted.

We still have a problem here. “get” and “put” both refer to tokyo-cabinet/*db*, but *db* is not defined. Luckily, we can easily fix that.

(binding [*db* hdb#]

This binds hdb# to *db* for whatever expressions are passed in. Since we’re using the backtick instead of the quote, this is automatically converted to tokyo-cabinet/*db*, which is what we want.

(do ~@body))))

“~@” is the last bit of magic. Basically this is the same as ~, except you can pass a sequence and it will apply ~ to every member. This is what we want, for every expression passed to be executed in the context we’ve defined.

As time goes on, I will add more of the Java API into this wrapper. You can follow the project on github if you’re interested in the updates.

That’s all for now! We can now store and retrieve things. After I spend some time building that into the site, we’ll be largely done with the Clojure bits. I’m sure a couple more things will come up though!

Fourth: Regular Expressions in Clojure

Things are cruising right along now in creating my awesome twitter portal in clojure. So far we have gotten set up with compojure, started using the twitter API to grab data, and built some forms to make sure the data is relevant to the logged in user. The next little chore is to find URLs in tweets and make them into actual, clickable links. I want to keep this simple for now, so we’ll just find http:// or https:// and link that.

The Code

It turns out that the code to do this is really simple. Clojure just uses Java’s regular expression engine, but integrates it into the language a bit cleaner than Java does. A big thanks to Fatvat for basically walking me through it.

Nothing too complicated here, but there is an interesting new concept. For the first time ever, Clojure doesn’t do everything we want and we talk to Java. This is one of the most powerful attributes of Clojure. Even though it’s a young language, it’s built on a mature platform that does basically everything you need. In this case, we wanted to mutate the “text” string. This isn’t exactly kosher in a functional language, but I didn’t want to slice and dice the text when there was a perfectly usable Java method that would do the replacement for me.

Anyway, how does this work? “.replaceAll” is a method of java.util.regex.Matcher. What we’re trying to express in Java is:

In clojure, re-matcher returns matches constructed out of applying a Pattern instance to a string (“text”). So, we’re applying the .replaceAll method to the object returned by re-matcher, which is a Matcher instance created out of a Pattern (indicated by the “#” macro). This is exactly what we want, expressed in a nice, functional style. After the instance that we’re operating on, we can pass additional arguments to the method. In this case we pass the replacement string.

Another thing you might notice is the string in the urlize function definition. Clojure has extensive support for metadata, which is something that I’ve largely ignored. In it’s simplest form, you can pass a string to defn as I have done, and that will be included as the docstring. The language also includes introspection features to pull these things out, but I have yet to investigate them in depth.

Again, pretty straightforward, and now we’re starting to do some real damage. I think I’m going to dive into JavaScript and CSS for a while, but I’ll be back soon with static storage. It should be fun! As always, all the code is on github.

Third: Using Sessions in Compojure

After the first two posts in this series, we have a site that displays the twitter public timeline and twitter searches. Now let’s customize this by allowing users to access their own twitter accounts.

To do that, we need to allow users to log in to their twitter accounts. Twitter hasn’t rolled out their OAuth solution yet, so we’ll have to ask users to trust us with their twitter names and passwords. For now, we won’t store the password, so all we need to do is get it out of a form and store it in a session.

Processing Forms in Compojure

This is pretty easy. Compojure provides a “params” hash-map that has the parameters of the servlet, which we will pass to a “login” controller. There are several things provided to the servlet. I won’t bore you with details of creating the form itself, so here’s the server side code that processes “twitter-user” and “twitter-password”.

As you can see, we have some new syntactic goodies to learn about. “session” is a map provided by Compojure. It is thread safe, which in clojure, is done through transactional memory. Dealing with that transaction is what most of this code does. dosync is a macro that allows many expressions to be executed in one transaction. alter takes a reference (session), a function that will alter the reference (assoc) and the arguments for the altering function. We finish our dosync call, which executes the transaction, and our session is all set.

After we set up our session, we redirect to the url of the user. We pull the user name out of the session. The “@” symbol is a macro that refers to the deref function, which allows us to get values out of references. If no name was provided, we kick back to the home page.

Pretty clean and painless. Clojure isn’t so bad.