The Asynchronous Conversation
Conversations are marvelous ways to spend time. One person says one little thing, which causes somebody else to think of something else, and that causes yet another thought in another corner. They are organic, engaging, educational, and brilliant fun. For many of us, a simple conversation is what being “social” means, and many social events are organized around this one thing that we almost all love to do.
Bringing a conversation to the internet is an idea as old as the thing itself. The internet is designed around sending messages back and forth. Communication and the internet have practically become synonymous. One of the huge advantages of the internet is the ability to communicate asynchronously. However, no internet technology has really been able to capture the elegant simplicity of a casual, social conversation and make it asynchronous.
Twitter has changed that. They have achieved the asynchronous converation where no other technology has been able to. I know that’s a bold statement, but let me explain. Conversation, in the social sense, is not about getting things done. Conversation is about bouncing a thought out there and seeing what bounces back. Conversation is about talking to whoever in the room will listen, just because you want to hear what they have to say. Conversation is about expressing yourself and yielding the floor to others. Twitter captures all of these things, and removes the necessity of being in the same place and time.
The “@” replies are really what gave this capability to twitter. By being able to view the replies of people you know to people you know, your twitter feed becomes a chat room transcript, except you don’t have to stay logged in and you aren’t expected to answer. It becomes your email inbox except there’s no “To: ” list, the messages are just sentences, and you don’t need a reason to post. It becomes your IM except any other friend can throw in his two cents. In fact, it’s welcome.
While the “@” replies make Twitter a channel on which you can have a conversation, it’s the 140 character limit that forces the conversational feel. Whenever you are not limited in time or length, people tend to go on too long. In a real conversation, somebody will usually interrupt somebody else who is taking up too much time to make his point. The asynchronous nature of much of the internet does not offer these interruptions, so there’s no impetus for conversation sized-snippets of thought. Twitter’s brilliant move to limit statements to 140 characters forces single thoughts to be expressed, which is exactly what happens in a good conversation.
These two simple design decisions have created a phenomenon that I used to mock mercilessly but now embrace. The web-wide conversation is a great way to casually keep in touch with people you rarely get to have a real conversation with, and a great way to have a little fun during work without distracting you for hours on end, like facebook can do. Twitter is probably not the end of network news, real time search is probably not the end of Google, and it’s probably not without some element of narcissm. However, I’ve grown to enjoy this simple technology immensely just because it lets me talk a bit more to a few more people. More conversations are never a bad thing.
Exploring memcmp
Yesterday I had a problem. I had a chunk of binary data saved in a file that I had extracted from another file. I needed to find where this chunk of data had come from. I wanted to know if it appeared in the original file more than once, and if so, where. I also wanted to see if it appeared in a number of other files.
There is no command in Linux (that I know of) that does this, so I set out to write my own. I used the obvious and naive implementation. I memory mapped the file containing the chunk I was looking for (the needle) and the file containing all the data I was looking through (the haystack). Then I walked through the haystack byte by byte and ran memcmp to see if it was the same.
int compare(char *needle_pa, int needle_size, char *haystack_pa, int haystack_size) { int i; int found = 0; for (i = 0; i < haystack_size-needle_size; i++) { if (memcmp(needle_pa, haystack_pa + i, needle_size) == 0) { printf("Found needle at byte offset %x\n", i); found++; } } return found; }
This was fast enough for my purposes, but intellectual curiosity lead me to try and optimize this function. There are a number of ways to speed this up, but the one that seemed most fun was to compare more bits at once. This lead me to explore in internals of memcmp.
glibc’s memcmp
Glibc comes with a memcmp implementation, of course. You can view the source here. It’s quite simple. If the passed buffer is not aligned on a word boundary, iterate byte by byte doing comparison via subtraction until you get to a word boundary. Then loop through word by word comparing with the equality operator (==). Once you get to the end of all the full words, compare the final few bytes by subtraction again.
This implementation is pretty good. It’s fast on most platforms as it maximizes aligned comparisons. It’s very portable as it does not take advantage of any processor specific features. However, the fact that it does not take advantage of processor features means that it could be faster.
SIMD (Single Instruction Multiple Data)
Many platforms have SIMD instructions. SIMD instructions are very specific vector instructions that allow you to manipulate large chunks of data with a single instruction. A common operation found in SIMD instruction sets is string comparison, which is what essentially what memcmp does. For instance, in the x86 architecture, there are instructions to compare strings 16 bytes at a time. That’s 4x the throughput of the glibc implementation.
Taking advantage of SIMD instructions is a bit tricky. They’re not supported on all platforms, even in the same processor family. Intel, for instance, has SSE1, SSE2, and SSE3 instruction sets, which all add new instructions not found in the previous iteration. Generally there’s no reason to go through the pain of using these instructions unless you absolutely know you need the speed boost. Today, we’re just doing it for fun.
GCC built-in functions
GCC comes with a number of SIMD optimized replacements for common libc functions. Memcmp is one of those functions that GCC optimizes. However, turning on this optimization is a bit tricky. Again, since every platform supports different instructions, you have to be very specific about what features are supported in order to get the maximum optimization. GCC will always create the most compatible binary it can unless specifically told otherwise, which is the right thing to do.
To compile with GCC’s optimized memcmp, we run the following:
gcc -Wall findneedle.c -O3 -march=opteron -o findneedle
This tells gcc to optimize as much as it can (-O3) and that it can expect to be running on a processor with the same features as the Opteron (x86-64, SSE1-3). I’m running this on an Intel Core Duo, which supports the same feature set. This gives us a version of the findneedle program that does not use the standard memcmp implementation, but instead uses GCC’s SIMD optimized version.
The Results
My test was simple: I timed how long it took to a 16K needle in a 328MB haystack.
At first I was doing this on a virtual machine. The most interesting result of any of this is that the program running in a vmware image slowed down significantly. It went from an average of ~4 seconds to an average of ~10 seconds. Thinking that the virtual machine might be emulating the SIMD instructions, I compiled on my mac itself. In that situation I saw the expected speedup. The average search time went from ~4.4 seconds to ~4.1 seconds. That’s a 7% speedup!
Clearly, these results show that optimizing your program to take advantage of processor features does not always help very much. However, if you’re compiling your own programs (I’m looking at you, gentoo fanatics), it does help to make sure that your processor options are set correctly. Just switching a few flags can cause core functionality to be improved, and that makes the whole experience that much better.
Also, assume your virtual machine is a 486. It’s way faster that way.
Update:
There was a good debate about optimizing the algorithm itself over on hacker news. jws was nice enough to do a rundown on a number of different algorithms. Not surprisingly, the one demonstrated above is indeed the most naive.
6th: Clojure Agents
Flockr is slow. I would profile it, but profiling in Java is a pain, so I’m going to commit a cardinal sin and guess as to what is causing the slowdown. Every time a user loads their flockr page, a bunch of synchronous http requests go out to twitter, return, and the responses are rendered. This could easily happen in a parallel fashion, and my guess is that the slowdown is caused by doing these requests synchronously.
“The Right Way” to solve this problem is using non-blocking I/O. However, that doesn’t let me play with fancy clojure features. Clojure has Agents built in. Agents are bits of code that execute asynchronously and in a thread safe manner. They allow concurrency without any of the usual pains.
Internally, I believe agents are simply functions distributed to a thread pool. Since state is immutable in clojure, it’s pretty trivial to make that arrangement thread safe. However, this is not the optimal solution for my problem as it is limited by the capabilities of the thread pool. The highest throughput would be through non-blocking IO, but we’ll use agents for now for the sake of learning Clojure.
Ok. This might get complicated. However, Berlin Brown was very helpful in getting this all figured out.
First we need to construct the agents. Since there are a dynamic number of channels, we need to put them in a data structure. I just construct a single agent for every twitter channel in column 1 and stick those in a list (the output of map), and then the same thing for the second column.
Constructing the agents is pretty easy. All we need to do is delcare an agent with agent and its default value. If I were to read it right when I created it (by doing @<agent-ref>), I would get the empty string.
I then pass the freshly constructed agent to send-off. send-off takes an agent and the function that will modify the agent. The return value of the generic function that I pass in will become the new value of the agent at some unspecified time in the future. send-off itself returns a reference to its agent immediately.
After running those first two maps I have two lists of agents, which represent the two columns of content. I then need to wait for all that content to get filled in. To do that, I use await. await takes any number of agents that it will wait for before continuing. If I did not wait for the agents to finish, I would return a blank page to the user! Not wanting to do that, I take my two lists of agents, concatenate them, and them use them as the arguments to await using apply.
After that, it’s easy! I have all the rendered channels in my lists of agents, so I iterate through each one, stick them in their columns by dereferencing (@ch-agent) and then send the whole thing off.
The question is whether it really improves performance. Without understanding the internal implementation of the thread pool, we are still limited by the slowest response from Twitter. This is very unfortunate, and in the end, this should be handled on the client side with JavaScript. That’s no fun though, so we’ll just optimize the server side and see what kind of performance we can squeeze out of the thread pool (I’m pretty sure mine is still sub-optimal). Even this fairly straightforward default configuration did cut average response time in half, however, so that’s a pretty good start.
As always, the entirety of the code is available on github.
Invest to Stimulate
The nice thing about a bubble bursting is that you know you’re not in a bubble. Now is the time to look at how we should be investing for the long term, both on personal and governmental levels.
Who Can Help
The recently passed stimulus bill (not the currently debated spending bill) focused on a few main areas. The first was “shovel ready” government projects. These projects are mere stopgaps. They employ people for a short time and then they are finished. Building a bridge or a road, while nice, creates a minimum amount of additional value. Coming up with a single idea that transforms our energy industry, on the other hand, potentially creates huge amounts of additional value. It is these types of innovations that need to be funded, and that was what a part of the stimulus bill was aimed at. Funding “green” technology and IT companies that would transform healthcare by making it efficient.
The problem with these investments is that the markets are artificial, or at least unproven. The market for energy efficient vehicles and other appliances will be created, in large part, by another part of the stimulus bill. Raw incentives and tax cuts are being handed out to those who will purchase the products the stimulus funded the development of. It’s a market, perhaps, but the value is all what the government put into it. If there is actual demand for these energy solutions, they will succeed wihout subsidies (and be extraordinarily lucrative with the government subsidies).
Determining what energy solutions will actually create new markets with obscene profits and goodness for all mankind is not an easy task. Luckily, we have professionals that do this on a day to day basis. These people are called “investors”. There are many types of investors, but they all have one goal: to invest in securities that will increase in value. Since they share this common goal, they have come up with numerous ways of determining whether something will make money or not. They are not always right, but they are way better at it than anybody else.
Fixing Societies Woes
The ideal situation, then, is to somehow get these investors the capital and time necessary to find the corners of the market that will flourish given enough time and capital and allow the investors to give these markets time and capital. As you can see, this all requires time and capital. There is an institution in this country that has plenty of both, and that is the Social Security Administration.
The Social Security Administrations currently looks over a fund of nearly 2.5 trillion dollars. That is enough money to fund just about everything. What’s more, people don’t need it until they are at least 62. This gives approximately 40 years from the time people start investing until the fund needs to reach maturity. There has never, ever been a 20 year period, let alone a 40 year period, in which the stock market on average has not returned a better return on investment than any other security.
It is my feeling that privatizing social security provides us with the capital necessary to bust out of any economic crunch, this one included. However, capital alone is not sufficient.
Privatizing Social Security, Responsibly
When the housing bubble burst a year ago, people were stunned to learn that their retirement savings had been slashed by huge percentages. I can’t even imagine the emotional impact of helplessly watching your grand retirement being washed away in a matter of weeks. This cannot be allowed to happen to Social Security, and it’s my belief that it needn’t happen to either Social Security or 401k’s.
As Cass Sunstein and Richard Thaler argue in Nudge, people are very unlikely to change from default choices. By following some of the recommendations there, we can make privatizing Social Security safer than the 401k’s people use today.
An ideal system would offer a very good default with a limited number of good alternatives, while also allowing the reckless individual to invest however he pleases. The default option for a person’s Social Security should be a private fund that invests heavily in index funds with a small percentage set aside for risky venture and low risk bond investments. It should also be age aware. By the time the individual reaches age 55 (or whatever is deemed appropriate), they should be exposed to the minimum amount of risk. A stock market crash on your 62nd birthday should ruin your day, not your retirement.
Investing in the Game Changers
The final piece of the puzzle that is necessary for true economic stimulus is investing in the potentially explosive markets. Getting in on the next Google or Microsoft: the next big thing. These are the companies that will employ vast numbers of people in short periods of time, provided they have the capital to grow. Currently, this niche is filled almost exclusively by Venture Capitalists. I feel that while these players certainly have a role to fill, current legislation practically prohibits the American public from taking advantage of these opportunities.
Sarbanes-Oxley was a bill passed shortly after the Enron scandal. It imposes very strict accounting requirements on public companies which end up costing these companies millions of dollars a year. Before these regulations existed, having an IPO (Initial Public Offering) was a good way to raise a round of capital to expand. After Sarbanes-Oxley, only companies that have enough money to offset the prohibitive costs of entry have an IPO. This means no regular Americans get to invest in the little guys that are steadily hiring and trying to grow with as little money as possible. The irony is that Enron was successfully prosecuted on laws that were already on the books, not Sarbanes-Oxley. These laws had just been poorly enforced until the scandal broke.
Now I do not mean to suggest that we should throw our Social Security into the venture capital arena. What I do mean to suggest is that a well balanced fund should invest a small percentage in risky investments, and that small companies should be able to benefit from these investments. Two percent of 2.5 trillion is still 50 billion dollars, and that is enough to fund whatever the next big thing may be. If we are careful, it could pay back in spades.
Stimulus Through Prudent Investment
With the index funds at ten year lows, this is a buyers market. The only way to bring the markets up, along with the economy, is to invest in those companies that are poised for expansion. Nothing is more about investing in the future than Social Security, and it seems obvious that these two identical goals should be combined. By removing the obstacles to investment, providing the capital for investment, and putting it in the hands of professionals, we can get America’s economy back on track.
Fifth: Static Storage and Tokyo Cabinet
If you’ve been following, you know that I’m trying to build the Web 2.0iest site out there. In fact, this is so Web 2.0, I’m tempted to call it Web 2.1. I’m using only the hottest language (Clojure) and the coolest social networking APIs (twitter). Now I’m kicking it up a notch and using the newest player in the key/value database arena, Tokyo Cabinet.
Introduction
Tokyo Cabinet is a simple, small, fast key/value store. Similar to DBM, it’s a very basic database. If you combine it with Tokyo Tyrant, it becomes a very capable, scalable network database (like mysql or couchdb). However, we don’t really need those things right now, so I’ll just be using Tokyo Cabinet straight up. It comes with a Java interface, so I’ll be using that. I will be using Tokyo Cabinet to store and retrieve user preferences.
Setting up Tokyo Cabinet
Going with the latest and greatest does have its drawbacks. Tokyo Cabinet isn’t packaged up and easy to install yet, so there is some setup involved. In fact, I would recommend not using this for anything beyond your own satisfaction. The project is out of Japan, and the support in English isn’t really there yet. Luckily, since it’s so small and simple, it’s really not that bad.
- Install Tokyo Cabinet
- Download the source from sourceforge.
- Unpackage
- Make sure you have the development headers for zlib and bzip2
-
./configure
-
make
-
make check
-
make install
- Install Java bindings
- Download the source from sourceforge
- Make sure you have jni.h. I didn’t and installed the ubuntu package “default-jdk-builddep”. That in turn installed most of the current open source software in existence, which seemed to work.
- Run the configure script. Mine screwed up the paths to all my java utilities, so I manually edited the Makefile to get it to work. Perhaps that’s not “The Right Way” to do things, but it worked.
- Make sure tokyocabinet.jar ends up in your classpath
- Make sure the libjtokyocabinet.* is in your java.library.path. I haven’t figured out a good system for configuring Java yet, so I just have it defined in my “compojure” bash script.
Storing Data
The next challenge for me (but luckily not for you) was to write a nice clojure wrapper over the Java interface for Tokyo Cabinet. This introduced me to all sorts of new concepts in clojure, and I wouldn’t have survived if it weren’t for hiredman and others on the #clojure IRC channel. Those guys are great!
I wanted the API to be simple and clear, and this is what I came up with:
To accomplish this, I ended up with the wrapper below.
Whew. Pretty intense stuff. There’s all sorts of new stuff here for the beginning lisper, so let’s step through this line by line, though we’ll skip a few of the less interesting lines.
(declare *db*)
This is pretty straightforward. It declares *db* in the tokyo-cabinet namespace. *<var name>* is the convention for declaring globals in lisp.
(defmacro use [filename & body]
Macros are what lispers tend to rave about, and this is my first one. Macros in lisp are basically the same concept as in C, you can substitute whatever you like into the place where it’s used at compile time. The difference is that lisp’s macro system is part of the language itself, so you can do absolutely anything. They do have their drawbacks, however, in that they’re exceptionally difficult to debug. After all, the whole thing is being substituted into your original source, so errors come up as if they had happened inline.
The “&” symbol might also be new to some of you. It allows for the macro to be passed an arbitrary number of arguments after the name of the database that we’re operating on. In our case, we’ll be executing the passed expressions in the context of the open database. Therefore any calls to put and/or get will operate on that particular database.
`(with-open [hdb# (HDB.)]
Oh boy. This line is basically magic. The backtick (`) is a form quote macro. This means that everything after from the backtick until the following expression is closed is the source that will be substituted into the caller’s source. There are two forms of form quotes in clojure, backtick (`) and quote (‘). They have one important difference. The backtick version namespaces all things declared with the macro to the current namespace. So in this case, anything declared would be namespaced to tokyo-cabinet. The quote version does not do this.
The # macro is the next confusing thing. When you have a macro, there’s a chance that you will be overwriting a symbol in the original source. If the caller of “use” already had “hdb” defined, I would overwrite it. The # macro automatically generates a unique name, so all references to #hdb will actually refer to something like hdb_1829_auto. Finally we’re using with-open. This macro binds hdb# to the HDB() instance for all the expressions passed to it. It then calls .close on hdb# when it exits. It basically does exactly what we want.
(.open hdb# ~filename (bit-or HDB/OWRITER HDB/OCREAT))
The new thing here is the “~”. The tilde is the unquote macro. Since we’re in a quoted form (due to the backtick on the previous line), we can’t actually access filename. filename would need to be defined in our caller. The tilde pops us out of the quoting for a second to pull in the passed filename. After that, we continue to be quoted.
We still have a problem here. “get” and “put” both refer to tokyo-cabinet/*db*, but *db* is not defined. Luckily, we can easily fix that.
(binding [*db* hdb#]
This binds hdb# to *db* for whatever expressions are passed in. Since we’re using the backtick instead of the quote, this is automatically converted to tokyo-cabinet/*db*, which is what we want.
(do ~@body))))
“~@” is the last bit of magic. Basically this is the same as ~, except you can pass a sequence and it will apply ~ to every member. This is what we want, for every expression passed to be executed in the context we’ve defined.
As time goes on, I will add more of the Java API into this wrapper. You can follow the project on github if you’re interested in the updates.
That’s all for now! We can now store and retrieve things. After I spend some time building that into the site, we’ll be largely done with the Clojure bits. I’m sure a couple more things will come up though!