Fundamental attribution error of programming

codeSam Stephenson is the creator of the Prototype JavaScript framework and rbenv, the competitor to RVM. He recently wrote an interesting article why programmers are not their product named “you are not your code“. Are you?

This is in fact what programmers do quite often: their identify themselves with their code. After all, they have written and created every line and every character. They have invented the names, the functions, and the structures. Nobody else knows their code as good as they do. They own their “precious” code. Programmers are like little gods who like to rule their own universe.

The advantage is obvious: if the software is succesful and you identify with it, it is your success. The drawback: if the software is not succesful and you identify with it, it is your failure. This is similar to a sports team: if a sports team wins, then everybody wants to take part in the success. If the team continues to lose, then everybody starts to blame each other: the president the trainer, the trainer the players, the players each other, etc.

It often works to claim the ownership of something because people have a lot of cognitive biases. One of these biases is the fundamental attribution error in Psychology: we have a tendency to over-emphasize personality-based explanations and ignore the role of other influences (for instance situational ones). We also tend to attribute great events to great men, know as great man theory.

While it is debatable if this is a good thing or not, a developer of a modern web application can hardly claim he is the only author of it. In the early days of PCs, it was only the programmer and the CPU that mattered, at least if you did machine programming in assembly language directly. Then we had the first high-level programming language to program systems with disk-operating systems like CP/M or various forms of DOS. Together with graphical user interfaces object-oriented programming languages arrived, and for the web comfortable high-level languages like Java, Ruby or Python with garbage collection appeared. Today we have 4 or 5 layers between the programmer and the CPU: for example for Ruby programs the programs are written in Ruby, Ruby is written in C, C is written in Assembly, and Assembly boils down to machine code.

And this is only the language itself. A modern web application is like an iceberg, the stuff above the surface is written by you and your team, the stuff below by countless others. It is not only the language and the tools for editing and debugging, a web application is based on a lot of different servers and systems

  • the operating system like MacOS or Linux
  • the web server like Apache or Nginx
  • the web server modules like Phusion Passenger
  • the database server like MySQL or PostgreSQL
  • the caching server like Memcached or Redis
  • the mail server and mail transfer agents like Postfix or Sendmail
  • the message queue processing server like ActiveMQ, RabbitMQ or ZeroMQ

Then there are also the languages and version management systems, frameworks and libraries,
gems and plugins, written by countless other developers:

  • languages like C, Ruby, Python or Javascript
  • version management systems like SVM, Git, RVM or rbenv
  • frameworks like Rails or Django
  • libraries like Prototype or jQuery
  • gems and plugins for pagination, authentication, etc.

In order to build a modern application, you setup different servers and configure them, choose a language, a framework and suitable libraries, and finally you select different plugins and gems and stick them together in a unique way. If you have done all this you can hardly claim you have created the system. And yet we tend to do it..

Therefore if you are a Ruby developer and you have produced more than others, it is not because you are taller or smarter. It is probaby because you are standing on the shoulders of many others.

(The sourcecode photo is from Flickr user nyuhuhuu)

Background Jobs in Ruby on Rails

Short response time is critical for every web application. Time consuming operations and long-running tasks which require intensive computation often can not be processed immediately during the normal HTTP request/response cycle. Otherwise the application gets unresponsive soon. The solution is background job processing. If you want to keep your app fast and responsive, then it is recommendable to move those long-running tasks into background processes. After the job is placed in a background queue, the application can return a response immediately. Examples include:

  1. pushing some data to a slow external service
  2. pulling data from a slow external service
  3. accessing a remote API (like posting something to Twitter)
  4. number crunching tasks
  5. managing large uploads or downloads
  6. processing huge multi-media files
  7. generating large pdfs or pdf reports
  8. generating image thumbnails
  9. sending of bulk email, newsletters or SMS

Many big sites use background jobs and job queues to process time consuming operations, for example Amazon, GitHub and Twitter, at least in early days. Amazon offers now a queue service named Amazon SQS. Chris Wanstrath has written a short history of using background jobs in GitHub. Twitter has tried various ways of using background processes, too.

There are in fact many options for performing background processing in Rails. The official wiki entry for background processes in ruby on rails is a bit sparse, the entry about background jobs in rails in the older wiki, too. The overview from Andy Stewart about long-running tasks in Rails without much effort is better. Tobin Harris lists 6 ways to run background jobs in Ruby on Rails.

In general, one can distinguish between different task storage and task execution forms. The task storage can be the database, which means persistence and durability, or a message queue (Amazon SQS, Websphere MQ, RabbitMQ, ..), which means high performance if the queue operates in memory. The task execution can be immediately, for example by an always running background daemon, or periodically, for instance by running a CronJob. For periodically reoccurring jobs which have to be executed at a certain time of day a rake task or a script/runner controlled by a cron job is a good solution. This is perfect for jobs which should be running once a day. If a task must be processed as soon as possible, usually some form of storage comes into play. The jobs can be stored in a message queue (MQ), in the database (DB) or not at all:

No queues or task storage

Spawn

=> use processes and threads, just fork a new process or create a new thread for each job

Database-driven Job Queues

BackgroundDRb, Background Job (BJ), DelayedJob (DJ), JobFu, Background-Fu

=> store jobs in persistent job queue table (database-driven Job queues), persistent but slow, closely integrated and tightly coupled to Rails project, use the SQL database as a message bus

Message Queues

Sparrow, Starling, Kestrel, RabbitMQ, Apache ActiveMQ, Amazon SQS, Beanstalkd
=> store jobs in memory (message queue), not always persistent but very fast, loosely coupled to Rails project, use a real message bus

Now let us take a closer look at the different solutions to the same problem. Users prefer those options and choices which are simple to set up and simple to use. The most simple solutions are database-driven Job queues: Background Job (BJ) and DelayedJob (DJ). DelayedJob is so popular that is has also two similar clones, JobFu and Background-Fu. Even simpler is to use no database or queue at all.

No queues or task storage

Spawn

– small plugin for Rails to easily fork or thread long-running code blocks
– executes task in new background process by creating a new child process (Forking) or new thread (threading)

Plugins for Database-driven Job Queues

BackgroundDRb

Written by Ezra Zygmuntowicz, originally used Distributed Ruby (DRb)
– stores jobs in persistent job queue table
– queue is processed using specific, hard-wired worker objects
– queue is filled by “MiddleMan” objects
– BackgroundDRb server “backgroundrb” contains workers and listens on certain port for new events/requests

Background Job (BJ)

– stores jobs in persistent job queue table and processes the jobs of the table in the background
– simple and robust solution, but how you structured the jobs is largely up to you. jobs have no direct connection to Rails models or their methods, Bj runs jobs as command line applications. Bj just runs the ruby or bash scripts you specify. Good “general purpose solution”.
– you need to use script/runner or rake tasks commands to access Rails models, which will loads the entire
Rails environment for each job
– no daemon process for processing the jobs, workers are not persistent, only one background process started or signalled for each stored job

DelayedJob (DJ)

– Written by Tobias Lütke (Tobi), used by GitHub in the past
– needs the daemons gem to create a background daemon process
– stores jobs in persistent job queue table (“delayed_jobs”)
– you can turn any method call into a job to be processed later: a job is queued by calling send_later(method, params) on any object. You can also use custom job classes.
– the queue is processed by a rake task (rake jobs:work) or a script “delayed_job” which start and stops a daemon process to process the queue
– the daemon will check for queued background jobs every 5 seconds
– good documentation, many tutorials, for example here, here, here or here.

JobFu

– written by Jon Stenqvist
– needs the daemons gem to create a background daemon process
– similar to delayed_job (a delayed_job clone)
– stores jobs in persistent job queue table (“jobs”)
– you can turn any method call into a job to be processed later (using the “Backgrounded handler” syntax)

Background-Fu

– written by Jacek Becela
– needs the daemons gem to create a background daemon process
– similar to delayed_job (a delayed_job clone)
– stores jobs in persistent job queue table (“jobs”)
– the daemon will check for queued background jobs every 5 seconds

Message Queue Servers

Message queue servers are available in various languages, Erlang (RabbitMQ), C (beanstalkd), Ruby (Starling or Sparrow), Scala (Kestrel) or Java (ActiveMQ). A short overview can be found here

Sparrow

– written by Alex MacCaw
– Sparrow is a lightweight queue written in Ruby that “speaks memcache”

Starling

– written by Blaine Cook at Twitter
– Starling is a Message Queue Server based on MemCached
– written in Ruby
– stores jobs in memory (message queue)
– Ruby client: for instance Workling
– documentation: some good tutorials, for example the railscast about starling and workling or this blog post about starling

Kestrel

– written by Robey Pointer
– Starling clone written in Scala (a port of Starling from Ruby to Scala)
– Queues are stored in memory, but logged on disk

RabbitMQ

– RabbitMQ is a Message Queue Server in Erlang
– stores jobs in memory (message queue)
– Ruby client: AMQP/Carrot/Workling/Minon

Apache ActiveMQ

– ActiveMQ is an open source message broker in Java
– Ruby client: ActiveMessaging can be used to access ActiveMQ

Beanstalkd

– written by Philotic, Inc. to improve the response time of a Facebook application
– in-memory workqueue service mostly written in C
– Ruby client: async-observer or beanstalk-client-ruby
– Docu: http://nubyonrails.com/articles/about-this-blog-beanstalk-messaging-queue

Amazon SQS

Amazon Simple Queue Service
– Ruby client: workling

( The alarm clock picture is from Flickr user tripplehelix )

Chris Wanstrath

RabbitMQ and AMQP in Ruby

Ilya Grigorik recently wrote a nice summary about message handling with AMQP. Rany Keddo, the author of the Workling plugin, wrote another nice post about RabbitMQ with amqp. What is the most simple and basic example of using an AMQP message queue from Ruby? And what is an AMQP message queue anyway?

Message queues are useful to store a large number of items which can not be processed immediately and to achieve a loose coupling between application parts in general. There are two standard protocols for asynchronous message queues, AMQP and XMPP. AMQP is the lighter one which uses a binary format, while XMPP uses XML. AMQP means Advanced Message Queuing Protocol, it is an open standard application layer protocol for Message Oriented Middleware.

To use AMQP in Ruby, you can use either the carrot gem (a synchronous AMQP client without using EventMachine) or the amqp gem (a simple AMQP driver for Ruby using EventMachine). You can install them both at once with

$ sudo gem install amqp carrot

And you need to install a server implementing AMQP. RabbitMQ is an open source AMQP broker written in Erlang. To install the Erlang package and the RabbitMQ daemon under Ubuntu or Debian try

$ sudo apt-get install rabbitmq-server

Then you can start or stop the RabbitMQ daemon (the Linux counterpart of a service in Windows) with

$ sudo /etc/init.d/rabbitmq-server start
$ sudo /etc/init.d/rabbitmq-server stop

Here is a simple example for a publish/subscribe message exchange process using an AMQP message queue. The information flow in the publish/subscribe pattern is similar to the client/server pattern, the difference is that the server is sending (=publishing) and the client is listening (=subscribing), instead of the client-server pattern where the server is listening and the client is sending. Run ruby client.rb in one console and ruby server.rb in another:

client.rb (subscriber)

require 'rubygems'
require 'amqp'
require 'mq'

AMQP.start(:host => 'localhost' ) do
 q = MQ.new.queue('tasks')
 q.subscribe do |msg|
   puts msg
 end
end

server.rb (publisher)

require 'rubygems'
require 'amqp'
require 'mq'

AMQP.start(:host => 'localhost') do
 MQ.queue('tasks').publish("hello world")
 MQ.queue('tasks').publish("it is #{Time.now}")
 AMQP.stop { EM.stop }
end

It is also possible to use EM.run instead of AMQP.start. It is necessary to call AMQP.stop, otherwise you will be stuck in an endless loop. Another possibility is using the carrot gem:

client.rb (subscriber)

require 'rubygems'
require 'carrot'

q = Carrot.queue('tasks', :durable => true)

puts "count: #{q.message_count}"
while msg = q.pop(:ack => true)
 puts msg
 q.ack
end
Carrot.stop
end

server.rb (publisher)

require 'rubygems'
require 'carrot'
q = Carrot.queue('tasks')
q.publish("hello world")
q.publish("it is #{Time.now}")