Short response time is critical for every web application. Time consuming operations and long-running tasks which require intensive computation often can not be processed immediately during the normal HTTP request/response cycle. Otherwise the application gets unresponsive soon. The solution is background job processing. If you want to keep your app fast and responsive, then it is recommendable to move those long-running tasks into background processes. After the job is placed in a background queue, the application can return a response immediately. Examples include:

  1. pushing some data to a slow external service
  2. pulling data from a slow external service
  3. accessing a remote API (like posting something to Twitter)
  4. number crunching tasks
  5. managing large uploads or downloads
  6. processing huge multi-media files
  7. generating large pdfs or pdf reports
  8. generating image thumbnails
  9. sending of bulk email, newsletters or SMS

Many big sites use background jobs and job queues to process time consuming operations, for example Amazon, GitHub and Twitter, at least in early days. Amazon offers now a queue service named Amazon SQS. Chris Wanstrath has written a short history of using background jobs in GitHub. Twitter has tried various ways of using background processes, too.

There are in fact many options for performing background processing in Rails. The official wiki entry for background processes in ruby on rails is a bit sparse, the entry about background jobs in rails in the older wiki, too. The overview from Andy Stewart about long-running tasks in Rails without much effort is better. Tobin Harris lists 6 ways to run background jobs in Ruby on Rails.

In general, one can distinguish between different task storage and task execution forms. The task storage can be the database, which means persistence and durability, or a message queue (Amazon SQS, Websphere MQ, RabbitMQ, ..), which means high performance if the queue operates in memory. The task execution can be immediately, for example by an always running background daemon, or periodically, for instance by running a CronJob. For periodically reoccurring jobs which have to be executed at a certain time of day a rake task or a script/runner controlled by a cron job is a good solution. This is perfect for jobs which should be running once a day. If a task must be processed as soon as possible, usually some form of storage comes into play. The jobs can be stored in a message queue (MQ), in the database (DB) or not at all:

No queues or task storage

Spawn

=> use processes and threads, just fork a new process or create a new thread for each job

Database-driven Job Queues

BackgroundDRb, Background Job (BJ), DelayedJob (DJ), JobFu, Background-Fu

=> store jobs in persistent job queue table (database-driven Job queues), persistent but slow, closely integrated and tightly coupled to Rails project, use the SQL database as a message bus

Message Queues

Sparrow, Starling, Kestrel, RabbitMQ, Apache ActiveMQ, Amazon SQS, Beanstalkd
=> store jobs in memory (message queue), not always persistent but very fast, loosely coupled to Rails project, use a real message bus

Now let us take a closer look at the different solutions to the same problem. Users prefer those options and choices which are simple to set up and simple to use. The most simple solutions are database-driven Job queues: Background Job (BJ) and DelayedJob (DJ). DelayedJob is so popular that is has also two similar clones, JobFu and Background-Fu. Even simpler is to use no database or queue at all.

No queues or task storage

Spawn

– small plugin for Rails to easily fork or thread long-running code blocks
– executes task in new background process by creating a new child process (Forking) or new thread (threading)

Plugins for Database-driven Job Queues

BackgroundDRb

Written by Ezra Zygmuntowicz, originally used Distributed Ruby (DRb)
– stores jobs in persistent job queue table
– queue is processed using specific, hard-wired worker objects
– queue is filled by “MiddleMan” objects
– BackgroundDRb server “backgroundrb” contains workers and listens on certain port for new events/requests

Background Job (BJ)

– stores jobs in persistent job queue table and processes the jobs of the table in the background
– simple and robust solution, but how you structured the jobs is largely up to you. jobs have no direct connection to Rails models or their methods, Bj runs jobs as command line applications. Bj just runs the ruby or bash scripts you specify. Good “general purpose solution”.
– you need to use script/runner or rake tasks commands to access Rails models, which will loads the entire
Rails environment for each job
– no daemon process for processing the jobs, workers are not persistent, only one background process started or signalled for each stored job

DelayedJob (DJ)

– Written by Tobias Lütke (Tobi), used by GitHub in the past
– needs the daemons gem to create a background daemon process
– stores jobs in persistent job queue table (“delayed_jobs”)
– you can turn any method call into a job to be processed later: a job is queued by calling send_later(method, params) on any object. You can also use custom job classes.
– the queue is processed by a rake task (rake jobs:work) or a script “delayed_job” which start and stops a daemon process to process the queue
– the daemon will check for queued background jobs every 5 seconds
– good documentation, many tutorials, for example here, here, here or here.

JobFu

– written by Jon Stenqvist
– needs the daemons gem to create a background daemon process
– similar to delayed_job (a delayed_job clone)
– stores jobs in persistent job queue table (“jobs”)
– you can turn any method call into a job to be processed later (using the “Backgrounded handler” syntax)

Background-Fu

– written by Jacek Becela
– needs the daemons gem to create a background daemon process
– similar to delayed_job (a delayed_job clone)
– stores jobs in persistent job queue table (“jobs”)
– the daemon will check for queued background jobs every 5 seconds

Message Queue Servers

Message queue servers are available in various languages, Erlang (RabbitMQ), C (beanstalkd), Ruby (Starling or Sparrow), Scala (Kestrel) or Java (ActiveMQ). A short overview can be found here

Sparrow

– written by Alex MacCaw
– Sparrow is a lightweight queue written in Ruby that “speaks memcache”

Starling

– written by Blaine Cook at Twitter
– Starling is a Message Queue Server based on MemCached
– written in Ruby
– stores jobs in memory (message queue)
– Ruby client: for instance Workling
– documentation: some good tutorials, for example the railscast about starling and workling or this blog post about starling

Kestrel

– written by Robey Pointer
– Starling clone written in Scala (a port of Starling from Ruby to Scala)
– Queues are stored in memory, but logged on disk

RabbitMQ

– RabbitMQ is a Message Queue Server in Erlang
– stores jobs in memory (message queue)
– Ruby client: AMQP/Carrot/Workling/Minon

Apache ActiveMQ

– ActiveMQ is an open source message broker in Java
– Ruby client: ActiveMessaging can be used to access ActiveMQ

Beanstalkd

– written by Philotic, Inc. to improve the response time of a Facebook application
– in-memory workqueue service mostly written in C
– Ruby client: async-observer or beanstalk-client-ruby
– Docu: http://nubyonrails.com/articles/about-this-blog-beanstalk-messaging-queue

Amazon SQS

Amazon Simple Queue Service
– Ruby client: workling

( The alarm clock picture is from Flickr user tripplehelix )

Chris Wanstrath

Advertisements