Task queues, Redis, Python, Celery, RQ

  • There are many instances where your application has some expensive work to do, that would be not good to execute in the hot path. (e.g. responding to an HTTP request)
  • The typical solution is to enqueue a task in a queue and have a worker process it “offline”
  • In Python, Celery is a popular library for this. It uses backends for the actual queue. Popular backends are RabbitMQ and Redis.
  • RQ is another Python that supports Redis only.
  • RabbitMQ is designed to be a queue โ€” it’s in the name.
  • Redis is a in memory key value store/database. (or “data structure” store). It includes a number of primitives that might be used to implement a queue. It’s basically a in memory hashmap. Keys are strings in a flat namespace. Value are a set of supported fundamental data types. Everything is serialized as it’s interacted with IPC.
    • List โ€” (Implemented in RQ.) You can use opcodes like LPUSH and RPOP.
    • Pub/Sub โ€” Unsure about this. (Possibly implemented in Celery?)
    • Stream โ€” Advanced but apparently is not implemented in either Celery or RQ.
  • Celery has sleek Pythonic syntactic sugar for specifying a “Task” and then calling it from the client (web app). It completely abstracts the queue. It returns a future to the client (AsyncResult) โ€” the interface is conceptually similar to std::async in C++.
  • RQ is less opinionated and any callable can be passed into this .enqueue() function, with arguments to call it with. This has the advantage that the expensive code does not need to have Celery as a dependency (to decorate it). However that is not a real downside, as you can always keep things separate by making Celery wrappers around otherwise dependency free Python code. But it is an addition level of layers.
  • Heroku offers support for this โ€” you just need to add a new process to your Procfile for the celery/RQ worker process. Both celery/RQ generate a main.
  • Redis has other uses beyond being a queue: it can be a simple cache that you application accesses on the same server before accessing the real database server. It can also be used to implement a distributed lock (sounds fancy but is basically just a single entry in redis that clients can check to see if a “lock” is taken. Of course it’s more complicated than just that). Redis also supports transactions, in a similar way to transactional memory on CPUs. In general there are direct parallels from from local parallel programming to nearly everything in this distributed system world. That said there are unique elements โ€” like the Redis distributed lock includes concepts like a timeout and a nonce in case the client that acquired the lock crashes or disappears forever. That is generally not something you’d see in a mutex implementation. Another difference is that even though accessing Redis is shared mutable state, clients probably don’t need some other out of band mutex because Redis implements atomicity likely. That’s different than local systems because even if the shared, mutable data type you’re writing to/reading from locally is atomic (like a int/word), you should still use a mutex/atomic locally due to instruction reordering (mutexes and atomics insert barriers).

Any thoughts?