Patterns of SOA: Background Job

This entry in our “Patterns of Service-oriented Architecture” is a very common one, but it bears discussion. It’s running code in a background process, instead of in a synchronous request a Consumer might be waiting on.

Intent

Run expensive work in a background job instead of a web request to avoid timeouts, poor user experience, and unnecessary resource use.

Motivation

When a user or system makes a request over HTTP, not all work needed to fulfill that request is required to give a complete response. This differs from Asynchronous Transaction, where long-running work is needed to return a response.

Here, any work not needed to respond to the request can be done later or offline. This is often a surprising amount of work! Performing it offline allows the web request to deliver a response as quickly as possible by doing only the work that is necessary to generate that response.

For example, when we process a returned shipment in our warehouse, the warehouse associate indicates which items were returned. All they need to do to complete their work is get an acknowledgement that the system recorded that the items were returned. However, our returns business process will automatically charge or refund the customer if they either didn’t pay for their Fix, or if they accidentally returned an item they paid for.

The associate processing the return doesn’t need to wait for that charge or refund to complete—it’s irrelevant to their job. So, we run that code in a background job.

Applicability

Any work not needed to produce a definitive response to a request should go into a background job. This is often ancillary bookkeeping or other side work that needs to happen as a result of the request, but that doesn’t affect the response given to the user or client.

Note that if you can design your solution as a Parameterless Job, that is preferable, as you can simply run it periodically, rather than explicitly queue it from code that could fail. Imagine a simple two-step process of updating a database and then queuing a Background Job. If the queuing fails (e.g. momentary network outage to your queueing system), you have updated the database but lose the job. While you can queue the job in a Database Transaction, this creates other problems (such as locking on a possibly shared table for too long). A Parameterless Job doesn’t have any of these downsides, since it runs on a schedule and doesn’t have to be told explicitly what work to do—it examines the state of the system at the time it runs to know what to do.

Structure

The Background Job pattern looks like so:

In words:

The Consumer makes a request of a Service’s endpoint (e.g. a Controller in an MVC architecture).
The Service endpoint does the synchronous processing needed to return a response.
The Service queues a background job to perform the remainder of the needed processing.
The Background Job picks up the work and defers it to the underlying Business Logic (e.g. to a service object) that does the actual work.

It’s customary that the code making up the Background Job only exists to determine what needs to be done. It doesn’t actually do the work, but defers to another class or object.

Anti-Patterns/Gotchas

In a large service-oriented architecture, it’s often important that the code to perform particular logic live in the right place. Using a Background Job implies that the logic being backgrounded belongs in the application that is servicing the original request. Sometimes, however, the logic to be backgrounded belongs elsewhere, and it might be more appropriate to Send a Message and allow another Service handle the ancillary logic.
Background jobs can experience failures just like any other piece of code. Because of that, you need to be mindful of what happens when a job fails. A good practice is to configure all jobs for automatic retry. This then requires that your job be an Idempotent Operation, so it can be safely retried without having its effect felt more than once. This is a good thing as it means you don’t have to babysit failed jobs or manually intervene. Still, jobs can fail permanently (e.g. an extended network outage of a bug in the code), so you’ll need some system in place to allow manual retry or manual clearing of jobs.

Dave Copeland

May 31, 2017 - Washington, DC

Intent

Motivation

Applicability

Structure

Anti-Patterns/Gotchas

See Also

Come Work with Us!