Alright! We all have developed applications with scheduler methods mainly for handling some background tasks that have nothing to do with user/other services interaction. In this article, we will be looking at the problem that kicks in when you start to horizontally scale your application with schedulers. Horizontal scaling is also otherwise known as X-Axis Scaling. This is to ensure the high availability of our application in case a node goes down for some unknown reasons.
The scheduled processes in our scaled-out application will run in parallel multiple times (based on the number of instances). But, most of the time, this might not be our expectation. Below are some of my thought processes for ensuring a scheduled task is running only once.
Thought process #1 — Conditionally enabling the scheduler methods:
A simple and a brute-force way is to define a new property that will tell whether to enable the scheduler or not. The following code snippet is an example.
And use the following command to start the application.
root@XYZ-$ java -jar -Dis.scheduler.enabled=true my-app.jar
If you pass in
true the scheduler method will execute for any other String that passed the schedule will not be executed.
But, is this the best way to accomplish conditionally enabling a specific scheduler? The answer is NO!. The problem with this above approach is, the schedule will be called, and we are adding a condition to enable or disable. If we have multiple scheduler methods, then this might lead to boilerplate coding of adding an if condition in all the scheduler methods.
A better way is to utilize
@ConditionalOnProperty annotation to inject the scheduler object as a bean into the spring container pool on start-up. For this to work, you need to remove
@Component annotation from the top of your scheduler class. If you are wondering how this works, it is very simple. Spring’s Scheduler will execute the tasks only if the method’s instance is available in Spring’s container pool.
So, this way, we can make sure the scheduler is enabled based on the given property. This way, we can activate the scheduler in one of the load-balanced instances. But there are problems with this approach. Have you ever heard about the famous Murphy’s law?
So, there are many chances that the node with the scheduler-enabled instance might go down, which could lead to loss of service continuity, causing downtime.
Thought process #2 — Separating the concerns AKA Functional decomposition:
If you have worked on microservices architecture, then you might have come across this principle, separation of concerns otherwise known as functional decomposition. Functional decomposition means breaking down a large complex application into small functional parts. This is also known as Y-Axis scaling, as mentioned in the figure that we’ve seen at the top.
We can split our application into multiple smaller parts. That way, we can have the scheduled tasks in a separate microservice. We can run this in a single instance since we don’t need to share the load for background tasks.
But is this the best solution overall? I would say it is a good solution but, we still have a problem here. Remember the law? If the node where our single instance is running goes down, then it will lead to a downtime again.
Thought process #3 — Common data source based locks:
Similar to how we lock a resource in parallel computing, we can use a common data source based lock for ensuring that the scheduled task is executed only once. This is by far the best approach that I’ve come across. For this exact purpose, Lukáš Křečan has created a plugin named ShedLock. The high level idea of ShedLock is pretty simple. When a given instance is executing a scheduler method of your application, a data source update call will happen with the provided configuration. The configuration will contain a unique
name for the scheduler and also it contains 3 other columns
locked_by. If another instance tries to run the same scheduler method, the first step is to check if it is locked, in case it is locked, then it will not be executed.
Let’s see how we can implement this. In the first step, we need the following dependencies.
For this case, I’ll use a MySQL-based data source, but ShedLock supports many data sources. We need to have a table in our database with the name
shedlock. It can be created with the following DDL.
CREATE TABLE shedlock(name VARCHAR(64) NOT NULL, lock_until TIMESTAMP(3) NOT NULL,
locked_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3), locked_by VARCHAR(255) NOT NULL, PRIMARY KEY (name));
defaultLockAtLeastFor are the default configuration for all the scheduler locks used in our application. The “lock at most for” configuration specifies the maximum amount of time from locking the lock is valid. Similarly, the “lock at least for” configuration specifies the minimum amount of time from locking the lock is valid.
Now let’s look at our scheduler implementation.
Once we run this, we will be able to see the entry in our
Following is the high-level sequence diagram for how ShedLock works.
If you would like to understand ShedLock a bit more, you can try reading this article, Lock @Scheduled Tasks With ShedLock And Spring Boot, by Dhananjay Kr. You can also visit ShedLock’s GitHub repository directly.
So, out of my 3 different thought processes, if you ask me which is the best way to go, then my suggestion would be to use a combination of numbers 2 and 3. It is always better to have different microservices for each concern and if high availability is needed for background processes then we can go ahead with scaling and data source-based locking.