Database Caching Example

Problem

Databases play a vital role in software applications—they need to keep updated data or state, which is served by the database acting as the source of truth for the application. How the database performs affects how the entire application performs. Besides obvious factors that affect the performance of a database (hardware, database type, networking infrastructure), there are techniques designed to help you improve the performance of your database and ultimately, your applications. One way to do this is to add caching to your database. But which cache technique works best for which application requirements? This article sheds light on one strategy to implement a cache system.

Solution

A high-speed data storage layer, caching is a data buffering technique that stores frequently queried (popular) data in temporary memory. There are different ways to configure your cache relative to your application and database, like using keyboard shortcuts or creating a file shortcut on your desktop.

This article focuses on using the Cache-aside (Lazy Loading) technique to configure a cache system for your application. You will learn basic concepts of caching data, how caching improves your application’s performance, and known cons to be aware of. We will also implement a simple Lazy Loading cache system using Redis, Python, and Flask.

Prerequisites

  • Python
  • Linux OS
  • Flask
  • Visual Studio Code
  • Basic Python knowledge
  • Basic Flask knowledge
  • Redis

How Cache-aside Works

Cache-aside, as the word suggests, is a cache technique where the cache and database get queried separately.

This is how the query flows for GET requests:

  1. When the application creates a query, the application will read the present data in the cache.
  2. This query is sent to the cache first.
    1. If there is a match (cache hit) in the cache, a response is sent back to the application from the cache.
    2. But, if there is no match (cache miss) for the query in the cache, the query is sent to the database. The response from the database is sent to the cache before it is ultimately sent to the application.

Below is a diagram to represent this query/response flow:

query/response flow

The above image shows what happens when the application sends a GET request. But what happens when the application sends a write (PUT, POST, or DELETE) request? Since the Cache-aside treats the database and cache as independent entities, there is no direct communication between the database and cache.

Cache Invalidation and Expiry Policy

Cache invalidation is an important process for maintaining accurate and up-to-date data in a cache. There are several methods of cache invalidation.

This tutorial implements the Write-through Cache Invalidation. This approach will update both the database and cache when a write operation is triggered by the application. As much as this approach minimizes stale data, sometimes you might have data in a cache for a long period without updating. Since there are small windows where data can be altered in the database without the cache being updated, hosting data in the cache for too long increases the chances of stale data.

There is a cache expiry policy for this reason. The policy assigns each item in the cache a time-to-live (TTL) variable that determines how long the item will be stored in the cache until it’s marked as expired. Once an item is marked as expired, the application will remove or update the item.

To get a reasonable TTL for your application, you need to evaluate the rate at which your data changes. Avoid having a global TTL for your cache items because the expiration will run for multiple items at the same time, potentially overwhelming the database. Consider making your TTL run randomly using the Jitter pattern.

When and Why to Use Caches

Before implementing a cache system, you need to know what to cache and when. Over-caching can lead to stale data, latency, and a complex cache management system. On the other hand, under-caching can lead to longer latencies, higher database load, and a constant need to scale database resources.

To get a good idea of how to configure your cache, analyze the workload patterns of your database or data store. These analysis strategies include the Read/Write analysis, a ratio of read operations compared to write operations. Data with a high read ratio implies that data is being frequently accessed, but is rarely changed – this is an ideal situation for cache. However, when there is a high write ratio, it implies that your application is dynamic and you would need to implement strict and more complex cache invalidation strategies.

Cache-aside Pros

This Cache-aside strategy significantly helps with the following:

  • Offline Application Operations: Caches help you store data locally on a device, i.e., your application can still be used without being online. This can be seen with banking applications.
  • Resiliency: The cache-aside strategy is resilient to a potential cache-layer failure since the application can still communicate with the database despite the cache failing.
  • Database and Cache Decoupling: As the database and the cache are separated, you can implement different data models for each and manage them separately.
  • Data Persistency: Using the write-through patterns, you can ensure that the data on the database and cache is up to date.
  • Responsiveness: Implementing a good cache strategy will improve your application’s response time since the user’s requests are fetched from the cache and not through the network.

Cache-aside Cons

There are certain instances you do not want to use a cache-aside strategy. Here are some of its shortcomings:

  • Management: A cache system has to be managed; otherwise, it can introduce potential bugs to your application.
  • High Latency: Cache-aside can increase latency because it updates both the database and cache separately.

Now that you have covered some of the basics of caching, let’s implement a simple cache-aside caching system in the following section.

Example: Simple Cache-aside System

Using the Flask Framework

Flask is a Python framework for web development, having predefined functions to make web development using Python easier. In this tutorial, you will use Flask to handle API functions, making it easy to relate your resources to your endpoints, as you will see during the tutorial.

Use the following command to install Flask on your environment:

pip install flask

Creating a Database and Table

Flask can work with different databases, with SQLite3 being the default, and we will use SQLite3 to create a simple database to host user data using the following Python code:

#sqlite_db.py
 
import sqlite3
 
# Connect to SQLite database
conn = sqlite3.connect("users.db")
cursor = conn.cursor()
 
# Create users table
cursor.execute("""
CREATE TABLE IF NOT EXISTS users (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    email TEXT NOT NULL
)
""")
 
# Insert sample data
cursor.executemany(
    "INSERT INTO users (name, email) VALUES (?, ?)",
    [("Alice", "alice@example.com"), ("Bob", "bob@example.com"), ("Charlie", "charlie@example.com")]
)
 
conn.commit()
conn.close()
print("Database initialized with sample users.")

To run this script, simply navigate to the folder the file is under and use the following Python command:

Python <file name>.py

Preparing the Cache and Application Endpoints

This tutorial uses a Redis cache client to create a cache for your application. To ensure that you have the Redis server installed on your machine, check by running the following code:

redis-server --version

If this does not return an instance of the Redis server, it might not be installed in your environment. To install the Redis server, run the following commands:

sudo apt update
sudo apt install redis-server

Then run the initial command again to ensure Redis is installed:

redis-server --version

Once Redis is installed, you can manually run Redis with the following code:

redis-server

At this point, you have created the SQLite3 database and populated it with sample user data and ran the Redis cache server.

To run your application, which glues the interaction between your database and requests, create a Flask application and add the following Python code:

from flask import Flask, jsonify
import sqlite3
import redis
import json
import time
 
# Initialize Flask
app = Flask(__name__)
 
# Connect to Redis
redis_client = redis.Redis(host="localhost", port=6379, db=0, decode_responses=True)
 
# Function to get user from SQLite with execution time tracking
def get_user_from_db(user_id):
    conn = sqlite3.connect("users.db")
    cursor = conn.cursor()
    
    start_time = time.time()  # Start timing
    cursor.execute("SELECT id, name, email FROM users WHERE id = ?", (user_id,))
    user = cursor.fetchone()
    db_time = time.time() - start_time  # Calculate execution time
    
    conn.close()
    
    if user:
        return {"id": user[0], "name": user[1], "email": user[2]}, db_time
    return None, db_time
 
# API endpoint with caching and execution time tracking
@app.route("/user/<int:user_id>", methods=["GET"])
def get_user(user_id):
    cache_key = f"user:{user_id}"
    
    # Measure cache lookup time
    start_cache_time = time.time()
    cached_user = redis_client.get(cache_key)
    cache_time = time.time() - start_cache_time  # Execution time
    
    if cached_user:
        print(f"Cache Hit ✅ (Time: {cache_time:.6f}s)")
        return jsonify({"source": "cache", "response_time": cache_time, **json.loads(cached_user)})  # Return from cache
 
    print(f"Cache Miss ❌ - Fetching from DB (Cache Time: {cache_time:.6f}s)")
    user, db_time = get_user_from_db(user_id)
    
    if user:
        redis_client.setex(cache_key, 300, json.dumps(user))  # Store in cache for 5 min
        return jsonify({"source": "database", "response_time": db_time, **user})
    
    return jsonify({"error": "User not found"}), 404
 
# API endpoint to update user (cache invalidation)
@app.route("/user/<int:user_id>", methods=["PUT"])
def update_user(user_id):
    import random
    new_name = f"User{random.randint(100, 999)}"
    new_email = f"user{random.randint(100, 999)}@example.com"
    
    conn = sqlite3.connect("users.db")
    cursor = conn.cursor()
    
    cursor.execute("UPDATE users SET name = ?, email = ? WHERE id = ?", (new_name, new_email, user_id))
    conn.commit()
    conn.close()
 
    # Invalidate cache
    redis_client.delete(f"user:{user_id}")
    print("Cache Invalidated ❌")
 
    return jsonify({"message": "User updated successfully", "new_name": new_name, "new_email": new_email})
 
# Home route
@app.route("/")
def home():
    return jsonify({"message": "Flask is running!"})
 
if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000, debug=True)

While in the directory, run your code using the following python command:

Python <your-filename.py>

Testing the Cache

It is time to test how this cache system will work. For the sake of simplicity, run a test by calling read operations to will read the stored data information using this curl command:

curl http://127.0.0.1:5000/user/1

The first time you run this on your terminal, you will observe a Cache Miss and the query going to the database:

Testing the cache

As shown above, if an application has a high read ratio, it will have a few Cache Misses, but ultimately, you will start getting more and more Cache Hits.

Let’s look at the curl requests to see the time spent executing the read operations from a database compared to a cache.

curl requests

When you run the curl read operation again, the data will be pulled from the cache:

curl read opertion

Please note that this is a simplified example. You will need to run this against your data to get your results.

Conclusion

Caching can significantly increase the performance of your database if applied correctly. There are different strategies to implement a cache system, and analyses to perform to determine what and when you should cache. Additionally, we discussed a specific strategy, the cache-aside approach to creating a cache, and implemented a simple cache system.

From here, you should dive deeper into analysis to help optimize your cache and read about other strategies or approaches to caching.

Next Steps

Leave a Reply

Your email address will not be published. Required fields are marked *