Relational database records often require transitions between various statuses; for example active
, pending
, deleted
etc.
Various database structures may be used to store this status.
The most naive database design would simply store this status
field as a varchar type
-- Postgres
DROP TABLE IF EXISTS product CASCADE;
CREATE TABLE product (
product_id SERIAL PRIMARY KEY,
title VARCHAR(200) NOT NULL,
sku VARCHAR(50) NOT NULL,
status VARCHAR(20) NOT NULL
);
This improved solution makes use of an ENUM
type to define status.
-- Postgres
DROP TYPE IF EXISTS product_status CASCADE;
CREATE TYPE product_status AS ENUM ('in stock', 'on order', 'unavailable', 'deleted');
DROP TABLE IF EXISTS product CASCADE;
CREATE TABLE product (
product_id SERIAL PRIMARY KEY,
title VARCHAR(200) NOT NULL,
sku VARCHAR(200) NOT NULL,
status product_status
);
This limits the possible value of status
to one of ‘in stock’, ‘on order’, ‘unavailable’ or ‘deleted’.
There are several benefits of using an enum
type over a varchar
:
CHECK
constraint).enum
values are internally sorted according to their order in the enum
type declaration, not their lexicographical order. This can lead to more efficient sorting and indexing.enum
types can make your code more readable and self-documenting by making it clear what values are allowed for a field.enum
values are stored as integers, which can be more space-efficient than varchar
.However, adding new values to an enum
type requires an ALTER TYPE
statement, which can be a heavy operation if your database is large.
These enum status values have the following semantics with regards to a Product:
Value | In (warehouse) stock | On back order | Buyable | Visible in Order History |
---|---|---|---|---|
in stock |
Yes | No | Yes | Yes |
on order |
No | Yes | Yes | Yes |
unavailable |
No | No | No | Yes |
deleted |
No | No | No | No |
These now need to be implemented in business logic.
Something like:
# status.py
from __future__ import annotations
from dataclasses import dataclass
@dataclass
class ProductStatus:
"""A data model for product status"""
is_in_stock: bool
is_on_back_order: bool
is_buyable: bool
is_active: bool
@classmethod
def create(cls, status: str) -> ProductStatus:
"""Create a `ProductStatus` instance derived from the given string"""
match status.lower():
case "in stock":
return ProductStatus(
is_in_stock=True,
is_on_back_order=False,
is_buyable=True,
is_active=True,
)
case "on order":
return ProductStatus(
is_in_stock=False,
is_on_back_order=True,
is_buyable=True,
is_active=True,
)
case "unavailable":
return ProductStatus(
is_in_stock=False,
is_on_back_order=False,
is_buyable=False,
is_active=True,
)
case "deleted":
return ProductStatus(
is_in_stock=False,
is_on_back_order=False,
is_buyable=False,
is_active=False,
)
case _:
raise ValueError(f"Unable to determine product status '{status}'")
This works well enough, but it does split the domain between the database and the code base. It would be better if we could represent the state better within the database.
In order to store these state values better in the database, we could add a few columns to the product
table:
-- Postgres
DROP TABLE IF EXISTS product CASCADE;
CREATE TABLE product (
product_id SERIAL PRIMARY KEY,
title VARCHAR(250) NOT NULL,
sku VARCHAR(200) NOT NULL,
is_in_stock BOOLEAN NOT NULL,
is_on_back_order BOOLEAN NOT NULL,
is_buyable BOOLEAN NOT NULL,
is_active BOOLEAN NOT NULL
);
This is an improvement, as we now have status attributes for each product record.
But, some limitations remain.
We cannot add any metadata to the various status flags. We also would need to add further columns if we ever needed a status that requires additional state flags. This would necessitate an ALTER
operation on our large product
table.
The best solution would be to abstract product status from the product
table.
To achieve this, we normalise the database structure by adding a foreign key to a product_status
table:
-- Postgres
DROP TABLE IF EXISTS product_status CASCADE;
CREATE TABLE product_status (
product_status_id SERIAL PRIMARY KEY,
product_status_usid VARCHAR(50) NOT NULL UNIQUE, -- unique string identifier
description VARCHAR(250) NULL,
is_in_stock BOOLEAN NOT NULL,
is_on_back_order BOOLEAN NOT NULL,
is_buyable BOOLEAN NOT NULL,
is_active BOOLEAN NOT NULL
);
DROP TABLE IF EXISTS product CASCADE;
CREATE TABLE product (
product_id SERIAL PRIMARY KEY,
title VARCHAR(250) NOT NULL,
sku VARCHAR(200) NOT NULL,
product_status_id INTEGER NOT NULL,
FOREIGN KEY (product_status_id) REFERENCES product_status (product_status_id)
);
Next, let’s create records for the various status values, and associated state flags.
-- Postgres
INSERT INTO product_status
(product_status_usid, description, is_in_stock, is_on_back_order, is_buyable, is_active)
VALUES
('in stock', 'Product is in stock', true, false, true, true),
('on order', 'Product is on back order', false, true, true, true),
('unavailable', 'Product is unavailable', false, false, false, true),
('deleted', 'Product is deleted', false, false, false, false)
;
SELECT * FROM product_status;
Which gives us:
product_status_id | product_status_usid | description | is_in_stock | is_on_back_order | is_buyable | is_active
-------------------+---------------------+--------------------------+-------------+------------------+------------+-----------
1 | in stock | Product is in stock | t | f | t | t
2 | on order | Product is on back order | f | t | t | t
3 | unavailable | Product is unavailable | f | f | f | t
4 | deleted | Product is deleted | f | f | f | f
(4 rows)
And add some junk product data:
INSERT INTO product
(title, sku, product_status_id)
VALUES
('EcoBoost Portable Charger', 'SKU-ECB-1234', 1),
('AquaPure Water Filter', 'SKU-AQPF-5678', 2),
('SolarGlow Garden Lights', 'SKU-SGL-9101', 3),
('FitFlex Yoga Mat', 'SKU-FFYM-1121', 4),
('BreezeAir Conditioner', 'SKU-BAC-3141', 1),
('CrispSound Bluetooth Speaker', 'SKU-CSBS-5161', 2),
('SmoothBlend Juicer', 'SKU-SBJ-7181', 3),
('QuickCook Microwave Oven', 'SKU-QCMO-9201', 4),
('UltraView Binoculars', 'SKU-UVB-1221', 1),
('ProFit Running Shoes', 'SKU-PFRS-3241', 1)
;
The unique string identifier (usid) product_status_usid
value is useful for reducing cognitive load when constructing queries.
For example:
SELECT
product.title,
product.sku,
product_status.description status
FROM
product
JOIN
product_status
ON
product.product_status_id=product_status.product_status_id
WHERE
product_status_usid='in stock'
;
title | sku | status
---------------------------+---------------+---------------------
EcoBoost Portable Charger | SKU-ECB-1234 | Product is in stock
BreezeAir Conditioner | SKU-BAC-3141 | Product is in stock
UltraView Binoculars | SKU-UVB-1221 | Product is in stock
ProFit Running Shoes | SKU-PFRS-3241 | Product is in stock
(4 rows)
is far easier to understand at a glance, than
SELECT
product.title,
product.sku,
product_status.description status
FROM
product
JOIN
product_status
ON
product.product_status_id=product_status.product_status_id
WHERE
product.product_status_id=1
;
Similarly, when referring to these foreign key records in code, we do not want to use a primary key integer value as a constant (as these are strictly-speaking not constant) identifier. Rather, we would want to use the usid for this.
Should we need to add a new status (for example pre-order
) to our system, it is as simple as adding a new record to the product_status
table. We may want to extend the structure for this as well. Fortunately altering the product_status
table is far quicker and less risky than doing the same to the large product
table.
-- Postgres
ALTER TABLE
product_status
ADD COLUMN
is_pre_order BOOLEAN NOT NULL DEFAULT false
;
INSERT INTO
product_status
(
product_status_usid,
description,
is_in_stock,
is_on_back_order,
is_buyable,
is_active,
is_pre_order
)
VALUES
(
'pre-order',
'Product is available for pre-order',
false,
false,
true,
true,
true
)
;
Another benefit that this abstraction offers us, is the ability to extend our architecture fairly easily. For example, to add a table to log status changes.
-- Postgres
DROP TABLE IF EXISTS product_status_log CASCADE;
CREATE TABLE product_status_log (
product_id INTEGER NOT NULL,
product_status_id INTEGER NOT NULL,
logged_at TIMESTAMP WITH TIME ZONE DEFAULT now(),
FOREIGN KEY (product_id) REFERENCES product (product_id),
FOREIGN KEY (product_status_id) REFERENCES product_status (product_status_id)
);
CREATE INDEX idx_product_status ON product_status_log (product_id, product_status_id);
And we have a nice log
SELECT
product_status.product_status_usid status,
log.logged_at
FROM product
JOIN product_status_log log
ON product.product_id=log.product_id
JOIN product_status
ON log.product_status_id=product_status.product_status_id
WHERE
product.sku='SKU-SGL-9101'
ORDER BY
log.logged_at ASC
;
status | logged_at
-------------+-------------------------------
in stock | 2023-08-07 22:46:21.388738+02
on order | 2023-08-07 22:46:57.509255+02
in stock | 2023-08-07 22:47:01.686259+02
on order | 2023-08-07 22:47:19.070394+02
in stock | 2023-08-07 22:47:26.662571+02
unavailable | 2023-08-07 22:47:31.837687+02
deleted | 2023-08-07 22:47:37.574532+02
(7 rows)
Cheers!