How to intentionally size numeric column in PostgreSQL?
When choosing a numeric data type for a database column, we need to consider how the data is stored and used.
Context
Given a time-series data of energy consumption in watt-hour (Wh) as shown in the example below.
| Timestamp | Wh |
|---|---|
| 2024-10-23 00:00:00 +0700 | 1321 |
| 2024-10-23 00:05:00 +0700 | 1344 |
| 2024-10-23 00:10:00 +0700 | 1411 |
| 2024-10-23 00:20:00 +0700 | 1521 |
Given the price of electricity in periods per day as show in the example below.
| Date | From | To | Cost per kWh |
|---|---|---|---|
| 2024-10-23 | 00:00:00 | 00:10:00 | 8.84 |
| 2024-10-23 | 00:10:01 | 00:13:00 | 8.16 |
| 2024-10-23 | 00:13:01 | 00:30:00 | 7.41 |
The range of “Cost per kWh” is 0.00 to 99.99 inclusive.
From the two information provided, calculate the accumulated price for each row of the energy consumption data.
Quick question
What should be set for precision and scale for a numeric column in Postgresql? If you answer contains a scale of 5 that is one good answer.
Why scale = 5 is a good answer?
Going through the calculations, you will have something similar to the table below.
| Timestamp | Wh | Additional Price (THB) | Accumulated Price (THB) |
|---|---|---|---|
| 2024-10-23 00:00:00 +0700 | 1321 | 0 | 0 |
| 2024-10-23 00:05:00 +0700 | 1344 | (23 / 1000) * 8.84 = 0.20332 | 0.20332 |
| 2024-10-23 00:10:00 +0700 | 1411 | (67 / 1000) * 8.16 = 0.54672 | 0.75004 |
| 2024-10-23 00:20:00 +0700 | 1521 | (110 / 1000) * 7.41 = 0.8151 | 1.56514 |
The “Accumulated Price (THB)” column values have 5 decimal places.
Having 5 decimal places is not a conincidence. When we multiply two real numbers, one with 2 decimal places (price per kWh), another with 3 decimal places (energy consumption in kWh), we always get 5 (3+2) decimal places.
To store 5 decimal places, we need the scale to be at least 5.
Rounding errors
When we choose a scale that is less than 5, we get rounding errors.
E.g. 1.56514 becomes 1.57 when it is stored with the scale of 2 and we choose to round up.
Depending on how the data is used, the rounding errors may be acceptable or unacceptable.
In this case, we repeatedly add the “Accumulated Price (THB)” of the previous row with the “Additional Price (THB)” of the current row to produce “Accumulated Price (THB)” of the current role.
The energy consumption data comes in about 10,000 records per session and we want to show the accurate latest accummulated price. Thus, the rounding errors are not acceptable here. The rounding is done at the last step. E.g.
- When showing the accumulated price to users on UI.
- Summarize the final price to charge the customer and storing in invoices.
Let’s see what the rounding errors look like in practice. Here is the table with the scale of 2 with rounding up.
| Timestamp | Wh | Additional Price (THB) | Accumulated Price (THB) |
|---|---|---|---|
| 2024-10-23 00:00:00 +0700 | 1321 | 0 | 0 |
| 2024-10-23 00:05:00 +0700 | 1344 | (23 / 1000) * 8.84 = 0.20332 | 0.21 |
| 2024-10-23 00:10:00 +0700 | 1411 | (67 / 1000) * 8.16 = 0.54672 | 0.76 |
| 2024-10-23 00:20:00 +0700 | 1521 | (110 / 1000) * 7.41 = 0.8151 | 1.58 |
Comparing to the previous table, the accumulated rounding errors at the latest timestamp amount to 1.58 - 1.56514 = 0.01486.
In large data set, the rounding errors would snowball pretty fast.
Key Takeaways
- Understand the calculation that produces the values you’re trying store. In doing so, you will automatically know the
scaleof your data. Theprecisioncan be tricky to nail but you should have some ballpark estimate already anyway. - Depending on the usage of the data stored, you can choose to accept rounding errors or choose to not accept it.
- Rounding errors can snowball when working with large data set.
That’s it for this post ✌️ See you next time.