How to intentionally size numeric column in PostgreSQL?


When choosing a numeric data type for a database column, we need to consider how the data is stored and used.

Context

Given a time-series data of energy consumption in watt-hour (Wh) as shown in the example below.

TimestampWh
2024-10-23 00:00:00 +07001321
2024-10-23 00:05:00 +07001344
2024-10-23 00:10:00 +07001411
2024-10-23 00:20:00 +07001521

Given the price of electricity in periods per day as show in the example below.

DateFromToCost per kWh
2024-10-2300:00:0000:10:008.84
2024-10-2300:10:0100:13:008.16
2024-10-2300:13:0100:30:007.41

The range of “Cost per kWh” is 0.00 to 99.99 inclusive.

From the two information provided, calculate the accumulated price for each row of the energy consumption data.

Quick question

What should be set for precision and scale for a numeric column in Postgresql? If you answer contains a scale of 5 that is one good answer.

Why scale = 5 is a good answer?

Going through the calculations, you will have something similar to the table below.

TimestampWhAdditional Price (THB)Accumulated Price (THB)
2024-10-23 00:00:00 +0700132100
2024-10-23 00:05:00 +07001344(23 / 1000) * 8.84 = 0.203320.20332
2024-10-23 00:10:00 +07001411(67 / 1000) * 8.16 = 0.546720.75004
2024-10-23 00:20:00 +07001521(110 / 1000) * 7.41 = 0.81511.56514

The “Accumulated Price (THB)” column values have 5 decimal places.

Having 5 decimal places is not a conincidence. When we multiply two real numbers, one with 2 decimal places (price per kWh), another with 3 decimal places (energy consumption in kWh), we always get 5 (3+2) decimal places.

To store 5 decimal places, we need the scale to be at least 5.

Rounding errors

When we choose a scale that is less than 5, we get rounding errors.

E.g. 1.56514 becomes 1.57 when it is stored with the scale of 2 and we choose to round up.

Depending on how the data is used, the rounding errors may be acceptable or unacceptable.

In this case, we repeatedly add the “Accumulated Price (THB)” of the previous row with the “Additional Price (THB)” of the current row to produce “Accumulated Price (THB)” of the current role.

The energy consumption data comes in about 10,000 records per session and we want to show the accurate latest accummulated price. Thus, the rounding errors are not acceptable here. The rounding is done at the last step. E.g.

  • When showing the accumulated price to users on UI.
  • Summarize the final price to charge the customer and storing in invoices.

Let’s see what the rounding errors look like in practice. Here is the table with the scale of 2 with rounding up.

TimestampWhAdditional Price (THB)Accumulated Price (THB)
2024-10-23 00:00:00 +0700132100
2024-10-23 00:05:00 +07001344(23 / 1000) * 8.84 = 0.203320.21
2024-10-23 00:10:00 +07001411(67 / 1000) * 8.16 = 0.546720.76
2024-10-23 00:20:00 +07001521(110 / 1000) * 7.41 = 0.81511.58

Comparing to the previous table, the accumulated rounding errors at the latest timestamp amount to 1.58 - 1.56514 = 0.01486.

In large data set, the rounding errors would snowball pretty fast.

Key Takeaways

  • Understand the calculation that produces the values you’re trying store. In doing so, you will automatically know the scale of your data. The precision can be tricky to nail but you should have some ballpark estimate already anyway.
  • Depending on the usage of the data stored, you can choose to accept rounding errors or choose to not accept it.
  • Rounding errors can snowball when working with large data set.

That’s it for this post ✌️ See you next time.