error neon mistake

10 common PostgreSQL mistakes and how to avoid them

A ton can go incorrect with a PostgreSQL set up. Worse, a lot of problems might lurk undetected as the difficulty builds more than a period of time of time, then out of the blue strike with a big influence that delivers it to the forefront of everyone’s awareness. No matter if it’s a glaring fall in general performance, or a spectacular rise in useful resource usage and billing fees, it’s crucial to detect these kinds of challenges as early as possible—or, superior but, stay away from them by configuring your implementation to fit the sought after workload.

Drawing on Percona’s practical experience serving to numerous PostgreSQL stores about the many years, we have compiled a listing of the most prevalent errors. Even if you assume you’ve configured your PostgreSQL installation the right way, you could however find this record valuable in validating your set up.

Blunder #1: Operating the default configuration

PostgreSQL will work ideal out of the box, but it’s not pretty nicely configured for your requires. The default configuration is pretty standard and not tuned for any distinct workload. This excessively conservative configuration allows PostgreSQL to run any natural environment, with the expectation that end users will configure it for their requirements.

The pgtune tool features a subset of configurations based mostly on components resources and the kind of workload. Which is a excellent commencing position for configuring your PostgreSQL cluster based on what your workload desires. Additionally, you may well have to configure the autovacuum, log, checkpoint, and WAL (compose-ahead log) retention variables.

It’s genuinely significant that your server is optimally configured for any immediate long run desires to stay away from any unnecessary restarts. So acquire a appear at all GUCs with the “postmaster” context in the pg_configurations catalog perspective.

Pick out title, environment, boot_val
FROM   pg_options
WHERE  context="postmaster"

This is specially important when environment up a superior availability (HA) cluster mainly because any downtime for the primary server will degrade the cluster and bring about the advertising of a standby server to the primary server purpose.

Mistake #2: Unoptimized databases layout and architecture

This point simply cannot be emphasised adequate. I have individually viewed businesses fork out more than five instances the charge they wanted to, only due to the fact of unoptimized databases layout and architecture.

A single of the greatest tips below is to glimpse at what your workload requirements suitable now, and in the close to future, relatively than what might be required in six months to a year’s time. Seeking far too considerably forward usually means that your tables are made for long term wants that might never be recognized. And that is just 1 element of it.

Along with this, overreliance on object-relational mapping (ORM) is also a key lead to of bad general performance. ORMs are used to hook up applications to databases applying object-oriented programming languages, and they should simplify daily life for your developers above time. Nonetheless, it’s crucial that you understand what an ORM presents and what type of effectiveness influence it introduces. Below the hood, an ORM may be executing many queries, irrespective of whether which is to combine a number of relations, to perform aggregations, or even to split up query facts. General, you are going to knowledge larger latency and reduce throughput on your transactions when using an ORM.

Over and above ORMs, improving upon your databases architecture is about structuring information so that your reads and publish operations are best for indexes as perfectly as for relations. One technique that can assistance is to denormalize the databases, as this minimizes SQL question complexity and the related joins so that you may well fetch information from much less relations.

In the close, the general performance is pushed by a very simple three-step system of “definition, measurement, and optimization” in your atmosphere for your application and workload.

Slip-up #3: Not tuning the databases for the workload

Tuning for a workload demands insights into the volume of facts you intend to store, the mother nature of the application, and the form of queries to be executed. You can often tune and benchmark your setup right up until you are pleased with the resource use under a severe load.

For example, can your complete databases fit into your machine’s offered RAM? If certainly, then you of course would want to enhance the shared_buffers benefit for it. In the same way, knowledge the workload is important to how you configure the checkpoint and the autovacuum procedures. For example, you are going to configure these pretty in different ways for an append-only workload in contrast to a blended on-line transaction processing workload that satisfies the Transaction Processing Efficiency Council Sort C benchmark.

There are a ton of useful instruments out there that present question effectiveness insights. You may possibly test out my web site write-up on question overall performance insights, which discusses some of the open supply choices available, or see my presentation on YouTube.

At Percona, we have two equipment that will help you immensely in being familiar with question overall performance styles:

  • PMM – Percona Monitoring and Management is a no cost, fully open up source venture that delivers a graphical interface with specific program stats and question analytics. Sense absolutely free to try out out the PMM demo that caters to MySQL, MongoDB, and PostgreSQL.
  • pg_stat_watch – This is an enhanced model of pg_stat_statements that delivers a lot more detailed insights into question performance styles, real question program, and question textual content with parameter values. It’s out there on Linux from our down load web page or as RPM packages from the PostgreSQL local community yum repositories.

Miscalculation #4: Poor relationship administration

The connections configuration looks innocuous at initial glance. Nevertheless, I’ve observed instances in which a quite substantial benefit for max_connections has brought on out of memory faults. So configuring max_connection demands some consideration.

The quantity of cores, the total of memory offered, and the type of storage must be factored in when configuring max_connections. You really do not want to overload your server means with connections that may possibly under no circumstances be made use of. Then there are kernel resources that are also remaining allocated for each connection. The PostgreSQL kernel documentation has extra information.

When consumers are executing queries that consider incredibly tiny time, a connection pooler substantially enhances effectiveness, as the overhead of spawning a relationship will become significant in this kind of workload.

Error #5: Vacuum is not doing the job effectively

Ideally, you have not disabled autovacuum. We have observed in numerous generation environments that consumers have disabled autovacuum completely, typically owing to some fundamental situation. If the autovacuum isn’t really working in your natural environment, there can be only 3 motives for it:

  1. The vacuum approach is not staying induced, or at the very least not as often as it really should be.
  2. Vacuuming is much too gradual.
  3. The vacuum is not cleaning up lifeless rows.

Both 1 and 2 are immediately related to configuration alternatives. You can see the vacuum-connected possibilities by querying the pg_settings perspective.

SELECT  name
        , limited_desc
        , environment
        , unit
        , Case
            WHEN context="postmaster" THEN 'restart'
            WHEN context="sighup"     THEN 'reload'
            ELSE context
          Stop "server calls for"
FROM    pg_options
WHERE   name LIKE '%vacuum%'

The speed can perhaps be enhanced by tuning autovacuum_get the job done_mem and the number of parallel personnel. The triggering of the vacuum procedure may well be tuned via configuring scale factors or thresholds.

When the vacuum approach isn’t cleaning up useless tuples, it’s an sign that some thing is holding again critical methods. The culprits could be just one or more of these:

  • Extended-operating queries or transactions.
  • Standby servers in a replication ecosystem with the hot_standby_responses alternative turned on.
  • A larger than required benefit of vacuum_defer_cleanup_age.
  • Replication slots that keep down the xmin benefit and prevent the vacuum from cleansing lifeless tuples.

If you want to manage the vacuum of a relation manually, then observe Pareto’s regulation (aka the 80/20 rule). Tune the cluster to an optimum configuration and then tune especially for individuals several tables. Don’t forget that autovacuum or toast.autovacuum might be disabled for a distinct relation by specifying the involved storage possibility all through the generate or alter assertion.

Miscalculation #6: Rogue connections and extensive-running transactions

A number of points can maintain your PostgreSQL cluster hostage, and rogue connections are a single of them. Other than holding onto connection slots that could be utilised by other applications, rogue connections and very long-functioning transactions maintain onto key sources that can wreak havoc during the procedure. To a lesser extent, in a replication setting with sizzling_standby_responses turned on, extended-operating transactions on the standby could avoid the vacuum on the key server from undertaking its occupation.

Believe of a buggy software that opens a transaction and stops responding thereafter. It may well be keeping onto locks or only avoiding the vacuum from cleansing up lifeless tuples as those people keep on being seen in these transactions. What if that software ended up to open a massive number of such transactions?

Far more generally than not, you can get rid of these kinds of transactions by configuring idle_in_transaction_session_timeout to a price tuned for your queries. Of class, normally preserve the behavior of your software in head whenever you start out tuning the parameter.

Past tuning idle_in_transaction_session_timeout, keep track of pg_stat_exercise for any prolonged-running queries or any classes that are waiting around for consumer-linked gatherings for more time than the envisioned amount of time. Hold an eye on the timestamps, the wait around situations, and the point out columns.

backend_start    | 2022-10-25 09:25:07.934633+00
xact_start       | 2022-10-25 09:25:11.238065+00
query_start      | 2022-10-25 09:25:11.238065+00
point out_change     | 2022-10-25 09:25:11.238381+00
wait_celebration_type  | Client
hold out_event       | ClientRead
state            | idle in transaction

Other than these, organized transactions (in particular orphaned geared up transactions) also can maintain onto crucial process means (locks or xmin price). I would advocate placing up a nomenclature for geared up transactions to define their age. Say, a ready transaction with a max age of 5 minutes may perhaps be designed as Prepare TRANSACTION 'foo_prepared 5m'.

        , well prepared
        , REGEXP_Swap(gid, '.* ', '') AS age
FROM    pg_well prepared_xacts
WHERE   well prepared + Cast(regexp_swap(gid, '.* ', '') AS INTERVAL) < NOW()

This provides a scheme for applications to define the age of their prepared transactions. A cronjob or a scheduled job could then monitor and roll back any prepared transactions that remain active beyond their intended age.

Mistake #7: Over-indexing or under-indexing

Surely there’s nothing wrong with over-indexing a relation. Or is there? To get the best performance out of your PostgreSQL instance, it is imperative that you understand how PostgreSQL manages indexes.

There are multiple types of indexes in PostgreSQL. Each has a different use case, and each has its own overheads. B-tree is the most commonly used index type. It is used for primary keys as well. The past few major releases have seen a lot of performance-related (and debloating) improvements in B-tree indexes. Here is one of my blog posts that discusses duplicate version churns in PostgreSQL 14.

When an index scan is executed on a relation, for each matching tuple, it accesses the heap to fetch both data and visibility information, so that only the version visible to the current transaction is chosen. Over-indexing will cause updates to more indexes, therefore consuming more resources without reaping the desired benefits.

Similarly, under-indexing will cause more heap scans, which will potentially lead to more I/O operations and therefore a drop in performance.

Indexing is not just about the number of indexes you have on a relation. It is how optimized those indexes are for the desired use cases. Ideally, you would want to hit an index-only scan each time, but there are limitations. Although B-tree indexes support index-only scans for all operators, GiST and SP-GiST indexes support them only for some operators. See the documentation for more details.

Following a simple checklist can help you validate that your system is optimally set up for indexes:

  • Ensure configuration is properly set (e.g., random page cost is tuned for your hardware).
  • Check that statistics are up to date, or at least that the analyze or vacuum commands run on the relations with indexes. This will ensure that statistics are more or less up to date so that the planner has a better probability of choosing an index scan.
  • Create the right type of index (B-tree, hash, or another type).
  • Use indexes on the right columns. Don’t forget to include non-indexed columns to avoid heap access. Not all index types allow covering indexes, so do check the documentation.
  • Get rid of unnecessary indexes. See pg_statio_user_indexes for more insights into indexes and block hits.
  • Understand the impact of covering indexes on features like deduplication, duplicate version churns, and index-only scans.

See this wiki page on index maintenance for more useful queries.

Mistake #8: Inadequate backups and HA

HA is not just about keeping a service up and running. It’s also about ensuring that the service responds within the defined acceptance criteria and that it satisfies the RPO (recovery point objective) and RTO (recovery time objective) targets. To match the uptime requirements and the number of nines you are targeting, refer to this wiki page for percentage calculations.

Leave a Reply