Postgres is a really popular SQL database, I didn't have a lot of experience working with it specifically on large datasets. One problem that came to my plate was optimising getting the total count from a specific table which had over 5GB of data.
Select count(*) from my_table;
The above query took roughly 30 seconds just to get the total count, which was absurd. My first initial approach was to index it and try to get the total count.
CREATE INDEX ON my_table USING btree(id); SELECT count(my_table.id) from my_table;
The above solution did reduce the time taken to get the total count but still didn't work as expected for something as simple as getting the total count. finally came across this super awesome article by new relic here.
A nice approach was to calculate this ahead of time. So that, when you need to query the total count you can read it fast. The only tradeoff would be that it won't be done in real-time. However, this will still increase the overall efficiency of the query. So at least you won't be waiting for an extremely long time just for the query to complete. I am just replicating the steps mentioned in the above article below.
- Create a table row count to keep tabs on table names that you want to keep track of.
CREATE TABLE row_count ( table_name text PRIMARY KEY, tally bigint );
- Create a function that will count and update your row count table
CREATE OR REPLACE FUNCTION do_count() RETURNS TRIGGER AS $$ DECLARE BEGIN IF TG_OP = 'INSERT' THEN EXECUTE 'UPDATE row_count set tally = tally + 1 where table = ''' || TG_TABLE_NAME || ''''; RETURN NEW; ELSIF TG_OP = 'DELETE' THEN EXECUTE 'UPDATE row_count set tally = tally - 1 where table = ''' || TG_TABLE_NAME || ''''; RETURN OLD; END IF; END; $$ LANGUAGE 'plpgsql';
- Finally, pass the initial total count to start with and set the trigger which will initiate counting every time. A new row is inserted into the table.
BEGIN; --- To have accurate counts, the table must be seeded with the current correct numbers. INSERT INTO row_count (table, tally) VALUES ( 'my_table', (SELECT count(*) from my_table)); --- And then apply the trigger to my_table. CREATE TRIGGER my_table_do_row_count BEFORE INSERT OR DELETE ON my_table FOR EACH ROW EXECUTE PROCEDURE do_count(); COMMIT;
Given the above solution you can even make advance queries to store count for the same table with different clause. Two approaches that i see is either create another column in the row table to define clause or in postgres you can store json instead of tally number, which will store count with clause.