I love relational databases. Well designed, they are the most elegant and efficient way to store data. Which is why MIG uses Postgresql, hosted by Amazon RDS.

It's the first time I use RDS for anything more than a small website. I discover its capabilities along the way.  Over the past few days, I've been investigating performance issues. The database was close to 100% CPU, and the number of DB connections maintained by the Go database package was varying a lot. Something was off.

I have worked as a junior Oracle & Postgres DBA in the past. In my limited experience, database performances are almost always due to bad queries, or bad schemas. When you wrote the queries, however, this is what you blame last, after spending hours looking for a bug in any other components outside of your control.

Eventually, I re-read my queries, and found one that looked bad enough:

// AgentByQueueAndPID returns a single agent that is located at a given queueloc and has a given PID
func (db *DB) AgentByQueueAndPID(queueloc string, pid int) (agent mig.Agent, err error) {
	err = db.c.QueryRow(`SELECT id, name, queueloc, os, version, pid, starttime, heartbeattime,
		status FROM agents WHERE queueloc=$1 AND pid=$2`, queueloc, pid).Scan(
		&agent.ID, &agent.Name, &agent.QueueLoc, &agent.OS, &agent.Version, &agent.PID,
		&agent.StartTime, &agent.HeartBeatTS, &agent.Status)
	if err != nil {
		err = fmt.Errorf("Error while retrieving agent: '%v'", err)
		return
	}
	if err == sql.ErrNoRows {
		return
	}
	return
}
The query locates an agent using its queueloc and pid values. Which is necessary to properly identify an agent, except that neither queueloc nor pid have indexes, resulting in a sequential scan of the table:
mig=> explain SELECT * FROM agents WHERE queueloc='xyz' AND pid=1234;
QUERY PLAN                                                   
--------------------------------------------------------------
 Seq Scan on agents  (cost=0.00..3796.20 rows=1 width=161)
   Filter: (((queueloc)::text = 'xyz'::text) AND (pid = 1234))
(2 rows)

This query is called ~50 times per second, and even with only 45,000 rows in the agents table, that is enough to burn all the CPU cycles on my RDS instance.

Postgres supports multicolumn indexes. The fix is simple enough: create an index on the columns queueloc and pid together.

mig=> create index agents_queueloc_pid_idx on agents(queueloc, pid);
CREATE INDEX

Which results in an immediate, drastic, reduction of the cost of the query, and CPU usage of the instance.


mig=> explain SELECT * FROM agents WHERE queueloc='xyz' AND pid=1234;
QUERY PLAN                                                     
---------------------------------------------------------------------------------------
 Index Scan using agents_queueloc_pid_idx on agents  (cost=0.41..8.43 rows=1 width=161)
   Index Cond: (((queueloc)::text = 'xyz'::text) AND (pid = 1234))
(2 rows)

migdbcpuusage.png

Immediate performance gain for a limited effort. Gotta love Postgres !