Bugless #14: k0: figure out a better postgres story for high-traffic OLTP uses - hswaw - Redmine

Bugless #14

k0: figure out a better postgres story for high-traffic OLTP uses

Added by q3k about 3 years ago. Updated almost 2 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

hscloud

Description

We currently setup our postgres instances via kube/postgres.libsonnet, which places them on a single-instance deployment backed in Ceph.

This is fine for simple software, but obviously suboptimal for high traffic usecases:

ceph eats IOPS for breakfast, so the effective IOPS available to postgres are tiny, thereby limiting our ability to do sustained writes
recovery from a failed node takes O(minutes) until Kube decides that the node is lost
the backup story isn't great, as we do ext4 dumps via benji, and these generally are dirty

Some better strategy is needed, either using one of the Well Known Postgres Operatoros, or NIHing our own. We don't even need sharding or autoplacement, just some ability to quickly and reliably fail over from a leader that ended up in a dead/unreachable node.

Updated by q3k about 3 years ago

Description updated (diff)

Updated by q3k almost 2 years ago

Category set to hscloud

Also available in: Atom PDF

Project

General

Profile

hswaw

Bugless #14

k0: figure out a better postgres story for high-traffic OLTP uses

Updated by q3k about 3 years ago

Updated by q3k almost 2 years ago