Project

General

Profile

Bugless #30

January boston-packets postgres fuckup cleanup, master tracker

Added by implr about 3 years ago. Updated about 1 year ago.

Status:
Assigned
Priority:
Normal
Assignee:
Category:
hscloud

Description

On 2020-12-26, postgres was upgraded from 9.x to 12.x. The init script was not updated, so after a boston reboot at 2021-01-27, 9 started again. All data written between the 26th and 2021-01-27 is in the (currently inactive) 12.
Affected databases:
  • kasownik - tentatively okay (members table), detailed transfers diff TBD
  • mail2sql
  • mailman, mailman-web #31
  • ood
  • owncloud, owncloud_sandbox FIXED in #32
  • pretix
  • spam

After all blocking issues are done, upgrade to 12.x and delete 9.


Related issues

Related to hswaw - Bugless #31: pg fuckup: recover mailman database entries from pg12NewActions
Related to hswaw - Bugless #32: pg fuckup: merge owncloudResolvedq3kActions
#1

Updated by implr about 3 years ago

  • Related to Bugless #31: pg fuckup: recover mailman database entries from pg12 added
#2

Updated by q3k about 3 years ago

  • Status changed from New to Assigned
  • Assignee set to implr
#3

Updated by q3k about 3 years ago

  • Related to Bugless #32: pg fuckup: merge owncloud added
#4

Updated by q3k about 3 years ago

implr: did you figure out when TBD is? I'd like to start merging some stuff.

#5

Updated by implr about 3 years ago

The last mtime in 12's pg_wal directory is Jan 27 21:16, so probably this. We could check logs for when the reboot exactly happened, but those have been rotated out already, so would have to check backups and meeeeh doesn't seem worth the effort

#6

Updated by implr about 3 years ago

  • Description updated (diff)
#7

Updated by implr about 3 years ago

postgres 12 configured to listen on port +1 (5433), you can connect via unix socket with -U postgres -p 5433

#8

Updated by q3k almost 3 years ago

For purposes of finding any data that is present in PG12 but missing in PG9 due to rollback we need to also have some well-defined timestamps. These timestamps can be used to determine whether a dated object within PG12 but missing from PG9 is due to a creation-and-less and not having been since removed from PG9.

Ie., if an object is found in PG12 it is assumed to have been lost due to rollback if it's missing from PG9 and its timestamp lies between the following two values:

>>> datetime(year=2021, month=1, day=25).timestamp()
1611529200.0
>>> datetime(year=2021, month=1, day=28).timestamp()
1611788400.0

Or, for your scripts:

incident_start = 1611529200 # when pg12 started to run
incident_end = 1611788400 # when we rolled back to pg9

I'm assuming these more conservatively than the above estimate. All cases of data lost due to the rollback will be manually verified before any action is taken, so it's better to have false positives than false negatives.

#9

Updated by implr almost 3 years ago

  • Description updated (diff)
#10

Updated by q3k almost 2 years ago

  • Category set to hscloud
#11

Updated by implr about 1 year ago

9.6 was ancient, out of support, and thus removed. I did a 9.6 -> 14 pg_upgrade (this time taking care to kill 9.6 for good).
The lost data in 12 is still there - just think "14" every time you see "9.6" in this issue.

Also available in: Atom PDF