Bugless #39
k0: calico has node names desynchronized with k8s
Description
When a calico node daemon first starts up, it attempts to mark that node with NetworkUnavailable until it is fully healthy.
This is currenlty broken on dcr01s22 and dcr01s24:
2021-03-27 12:03:54.958 [WARNING][9] startup/startup.go 1203: Failed to set NetworkUnavailable to False; will retry error=nodes "dcr01s22" not found
IIUC, calico sees dcr01s22 as 'dcr01s22', while k8s sees it as 'dcr01s22.hswaw.net'. This makes the daemon get into a small retry loop on startup, slowing things down (but not breaking them).
I attempted calico etcd store surgery for this before, but I think I gave up because I didn't want to affect production too much. We should drain dcr01s22.hswaw.net at some point (probably after #6) and try to fix this properly.
Updated by implr over 2 years ago
- Status changed from Assigned to Accepted
Seems resolved after upgrade to 3.15, will verify some more and resolve.