Project

General

Profile

Bugless #39

k0: calico has node names desynchronized with k8s

Added by q3k about 3 years ago. Updated almost 2 years ago.

Status:
Accepted
Priority:
Normal
Assignee:
Category:
hscloud

Description

When a calico node daemon first starts up, it attempts to mark that node with NetworkUnavailable until it is fully healthy.

This is currenlty broken on dcr01s22 and dcr01s24:

2021-03-27 12:03:54.958 [WARNING][9] startup/startup.go 1203: Failed to set NetworkUnavailable to False; will retry error=nodes "dcr01s22" not found

IIUC, calico sees dcr01s22 as 'dcr01s22', while k8s sees it as 'dcr01s22.hswaw.net'. This makes the daemon get into a small retry loop on startup, slowing things down (but not breaking them).

I attempted calico etcd store surgery for this before, but I think I gave up because I didn't want to affect production too much. We should drain dcr01s22.hswaw.net at some point (probably after #6) and try to fix this properly.

#1

Updated by q3k over 2 years ago

  • Status changed from New to Assigned
  • Assignee set to implr
#2

Updated by implr over 2 years ago

  • Status changed from Assigned to Accepted

Seems resolved after upgrade to 3.15, will verify some more and resolve.

#3

Updated by q3k almost 2 years ago

  • Category set to hscloud

Also available in: Atom PDF