Bugless #39: k0: calico has node names desynchronized with k8s - hswaw - Redmine

Bugless #39

k0: calico has node names desynchronized with k8s

Added by q3k about 3 years ago. Updated almost 2 years ago.

Status:

Accepted

Priority:

Normal

Assignee:

implr

Category:

hscloud

Description

When a calico node daemon first starts up, it attempts to mark that node with NetworkUnavailable until it is fully healthy.

This is currenlty broken on dcr01s22 and dcr01s24:

2021-03-27 12:03:54.958 [WARNING][9] startup/startup.go 1203: Failed to set NetworkUnavailable to False; will retry error=nodes "dcr01s22" not found

IIUC, calico sees dcr01s22 as 'dcr01s22', while k8s sees it as 'dcr01s22.hswaw.net'. This makes the daemon get into a small retry loop on startup, slowing things down (but not breaking them).

I attempted calico etcd store surgery for this before, but I think I gave up because I didn't want to affect production too much. We should drain dcr01s22.hswaw.net at some point (probably after #6) and try to fix this properly.

Updated by q3k over 2 years ago

Status changed from New to Assigned
Assignee set to implr

Updated by implr over 2 years ago

Status changed from Assigned to Accepted

Seems resolved after upgrade to 3.15, will verify some more and resolve.

Updated by q3k almost 2 years ago

Category set to hscloud

Also available in: Atom PDF

Project

General

Profile

hswaw

Bugless #39

k0: calico has node names desynchronized with k8s

Updated by q3k over 2 years ago

Updated by implr over 2 years ago

Updated by q3k almost 2 years ago