SelfNodeRemediationConfig
self-node-remediation.medik8s.io / v1alpha1
apiVersion: self-node-remediation.medik8s.io/v1alpha1
kind: SelfNodeRemediationConfig
metadata:
name: example
apiVersion
string
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
kind
string
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
metadata
object
spec object
SelfNodeRemediationConfigSpec defines the desired state of SelfNodeRemediationConfig
apiCheckInterval
string
The frequency for api-server connectivity check.
Valid time units are "ms", "s", "m", "h".
the frequency for api-server connectivity check
pattern:
^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$
apiServerTimeout
string
Timeout for each api-connectivity check.
Valid time units are "ms", "s", "m", "h".
pattern:
^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$customDsTolerations []object
CustomDsTolerations allows to add custom tolerations snr agents that are running on the ds in order to support remediation for different types of nodes.
effect
string
Effect indicates the taint effect to match. Empty means match all taint effects.
When specified, allowed values are NoSchedule, PreferNoSchedule and NoExecute.
key
string
Key is the taint key that the toleration applies to. Empty means match all taint keys.
If the key is empty, operator must be Exists; this combination means to match all values and all keys.
operator
string
Operator represents a key's relationship to the value.
Valid operators are Exists and Equal. Defaults to Equal.
Exists is equivalent to wildcard for value, so that a pod can
tolerate all taints of a particular category.
tolerationSeconds
integer
TolerationSeconds represents the period of time the toleration (which must be
of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default,
it is not set, which means tolerate the taint forever (do not evict). Zero and
negative values will be treated as 0 (evict immediately) by the system.
format:
int64
value
string
Value is the taint value the toleration matches to.
If the operator is Exists, the value should be empty, otherwise just a regular string.
endpointHealthCheckUrl
string
EndpointHealthCheckUrl is an url that self node remediation agents which run on control-plane node will try to access when they can't contact their peers.
This is a part of self diagnostics which will decide whether the node should be remediated or not.
It will be ignored when empty (which is the default).
hostPort
integer
HostPort is used for internal communication between SNR agents.
minimum:
1
isSoftwareRebootEnabled
boolean
IsSoftwareRebootEnabled indicates whether self node remediation agent will do software reboot,
if the watchdog device can not be used or will use watchdog only,
without a fallback to software reboot.
maxApiErrorThreshold
integer
After this threshold, the node will start contacting its peers.
minimum:
1
minPeersForRemediation
integer
Minimum number of peer workers/control nodes to attempt to contact before deciding if node is unhealthy or not
if set to zero, no other peers will be required to be present for remediation action to occur when this
node has lost API server access. If an insufficient number of peers are found, we will not attempt to ask
any peer nodes (if present) whether they see that the current node has been marked unhealthy with a
SelfNodeRemediation CR
minimum:
0
peerApiServerTimeout
string
The timeout for api-server connectivity check.
Valid time units are "ms", "s", "m", "h".
pattern:
^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$
peerDialTimeout
string
Timeout for establishing connection to peer.
Valid time units are "ms", "s", "m", "h".
pattern:
^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$
peerRequestTimeout
string
Timeout for each peer request.
Valid time units are "ms", "s", "m", "h".
pattern:
^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$
peerUpdateInterval
string
The frequency for updating peers.
Valid time units are "ms", "s", "m", "h".
pattern:
^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$
safeTimeToAssumeNodeRebootedSeconds
integer
SafeTimeToAssumeNodeRebootedSeconds is the time after which the healthy self node remediation
agents will assume the unhealthy node has been rebooted, and it is safe to recover affected workloads.
This is extremely important as starting replacement Pods while they are still running on the failed
node will likely lead to data corruption and violation of run-once semantics.
In an effort to prevent this, the operator ignores values lower than a minimum calculated from the
ApiCheckInterval, ApiServerTimeout, MaxApiErrorThreshold, PeerDialTimeout, and PeerRequestTimeout fields,
and the unhealthy node's individual watchdog timeout.
watchdogFilePath
string
WatchdogFilePath is the watchdog file path that should be available on each node, e.g. /dev/watchdog.
status
object
SelfNodeRemediationConfigStatus defines the observed state of SelfNodeRemediationConfig
No matches. Try .spec.apiCheckInterval for an exact path