Skip to search

SelfNodeRemediationConfig

self-node-remediation.medik8s.io / v1alpha1

apiVersion: self-node-remediation.medik8s.io/v1alpha1 kind: SelfNodeRemediationConfig metadata: name: example
View raw schema
apiVersion string
APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
kind string
Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
metadata object
spec object
SelfNodeRemediationConfigSpec defines the desired state of SelfNodeRemediationConfig
apiCheckInterval string
The frequency for api-server connectivity check. Valid time units are "ms", "s", "m", "h". the frequency for api-server connectivity check
pattern: ^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$
apiServerTimeout string
Timeout for each api-connectivity check. Valid time units are "ms", "s", "m", "h".
pattern: ^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$
customDsTolerations []object
CustomDsTolerations allows to add custom tolerations snr agents that are running on the ds in order to support remediation for different types of nodes.
effect string
Effect indicates the taint effect to match. Empty means match all taint effects. When specified, allowed values are NoSchedule, PreferNoSchedule and NoExecute.
key string
Key is the taint key that the toleration applies to. Empty means match all taint keys. If the key is empty, operator must be Exists; this combination means to match all values and all keys.
operator string
Operator represents a key's relationship to the value. Valid operators are Exists and Equal. Defaults to Equal. Exists is equivalent to wildcard for value, so that a pod can tolerate all taints of a particular category.
tolerationSeconds integer
TolerationSeconds represents the period of time the toleration (which must be of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default, it is not set, which means tolerate the taint forever (do not evict). Zero and negative values will be treated as 0 (evict immediately) by the system.
format: int64
value string
Value is the taint value the toleration matches to. If the operator is Exists, the value should be empty, otherwise just a regular string.
endpointHealthCheckUrl string
EndpointHealthCheckUrl is an url that self node remediation agents which run on control-plane node will try to access when they can't contact their peers. This is a part of self diagnostics which will decide whether the node should be remediated or not. It will be ignored when empty (which is the default).
hostPort integer
HostPort is used for internal communication between SNR agents.
minimum: 1
isSoftwareRebootEnabled boolean
IsSoftwareRebootEnabled indicates whether self node remediation agent will do software reboot, if the watchdog device can not be used or will use watchdog only, without a fallback to software reboot.
maxApiErrorThreshold integer
After this threshold, the node will start contacting its peers.
minimum: 1
minPeersForRemediation integer
Minimum number of peer workers/control nodes to attempt to contact before deciding if node is unhealthy or not if set to zero, no other peers will be required to be present for remediation action to occur when this node has lost API server access. If an insufficient number of peers are found, we will not attempt to ask any peer nodes (if present) whether they see that the current node has been marked unhealthy with a SelfNodeRemediation CR
minimum: 0
peerApiServerTimeout string
The timeout for api-server connectivity check. Valid time units are "ms", "s", "m", "h".
pattern: ^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$
peerDialTimeout string
Timeout for establishing connection to peer. Valid time units are "ms", "s", "m", "h".
pattern: ^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$
peerRequestTimeout string
Timeout for each peer request. Valid time units are "ms", "s", "m", "h".
pattern: ^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$
peerUpdateInterval string
The frequency for updating peers. Valid time units are "ms", "s", "m", "h".
pattern: ^([0-9]+(\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$
safeTimeToAssumeNodeRebootedSeconds integer
SafeTimeToAssumeNodeRebootedSeconds is the time after which the healthy self node remediation agents will assume the unhealthy node has been rebooted, and it is safe to recover affected workloads. This is extremely important as starting replacement Pods while they are still running on the failed node will likely lead to data corruption and violation of run-once semantics. In an effort to prevent this, the operator ignores values lower than a minimum calculated from the ApiCheckInterval, ApiServerTimeout, MaxApiErrorThreshold, PeerDialTimeout, and PeerRequestTimeout fields, and the unhealthy node's individual watchdog timeout.
watchdogFilePath string
WatchdogFilePath is the watchdog file path that should be available on each node, e.g. /dev/watchdog.
status object
SelfNodeRemediationConfigStatus defines the observed state of SelfNodeRemediationConfig

No matches. Try .spec.apiCheckInterval for an exact path