Skip to content

kubectl get pod is very slow, it takes about 15 seconds,is there any way to improve it? #73570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gosoon opened this issue Jan 31, 2019 · 37 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/cli Categorizes an issue or PR as relevant to SIG CLI.

Comments

@gosoon
Copy link
Contributor

gosoon commented Jan 31, 2019

Our kubernetes cluster have 1000 nodes and 7100 pods,I don't think the cluster is large,but using kubectl is very slow,is there any way to improve it?

# etcdctl --version
etcdctl version: 3.3.1
API version: 2

What happened:

$ time kubectl get pod -o wide | wc -l
7100

real	0m14.045s
user	0m13.070s
sys	0m0.885s

$ kubectl get node | wc -l
1001

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): v1.8.1
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release): CentOS Linux release 7.2.1511 (Core)
  • Kernel (e.g. uname -a): 3.10.0-514.16.1.el7.x86_64
  • Install tools:
  • Others:

/sig CLI

@gosoon gosoon added the kind/bug Categorizes issue or PR as related to a bug. label Jan 31, 2019
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. sig/cli Categorizes an issue or PR as relevant to SIG CLI. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 31, 2019
@WanLinghao
Copy link
Contributor

It maybe caused by network jam, try kubectl get pod -o wide -v=10 to see if the api-server response to slow

@gosoon
Copy link
Contributor Author

gosoon commented Feb 11, 2019

@WanLinghao
Getting data from apiserver is very fast, but the step of kubectl to output the captured json data in a converted format is very slow.

@WanLinghao
Copy link
Contributor

Then my best guess is processing JSON file consumes too much time. kubectl get --server-print=true could address this but it seems only usable after v1.11

@grantcurell
Copy link

@WanLinghao - did you happen to confirm this was an issue. We're still on 1.9.7 and I've got a box where no nodes are reporting pressure any sort, the servers are quite beefy, under minimal load, but I'm seeing API response times of about 5500ms.

@grantcurell
Copy link

To provide additional information: the etcd pod has restarted 24 times. @Nayruden can you post what you saw here?

@Nayruden
Copy link

We have a beefy system with minimal load and are seeing large latency from the api-server.

# time kubectl get nodes
NAME          STATUS    ROLES     AGE       VERSION
server1.lan   Ready     master    17h       v1.9.7

real	0m2.209s
user	0m0.224s
sys	0m0.103s
# time kubectl get pods -o wide --all-namespaces
NAMESPACE        NAME                                    READY     STATUS    RESTARTS   AGE       IP             NODE
kube-system      etcd-server1.lan                        1/1       Running   3          17h       192.168.1.11   server1.lan
kube-system      heapster-5c448886d-wbbcr                1/1       Running   2          17h       10.244.0.19    server1.lan
kube-system      kube-apiserver-server1.lan              1/1       Running   5          17h       192.168.1.11   server1.lan
kube-system      kube-controller-manager-server1.lan     1/1       Running   6          17h       192.168.1.11   server1.lan
kube-system      kube-dns-6f4fd4bdf-dftm8                3/3       Running   6          17h       10.244.0.18    server1.lan
kube-system      kube-flannel-ds-675r4                   1/1       Running   3          17h       192.168.1.11   server1.lan
kube-system      kube-proxy-fb6qn                        1/1       Running   2          17h       192.168.1.11   server1.lan
kube-system      kube-scheduler-server1.lan              1/1       Running   5          17h       192.168.1.11   server1.lan
kube-system      kubernetes-dashboard-5bd6f767c7-5l9mc   1/1       Running   2          17h       10.244.0.17    server1.lan
kube-system      monitoring-grafana-65757b9656-dn4s2     1/1       Running   2          17h       10.244.0.15    server1.lan
kube-system      monitoring-influxdb-66946c9f58-fnljl    1/1       Running   2          17h       10.244.0.14    server1.lan
metallb-system   controller-57d9d74b4d-p2vrv             1/1       Running   2          17h       10.244.0.16    server1.lan
metallb-system   speaker-p2g9r                           1/1       Running   3          17h       192.168.1.11   server1.lan

real	0m2.131s
user	0m0.170s
sys	0m0.057s

No logs I can find are suggestive of what the issue might be. Server has plenty of extra CPU, memory, and HDD I/O that's not being used.

@WanLinghao
Copy link
Contributor

To confirm, is the etcd pod the backend of the cluster?

@grantcurell
Copy link

@WanLinghao correct

@WanLinghao
Copy link
Contributor

Please check if the ectd pod has something wrong by:
kubectl logs etcd-server1.lan -n kube-system

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 15, 2019
@ilanni2460
Copy link

i have same this issue

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 28, 2019
@abdennour
Copy link

I have the same issue. I used it with -v=99 as verbose. The response from the master received quickly however, the rendering takes time.

@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jerry3k
Copy link

jerry3k commented Nov 28, 2019

/reopen

@k8s-ci-robot
Copy link
Contributor

@jerry3k: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jerry3k
Copy link

jerry3k commented Nov 28, 2019

The issue should be reopened as I can't find any solution too!

@rogperez
Copy link

@WanLinghao I am having the same issue, and I tried this command:
kubectl get pod -o wide -v=10
and am seeing this output:

I0325 09:20:14.147549   91727 loader.go:375] Config loaded from file:  /Users/rogerperez/.kube/config
I0325 09:20:14.153025   91727 round_trippers.go:423] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.17.3 (darwin/amd64) kubernetes/06ad960" 'https://35.188.253.254/api?timeout=32s'
I0325 09:22:15.332281   91727 loader.go:375] Config loaded from file:  /Users/rogerperez/.kube/config
I0325 09:22:15.333500   91727 loader.go:375] Config loaded from file:  /Users/rogerperez/.kube/config
I0325 09:22:15.334853   91727 loader.go:375] Config loaded from file:  /Users/rogerperez/.kube/config
I0325 09:22:15.342180   91727 round_trippers.go:443] GET https://35.188.253.254/api?timeout=32s  in 121193 milliseconds
I0325 09:22:15.342192   91727 round_trippers.go:449] Response Headers:
I0325 09:22:15.342809   91727 cached_discovery.go:121] skipped caching discovery info due to Get https://35.188.253.254/api?timeout=32s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
I0325 09:22:15.344194   91727 round_trippers.go:423] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.17.3 (darwin/amd64) kubernetes/06ad960" 'https://35.188.253.254/api?timeout=32s'
I0325 09:22:15.974822   91727 loader.go:375] Config loaded from file:  /Users/rogerperez/.kube/config
I0325 09:22:20.647922   91727 loader.go:375] Config loaded from file:  /Users/rogerperez/.kube/config
I0325 09:22:21.012547   91727 round_trippers.go:443] GET https://35.188.253.254/api?timeout=32s 200 OK in 5668 milliseconds
I0325 09:22:21.012576   91727 round_trippers.go:449] Response Headers:
I0325 09:22:21.012584   91727 round_trippers.go:452]     Audit-Id: 5c228dca-79b1-4bd3-942a-88609bb481a3
I0325 09:22:21.012592   91727 round_trippers.go:452]     Content-Type: application/json
I0325 09:22:21.012598   91727 round_trippers.go:452]     Content-Length: 133
I0325 09:22:21.012611   91727 round_trippers.go:452]     Date: Wed, 25 Mar 2020 15:22:20 GMT
I0325 09:22:21.028550   91727 request.go:1017] Response Body: {"kind":"APIVersions","versions":["v1"],"serverAddressByClientCIDRs":[{"clientCIDR":"0.0.0.0/0","serverAddress":"172.24.64.2:443"}]}
I0325 09:22:21.050839   91727 round_trippers.go:423] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.17.3 (darwin/amd64) kubernetes/06ad960" 'https://35.188.253.254/apis?timeout=32s'
I0325 09:22:21.120790   91727 round_trippers.go:443] GET https://35.188.253.254/apis?timeout=32s 200 OK in 69 milliseconds
I0325 09:22:21.120829   91727 round_trippers.go:449] Response Headers:
I0325 09:22:21.120837   91727 round_trippers.go:452]     Audit-Id: 0419f7db-fad2-45ab-bf1f-749fe580047f
I0325 09:22:21.120847   91727 round_trippers.go:452]     Content-Type: application/json
I0325 09:22:21.120854   91727 round_trippers.go:452]     Date: Wed, 25 Mar 2020 15:22:21 GMT

As you can see the first curl command takes a minute and a half. Is there some way to clear this network jam?

@jerry3k
Copy link

jerry3k commented Mar 25, 2020

@rogperez did you install kubernetes natively or through rancher (rke)?

@rogperez
Copy link

rogperez commented Mar 25, 2020

@jerry3k According to my symlinks I used brew

kubectl -> ../Cellar/kubernetes-cli/1.17.3/bin/kubectl

@jerry3k
Copy link

jerry3k commented Mar 25, 2020

@rogperez What worked for me was to clean up the host nodes and reinstalling kubernetes (but i was using rke). Here were the steps you must do for all your nodes:

  1. Clean up Docker
docker rm -f $(docker ps -qa)
docker rmi -f $(docker images -q)
docker volume rm $(docker volume ls -q)
  1. Unmount Volumes (need su permission for it)
for mount in $(mount | grep tmpfs | grep '/var/lib/kubelet' | awk '{ print $3 }') /var/lib/kubelet /var/lib/rancher; do umount $mount; done
  1. Remove all these directories
sudo rm -rf /etc/ceph \
       /etc/cni \
       /etc/kubernetes \
       /opt/cni \
       /opt/rke \
       /run/secrets/kubernetes.io \
       /run/calico \
       /run/flannel \
       /var/lib/calico \
       /var/lib/etcd \
       /var/lib/cni \
       /var/lib/kubelet \
       /var/lib/rancher/rke/log \
       /var/log/containers \
       /var/log/pods \
       /var/run/calico
  1. Remove Interface
ip address show
ip link delete flannel.1
  1. Clean up your .kube folder (basically delete and make dir again)
  2. Reinstall kubernetes

Like i said i was using Rancher - RKE for my setup and this worked. So everytime I have installed k8s on a fresh VM (or bare-metal) I have always had this issue of kubectl being slow and fortunately the only solution was to clean up and reinstall on the same machines.

@wingerted
Copy link

We sloved this problem,kubelctl cache the result from apiserver but it’s really slow because it fsync everytime ! So we link .kube/cache and http_cache dir to /dev/shm .After that everything works really good!

@gosoon
Copy link
Contributor Author

gosoon commented Aug 31, 2020

We sloved this problem,kubelctl cache the result from apiserver but it’s really slow because it fsync everytime ! So we link .kube/cache and http_cache dir to /dev/shm .After that everything works really good!

@wingerted The ~/.kube directory has very few files, about 3M in a large cluster, and placing the directory in Shared memory didn't improve the speed,still very slow.

@wingerted
Copy link

wingerted commented Aug 31, 2020

We sloved this problem,kubelctl cache the result from apiserver but it’s really slow because it fsync everytime ! So we link .kube/cache and http_cache dir to /dev/shm .After that everything works really good!

@wingerted The ~/.kube directory has very few files, about 3M in a large cluster, and placing the directory in Shared memory didn't improve the speed,still very slow.

@gosoon Ok, I know it's very few files in the cache. But in our environment, the cache dir not in the memory cost a lot.

We also have 8k pod in a namespace

Before we link cache to /dev/shm, it shows

time kubectl get pod -o wide | wc -l
7901
real      0m12.469s
user      0m11.364s
sys       0m1.902s

After

time kubectl get pod -o wide | wc -l
7901
real      0m8.614s
user      0m12.471s
sys       0m2.361s

Well, 25% is not a big improve since 8k pod info is a large data from apiserver to local, we can just get ns to see the diff clear

Before

time kubectl get ns  | wc -l
62
real      0m2.161s
user      0m0.073s
sys       0m0.163s

After

time kubectl get ns  | wc -l
62
real      0m0.082s
user      0m0.135s
sys       0m0.051s

Of cause I clean the cache every time in the test. And our local disk is a SATA HDD.

@gosoon
Copy link
Contributor Author

gosoon commented Sep 2, 2020

We sloved this problem,kubelctl cache the result from apiserver but it’s really slow because it fsync everytime ! So we link .kube/cache and http_cache dir to /dev/shm .After that everything works really good!

@wingerted The ~/.kube directory has very few files, about 3M in a large cluster, and placing the directory in Shared memory didn't improve the speed,still very slow.

@gosoon Ok, I know it's very few files in the cache. But in our environment, the cache dir not in the memory cost a lot.

We also have 8k pod in a namespace

Before we link cache to /dev/shm, it shows

time kubectl get pod -o wide | wc -l
7901
real      0m12.469s
user      0m11.364s
sys       0m1.902s

After

time kubectl get pod -o wide | wc -l
7901
real      0m8.614s
user      0m12.471s
sys       0m2.361s

Well, 25% is not a big improve since 8k pod info is a large data from apiserver to local, we can just get ns to see the diff clear

Before

time kubectl get ns  | wc -l
62
real      0m2.161s
user      0m0.073s
sys       0m0.163s

After

time kubectl get ns  | wc -l
62
real      0m0.082s
user      0m0.135s
sys       0m0.051s

Of cause I clean the cache every time in the test. And our local disk is a SATA HDD.

# time kubectl get pod | wc -l
37865

real	0m41.806s
user	0m12.310s
sys	0m1.167s

# du -sh ~/.kube
17M	/root/.kube

Link cache to /dev/shm doesn't work very well, and it's still very slow.

@MrAmbiG
Copy link

MrAmbiG commented Dec 7, 2020

@rogperez What worked for me was to clean up the host nodes and reinstalling kubernetes (but i was using rke). Here were the steps you must do for all your nodes:

  1. Clean up Docker
docker rm -f $(docker ps -qa)
docker rmi -f $(docker images -q)
docker volume rm $(docker volume ls -q)
  1. Unmount Volumes (need su permission for it)
for mount in $(mount | grep tmpfs | grep '/var/lib/kubelet' | awk '{ print $3 }') /var/lib/kubelet /var/lib/rancher; do umount $mount; done
  1. Remove all these directories
sudo rm -rf /etc/ceph \
       /etc/cni \
       /etc/kubernetes \
       /opt/cni \
       /opt/rke \
       /run/secrets/kubernetes.io \
       /run/calico \
       /run/flannel \
       /var/lib/calico \
       /var/lib/etcd \
       /var/lib/cni \
       /var/lib/kubelet \
       /var/lib/rancher/rke/log \
       /var/log/containers \
       /var/log/pods \
       /var/run/calico
  1. Remove Interface
ip address show
ip link delete flannel.1
  1. Clean up your .kube folder (basically delete and make dir again)
  2. Reinstall kubernetes

Like i said i was using Rancher - RKE for my setup and this worked. So everytime I have installed k8s on a fresh VM (or bare-metal) I have always had this issue of kubectl being slow and fortunately the only solution was to clean up and reinstall on the same machines.

As an rke user I echo this. This worked for me too.

@browseman
Copy link

Just to provide some details, it seems to me that is cause by the difference in the kubectl version and the k8s cluster. Observed the same issue with v1.16.13-eks-2ba888. It's also notable that the slowdown is mostly visible when executing get all and get pods commands, but it's missing when execute get nodes.

Here are the results from the experiments that I did:

# time /tmp/kubectl_1.20.2 get all >/dev/null
real	0m6.451s
user	0m0.661s
sys	0m0.087s

# time /tmp/kubectl_1.19.7 get all >/dev/null
real	0m7.661s
user	0m0.659s
sys	0m0.106s

# time /tmp/kubectl_1.18.15 get all >/dev/null
real	0m6.451s
user	0m0.649s
sys	0m0.085s

# time /tmp/kubectl_1.17.17 get all >/dev/null
real	0m6.433s
user	0m0.631s
sys	0m0.100s

# time /tmp/kubectl_1.16.15 get all >/dev/null
real	0m2.560s
user	0m0.627s
sys	0m0.082s

# time /tmp/kubectl_1.15.12 get all >/dev/null
real	0m2.632s
user	0m0.647s
sys	0m0.058s

Here is the same test using get nodes

# time /tmp/kubectl_1.20.2 get nodes >/dev/null
real	0m1.243s
user	0m0.564s
sys	0m0.086s

# time /tmp/kubectl_1.19.7 get nodes >/dev/null
real	0m1.199s
user	0m0.580s
sys	0m0.077s

# time /tmp/kubectl_1.18.15 get nodes >/dev/null
real	0m1.159s
user	0m0.576s
sys	0m0.074s

# time /tmp/kubectl_1.17.17 get nodes >/dev/null
real	0m1.175s
user	0m0.610s
sys	0m0.048s

# time /tmp/kubectl_1.16.15 get nodes >/dev/null
real	0m1.145s
user	0m0.562s
sys	0m0.085s

# time /tmp/kubectl_1.15.12 get nodes >/dev/null
real	0m1.272s
user	0m0.598s
sys	0m0.060s

@hrittikhere
Copy link

I am not sure why but this is something I am facing with k3s running on openSUSE Leap 15.2. Reinstalling didn't work till now

@fanchuanster
Copy link

jenkins@7b4cc536a0af:~$ time kubectl get namespace -v6
I1130 13:50:18.055615 13278 loader.go:372] Config loaded from file: /var/jenkins_home/.kube/config
I1130 13:50:38.948753 13278 round_trippers.go:454] GET https://36D0F801E271A8098A58237D06FC3BE0.gr7.eu-central-1.eks.amazonaws.com/api?timeout=32s 200 OK in 20892 milliseconds
I1130 13:50:38.957243 13278 round_trippers.go:454] GET https://36D0F801E271A8098A58237D06FC3BE0.gr7.eu-central-1.eks.amazonaws.com/apis?timeout=32s 200 OK in 1 milliseconds
I1130 13:50:38.968196 13278 round_trippers.go:454] GET https://36D0F801E271A8098A58237D06FC3BE0.gr7.eu-central-1.eks.amazonaws.com/apis/apiregistration.k8s.io/v1?timeout=32s 200 OK in 4 milliseconds
I1130 13:50:38.968294 13278 round_trippers.go:454] GET https://36D0F801E271A8098A58237D06FC3BE0.gr7.eu-central-1.eks.amazonaws.com/apis/autoscaling/v2beta1?timeout=32s 200 OK in 4 milliseconds

In my situation, the first call takes 20 seconds, all the rest are OK. any idea? kubectl version is 1.2x
jenkins@7b4cc536a0af:~$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.3", GitCommit:"c92036820499fedefec0f847e2054d824aea6cd1", GitTreeState:"clean", BuildDate:"2021-10-27T18:41:28Z", GoVersion:"go1.16.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-eks-0389ca3", GitCommit:"8a4e27b9d88142bbdd21b997b532eb6d493df6d2", GitTreeState:"clean", BuildDate:"2021-07-31T01:34:46Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}

@hrittikhere
Copy link

For me deallocating the VM and creating a new system with fresh installation started working without delays. Quite strange because I wasn't able to pin point the reason but it works and that's okay with me.

May no one faces the same issue in a production cluster 🙂

@dminGod
Copy link

dminGod commented Jan 1, 2022

We sloved this problem,kubelctl cache the result from apiserver but it’s really slow because it fsync everytime ! So we link .kube/cache and http_cache dir to /dev/shm .After that everything works really good!

On my local instance - I followed these steps and it worked really well

rm -rf ~/.kube/cache
mkdir /dev/shm/kube_cache
ln -s /dev/shm/kube_cache ~/.kube/cache

I think it could be that my hdd was slow and kubectl is trying to read a lot of files on every command -- you can see this in action if you run any kubectl command with a high verbosity -- -v 8 -- this basically moves your kubectl local cache to the shared memory (RAM). (Please do your due diligence before doing stuff)

@bigtob
Copy link

bigtob commented Jan 22, 2022

I'm not sure of the underlying workings but...

mv ~/.kube/cache ~/.kube/cache-$(date +%d%m%y)
kubectl get nodes ; kubectl get pods -A

...the cache directory will be repopulated and subsequent commands appear to be much quicker thereafter.

@kotyara85
Copy link

Same here on 1.21. Even on minikube it takes about 2s to update discovery cache. kubectl also does this pretty often which is annoying

@kong62
Copy link

kong62 commented Apr 13, 2023

version 1.27

[#] time kubectl get pod -o wide | wc -l
3

real 0m0.044s
user 0m0.042s
sys 0m0.023s

[#] time kubectl get pod -n prod -o wide | wc -l
3073

real 0m3.452s
user 0m3.615s
sys 0m0.308s

@RampedIndent
Copy link

RampedIndent commented Jan 18, 2024

ps. maybe check your .kube/config file to see if it's 1.8 million lines long like mine

@adamyodinsky
Copy link

rm -rf ~/.kube/cache

This helped me. and also removing some clutter from my kube config file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/cli Categorizes an issue or PR as relevant to SIG CLI.
Projects
None yet
Development

No branches or pull requests