JDOS(JD Datacenter Operation System) is the very large-scale container cluster system that running in JD's datacenters across the world. It was designed and developed based on Kubernetes. Today, almost all the JD's business has been deployed and running on JDOS. At present, the number of containers in JD's production environment has been millions. How to manage such large-scale clusters is a challenging issue for JDOS developers and operators. However, JD have only 2 full-time SREs to manage the clusters. This presentation will share some of the following experiences: 1.Node Component's detection and management; 2.Master Component's fault detection and failure recovery, especially for the etcd nodes; 3.How to significantly reduce apiserver requests, in order to build a much larger k8s cluster.