US ATLAS Federated Operations Team Information
The US ATLAS Federated Operations (FedOps) Team supports the operation of centrally-managed, containerized services via SLATE across the U.S. ATLAS complex. The team is made up of sysadmins from all the U.S. sites, who coordinate service deployments, monitor service performance and provide first line operational support.
Information for Operations Shifters
Not yet
Information for Sites
Setting Up a New Site
Helpful documentation for setting up a new site
The main contents of this section are based on (copied from) this Google document written by Lincoln Bryant and Ilija Vukotic:
https://docs.google.com/document/d/1NOc7EyOZpNTKwlXU_lP7wbiKHpwVqmZXNCYCGlOaXuI
Also included is information from the SLATE project documentation website:
https://slateci.io/docs/cluster/
The intention of this section is to provide a single document containing a comprehensive set of instructions to bring a new bare metal slate server online running a squid service that will work with both ATLAS and OSG jobs. The resulting squid instance can be used for both Frontier and cvmfs. The installation procedure has been carefully checked on a test server specifically setup for validating the procedures documented here. Please report documentation issues <xxxx@bnl.gov>.
Steps to setup a SLATE-based squid service on a new server
The steps in the process are:
- Setup a new server with Linux and whatever the local environment requires.
- Install SLATE.
- Install Kubernetes and configure a cluster.
- Setup squid.
- Add the the new squid to the ATLAS CRIC and the OSG topology.
- Modify the site gatekeepers to know about the new squid.
- Setup SLATE monitoring if desired.
The attached file, SLATE-install-doc-vi.txt, shows the bash commands and their outputs for steps 2 and 3. This file was created on special test server and there are places where one would need to modify the command to refer to the server that is being setup rather than the test environment.
The following instructions assume root access on the target server.
1. Setup the new server with Linux
While you can use a virtual machine, use of a bare metal server is preferred. The server should meet these minimum requirements to run a squid service:
- 16GB RAM
- 2 CPU cores
- 100GB Disk
- 1Gbps Connectivity
- Port 3401/udp for external WLCG monitoring
- Port 32200/tcp for client access (e.g. within the site)
NB: If the server will also run an XCache service than the hardware requirements are significantly higher - see: xxx. In this case, reserve one disk for use exclusively by the squid disk cache and remove it from the disk array servicing XCache.
Set this server up with CentOS 7 (or equivalent) and the usual local environment (accounts, firewalls, system management tools etc.) It's always a good idea to reboot the server before proceeding to the next step.
2. Install Slate
Following the SLATE installation documentation, install SLATE and then Kubernetes. NB: The # character normally used for the root command prompt has been replaced the % character to avoid formatting issue in what is displayed by Drupal. Also many commands have their outputs suppressed to improve readability. See the attached file slate-install-doc-v1.txt which contains a session run on a test server showing the full output of the commands.
If you do not have a SLATE account, create one and obtain a SLATE token for your new server. <More instructions needed>. Once you have the account you need to join the SLATE group for your site or create a group for your site if it is new.
Using the SLATE token that you created when registering with SLATE in place of the token shown (d43ZtGa0DDEjulKrbvlzXe) , create the file slate-token.sh containing:
#!/bin/sh
mkdir -p -m 0700 "$HOME/.slate"
if [ "$?" -ne 0 ] ; then
echo "Not able to create $HOME/.slate" 1>&2
exit 1
fi
echo "d43ZtGaODDEjulKrbvlzXe" > "$HOME/.slate/token"
if [ "$?" -ne 0 ] ; then
echo "Not able to write token data to $HOME/.slate/token" 1>&2
exit 1
fi
chmod 600 "$HOME/.slate/token"
echo 'https://api.slateci.io:443' > ~/.slate/endpoint
echo "SLATE access token successfully stored"
The suggested location for slate-token.sh is /root but it can be anywhere on the system. The token will be stored /root. Now execute the script slate-token.sh script:
[root@iut2-slate01 ~]% chmod 755 slate-token.sh
[root@iut2-slate01 ~]% ./slate-token.sh
SLATE access token successfully stored
Next download the SLATE tarball and check that the download is not corrupted:
[root@server ~]% curl -LO https://jenkins.slateci.io/artifacts/client/slate-linux.sha256
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 85 100 85 0 0 292 0 --:--:-- --:--:-- --:--:-- 293
{root@server ~]% curl -LO https://jenkins.slateci.io/artifacts/client/slate-linux.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1896k 100 1896k 0 0 6585k 0 --:--:-- --:--:-- --:--:-- 6607k
[root@server ~]% sha256sum -c slate-linux.sha256
slate-linux.tar.gz: OK
See the attached file slate-install-doc-v1.txt to see the messages generated by running the curl commands.
If you do not get final slate-linux.tar.gz OK message after retrying you are likely receiving a corrupted or compromised version of SLATE. Do not proceed and ask for help.
Now install SLATE on the server:
[root@server ~]% tar xzvf slate-linux.tar.gz
slate
[root@server ~]% ls -ltr
total 5992
[Lines removed]
-rwxr-xr-x 1 1000 1000 4123632 Jul 30 14:37 slate
-rwxr-xr-x 1 root root 410 Aug 4 10:35 slate-token.sh
-rw-r--r-- 1 root root 85 Aug 4 10:36 slate-linux.sha256
-rw-r--r-- 1 root root 1941755 Aug 4 10:37 slate-linux.tar.gz
[root@server ~]% mv slate /usr/local/bin/slate
This completes installing SLATE. To check that SLATE is working list the SLATE clusters:
[root@server ~]% slate cluster list
Name Admin ID
Rice-CRC-OCI rice-crc cluster_wRzlo7q62VM
atlas-af-proto mwt2 cluster_CwuDuKE43GA
[continues displaying info about more clusters]
3. Install Kubernetes (K8s) and configure a cluster
This section is based on the SLATE documentation with modifications to support the standard US ATLAS squid setup.
[root@server ~]% yum install yum-utils -y
{Usual yum output]
[root@server ~]% yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
[Usual yum output]
[root@server ~]% systemctl enable --now docker
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
[root@server ~]% cat <<EOF > /etc/yum.repos.d/kubernetes.repo
> [kubernetes]
> name=Kubernetes
> baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
> enabled=1
> gpgcheck=1
> repo_gpgcheck=1
> gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
> EOF
[root@server ~]% yum install -y kubeadm kubectl kubelet --disableexcludes=kubernetes
[Usual yum output]
The first step is to disable selinux, turn off swapping, and turn off the local firewall:
[root@server ~]% setenforce 0
setenforce: SELinux is disabled
[root@server ~]% sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux
[root@server ~]% swapoff -a
[root@server ~]% sed -e '/swap/s/^/#/g' -i /etc/fstab
[root@server ~]% systemctl disable --now firewalld
[root@server ~]% cat <<EOF > /etc/sysctl.d/k8s.conf
> net.bridge.bridge-nf-call-ip6tables = 1
> net.bridge.bridge-nf-call-iptables = 1
> EOF
[root@server ~]% sysctl --system
[SLATE-install-doc-v1.txt shows the output]
Now use yum to install packages that Kubernetes requires:
[root@server ~]% yum install yum-utils -y
{Usual yum output]
[root@server ~]% yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
{Usual yum output]
[root@server ~]% yum install docker-ce docker-ce-cli containerd.io -y
docker-ce-stable
{Usual yum output]
[root@server ~]% systemctl enable --now docker
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
[root@server ~]% cat <<EOF > /etc/yum.repos.d/kubernetes.repo
> [kubernetes]
> name=Kubernetes
> baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
> enabled=1
> gpgcheck=1
> repo_gpgcheck=1
> gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
> EOF
[root@server ~]% yum install -y kubeadm kubectl kubelet --disableexcludes=kubernetes
kubernetes/signature
[Usual yum output]
Next enable Kubelet using systemctl:
[root@server ~]% systemctl enable --now kubelet
Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /usr/lib/systemd/system/kubelet.service.
[root@server ~]% kubeadm init --pod-network-cidr=192.168.0.0/16
[Many lines of output shown in SLATE-install-doc-v2.txt]
[root@server ~]% export KUBECONFIG=/etc/kubernetes/admin.conf
[root@server ~]% kubectl apply -f https://docs.projectcalico.org/v3.8/manifests/calico.yaml
[Many lines of output shown in SLATE-install-doc-v1.txt]
[root@server ~]% kubectl get nodes
NAME STATUS ROLES AGE VERSION
server.example.edu Ready control-plane,master 5h52m v1.21.3
[root@server ~]% kubectl get nodes
NAME STATUS ROLES AGE VERSION
server.example.edu Ready control-plane,master 6h1m v1.21.3
[root@server ~]% kubectl taint nodes --all node-role.kubernetes.io/master-
node/server.example.edu untainted
The next step is to install the metallb load balancer which is used for setting up the IP addresses and not for load balancing. NB: A load balancer must installed even for clusters with a single master node and no worker nodes. First install the metallb package and then set up the configuration being sure to change the example code to reflect the proper IP address range:
[root@server ~]% kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.8.1/manifests/metallb.yaml
[Usual output from applying a yaml file]
[root@iut2-slate01 ~]% cat <<EOF > metallb-config.yaml
> apiVersion: v1
> kind: ConfigMap
> metadata:
> namespace: metallb-system
> name: config
> data:
> config: |
> address-pools:
> - name: default
> protocol: layer2
> addresses:
> - 149.165.224.242/32
> EOF
[root@iut2-slate01 ~]% kubectl apply -f metallb-config.yaml
configmap/config created