1
0
Fork 0
tinyrocs/docs/_source/ubuntu.rst

187 lines
5.2 KiB
ReStructuredText

======
Ubuntu
======
Ubuntu is a GNU/Linux distribution downstream from Debian
with proprietary bits added. It is used in many Top500 clusters.
It is used by tinygrad on the tinybox.
`<https://ubuntu.com>`_
Documentation
=============
Ubuntu server docs.
`<https://ubuntu.com/server/docs>`_
Install
=======
Get Ubuntu and install.
The upstream tinybox runs Ubuntu 22.04 LTS. Run that, perhaps.
`<https://ubuntu.com/download/server>`_
`<https://releases.ubuntu.com/22.04.3/ubuntu-22.04.3-live-server-amd64.iso>`_
Write to USB drive, make sure device is correct...
.. code-block:: sh
sudo dd if=ubuntu-22.04.3-live-server-amd64.iso of=/dev/sdXX bs=16M status=progress oflag=sync
Configuration
=============
Setup, perhaps as so:
* ssh keys.
Packages
========
Update and install new packages from Ubuntu repos.
.. code-block:: sh
# Use IPv4 for apt
echo 'Acquire::ForceIPv4 "true";' | sudo tee /etc/apt/apt.conf.d/99force-ipv4
# Set up apt-cache
echo 'Acquire::http::Proxy "http://192.168.1.1:3142";' | sudo tee /etc/apt/apt.conf.d/90cache
sudo sed -i -e 's/https:/http:/g' /etc/apt/sources.list.d/*.list
sudo apt update
sudo apt dist-upgrade
sudo apt install bc bison build-essential ccache cmake-curses-gui colordiff \
cpufrequtils devscripts dpkg-dev equivs flex gfortran git haveged host \
libbz2-dev libdrm-dev libedit-dev libegl1-mesa-dev libelf-dev libffi-dev \
libhdf5-openmpi-dev liblzma-dev libncurses-dev libnuma-dev \
libopenmpi-dev libpomp2-dev libsqlite3-dev libssl-dev libsystemd-dev \
libudev-dev libxml2-dev libxml2-utils libz3-dev libzstd-dev lshw \
lzma-dev mesa-common-dev net-tools ninja-build nlohmann-json3-dev \
ntpsec-ntpdate nvme-cli ocl-icd-opencl-dev openmpi-bin pahole pkg-config \
portaudio19-dev python3-argcomplete python3-pip python3-pygments \
python3-venv python3-virtualenv python3-yaml quilt rsync rsyslog sshfs \
sudo swig traceroute vim xxd python3-sphinx git-lfs hwdata \
lua5.3 liblua5.3-dev libmpfr-dev libmsgpack-dev libfmt-dev \
environment-modules python3-numpy pybind11-dev libopengl-dev zip zsh \
hpcc gawk googletest libdw-dev libgtest-dev libsigsegv2 \
libbabeltrace-dev libbabeltrace1 libbison-dev libncurses5-dev \
libtext-unidecode-perl tex-common texinfo ucx-utils libucx-dev \
librdmacm-dev
OS Configuration
----------------
Operating system configuration.
.. code-block:: sh
# Lazy sudo
sed -i -e 's/%sudo\tALL=(ALL:ALL) ALL/%sudo ALL=(ALL) NOPASSWD: ALL/g' /etc/sudoers
* After all packages installed, add to groups:
sudo adduser debian audio
sudo adduser debian dialout
sudo adduser debian kvm
sudo adduser debian render
sudo adduser debian video
# Disable various startup packages
systemctl disable XXX
User Configuration
==================
Set up the user account.
Configure to use various caching services already available in the cluster.
ccache
------
There is a ``redis`` ``ccache`` server on the tinyrocs network.
Edit ``~/.config/ccache/ccache.conf`` thusly:
.. code-block::
remote_storage = redis://192.168.1.2
remote_only = true
reshare = true
PATH
----
Add the ROCm binary path and ccache (XXX) to ``~/.bashrc``:
.. code-block:: sh
PATH=/usr/lib/ccache:/opt/rocm/bin:$PATH
Python pip cache
----------------
Set up to use LAN ``pip`` cache ``pydev`` if available,
by editing ``~/.config/pip/pip.conf``, such as:
.. code-block:: sh
[global]
trusted-host = 192.168.1.3
index-url = http://192.168.1.3:4040/root/pypi/+simple/
[search]
index = http://192.168.1.3:4040/root/pypi/
ROCm
====
ROCm for Ubuntu.
.. code-block:: sh
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
wget https://repo.radeon.com/amdgpu-install/6.0.2/ubuntu/jammy/amdgpu-install_6.0.60002-1_all.deb
sudo apt install ./amdgpu-install_6.0.60002-1_all.deb
sudo apt update
sudo apt install amdgpu-dkms
sudo apt install rocm-hip-libraries
# sudo reboot
sudo apt install rocm-hip-sdk rocm-ml-sdk rocm-opencl-sdk rocm-openmp-sdk \
rocm-bandwidth-test rocm-clang-ocl amdgpu-dkms-headers rocm \
llvm-amdgpu llvm-amdgpu-runtime rocm-dkms rocm-dev rocm-libs \
rocm-khronos-cts rocm-ocltst rocm-validation-suite \
smi-lib-amdgpu smi-lib-amdgpu-dev \
libstdc++-12-dev python-is-python3 \
vulkan-amdgpu libvulkan-dev libvulkan-volk-dev vulkan-tools \
vulkan-validationlayers-dev glslang-dev glslang-tools
# sudo apt purge --autoremove libc6-dev-i386 libc6-dev-x32
sudo apt install gcc-multilib
Misc
====
More.
.. code-block:: sh
systemctl disable ModemManager.service nvmefc-boot-connections.service \
nvmf-autoconnect.service open-iscsi.service ubuntu-advantage.service \
ufw.service unattended-upgrades.service update-notifier-download.timer \
update-notifier-motd.timer \
apport-autoreport.path apport-autoreport.timer apport-forward.socket \
apt-daily.timer apt-daily-upgrade.timer fwupd-refresh.timer \
remote-fs.target iscsid.socket motd-news.timer \
ua-reboot-cmds.service ua-timer.timer
sudo snap install nvtop
GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1 selinux=0 apparmor=0"
lvresize --resizefs -L 500G /dev/ubuntu-vg/ubuntu-lv
XXX Disable sound card.
XXX long time to wait for network to be configured ... XXX