iSCSI-Storage@FreeBSD/FreeNAS: iSER, der Block-Storage-Turbo
Performance auf shared iSCSI Storage ist Pflicht, wenn man viele VMs auf einem Proxmox-Cluster betreiben möchte.
Wie sich dabei die einzelnen Protokolle und Implementierungen verhalten, ist hier in der Übersicht zu sehen:
Zeta Systems hat genauestens untersucht, welche Performance aktuell von welchem Protokoll zu erwarten ist:
http://www.zeta.systems/blog/2016/09/21/iSCSI-vs-iSER-vs-SRP-on-Ethernet-&-InfiniBand/
Es soll somit iSER sein. Beschäftigt man sich damit, stellt man fest, dass die Technologie bereits sehr lange existiert, aber scheinbar bisher ein wenig unter gegangen ist.
Eine Präsentation von 2008:
https://www.snia.org/sites/default/orig/sdc_archives/2008_presentations/wednesday/PeterDunlap_OpenSolaris_iSER.pdf
Die Infiniband-Basics auf dem FreeBSD-Wiki:
https://wiki.freebsd.org/InfiniBand
Und hier ein Beitrag von 2016 auf der FreeBSD SCSI Liste:
https://freebsd-scsi.freebsd.narkive.com/sQG1P9kO/help-to-make-iser-working
Hier trägt Ben Rubson das wesentliche zusammen:
Ben RUBSON 4 years ago Hi, I went with a fresh new install to have a clean status. What I did : // FreeBSD 11-RC1 installation # echo "WITH_OFED=YES" > /etc/make.conf # cd /usr/src/ ; make buildkernel KERNCONF=GENERIC ; make installkernel KERNCONF=GENERIC # cd /usr/src/sys/modules/mlx4 ; make ; make install # cd /usr/src/sys/modules/mlxen ; make ; make install # cd /usr/src/sys/modules/iser ; make ; make install # echo "mlx4_load=YES" >> /boot/loader.conf # echo "mlxen_load=YES" >> /boot/loader.conf # echo "iser_load=YES" >> /boot/loader.conf # reboot Everything went fine, no code modition needed or whatever, perfect. # iscsictl -A -t iqn.2012-06.com.test:target1 -p 192.168.2.2 On the target, I get : tgtd[1912]: tgtd: login_start(502) Target iqn.2012-06.com.test:target1 is RDMA, but conn cid:0 from iqn.1994-09.org.freebsd:srv1 is TCP Perfect, the target is correctly iSER enabled. # iscsictl -A -t iqn.2012-06.com.test:target1 -p 192.168.2.2 -r I still get the following error : iscsid[1062]: 192.168.2.2 (iqn.2012-06.com.test:target1): failed to connect to 192.168.2.2 using ICL kernel proxy: ISCSIDCONNECT: Input/output error kernel: INFO: iser_free_ib_conn_res: freeing conn 0xfffff80147771000 cma_id 0 qp 0 kernel: DEBUG: iser_conn_connect: before cv_wait: 0xfffff80147771000 kernel: INFO: iser_cma_handler: event 1 status -19 conn 0xfffff80147771000 id 0xfffff8005f0e5400 kernel: ERROR: iser_connect_error: conn 0xfffff80147771000 kernel: DEBUG: iser_conn_connect: after cv_wait: 0xfffff80147771000 iscsid[853]: child process 1062 terminated with exit status 1 Waiting for your instructions, would really be nice to have it working for 11 release. Many thanks ! Best regards, Ben ... Ben RUBSON 4 years ago Permalink ... This was missing : cd /usr/src/sys/modules/mlx4ib/ ; make ; make install ; kldload mlx4ib.ko Now it works perfectly... I'm really sorry for the useless noise on this list... I hope this above howto will help others. Thank you again for your support, and sorry again :-/ Benchmarks will of course follow. Best regards, Ben
iSer läuft mittlerweile auf FreeBSD, aber eben nicht unter FreeNAS.
Ein Kommentar auf iXsystems Jira lässt hoffen, klar ist das aber nicht:
Kris Moore added a comment - 11/Jan/20 2:53 PM We're considering some changes this year which may include down the road iSER support. Closing for now, but stay tuned later in 2020.
Es sieht also danach aus, dass man sich damit gerade beschäftigt. Ob das Feature aber TrueNAS Core oder TrueNAS Enterprise sein wird, wissen wir nicht.
Daher ist der Weg für uns erst einmal, dass wir das selbst implementieren. Wir brauchen das JETZT, nicht erst im Sommer 2020.
Wie die Story weiter geht, erfahrt Ihr nun.
FreeNAS hat sich bei unseren Tests als nicht gangbar erwiesen.
Problem ist einerseits die ungenügende Mellanox-Unterstützung und andererseits die allgemeine Bestrebung von OSS-Unternehmen, Teile des Codes closed source zu machen und dies als Enterprise-Version zu vermarkten.
Support alleine macht nicht glücklich. Man versucht, dauerhafte Einnahmen zu etablieren, um planbarer zu sein.
Aber zurück zum Thema iSER.
Da Linux eine hervorragende iSER Unterstützung mitbringt und eine der führenden Distributionen sowieso schon im Haus ist, lag es am nahe, zu unseren Proxmox VE Cluster-Knoten auch den restlichen Bereich damit abzudecken: Proxmox iSER/iSCSI-Server, Proxmox Mail Gateway, Proxmox Backupserver.
Proxmox setzen wir auf mit einem ZFS-Pool raidz-1 auf zwei gespiegelten Intel NVMe SSD DC P4511 mit je 900GB. Das hätte es zwar nicht gebraucht, aber die lagen noch im Regal.
Erster Schritt, evtl. noch vorhandene ZFS-Metadaten löschen, und zwar am Ende und am Anfang der Platten.
dd bs=512 if=/dev/zero of=/dev/nvme0n1 count=2048 seek=$(($(blockdev --getsz /dev/nvme0n1) - 2048))
dd bs=512 if=/dev/zero of=/dev/nvme0n1 count=2048
dd bs=512 if=/dev/zero of=/dev/nvme1n1 count=2048 seek=$(($(blockdev --getsz /dev/nvme1n1) - 2048))
dd bs=512 if=/dev/zero of=/dev/nvme1n1 count=2048
Nach Erstinstallation bootet das System aber nicht.
Ok., Proxmox VE im Debug Modus starten, dabei stoppt das System an mehreren Stellen, was man mit exit weiter laufen lässt. Am Ende hat man eine root Shell mit allen notwendigen Werkzeugen.
Ein „zpool import“ offenbart, dass der Pool online ist.
Weiterhin lässt er sich problemlos importieren und mounten.
mkdir /mnt/rpool
zpool import -R /mnt/rpool rpool
zpool export
Das Problem war hier der BIOS legacy mode.
Nach einem „zpool destroy -f rpool“, Umstellung auf UEFI und Neuinstallation mit UEFI-Boot von Proxmox VE 6.4, war das Boot-Problem behoben.
Basisinstallation iSCSI/Infiniband Tools, opensm Infiniband Subnet Manager, NFS-Server und targetcli-fb zum managen von iSER Devices:
apt-get install apt-file net-tools opensm nfs-kernel-server rtr-tools scsitools scsitools targetcli-fb librdmacm1 ibacm infiniband-diags srptools vim ibutils ibverbs-providers ibverbs-utils rdmacm-utils perftest tgt open-iscsi lsscsi tgt multipath-tools -y
apt-file update
Anlegen der /etc/rc.local mit folgendem Inhalt:
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.
# run at startup
echo 0 > /proc/sys/kernel/hung_task_timeout_secs
/usr/local/bin/iser-performance-tuning.sh
exit 0
Anlegen des iser-performance-tuning.sh Skripts /usr/local/bin/iser-performance-tuning.sh:
#!/bin/bash
target=pvecn0
verbose=0
messenger(){
if [ "$verbose" = "1" ]
then printf "$MSG\n"
fi
}
# iSER LIO server is target, clients are initiators
if [ "$(hostname)" = "$target" ]
then iSERtype="target"
else iSERtype="initiator"
fi
MSG="iSER type is:\t$iSERtype"
messenger
irq_affinity(){
#relevant for target and initiator
MSG="IRQ affinity setup..."
messenger
#service irqbalancer stop 2>&1>/dev/null
IRQS=$(cat /proc/interrupts | egrep 'mlx4|mlx5' | awk '{print $1}' | sed 's/://')
cores=($(seq 1 $(grep -c processor /proc/cpuinfo)))
i=0
#echo ${#cores[@]}
for IRQ in $IRQS
do
core=${cores[$i]}
let "mask=2**(core-1)"
#printf "IRQ\t:\t$IRQ\nIndex\t:\t$i\ncore\t:\t$core\nmask\t:\t$mask\n"
echo $(printf "%x" $mask) > /proc/irq/$IRQ/smp_affinity
i=$(($i+1))
if [ "$i" = "${#cores[@]}" ]; then i=0
fi
done
}
cpu_performance_scaling(){
# relevant for target and initiator
MSG="CPU performance setup..."
messenger
for i in $(ls /sys/devices/system/cpu/cpu[0-9]*/cpufreq/scaling_governor | sed -e "s/ /\n/g")
{
echo performance >$i
}
}
block_layer_staging(){
# relevant for initiators only
MSG="Block layer staging..."
messenger
# set IO scheduler to do no-operation.
# IO schedulers try to accelerate HDD access time by minimizing seeks.
# When working with SAN targets normally it is better to let the target machine do these optimizations if needed (normally a single LUN is not made of a single HDD...).
#In addition, SDDs do not suffer from seek time.
for i in $(ls /sys/block/[a-z]*/queue/scheduler)
{
echo "noop" >$i 2>&1>/dev/null
}
# Normally the block layer will try to merge IOs to consecutive offsets.
# On fast SAN networks it may be better not to merge, and save the CPU utilization.
for i in $(ls /sys/block/[a-z]*/queue/nomerges)
{
echo 2 >$i
}
# The system uses physical devices to gather randomness for its random numbers generator.
# Can save some utilization by turning this off.
for i in $(ls /sys/block/[a-z]*/queue/add_random)
{
echo 0 >$i
}
# Deliver IO completion on the same core that handled the request.
for i in $(ls /sys/block/[a-z]*/queue/rq_affinity)
{
echo 1 >$i
}
}
set_nv_huge_pages(){
# relevant for targets only
MSG="Huge paging setup..."
messenger
# In case of kernel space target suech as LIO or SCST, set a low (or zero) number of HugePages (Increases page-cache)
echo 0 > /proc/sys/vm/nr_hugepages
}
if [ "$iSERtype" = "target" ]
then
MSG="Running target optimizations..."
messenger
irq_affinity
cpu_performance_scaling
set_nv_huge_pages
fi
if [ "$iSERtype" = "initiator" ]
then
MSG="Running initiator optimizations..."
messenger
irq_affinity
cpu_performance_scaling
block_layer_staging
fi
Anlegen von /usr/local/bin/iscsi-autologin.sh:
#!/bin/bash
### enable autologin to iSCSI target ###
iscsiadm --mode node -T iqn.2020-03.de.pawott.pvecn0.hddpool -o update -n node.startup -v automatic
iscsiadm --mode node -T iqn.2020-03.de.pawott.pvecn0.ssdpool -o update -n node.startup -v automatic
cd /etc/iscsi
grep -R node.startup * | grep -v manual
SSH-Keys id_rsa und id_rsa.pub erzeugen und verteilen auf alle Knoten nach
/root/.ssh/
/etc/pve/priv/authorized_keys
Einstellungen für remote shell
/root/.bashrc
# ~/.bashrc: executed by bash(1) for non-login shells.
# If not running interactively, don't do anything
case $- in
*i*) ;;
*) return;;
esac
# Note: PS1 and umask are already set in /etc/profile. You should not
# need this unless you want different defaults for root.
# PS1='${debian_chroot:+($debian_chroot)}\h:\w\$ '
# umask 022
# You may uncomment the following lines if you want `ls' to be colorized:
# export LS_OPTIONS='--color=auto'
# eval "`dircolors`"
alias ls='ls $LS_OPTIONS'
alias ll='ls $LS_OPTIONS -l'
alias l='ls $LS_OPTIONS -lA'
#
# Some more alias to avoid making mistakes:
# alias rm='rm -i'
# alias cp='cp -i'
# alias mv='mv -i'
if [ -f ~/.bash_aliases ]; then
. ~/.bash_aliases
fi
if [ -f /etc/bash.bashrc.local ]; then
. /etc/bash.bashrc.local
fi
Farbcodes in Ausgaben:
/etc/rc.status
# Definition of boot script return messages
#
# The bootscripts should use the variables rc_done and rc_failed to
# report whether they failed or succeeded. See /etc/init.d/skeleton for
# an example how the shell functions rc_status and rc_reset are used.
#
# These functions make use of the variables rc_done and rc_failed;
# rc_done_up and rc_failed_up are the same as rc_done and rc_failed
# but contain a terminal code to move up one line before the output
# of the actual string. (This is particularly useful when the script
# starts a daemon which produces user output with a newline character)
#
# The variable rc_reset is used by the master resource control script
# /etc/init.d/rc to turn off all attributes and switch to the standard
# character set.
#
# \033 ascii ESCape
# \033[G move to column (linux console, xterm, not vt100)
# \033[C move columns forward but only upto last column
# \033[D move columns backward but only upto first column
# \033[A move rows up
# \033[B move rows down
# \033[1m switch on bold
# \033[31m switch on red
# \033[32m switch on green
# \033[33m switch on yellow
# \033[m switch off color/bold
# \017 exit alternate mode (xterm, vt100, linux console)
# \033[10m exit alternate mode (linux console)
# \015 carriage return (without newline)
if test -z "$LINES" -o -z "$COLUMNS" ; then
eval `exec 3<&1; stty size <&3 2>/dev/null | (read L C; \
echo LINES=${L:-24} COLUMNS=${C:-80})`
fi
test $LINES -eq 0 && LINES=24
test $COLUMNS -eq 0 && COLUMNS=80
export LINES COLUMNS
# Make sure we have /sbin and /usr/sbin in PATH
case $PATH in
*sbin*)
;;
*)
export PATH=/sbin:/usr/sbin:$PATH
;;
esac
if test -t 1 -a "$TERM" != "raw" -a "$TERM" != "dumb" && stty size <&1 > /dev/null 2>&1 ; then
esc=`echo -en "\033"`
extd="${esc}[1m"
warn="${esc}[1;31m"
done="${esc}[1;32m"
attn="${esc}[1;33m"
norm=`echo -en "${esc}[m\017"`
stat=`echo -en "\015${esc}[${COLUMNS}C${esc}[10D"`
rc_done="${stat}${done}done${norm}"
rc_running="${stat}${done}running${norm}"
rc_failed="${stat}${warn}failed${norm}"
rc_missed="${stat}${warn}missing${norm}"
rc_skipped="${stat}${attn}skipped${norm}"
rc_dead="${stat}${warn}dead${norm}"
rc_unused="${stat}${extd}unused${norm}"
rc_unknown="${stat}${attn}unknown${norm}"
rc_done_up="${esc}[1A${rc_done}"
rc_failed_up="${esc}[1A${rc_failed}"
rc_reset="${norm}${esc}[?25h"
rc_save="${esc}7${esc}[?25l"
rc_restore="${esc}8${esc}[?25h"
function rc_cuu () { test $1 -eq 0 && return; echo -en "\033[${1}A"; }
function rc_cud () { test $1 -eq 0 && return; echo -en "\033[${1}B"; }
function rc_timer_on () {
# Draw seconds of running timout to column.
# Two arguments: timeout in seconds and offset
local n=$1
local c=$2
(trap "exit 0" TERM
while test $((n--)) -gt 0; do
sleep 1;
if test $n -gt 9 ; then
echo -en "${attn}\015${esc}[${c}C(${n}s)${norm} "
else
echo -en "${attn}\015${esc}[${c}C( ${n}s)${norm} "
fi
done) & _rc_timer_pid=$!
}
function rc_timer_off () {
if test -n "$_rc_timer_pid" ; then
kill -TERM $_rc_timer_pid > /dev/null 2>&1
fi
unset _rc_timer_pid
}
else
esc=""
extd=""
warn=""
done=""
attn=""
norm=""
stat=""
rc_done="..done"
rc_running="..running"
rc_failed="..failed"
rc_missed="..missing"
rc_skipped="..skipped"
rc_dead="..dead"
rc_unused="..unused"
rc_unknown="..unknown"
rc_done_up="${rc_done}"
rc_failed_up="${rc_failed}"
rc_reset=""
rc_save=""
rc_restore=""
function rc_cuu () { return; }
function rc_cud () { return; }
function rc_timer_on () { return; }
function rc_timer_off () { return; }
fi
_rc_service=${0##*/[SK][0-9][0-9]}
_rc_status=0
_rc_status_all=0
_rc_todo=$1
function rc_check ()
{
_rc_status_ret=$?
test $_rc_status_ret -eq 0 || _rc_status=$_rc_status_ret
test $_rc_status -eq 0 || _rc_status_all=$_rc_status
return $_rc_check_ret
}
function rc_reset ()
{
_rc_status=0
_rc_status_all=0
rc_check
return 0
}
if test "$_rc_todo" = "status" ; then
function rc_status ()
{
rc_check
_rc_status_ret=$_rc_status
local i
for i ; do
case "$i" in
-v|-v[1-9]|-v[1-9][0-9])
local vrt=""
local out=1
local opt="en"
test -n "${i#-v}" && vrt="$vrt${esc}[${i#-v}A" || opt="e"
case "$_rc_status" in
0) vrt="$vrt$rc_running"; ;; # service running
1) vrt="$vrt$rc_dead" ; out=2 ;; # service dead (but has pid file)
2) vrt="$vrt$rc_dead" ; out=2 ;; # service dead (but has lock file)
3) vrt="$vrt$rc_unused" ; ;; # service not running
4) vrt="$vrt$rc_unknown"; ;; # status is unknown
esac
echo -$opt "$rc_save$vrt$rc_restore" 1>&$out
# reset _rc_status to 0 after verbose case
_rc_status=0 ;;
-r) rc_reset ;;
-s) echo -e "$rc_skipped" ; rc_failed 3 ;;
-u) echo -e "$rc_unused" ; rc_failed 3 ;;
*) echo "rc_status: Usage: [-v[] [-r]|-s|-u]" 1>&2 ; return 0 ;;
esac
done
return $_rc_status_ret
}
elif test -n "$_rc_todo" ; then
function rc_status ()
{
rc_check
test "$_rc_status" -gt 7 && rc_failed 1
_rc_status_ret=$_rc_status
case "$_rc_todo" in
stop)
# program is not running which
# is success if we stop service
test "$_rc_status" -eq 7 && rc_failed 0 ;;
esac
local i
for i ; do
case "$i" in
-v|-v[1-9]|-v[1-9][0-9])
local vrt=""
local out=1
local opt="en"
test -n "${i#-v}" && vrt="$vrt${esc}[${i#-v}A" || opt="e"
case "$_rc_status" in
0) vrt="$vrt$rc_done" ; ;; # success
1) vrt="$vrt$rc_failed" ; out=2 ;; # generic or unspecified error
2) vrt="$vrt$rc_failed" ; out=2 ;; # invalid or excess args
3) vrt="$vrt$rc_missed" ; out=2 ;; # unimplemented feature
4) vrt="$vrt$rc_failed" ; out=2 ;; # insufficient privilege
5) vrt="$vrt$rc_skipped"; out=2 ;; # program is not installed
6) vrt="$vrt$rc_unused" ; out=2 ;; # program is not configured
7) vrt="$vrt$rc_failed" ; out=2 ;; # program is not running
*) vrt="$vrt$rc_failed" ; out=2 ;; # unknown (maybe used in future)
esac
echo -$opt "$rc_save$vrt$rc_restore" 1>&$out
# reset _rc_status to 0 after verbose case
_rc_status=0 ;;
-r) rc_reset ;;
-s) echo -e "$rc_skipped" 1>&2 ; rc_failed 5 ;;
-u) echo -e "$rc_unused" 1>&2 ; rc_failed 6 ;;
*) echo "rc_status: Usage: [-v[] [-r]|-s|-u]" 1>&2 ; return 0 ;;
esac
done
return $_rc_status_ret
}
else
function rc_status ()
{
rc_check
_rc_status_ret=$_rc_status
local i
for i ; do
case "$i" in
-v|-v[1-9]|-v[1-9][0-9])
local vrt=""
local out=1
local opt="en"
test -n "${i#-v}" && vrt="$vrt${esc}[${i#-v}A" || opt="e"
case "$_rc_status" in
0) vrt="$vrt$rc_done" ; ;; # success
*) vrt="$vrt$rc_failed"; out=2 ;; # failed
esac
echo -$opt "$rc_save$vrt$rc_restore" 1>&$out
# reset _rc_status to 0 after verbose case
_rc_status=0 ;;
-r) rc_reset ;;
-s) echo -e "$rc_skipped" ; return 0 ;;
-u) echo -e "$rc_unused" ; return 0 ;;
*) echo "rc_status: Usage: [-v[] [-r]|-s|-u]" 1>&2 ; return 0 ;;
esac
done
return $_rc_status_ret
}
fi
function rc_failed ()
{
rc_reset
case "$1" in
[0-7]) _rc_status=$1 ;;
"") _rc_status=1
esac
rc_check
return $_rc_status
}
function rc_exit ()
{
exit $_rc_status_all
}
function rc_confirm()
{
local timeout="30"
local answer=""
local ret=0
case "$1" in
-t) timeout=$2; shift 2 ;;
esac
local message="$@, (Y)es/(N)o/(C)ontinue? [y] "
: ${REDIRECT:=/dev/tty}
while true ; do
read -t ${timeout} -n 1 -p "${message}" answer < $REDIRECT > $REDIRECT 2>&1
case "$answer" in
[yY]|"") ret=0; break ;;
[nN]) ret=1; break ;;
[cC]) ret=2; break ;;
*) echo; continue
esac
done
echo
return $ret
}
function rc_active ()
{
local x
for x in /etc/init.d/*.d/S[0-9][0-9]${1} ; do
test -e $x || break
return 0
done
return 1
}
function rc_splash()
{
return 0
}
Lokale globale /etc/bash.bashrc.local
# bash completion for the `wp` command
_wp_complete() {
local OLD_IFS="$IFS"
local cur=${COMP_WORDS[COMP_CWORD]}
IFS=$'\n'; # want to preserve spaces at the end
local opts="$(wp cli completions --line="$COMP_LINE" --point="$COMP_POINT")"
if [[ "$opts" =~ \<file\>\s* ]]
then
COMPREPLY=( $(compgen -f -- $cur) )
elif [[ $opts = "" ]]
then
COMPREPLY=( $(compgen -f -- $cur) )
else
COMPREPLY=( ${opts[*]} )
fi
IFS="$OLD_IFS"
return 0
}
complete -o nospace -F _wp_complete wp
if [ "$PS1" ]; then
PS1='\u@\h:\w\$ '
ROTH="\[\033[1;31m\]"
ROT="\[\033[0;31m\]"
GRUEN="\[\033[0;32m\]"
BLAU="\[\033[0;34m\]"
YELLOW="\[\033[1;33m\]"
NOCOLOR="\[\033[0m\]"
[ "$UID" == "0" ] && USRCLR="$ROT\\u$NOCOLOR" || USRCLR="\\u"
PS1="${ROTH}Remote: $GRUEN\$(date +%H:%M:%S)h$ROTH@$YELLOW\h ${BLAU}[$GRUEN$USRCLR$ROTH@$ROTH${debian_chroot:+($debian_chroot)}$GRUEN\w${BLAU}]\n$NOCOLOR#"
shopt -s checkwinsize
# You may uncomment the following lines if you want `ls' to be colorized:
export LS_OPTIONS='--color=auto'
eval "`dircolors`"
alias ls='ls $LS_OPTIONS'
alias ll='ls $LS_OPTIONS -l'
alias l='ls $LS_OPTIONS -lA'
alias mce='mcedit'
if [ -f /etc/bash_completion ]; then
. /etc/bash_completion
fi
if [ -f /etc/bash.aliases ]; then
. /etc/bash.aliases
fi
export LANG=de_DE.UTF-8
. /etc/rc.status
echo -en "\033[1m"
echo "${done}"
echo "UPTIME / SYSINFOS :"
echo ""
echo "___________________________________________________________________________"
echo ""
uptime
echo -en "\033[1m"
echo "___________________________________________________________________________"
echo "${norm}"
/usr/local/bin/checkspace
echo ""
echo -en "\033[1m"
fi
if [ $UID != 0 ]
then
echo ""
echo -en "\033[1m"
echo -en "\033[34m"
cal
echo -en "${done}"
else
echo -en "\033[0;31m"
echo "_________________________"
echo "| |"
echo "| root weiss was er tut! |"
echo "|________________________|"
echo ""
fi
export HISTTIMEFORMAT='%F %T '
export HISTSIZE=150000
export HISTSIZEFILE=150000
export LANG=de_DE.UTF-8
export LANGUAGE=
export LC_CTYPE="de_DE.UTF-8"
export LC_NUMERIC="de_DE.UTF-8"
export LC_TIME="de_DE.UTF-8"
export LC_COLLATE="de_DE.UTF-8"
export LC_MONETARY="de_DE.UTF-8"
export LC_MESSAGES="de_DE.UTF-8"
export LC_PAPER="de_DE.UTF-8"
export LC_NAME="de_DE.UTF-8"
export LC_ADDRESS="de_DE.UTF-8"
export LC_TELEPHONE="de_DE.UTF-8"
export LC_MEASUREMENT="de_DE.UTF-8"
export LC_IDENTIFICATION="de_DE.UTF-8"
export LC_ALL="de_DE.UTF-8"
Anpassung locale:
sed -i "s/# de_DE.UTF-8 UTF-8/de_DE.UTF-8 UTF-8/" /etc/locale.gen
locale-gen
Falls beim SSH login folgende Meldung auftaucht:
Offending key for IP in /home/${user}/.ssh/known_hosts:220
oder
Offending key for IP in /etc/ssh/ssh_known_hosts:8
ist folgendes zu tun:
sed -i '220d' ~/.ssh/known_host
bzw.
sed -i '8d' /etc/ssh/ssh_known_hosts
Notwendige Kernel-Module Client, ib_isert würde nicht benötigt, wird aber dennoch mit geladen:
mlx4_en 118784 0
mlx4_ib 196608 0
ib_uverbs 126976 32 mlx4_ib,rdma_ucm
ib_core 311296 13 rdma_cm,ib_ipoib,rpcrdma,mlx4_ib,ib_srp,iw_cm,ib_iser,ib_umad,ib_isert,rdma_ucm,ib_uverbs,ib_cm
mlx4_core 307200 2 mlx4_ib,mlx4_en
Notwendige Kernel-Module Server:
mlx4_en 118784 0
mlx4_ib 196608 0
ib_uverbs 126976 32 mlx4_ib,rdma_ucm
ib_core 311296 13 rdma_cm,ib_ipoib,rpcrdma,mlx4_ib,ib_srp,iw_cm,ib_iser,ib_umad,ib_isert,rdma_ucm,ib_uverbs,ib_cm
mlx4_core 307200 2 mlx4_ib,mlx4_en
Interfaces /etc/network/interfaces, je Mellanox Infiniband ein eigenes Subnetz:
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!
auto lo
iface lo inet loopback
iface eno3 inet manual
iface eno1 inet manual
iface eno4 inet manual
iface eno2 inet manual
auto bond0
iface bond0 inet manual
bond-slaves eno1 eno2
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
#SFP+ Bond
auto vmbr1
iface vmbr1 inet static
address 10.40.2.100/24
bridge-ports eno3
bridge-stp off
bridge-fd 0
# gateway 10.40.2.1
#Proxmox INFRASTRUCTURE/FAILOVER1
auto vmbr0
iface vmbr0 inet static
address 10.40.0.100/24
bridge-ports bond0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
#Bonding Bridge VMs
auto vmbr1:0
iface vmbr1:0 inet static
address 192.168.0.100/24
gateway 192.168.0.1
#Temporary IP alias
auto vmbr2
iface vmbr2 inet static
address 10.40.7.100/24
bridge-ports eno4
bridge-stp off
bridge-fd 0
#Proxmox FAILOVER2
auto ibp3s0
iface ibp3s0 inet static
address 10.40.10.100/24
pre-up echo connected > /sys/class/net/ibp3s0/mode
post-up /usr/sbin/ip link set ibp3s0 mtu 4092
post-up echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore
post-up echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce
post-up echo 1 > /proc/sys/net/ipv4/conf/ibp3s0/arp_ignore
auto ibp4s0
iface ibp4s0 inet static
address 10.40.20.100/24
pre-up echo connected > /sys/class/net/ibp4s0/mode
post-up /usr/sbin/ip link set ibp4s0 mtu 4092
post-up echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore
post-up echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce
post-up echo 1 > /proc/sys/net/ipv4/conf/ibp4s0/arp_ignore
auto ibp3s0d1
iface ibp3s0d1 inet static
address 10.40.30.100/24
pre-up echo connected > /sys/class/net/ibp3s0d1/mode
post-up /usr/sbin/ip link set ibp3s0d1 mtu 4092
post-up echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore
post-up echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce
post-up echo 1 > /proc/sys/net/ipv4/conf/ibp3s0d1/arp_ignore
auto ibp4s0d1
iface ibp4s0d1 inet static
address 10.40.40.100/24
pre-up echo connected > /sys/class/net/ibp4s0d1/mode
post-up /usr/sbin/ip link set ibp4s0d1 mtu 4092
post-up echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore
post-up echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce
post-up echo 1 > /proc/sys/net/ipv4/conf/ibp4s0d1/arp_ignore
Einträge in /etc/hosts aller Cluster-Knoten, um unnötige DNS-Abfragen zu vermeiden:
127.0.0.1 localhost.localdomain localhost
10.40.2.100 pvecn0.pawott.de pvecn0
10.40.2.101 pvecn1.pawott.de pvecn1
10.40.2.102 pvecn2.pawott.de pvecn2
10.40.2.103 pvecn3.pawott.de pvecn3
10.40.2.104 pvecn4.pawott.de pvecn4
10.40.2.105 pvecn5.pawott.de pvecn5
10.40.2.106 pvecn6.pawott.de pvecn6
10.40.10.100 iscsi-portal.pawott.de iscsi-portal
10.40.20.100 iscsi-portal.pawott.de iscsi-portal
#10.40.30.100 iscsi-portal1.pawott.de iscsi-portal1
#10.40.20.100 iscsi-portal2.pawott.de iscsi-portal2
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
Anlegen der /etc/modules-load.d/mellanox.conf zum Laden der Module:
mlx4_core
mlx4_ib
mlx4_en
ib_core
#ib_addr
#ib_sa
ib_cm
rdma_cm
rdma_ucm
ib_iser
#ib_isert
## IP over Infiniband
ib_ipoib
ib_umad
#ib_mad
rpcrdma
ib_srp
ib_isert
iw_cm
ib_uverbs
svcrdma
xprtrdma
Aktivierung von iSCSI über RDMA in /etc/rdma/modules/rdma.conf
# These modules are loaded by the system if any RDMA devices is installed
# iSCSI over RDMA client support
ib_iser
# iSCSI over RDMA target support
ib_isert
# User access to RDMA verbs (supports libibverbs)
ib_uverbs
# User access to RDMA connection management (supports librdmacm)
rdma_ucm
# RDS over RDMA support
# rds_rdma
# NFS over RDMA client support
xprtrdma
# NFS over RDMA server support
svcrdma
Port Typ Konfiguration:
for i in $(lspci | grep Mellanox | awk '{print $1}'); { echo "0000:$i ib ib" >>/etc/rdma/mlx4.conf; }
Deaktivierung von i40iw Devices in /etc/modprobe.d/mlx4.conf:
echo "blacklist i40iw" >> /etc/modprobe.d/mlx4.conf
Aktivieren der Exports des NFS-Server unter /etc/exports:
# /etc/exports: the access control list for filesystems which may be exported
# to NFS clients. See exports(5).
#
# Example for NFSv2 and NFSv3:
# /srv/homes hostname1(rw,sync,no_subtree_check) hostname2(ro,sync,no_subtree_check)
#
# Example for NFSv4:
# /srv/nfs4 gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check)
# /srv/nfs4/homes gss/krb5i(rw,sync,no_subtree_check)
#
/hddpool/NFS/ISO-Images 10.40.2.0/24(rw,no_root_squash,no_all_squash,sync,no_subtree_check)
/hddpool/NFS/ISO-Images 10.40.2.50(rw,no_root_squash,no_all_squash,sync,no_subtree_check)
/hddpool/NFS/ISO-Images 10.40.10.0/24(rw,no_root_squash,no_all_squash,sync,no_subtree_check)
/hddpool/NFS/ISO-Images 10.40.20.0/24(rw,no_root_squash,no_all_squash,sync,no_subtree_check)
/hddpool/NFS/ISO-Images 10.40.30.0/24(rw,no_root_squash,no_all_squash,sync,no_subtree_check)
/hddpool/NFS/ISO-Images 10.40.40.0/24(rw,no_root_squash,no_all_squash,sync,no_subtree_check)
/hddpool/NFS/Snapshots 10.40.2.0/24(rw,no_root_squash,no_all_squash,sync,no_subtree_check)
/hddpool/NFS/Snapshots 10.40.2.50(rw,no_root_squash,no_all_squash,sync,no_subtree_check)
/hddpool/NFS/Snapshots 10.40.10.0/24(rw,no_root_squash,no_all_squash,sync,no_subtree_check)
/hddpool/NFS/Snapshots 10.40.20.0/24(rw,no_root_squash,no_all_squash,sync,no_subtree_check)
/hddpool/NFS/Snapshots 10.40.30.0/24(rw,no_root_squash,no_all_squash,sync,no_subtree_check)
/hddpool/NFS/Snapshots 10.40.40.0/24(rw,no_root_squash,no_all_squash,sync,no_subtree_check)