土豆运营团队称之为:穷人的劳斯莱斯。呵呵!我这里一直使用ZXTM,但是因为一些特殊的业务需要,新尝试了这种架构。我参考了土豆网站运维的文章,但是网上相关内容极少,并且含糊其词,所以写了本文。
1 这2款软件的功能以及和ZXTM,LVS等对比请参看土豆团队博文:http://blog.ops.tudou.com/wp/?p=188
2 安装前准备:
注:我也强调/etc/hosts文件内容的重要性,在安装前务必配置好想使用的IP和主机名,因为启动spread需要指定主机名,但是和土豆团队文章不同,我认为非根据`uname -n`,下面会提到。
3 安装spread:
我也选择了4.0.0版本,原因是最初使用4.1.0时出现了很多问题,但是欢迎大家去体验4.1.0版本,并留言给我
tar zxvf spread-src-4.0.0.tar.gz && cd spread-src-4.0.0 && . /configure&& make && make install
4 安装wackamole:
下载地址为http://www.cnds.jhu.edu/download/download_wackamole.cgi 需要输入一下信息 点击下载
注:这个过程中可能会出现三个问题:
1 Invalid configuration `x86_64-unknown-linux-gnu’: machine `x86_64-unknown’ not recognized
解决办法:需要将2个文件拷贝过来覆盖此目录下文件:
cp /usr /share /libtool /config.sub .
2 checking size of char… configure: error: cannot compute sizeof (char)
解决办法:将安装的spread的lib目录定义在LD_LIBRARY_PATH里面,我的是空,所以直接赋值:
#export LD_LIBRARY_PATH=/usr/local/lib(这个目录是默认安装的lib目录)
3 后话了,在开启wackamole时可能出现:Starting wackamole…/usr/local/sbin/wackamole: error while loading shared libraries: libspread.so: cannot open shared object file: No such file or directory [FAILED]
解决办法:这个可能是因为在安装spread后没有执行ldconfig,如果还是不行,可以locate出来lib文件的目录地址放在/etc/ld.so.conf中,再执行ldconfig
启动脚本大家可以参考一下土豆原博客,但是有写html码,并且spread的脚本有问题,启动和杀掉进程都有一些问题,不知道别人有没有这样的问题,但是我在最后会粘贴一下我改善过的脚本。
5 配置原理:
我的实验环境:
centos5.5
想要达到的实验目的:
对三个真实IP:192.168.9.160,192.168.9.161,192.168.9.162虚拟成三个虚拟IP(正常情况下每个真实IP使用一个虚拟IP):192.168.9.109,192.168.9.112,192.168.9.113
当出现故障时,虚拟IP自动“飘”到其他机器上。
1 配置spread:
他的spread.conf主要配置的是想要虚拟的组的设备上真实的IP和主机明的对应关系,以下是我的配置:
首先看一下我的host文件:
#vi /etc/hosts
192.168.9.160 test00.dongwm.com
192.168.9.161 test01.dongwm.com
192.168.9.162 test02.dongwm.com [ / cc ]
#vi spread.conf
[ cc lang= 'bash' width= "99%" height= "100%" ]DaemonGroup = spread
DaemonUser = spread
EventLogFile = /usr /local /etc /spreadlog_ %h.log
EventPriority = ERROR
Spread_Segment 192.168.255.255: 4803 {
test00.dongwm.com 192.168.9.160
test01.dongwm.com 192.168.9.161
test02.dongwm.com 192.168.9.162
} #这是一种广播方式,还有一种多播配置方式
注:每台机器都要开启此进程
2 配置wackamole
#vi wackamole.conf
然后其他节点的监听方式为: Spread = 4803 @server.dongwm.com
SpreadRetryInterval = 5s
Group = test #这个类似于分布式消息系统,当你参加到这个组,就可以监听所有人,此程序进入此模式的命令是spuser 其中 j表示参加,l表示离开,有兴趣的可以研究下
Control = /var /run /wack.it
Prefer None #这个就是提供一个优先选择的手段,我们这里的业务不需要,所以没有设置,设置方式参考官网的pdf文档
VirtualInterfaces {
{ eth0:192.168.9.109 / 32 }
{ eth0:192.168.9.112 / 32 }
{ eth0:192.168.9.113 / 32 }
} #这里就是想要虚拟的IP
Arp-Cache = 90s
Notify {
eth0:192.168.8.1 / 32 #这是你路由器的地址,很重要的
arp-cache
}
balance {
AcquisitionsPerRound = all
interval = 4s
}
mature = 5s
6 启动服务,查看日志:
1 创建spread用户,假如你设定了其他用户,这步略过
2 需要创建/var/run/spread/目录
启动spread
注:我也是每个机器都启动这个进程
6 启动服务,查看日志:
/etc/init.d/spread start
查看端口监听:
tcp 0 0 0.0.0.0: 4803 0.0.0.0: * LISTEN 18318 /spread
udp 0 0 0.0.0.0: 4803 0.0.0.0: * 18318 /spread
udp 0 0 0.0.0.0: 4804 0.0.0.0: * 18318 /spread
启动wacka mole:
查看日志:
会提示虚拟IP网卡启动了
当三台服务器都启动后:
执行ifconfig
会发现每个服务器上飘了一个VIP:
eth0 Link encap:Ethernet HWaddr 00: 50: 56: 91:00:1B
inet addr:192.168.9.162 Bcast:192.168.255.255 Mask:255.255.0.0
inet6 addr: fe80:: 250:56ff:fe91:1b / 64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU: 1500 Metric: 1
RX packets: 1054714 errors: 0 dropped: 0 overruns: 0 frame: 0
TX packets: 356497 errors: 0 dropped: 0 overruns: 0 carrier: 0
collisions: 0 txqueuelen: 1000
RX bytes: 123799512 ( 118.0 MiB ) TX bytes: 94783259 ( 90.3 MiB )
eth0: 1 Link encap:Ethernet HWaddr 00: 50: 56: 91:00:1B
inet addr:192.168.9.112 Bcast:192.168.9.112 Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU: 1500 Metric: 1
root@test00:~$ ifconfig
inet addr:192.168.9.160 Bcast:192.168.255.255 Mask:255.255.0.0
inet6 addr: fe80:: 250:56ff:fe91: 13 / 64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU: 1500 Metric: 1
RX packets: 966721 errors: 0 dropped: 0 overruns: 0 frame: 0
TX packets: 343226 errors: 0 dropped: 0 overruns: 0 carrier: 0
collisions: 0 txqueuelen: 1000
RX bytes: 182954295 ( 174.4 MiB ) TX bytes: 67258649 ( 64.1 MiB )
eth0: 3 Link encap:Ethernet HWaddr 00: 50: 56: 91:00: 13
inet addr:192.168.9.113 Bcast:192.168.9.113 Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU: 1500 Metric: 1
eth0 Link encap:Ethernet HWaddr 00: 50: 56: 91:00: 15
inet addr:192.168.9.161 Bcast:192.168.255.255 Mask:255.255.0.0
inet6 addr: fe80:: 250:56ff:fe91: 15 / 64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU: 1500 Metric: 1
RX packets: 869456 errors: 0 dropped: 0 overruns: 0 frame: 0
TX packets: 162884 errors: 0 dropped: 0 overruns: 0 carrier: 0
collisions: 0 txqueuelen: 1000
RX bytes: 161753343 ( 154.2 MiB ) TX bytes: 40910624 ( 39.0 MiB )
eth0: 1 Link encap:Ethernet HWaddr 00: 50: 56: 91:00: 15
inet addr:192.168.9.109 Bcast:192.168.9.109 Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU: 1500 Metric: 1
7 实验()
原理:我自己写了脚本,去检测本机的一些进程和服务是否异常。假如异常,就执行脚本命令,停止这个机器上的wackamole进程;当进程和服务恢复,我又执行脚本命令,开启wackamole进程
这里模拟出现异常,脚本杀掉进程:
Stopping wackamole... [确定 ]
执行ifconfig:
eth0 Link encap:Ethernet HWaddr 00: 50: 56: 91:00:1B
inet addr:192.168.9.162 Bcast:192.168.255.255 Mask:255.255.0.0
inet6 addr: fe80:: 250:56ff:fe91:1b / 64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU: 1500 Metric: 1
RX packets: 1059629 errors: 0 dropped: 0 overruns: 0 frame: 0
TX packets: 359237 errors: 0 dropped: 0 overruns: 0 carrier: 0
collisions: 0 txqueuelen: 1000
RX bytes: 124517323 ( 118.7 MiB ) TX bytes: 95293463 ( 90.8 MiB )
执行其他2台服务器,发现:
eth0 Link encap:Ethernet HWaddr 00: 50: 56: 91:00: 13
inet addr:192.168.9.160 Bcast:192.168.255.255 Mask:255.255.0.0
inet6 addr: fe80:: 250:56ff:fe91: 13 / 64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU: 1500 Metric: 1
RX packets: 972018 errors: 0 dropped: 0 overruns: 0 frame: 0
TX packets: 345715 errors: 0 dropped: 0 overruns: 0 carrier: 0
collisions: 0 txqueuelen: 1000
RX bytes: 184164965 ( 175.6 MiB ) TX bytes: 67683923 ( 64.5 MiB )
eth0: 1 Link encap:Ethernet HWaddr 00: 50: 56: 91:00: 13
inet addr:192.168.9.112 Bcast:192.168.9.112 Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU: 1500 Metric: 1
eth0: 3 Link encap:Ethernet HWaddr 00: 50: 56: 91:00: 13
inet addr:192.168.9.113 Bcast:192.168.9.113 Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU: 1500 Metric: 1
^.^ 成功了,飘过来了
再看一下HA延迟,刚才我一直在另外一个服务器上执行 ping 192.168.9.112
64 bytes from 192.168.9.112: icmp_seq= 145 ttl= 64 time= 0.333 ms
64 bytes from 192.168.9.112: icmp_seq= 146 ttl= 64 time= 0.414 ms
64 bytes from 192.168.9.112: icmp_seq= 147 ttl= 64 time= 0.346 ms
64 bytes from 192.168.9.112: icmp_seq= 148 ttl= 64 time= 0.373 ms
64 bytes from 192.168.9.112: icmp_seq= 149 ttl= 64 time= 0.333 ms
64 bytes from 192.168.9.112: icmp_seq= 150 ttl= 64 time= 0.313 ms
64 bytes from 192.168.9.112: icmp_seq= 151 ttl= 64 time= 0.323 ms
64 bytes from 192.168.9.112: icmp_seq= 152 ttl= 64 time= 0.324 ms
64 bytes from 192.168.9.112: icmp_seq= 153 ttl= 64 time= 0.432 ms
64 bytes from 192.168.9.112: icmp_seq= 154 ttl= 64 time= 0.510 ms
64 bytes from 192.168.9.112: icmp_seq= 155 ttl= 64 time= 0.348 ms
64 bytes from 192.168.9.112: icmp_seq= 156 ttl= 64 time= 0.303 ms
64 bytes from 192.168.9.112: icmp_seq= 157 ttl= 64 time= 0.383 ms
64 bytes from 192.168.9.112: icmp_seq= 158 ttl= 64 time= 0.365 ms
看,没有停顿!
注:我们可以使用 spmonitor命令,进去选择0,查看个节点情况
8 发布我改善后的启动脚本(尊重原创,我这里只是修改):
1 spread:
#
# spread This starts and stops spread
#
# chkconfig: 345 90 10
# description: This starts the spread daemon
#
# processname: spread
# config: /etc/spread.conf
# pidfile:/var/run/spread.pid
DAEMON= /usr /sbin /spread
CONFIG= /etc /spread.conf
LOG= /your /path /spread.log
HOST= ` uname -n `
NAME= "spread"
RETVAL= 0
#Source function library.
. /etc /rc.d /init.d /functions
start ( ) {
echo -n "Starting $NAME..."
daemon $ ( $DAEMON 2 >& 1 > $LOG & )
RETVAL= $?
[ "$RETVAL" = 0 ] && touch /var /lock /subsys / $NAME
echo
}
stop ( ) {
echo -n "Stopping $NAME..."
killproc $DAEMON
[ "$RETVAL" = 0 ] && rm -f /var /lock /subsys / $NAME
echo
}
case "$1" in
start )
start
;;
stop )
stop
;;
restart )
stop
start
;;
status )
status $NAME
RETVAL= $?
;;
* )
echo $ "Usage: $0 {start|stop|restart|status}"
RETVAL= 1
esac
exit $RETVAL
2 wackamole
#
# wackamole This starts and stops wackamole
#
# chkconfig: 345 95 05
# description: This starts the wackamole daemon
#
# requires: spread
# processname: wackamole
# config: /etc/wackamole.conf
# pidfile:/var/run/wackamole.pid
DAEMON= /usr /sbin /wackamole
CONFIG= /etc /wackamole.conf
NAME= "wackamole"
RETVAL= 0
#Source function library.
. /etc /rc.d /init.d /functions
start ( ) {
echo -n "Starting $NAME..."
daemon $DAEMON -c $CONFIG
RETVAL= $?
[ "$RETVAL" = 0 ] && touch /var /lock /subsys / $NAME
echo
}
stop ( ) {
echo -n "Stopping $NAME..."
killproc $DAEMON
[ "$RETVAL" = 0 ] & & rm -f /var /lock /subsys / $NAME
echo
}
case "$1" in
start )
start
;;
stop )
stop
;;
restart )
stop
start
;;
status )
status $NAME
RETVAL= $?
;;
* )
cho $ "Usage: $0 {start|stop|restart|status}"
RETVAL= 1
esac
exit $RETVAL