加入收藏 | 设为首页 | 会员中心 | 我要投稿 云计算网_泰州站长网 (http://www.0523zz.com/)- 视觉智能、AI应用、CDN、行业物联网、智能数字人!
当前位置: 首页 > 服务器 > 搭建环境 > Linux > 正文

linux – mount.ocfs2:安装时没有连接传输端点……?

发布时间:2021-05-24 22:23:03 所属栏目:Linux 来源:网络整理
导读:副标题#e# 我用OCFS2替换了在双主模式下运行的死节点.所有步骤都有效: 的/ proc / DRBD version: 8.3.13 (api:88/proto:86-96)GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by mockbuild@builder10.centos.org,2012-05-07 11:56:36 1: cs:Co
副标题[/!--empirenews.page--]

我用OCFS2替换了在双主模式下运行的死节点.所有步骤都有效:

的/ proc / DRBD

version: 8.3.13 (api:88/proto:86-96)
GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by mockbuild@builder10.centos.org,2012-05-07 11:56:36

 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:81 nr:407832 dw:106657970 dr:266340 al:179 bm:6551 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

直到我尝试装入卷:

mount -t ocfs2 /dev/drbd1 /data/webroot/
mount.ocfs2: Transport endpoint is not connected while mounting /dev/drbd1 on /data/webroot/. Check 'dmesg' for more information on this error.

/var/log/kern.log

kernel: (o2net,11427,1):o2net_connect_expired:1664 ERROR: no connection established with node 0 after 30.0 seconds,giving up and returning errors.
kernel: (mount.ocfs2,12037,1):dlm_request_join:1036 ERROR: status = -107
kernel: (mount.ocfs2,1):dlm_try_to_join_domain:1210 ERROR: status = -107
kernel: (mount.ocfs2,1):dlm_join_domain:1488 ERROR: status = -107
kernel: (mount.ocfs2,1):dlm_register_domain:1754 ERROR: status = -107
kernel: (mount.ocfs2,1):ocfs2_dlm_init:2808 ERROR: status = -107
kernel: (mount.ocfs2,1):ocfs2_mount_volume:1447 ERROR: status = -107
kernel: ocfs2: Unmounting device (147,1) on (node 1)

以下是节点0(192.168.3.145)上的内核日志:

kernel: : (swapper,7):o2net_listen_data_ready:1894 bytes: 0
kernel: : (o2net,4024,3):o2net_accept_one:1800 attempt to connect from unknown node at 192.168.2.93
:43868
kernel: : (o2net,3):o2net_connect_expired:1664 ERROR: no connection established with node 1 after 30.0 seconds,giving up and returning errors.
kernel: : (o2net,3):o2net_set_nn_state:478 node 1 sc: 0000000000000000 -> 0000000000000000,valid 0 -> 0,err 0 -> -107

我确定两个节点上的/etc/ocfs2/cluster.conf是相同的:

/etc/ocfs2/cluster.conf

node:
    ip_port = 7777
    ip_address = 192.168.3.145
    number = 0
    name = SVR233NTC-3145.localdomain
    cluster = cpc

node:
    ip_port = 7777
    ip_address = 192.168.2.93
    number = 1
    name = SVR022-293.localdomain
    cluster = cpc

cluster:
    node_count = 2
    name = cpc

他们连接得很好:

# nc -z 192.168.3.145 7777
Connection to 192.168.3.145 7777 port [tcp/cbt] succeeded!

但O2CB心跳在新节点上不活动(192.168.2.93):

/etc/init.d/o2cb状态

Driver for "configfs": Loaded
Filesystem "configfs": Mounted
Driver for "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster cpc: Online
Heartbeat dead threshold = 31
  Network idle timeout: 30000
  Network keepalive delay: 2000
  Network reconnect delay: 2000
Checking O2CB heartbeat: Not active

以下是在节点1上运行tcpdump同时在节点1上启动ocfs2时的结果:

1   0.000000 192.168.2.93 -> 192.168.3.145 TCP 70 55274 > cbt [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSval=690432180 TSecr=0
  2   0.000008 192.168.3.145 -> 192.168.2.93 TCP 70 cbt > 55274 [SYN,ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 TSval=707657223 TSecr=690432180
  3   0.000223 192.168.2.93 -> 192.168.3.145 TCP 66 55274 > cbt [ACK] Seq=1 Ack=1 Win=5840 Len=0 TSval=690432181 TSecr=707657223
  4   0.000286 192.168.2.93 -> 192.168.3.145 TCP 98 55274 > cbt [PSH,ACK] Seq=1 Ack=1 Win=5840 Len=32 TSval=690432181 TSecr=707657223
  5   0.000292 192.168.3.145 -> 192.168.2.93 TCP 66 cbt > 55274 [ACK] Seq=1 Ack=33 Win=5792 Len=0 TSval=707657223 TSecr=690432181
  6   0.000324 192.168.3.145 -> 192.168.2.93 TCP 66 cbt > 55274 [RST,ACK] Seq=1 Ack=33 Win=5792 Len=0 TSval=707657223 TSecr=690432181

每6个数??据包发送一次RST标志.

我还可以做些什么来调试这个案例?

PS:

节点0上的OCFS2版本:

> ocfs2-tools-1.4.4-1.el5
> ocfs2-2.6.18-274.12.1.el5-1.4.7-1.el5

节点1上的OCFS2版本:

> ocfs2-tools-1.4.4-1.el5
> ocfs2-2.6.18-308.el5-1.4.7-1.el5

更新1 – Sun Dec 23 18:15:07 ICT 2012

Are both nodes on the same lan segment? No routers etc.?

不,它们是不同子网上的2个VMWare服务器.

Oh,while I remember – hostnames/DNS all setup and working correctly?

当然,我在/ etc / hosts中添加了每个节点的主机名和IP地址:

192.168.2.93    SVR022-293.localdomain
192.168.3.145   SVR233NTC-3145.localdomain

并且他们可以通过主机名相互连接:

# nc -z SVR022-293.localdomain 7777
Connection to SVR022-293.localdomain 7777 port [tcp/cbt] succeeded!

# nc -z SVR233NTC-3145.localdomain 7777
Connection to SVR233NTC-3145.localdomain 7777 port [tcp/cbt] succeeded!

更新2 – 星期一12月24日18:32:15 ICT 2012

找到了线索:我的同事在群集运行时手动编辑了/etc/ocfs2/cluster.conf文件.因此,它仍然将死节点信息保存在/ sys / kernel / config / cluster /中:

# ls -l /sys/kernel/config/cluster/cpc/node/
total 0
drwxr-xr-x 2 root root 0 Dec 24 18:21 SVR150-4107.localdomain
drwxr-xr-x 2 root root 0 Dec 24 18:21 SVR233NTC-3145.localdomain

(在这种情况下为SVR150-4107.localdomain)

我要停止集群删除死节点但是出现以下错误:

# /etc/init.d/o2cb stop
Stopping O2CB cluster cpc: Failed
Unable to stop cluster as heartbeat region still active

我确定ocfs2服务已经停止:

# mounted.ocfs2 -f
Device                FS     Nodes
/dev/sdb              ocfs2  Not mounted
/dev/drbd1            ocfs2  Not mounted

没有参考了:

# ocfs2_hb_ctl -I -u 12963EAF4E16484DB81ECB0251177C26
12963EAF4E16484DB81ECB0251177C26: 0 refs

(编辑:云计算网_泰州站长网)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

热点阅读