Post by AdrenalinThat's strange, after recompiling the lastest 8_0 that contain the patch (
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/rpc/clnt_vc.c.diff?r1=1.8.2.2.2.1;r2=1.8.2.2.2.2)
after 5 days it stuck again with same symptoms, I've also got some in the
FreeBSD .. 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: Tue Mar 16 22:56:51 EET
When attaching the debugger for an rpccon process, It stuck in here
#0 0x000000080124051c in stat () from /lib/libc.so.7
http://img705.imageshack.us/img705/741/10032219218.png
Can I do the online debug of the kernel, or how can I can help you to solve
the problem ?
Well, sleeping in "rpccon" means that the TCP connect has failed after a
soconnect() call. If you can get into a kernel debugger, there is a
global structure with more error information in it.
It is called: rpc_createerr
- and it has 2 enums, followed by an int. The first enum should be 12
(RPC_SYSTEMERR), which is what gets it to tsleep(.."rpccon"..), the
second enum doesn't apply to this case and the int after them should
be the errno of the soconnect() failure. (The way the code is currently
written, it could either be an error return from soconnect() or a value
set in so_error after soconnect() returns, while it is in the process
of connecting.
So, if you can get to that 3rd field, the value there might help tell
why the TCP connect is failing. Otherwise, all I can suggest is poking
around and trying to figure out why TCP connects are failing.
- wedged network interface
- routing problem
- network infrastructure problem
...
(Btw, I was driven a little batty at UofG because the campus network
switch I was on would decide to inject TCP RSTs into new connection
attempts for some reason. I finally was able to determine this by
looking at packet traces on both client and server and see the RSTs
coming out of the network on the client end, but never sent on the
server end. It was some Cisco related parameter/issue that was never
resolved.)
Hopefully others with more TCP expertise can make suggestions w.r.t.
why the TCP connects are failing?
Good luck with it, rick