So I recently built a new VM with RHEL 6.6 for a production Domino server. I mounted the data and log directories to a NetApp filer volume as I’d done in the past with RHEL 5.x, and went about my way.
A couple of days later I got some alerts about the server crashing. I ran a top and found that RPCIOD was using the entirety of my CPU’s, not only that but the filer was maxing and according to vSphere my VM was using 2gps network traffic.
The only way out of this mess was to hard reboot the box, as trying to kill these processes or unmounting didn’t work.
After some investigation and looking at the logs I came across these two bug reports – both closely match the problems I was having and it seems that kernel 2.6.32-504.16.2.el6 may have issues with NFSv4.
After changing fstab to specifically use NFSv3 rather than v4 the problem hasn’t reoccurred yet.
I could reproduce the error on a different VM, with a different mapped volume on the same filer.