Broken NFS server (or client?)

Wed Aug 8 21:50:37 PDT 2001

On Wed, 8 Aug 2001, Lincoln Peters wrote:
> >I assume the test client has a local HD since you have LILO being used for
> >the netboot testing. This is true?
> 
> No, LILO is installed on the boot floppy.  There is no hard disk.
> (although since it's a VMWare virtual machine, I can add one with little 
> difficulty)

Might be an option, but check other things for now. :-)

> >When you boot the client from the local disk, are you able to nfs mount
> >the exported volume and use it as you desire to use with a netbooted
> >system?
> 
> When I tried to mount the root filesystem on another client (it is supposed 
> to be mountable to any system that wants to mount it), I got a similar error 
> message.

Ahhhh. This is a good pointer. Lets have a look at the list of NFS
exports. Not sure if this is still going to be /etc/exports on your
NFS server, but it is a place to look. I really want to see what
rules/permissions you have for the export (who can look at it as well as 
ro?rw?) also could you show the ls -l of the volume and make sure that all
of the dirs leading up to the volume share point are at the very least +x
for everyone and just for testing rwxr-xr-x (755).

As the other poster wrote:
> From: Mark Street <jet at sonic.net>
> Do you have portmapper running?  Is it started before nfsutils?

which is another very good point in this same direction.

Also, related to the excellent point from Mark:
Do you have any firewall rules on the server that we should know about? 

Really, it gets to the direction of thought:
Can you do it with another non-NFSroot client? If you can't, then focus on
the server side. Once you can get a different disk based client to NFS
mount the export from the server, then we can look back at the client
side.

> >When you write about this error, are you saying that you can't even mount
> >your root filesystem via NFS from the client, or once you have the client
> >machine net-booted in Linux you get this error when you ry to start up the
> >vmware application?
> 
> I can't mount the root filesystem via NFS.  I don't think that VMWare is 
> causing any problems.

I agree with your conclusion based on the data we have so far. :-)

> >Does the client get a different IP address fron the network when it
> >netboots vs. when it is running from a local HD?
> 
> It does not have a local HD, but since each system that I run with VMWare 
> always seems to get the same IP address every time I run it (I'm not saying 
> that they all share a single IP address), I can (hopefully) safely assume 
> that the IP addresses are the same.

This would be another point for testing. Fire up your test-NFS client (the
one with a disk) and bind the same IP address you will use for the client
that will be netbooting and mounting its root via NFS. This helps to make
sure any IP based authentication or checks would otherwise pass when the
IP is moved to the real netbooting client.

> >Is the error logged from the server or the client or both?
> 
> The error can't be logged by the client since it has nowhere to log to; it 
> just gave me that "Error -101" message that I already described.  And I 
> cannot find any NFS logs on the server.

So, the client just hangs in the middle of the kernel loading giving some
grand message like 
"Kernel panic unable to mount /"
?

> >"The client IP address is in the lilo configuration" how? is this an
> >"append=" statement? Often those are passed to scripts that do the actual
> >work of setting up your interfaces. Is it just getting its IP and nfs root
> >server/volume from bootp/dhcp?
> 
> The appended lines are:
> root=/dev/nfs
> nfsroot=192.168.0.6:/

This concerns me greatly. Are you really exporting your root "/" file
system from your server over NFS? Usually, you create a second system tree
in a new location off of the root file system like /usr/export/system1 and
then have your nfsroot= entry on the client look like:
nfsroot=192.168.0.6:/usr/export/system1
and then have an entry in your /etc/exports on the server like:
/usr/export/system1	*.yourdomain.com(ro,insecure,all_squash)
or perhaps
/usr/export/system1	*.yourdomain.com(ro,all_squash)

and then make sure that either:
all of the ip addresses of client that will connect to NFS have valid
entires in the server's /etc/host file with FQDN (Fully Qualified Domain
Names) *or* a DNS server that will provide zone support for reverse lookup
of the IPs and make the *.yourdomain.com check for the incoming IP address
work.

(There are other ways, but this is the most common I think.)

One big issue with exporting your server's root over NFS is that even if
is is ro, you may not have all_squash and an evil user may be able to see
files they should not= from a client machine. Perhaps password files, etc.

> It is configured to get its network configuration from a DHCP server.  
> However, I did notice that there is no entry in the DHCP log that looks like 
> my test client.  The problem still appeared on the test client as an NFS 
> error, though.

This part confuses me:
> The problem still appeared on the test client as an NFS error, though.

So the problem and error appear on the screen of the test client, or the
symptoms appear on the test client? The Server does still log the error
message in a server log file too, eh?

Mark Street <jet at sonic.net> has some good thoughts on troubleshooting
too. I'd go through his suggestions and look at his questions as well.

-ME

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCS/CM$/IT$/LS$/S/O$ !d--(++) !s !a+++(-----) C++$(++++) U++++$(+$) P+$>+++ 
L+++$(++) E W+++$(+) N+ o K w+$>++>+++ O-@ M+$ V-$>- !PS !PE Y+ !PGP
t at -(++) 5+@ X@ R- tv- b++ DI+++ D+ G--@ e+>++>++++ h(++)>+ r*>? z?
------END GEEK CODE BLOCK------
decode: http://www.ebb.org/ungeek/ about: http://www.geekcode.com/geek.html
     Systems Department Operating Systems Analyst for the SSU Library