network troubleshooting

1.8 Given a scenario, implement the following network troubleshooting methodology

Generally speaking, troubleshooting is a process of isolation. The best troubleshooters will try to determine what still works and how far it continues to work and then determine exactly where it breaks down. The more you know about the interworking of a network, the better you will be able to determine the weakness or the problem based on its symptoms. There are many different troubleshooting methodologies, but all share the same basic steps. In the following sections, I'll discuss the steps involved in troubleshooting a network.
To facilitate this discussion, I will also present a scenario to which you will apply your troubleshooting methodology. For this scenario, I will use technical information and terminology that I have previously discussed. Specifically, say you have a user who is complaining that she cannot access any intranet or Internet resources. Now, let's apply a troubleshooting methodology to this scenario.

Identify the problem

You go to the user's computer and verify that the user cannot connect to the Internet or any of the network resources. You will now troubleshoot the issue by gathering information and identifying the symptoms of the problem. In addition, you will question users and determine whether anything has changed that could have caused this issue.

Information gathering

You first check to make sure that the network cable is plugged in and that the link light on the network interface card is lit. If it is lit, this indicates that a connection is present. If it is blinking, then it's not only connected but traffic is being passed through the connection. You should always verify the physical connections first, because often that's all you will have to fix—but unfortunately not this time. In this case, you verify that the cable is plugged in and that the link light is lit, so you move on to the next step.

Identify symptoms

By typing ipconfig /all, you determine that the computer is set to obtain an address from a DHCP server but the address is 169.254.2.1. Your first thought is, “Hmmm, that looks like an APIPA address, doesn't it?” I wonder how far this problem goes?

Question users

Next you decide to ask around and see whether others are having the same problem. Some users report that they have a connection that seems to be working fine, while others have now lost their connection as well. You go to one of the other computers that recently lost its connection and type ipconfig, only to see that it also has an APIPA address of 169.254.5.67. You wonder what's causing this to happen.

Determine if anything has changed

You ask another network administrator, who informs you that only a few servers were due for maintenance last night, but they were supposed to be put back online by this morning. He says he will check into it and get back to you. Could that be the problem?

Establish a theory of probable cause

You reason that if those servers that were down were DHCP servers, then they may not have been available at the time when the users' computers were trying to renew their leases. In that case, the computers would end up with an APIPA address. You decide that this is the most probable cause, but you aren't done yet.

Question the obvious

You are now convinced that the DHCP servers are to blame and that it will be determined that they were not put back online. You run a quick check from one of the affected computers by typing ipconfig /renew, but it is not able to renew its address, further confirming your suspicion.

Test the theory to determine cause

You haven't heard from the other administrator, yet your users are still down. You decide that someone may have “dropped the ball.” You need to get those DHCP servers up and running, or you need to assign static addresses to those clients for now. In either case, this will require that you escalate the issue and get some results.

Once theory is confirmed, determine next steps to resolve problem

You decide that you will call the senior network administrator and enlist her support to help get the problem fixed. She has the contacts and authority that you do not have. In this way, you will escalate the issue and probably get some fast results.

If theory is not confirmed, re-establish new theory or escalate

If your theory had not proved correct, then you would have needed to go back to “square one” by establishing a new theory and questioning the obvious once again. In this case, you have correctly identified the real problem. Now, it's just a matter of getting it fixed.

Establish a plan of action to resolve the problem and identify potential effects

In this case, if you can just get those DHCP servers back online, then your problem should take care of itself. The operating systems on your client computers should recognize the DHCP servers as soon as they come back online, and they should obtain an address that will allow them a connection to the Internet and the intranet resources. You will verify this with the users who have been affected by taking a quick look at things once the servers are back online.

Implement the solution or escalate as necessary

You call the senior network administrator and tell her your situation. She checks into it and finds that the DHCP servers are not online as they should be. The team was falling behind and did not fully understand the urgency of the situation. The server technicians humbly apologize for their mistake and get the DHCP servers back online.

Verify full system functionality and if applicable implement preventive measures

After you verify with the users that the problem is resolved, you contact the senior network administrator and thank her for her assistance. She asks you to send her an email relating the entire situation and how it all transpired since this morning. Based on what you say about the situation, the senior network administrator will recommend additional training for the server administrators regarding the principles of DHCP, the frequency of lease renewals in your organization, and the effects of not having a DHCP server available to the client computers when needed.

Document findings, actions, and outcomes

You sit down and write that email to the senior network administrator. You focus it not on who was at fault (everyone makes mistakes) but on documenting the actions and/or inactions in regard to the DHCP server and how quickly it led to problems for users in your organization. You also document how it was eventually resolved and the final test that you performed to ensure that all was well again. In addition, you have a server issue log that you will need to update to make this issue easier to identify for another administrator. Now you are ready to move on to your next challenge!

Popular posts from this blog

ch11 review silberschatz operating systems concepts essentials 2nd ed