network troubleshooting
1.8 Given a scenario, implement the following network troubleshooting methodology
Generally speaking, troubleshooting is a process of isolation. The best troubleshooters will try to determine what still works and how far it continues to work and then determine exactly where it breaks down. The more you know about the interworking of a network, the better you will be able to determine the weakness or the problem based on its symptoms. There are many different troubleshooting methodologies, but all share the same basic steps. In the following sections, I'll discuss the steps involved in troubleshooting a network.
To facilitate this discussion, I will also present a scenario to which you will apply your troubleshooting methodology. For this scenario, I will use technical information and terminology that I have previously discussed. Specifically, say you have a user who is complaining that she cannot access any intranet or Internet resources. Now, let's apply a troubleshooting methodology to this scenario.
Identify the problem
You go to the user's computer and verify that the user cannot connect to the Internet or any of the network resources. You will now troubleshoot the issue by gathering information and identifying the symptoms of the problem. In addition, you will question users and determine whether anything has changed that could have caused this issue.
Information gathering
You first check to make sure that the network cable is plugged in and that the link light on the network interface card is lit. If it is lit, this indicates that a connection is present. If it is blinking, then it's not only connected but traffic is being passed through the connection. You should always verify the physical connections first, because often that's all you will have to fix—but unfortunately not this time. In this case, you verify that the cable is plugged in and that the link light is lit, so you move on to the next step.
Identify symptoms
By typing ipconfig /all, you determine that the computer is set to obtain an address from a DHCP server but the address is 169.254.2.1. Your first thought is, “Hmmm, that looks like an APIPA address, doesn't it?” I wonder how far this problem goes?
Question users
Next you decide to ask around and see whether others are having the same problem. Some users report that they have a connection that seems to be working fine, while others have now lost their connection as well. You go to one of the other computers that recently lost its connection and type ipconfig, only to see that it also has an APIPA address of 169.254.5.67. You wonder what's causing this to happen.
Establish a theory of probable cause
You reason that if those servers that were down were DHCP servers, then they may not have been available at the time when the users' computers were trying to renew their leases. In that case, the computers would end up with an APIPA address. You decide that this is the most probable cause, but you aren't done yet.
Question the obvious
You are now convinced that the DHCP servers are to blame and that it will be determined that they were not put back online. You run a quick check from one of the affected computers by typing ipconfig /renew, but it is not able to renew its address, further confirming your suspicion.
Test the theory to determine cause
You haven't heard from the other administrator, yet your users are still down. You decide that someone may have “dropped the ball.” You need to get those DHCP servers up and running, or you need to assign static addresses to those clients for now. In either case, this will require that you escalate the issue and get some results.
Once theory is confirmed, determine next steps to resolve problem
You decide that you will call the senior network administrator and enlist her support to help get the problem fixed. She has the contacts and authority that you do not have. In this way, you will escalate the issue and probably get some fast results.
If theory is not confirmed, re-establish new theory or escalate
If your theory had not proved correct, then you would have needed to go back to “square one” by establishing a new theory and questioning the obvious once again. In this case, you have correctly identified the real problem. Now, it's just a matter of getting it fixed.
Establish a plan of action to resolve the problem and identify potential effects
In this case, if you can just get those DHCP servers back online, then your problem should take care of itself. The operating systems on your client computers should recognize the DHCP servers as soon as they come back online, and they should obtain an address that will allow them a connection to the Internet and the intranet resources. You will verify this with the users who have been affected by taking a quick look at things once the servers are back online.
Implement the solution or escalate as necessary
You call the senior network administrator and tell her your situation. She checks into it and finds that the DHCP servers are not online as they should be. The team was falling behind and did not fully understand the urgency of the situation. The server technicians humbly apologize for their mistake and get the DHCP servers back online.
Verify full system functionality and if applicable implement preventive measures
After you verify with the users that the problem is resolved, you contact the senior network administrator and thank her for her assistance. She asks you to send her an email relating the entire situation and how it all transpired since this morning. Based on what you say about the situation, the senior network administrator will recommend additional training for the server administrators regarding the principles of DHCP, the frequency of lease renewals in your organization, and the effects of not having a DHCP server available to the client computers when needed.
Document findings, actions, and outcomes
You sit down and write that email to the senior network administrator. You focus it not on who was at fault (everyone makes mistakes) but on documenting the actions and/or inactions in regard to the DHCP server and how quickly it led to problems for users in your organization. You also document how it was eventually resolved and the final test that you performed to ensure that all was well again. In addition, you have a server issue log that you will need to update to make this issue easier to identify for another administrator. Now you are ready to move on to your next challenge!