What is a broken link?
Links are used for navigating between webpages. Users are directed to a web page when they click or type a link on a web browser. So a broken link indicates a link that is not working. In other words, it will not navigate the user properly to the requested web page. It happens due to several reasons such as server-side errors, the absence of webpages, typing errors of users.
When a user visits a broken link, they are notified with an error message. While Valid URLs give 2XX status codes, broken URLs give status codes that begin with 400 series, and 500 series .4XX status codes indicate client-side errors, and 5XX status codes indicate server response errors.
Reasons for broken links
Below are some reasons for broken links.
- 400 Bad request error: This error code is received because of the wrong URL address. So the server cannot process the link to get the requested web page.
- 404 Page Not Found error: The web page is not existing or removed by the owner.
- Sometimes the system firewall can restrict reaching some web sites.
- Users can insert the link incorrectly.
Why Should You Check Broken Links?
Having broken links on your website creates a bad experience for your users. It can seriously affect the reputation of your website. A website usually contains a large number of links. Manually testing each of these links is a time-consuming task. Therefore automating the Selenium Web Driver to check broken links is the best solution for this issue.
Using Selenium WebDriver to find broken links
Testing broken links can be done, as shown in the steps below. The below code is a sample code for a test carried out to https://www.google.co.uk, and relevant facts are discussed below.
package automationproject; import java.io.IOException; import java.net.MalformedURLException; import java.net.HttpURLConnection; import java.util.Iterator; import java.net.URL; import java.util.List; import org.openqa.selenium.*; import org.openqa.selenium.chrome.ChromeDriver; public class MyBrokenLinks { public static void main(String[] args) { System.setProperty("webdriver.ie.driver","C:\\Users\\tushar\\eclipse-workspace\\first test\\chromedriver.exe"); WebDriver mydriver = new ChromeDriver(); String myhomePage = "https://www.google.co.uk"; String myurl = ""; HttpURLConnection myhuc = null; int responseCode = 200; mydriver = new ChromeDriver(); mydriver.manage().window().maximize(); mydriver.get(myhomePage); List < WebElement > mylinks = mydriver.findElements(By.tagName("a")); Iterator < WebElement > myit = mylinks.iterator(); while (myit.hasNext()) { myurl = myit.next().getAttribute("href"); System.out.println(myurl); if (myurl == null || myurl.isEmpty()) { System.out.println("Empty URL or an Unconfigured URL"); continue; } if (!myurl.startsWith(myhomePage)) { System.out.println("This URL is from another domain"); continue; } try { myhuc = (HttpURLConnection)(new URL(myurl).openConnection()); myhuc.setRequestMethod("HEAD"); myhuc.connect(); responseCode = myhuc.getResponseCode(); if (responseCode >= 400) { System.out.println(myurl + " This link is broken"); } else { System.out.println(myurl + " This link is valid"); } } catch(MalformedURLException ex) { ex.printStackTrace(); } catch(IOException ex) { ex.printStackTrace(); } } mydriver.quit(); } }
Below are my test results.
Each link that is used in the codes of the web page can be found with the aid of the anchor tag‘<a>.’ The identified links are listed down
List<WebElement> mylinks = drive.findElements(By.tagName("a"));
Then an iterator is placed to move through the created list of links.
Iterator<WebElement> myit = mylinks.iterator();
Identification and Validation of URLs
This step is provided to check the URLs generated with a third party domain or to check it is empty or null. HREF of the anchor tag is stored in a variable called “URL,” and then it is checked as above.
myurl = myit.next().getAttribute("href");
For empty URLs, the below code is used.
if(myurl == null || myurl.isEmpty()){
System.out.println("Empty URL or an Unconfigured URL");
continue;
}
The following code is used to determine where the URL belongs to, whether it belongs to the created domain or it is obtained from a third-party provider.
if(!myurl.startsWith(homePage)){
System.out.println("This URL is from another domain");
continue;
}
HTTP Request Sending
Methods in the above, imported “HttpURLConnection” class allows you to send requests and capture responses from the HTTP response codes.
myhuc = (HttpURLConnection)(new URL(myurl).openConnection());
Here “HEAD” is set as request type without using “GET” to return only headers instead of the body of the document.
myhuc.setRequestMethod("HEAD");
When the connect method is invoked, the actual connection of the URL will be established.
myhuc.connect();
Validating Links
HTTP response should be obtained by the getResponseCode() method.
responseCode = huc.getResponseCode();
Broken links can be determined by the response code number, as mentioned above. Any code that is larger than or equal to 400 can be identified as broken links.
if(responseCode >= 400){
System.out.println(myurl+" This link is broken");
}
else{
System.out.println(myurl+" This link is valid");
}
Testing a broken link is a crucial function to make a good website with an excellent user experience. Users can identify malfunctioning links using Selenium Web Driver testing quickly. This is a tester-friendly version to create a better website.