Hackerrank Tag Content Extractor Solution
In a tag-based language like XML or HTML, contents are enclosed between a start tag and an end tag like <tag>contents</tag>
. Note that the corresponding end tag starts with a /
.
Given a string of text in a tag-based language, parse this text and retrieve the contents enclosed within sequences of well-organized tags meeting the following criterion:
- The name of the start and end tags must be same. The HTML code
<h1>Hello World</h2>
is not valid, because the text starts with anh1
tag and ends with a non-matchingh2
tag. - Tags can be nested, but content between nested tags is considered not valid. For example, in
<h1><a>contents</a>invalid</h1>
,contents
is valid butinvalid
is not valid. - Tags can consist of any printable characters.
Input Format
The first line of input contains a single integer, (the number of lines).
The subsequent lines each contain a line of text.
Constraints
- Each line contains a maximum of printable characters.
- The total number of characters in all test cases will not exceed .
Output Format
For each line, print the content enclosed within valid tags.
If a line contains multiple instances of valid content, print out each instance of valid content on a new line; if no valid content is found, print None
.
Sample Input
4
<h1>Nayeem loves counseling</h1>
<h1><h1>Sanjay has no watch</h1></h1><par>So wait for a while</par>
<Amee>safat codes like a ninja</amee>
<SA premium>Imtiaz has a secret crush</SA premium>
Sample Output
Nayeem loves counseling
Sanjay has no watch
So wait for a while
None
Imtiaz has a secret crush
Solution in java8
Approach 1.
import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;
public class Solution{
public static void main(String[] args){
Scanner in = new Scanner(System.in);
int testCases = Integer.parseInt(in.nextLine());
while(testCases>0){
String line = in.nextLine();
//Write your code here
boolean matchFound = false;
Pattern r = Pattern.compile("<(.+)>([^<]+)</\\1>");
Matcher m = r.matcher(line);
while (m.find()) {
System.out.println(m.group(2));
matchFound = true;
}
if (!matchFound) {
System.out.println("None");
}
testCases--;
}
}
}
Approach 2.
import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;
public class Solution{
public static void main(String[] args){
Scanner in = new Scanner(System.in);
int testCases = Integer.parseInt(in.nextLine());
while(testCases>0){
String line = in.nextLine();
String rgx = "<(.+)>([^<]+)</\\1>";
Pattern tags = Pattern.compile(rgx);
Matcher m = tags.matcher(line);
boolean match = false;
while (m.find()) {
System.out.println(m.group(2));
match = true;
}
if (! match == true) {
System.out.println("None");
}
testCases--;
}
}
}
Approach 3.
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Solution {
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
int testCases = Integer.parseInt(scan.nextLine());
while (testCases-- > 0) {
String line = scan.nextLine();
boolean matchFound = false;
Pattern r = Pattern.compile("<(.+)>([^<]+)</\\1>");
Matcher m = r.matcher(line);
while (m.find()) {
System.out.println(m.group(2));
matchFound = true;
}
if (!matchFound) {
System.out.println("None");
}
}
scan.close();
}
}