Hackerrank - Detect HTML Tags Solution
2 min read

Hackerrank - Detect HTML Tags Solution

In this challenge, we're using regular expressions to detect the various tags used in an HTML document.
Hackerrank - Detect HTML Tags Solution

In this challenge, we're using regular expressions to detect the various tags used in an HTML document.

  • Tags come in pairs. Some tag name, , will have an opening tag, , followed by some intermediate text, followed by a closing tag, . The forward slash in a closing tag will always come before the tag name.
  • The exception to this is self-closing tags, which consist of a single tag (not a pair) with a forward slash after the tag name:

Here are a few examples of tags:

  • The  tag is for paragraphs:
  • There may be  or more spaces before or after a tag name:
  • A void or empty tag involves an opening and closing tag with no intermediate characters:

Some tags can also have attributes, such as the  tag, which is used to add a hyperlink to another document:

In the above case,  is the tag name and  is an attribute having the value .

Given  lines of HTML, find the tag names (ignore any attributes) and print them as a single line of lexicographically ordered semicolon-separated values (e.g.: ).

Input Format

The first line contains an integer, , the number of HTML fragments.
Each of the  subsequent lines contains a fragment of an HTML document.


  • Each fragment contains  ASCII characters.
  • The fragments are chosen from Wikipedia, so analyzing and observing their markup structure may help.
  • Leading and trailing spaces/indentation have been trimmed from the HTML fragments.

Output Format

Print a single line containing all of the unique tag names found in the input. Your output tags should be semicolon-separated and ordered lexicographically (i.e.: alphabetically). Do not print the same tag name more than once.

Sample Input

<p><a href="http://www.quackit.com/html/tutorial/html_links.cfm">Example Link</a></p>
<div class="more-info"><a href="http://www.quackit.com/html/examples/html_links_examples.cfm">More Link Examples...</a></div>

Sample Output



The first line contains  tag names: .
The second line contains  tag names: .
Our set of unique tag names is .
When we order these alphabetically and print them as semicolon-separated values, we get "".

Solution in python

import sys, re

html = sys.stdin.read()
tags =  set(re.findall("<\s*([a-z0-9]+)", html))
tags = list(tags)

Enjoying these posts? Subscribe for more

Adblocker detected! Please consider reading this notice.

We've detected that you are using AdBlock Plus or some other adblocking software which is preventing the page from fully loading.

That's okay. But without advertising-income, we can't keep making this site awesome.

We don't have any banner, Flash, animation, obnoxious sound, or popup ad. We do not implement these annoying types of ads!

We need money to operate the site, and almost all of it comes from our online advertising.

Please add thepoorcoder.com to your ad blocking whitelist or disable your adblocking software.