By Vishal Basumatary in Hackerrank — Jul 28, 2020

Hackerrank Detect HTML Tags, Attributes and Attribute Values Solution

You are given an HTML code snippet of lines.
Your task is to detect and print all the HTML tags, attributes and attribute values.

Print the detected items in the following format:Tag1Tag2-> Attribute2[0] > Attribute_value2[0]-> Attribute2[1] > Attribute_value2[1]-> Attribute2[2] > Attribute_value2[2]Tag3-> Attribute3[0] > Attribute_value3[0]

The -> symbol indicates that the tag contains an attribute. It is immediately followed by the name of the attribute and the attribute value.
The > symbol acts as a separator of attributes and attribute values.

If an HTML tag has no attribute then simply print the name of the tag.

Note: Do not detect any HTML tag, attribute or attribute value inside the HTML comment tags (). Comments can be multiline.
All attributes have an attribute value.

Input Format

The first line contains an integer , the number of lines in the HTML code snippet.
The next lines contain HTML code.

Constraints

Output Format

Print the HTML tags, attributes and attribute values in order of their occurrence from top to bottom in the snippet.

Format your answers as explained in the problem statement.

Sample Input

9
<head>
<title>HTML</title>
</head>
<object type="application/x-flash" 
  data="your-file.swf" 
  width="0" height="0">
  <!-- <param name="movie" value="your-file.swf" /> -->
  <param name="quality" value="high"/>
</object>

Sample Output

head
title
object
-> type > application/x-flash
-> data > your-file.swf
-> width > 0
-> height > 0
param
-> name > quality
-> value > high

Explanation

head tag: Print the head tag only because it has no attribute.
title tag: Print the title tag only because it has no attribute.
object tag: Print the object tag. In the next lines, print the attributes type, data, width and height along with their respective values.
 tag: Don't detect anything inside it.
param tag: Print the param tag. In the next lines, print the attributes name along with their respective values.

Solution in python3

Approach 1.

from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
 def handle_starttag(self, tag, attrs):
  print(tag)
  for attr in attrs:
      print("->", attr[0], ">", attr[1])
N = int(input())
raw = ""
for i in range(N):
 raw += input()
new = MyHTMLParser()
new.feed(raw)

Approach 2.

from html.parser import HTMLParser
class Pythonist2Parser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print(tag)
        for attr, value in attrs:
            print('->', attr, '>', value)
parser = Pythonist2Parser()
for _ in range(int(input())):
    parser.feed(input())
parser.close()

Approach 3.

from html.parser import HTMLParser
class Parser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print(tag)
        for attr in attrs:
            prop,value = attr
            print("-> " + prop + " > " + value)
N=int(input())
s=""
for i in range(N):
    s+= " " + input()
parser = Parser()
parser.feed(s)

Solution in python

Approach 1.

import HTMLParser
class parseTitle(HTMLParser.HTMLParser):
    def handle_starttag(self, tag, attrs):
        print tag
        for name, value in attrs:
            print '-> ' + str(name) + ' > ' + str(value)
aparser = parseTitle()
s = ""
for i in xrange(input()):
    s += raw_input()
aparser.feed(s)

Approach 2.

import re
num_lines = input()
lines = [raw_input() for i in xrange(num_lines)]
text = ' '.join(lines)
text = re.sub(r'', '', text, flags=re.DOTALL)
from HTMLParser import HTMLParser
class Parser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print tag
        for name, val in attrs:
            print '->', name, '>', val
Parser().feed(text)

Approach 3.

"""
https://www.hackerrank.com/contests/pythonist2/challenges/detect-html-tags-attributes-and-attribute-values
"""
from HTMLParser import HTMLParser
class MyHTML(HTMLParser) :
 def handle_starttag(self, tag, attrs):
  print tag
  for attr in attrs :
   print "->",attr[0],">",attr[1]
N = int(raw_input())
html = ""
parser = MyHTML()
for _ in xrange(N) :
 line = str(raw_input()).strip()
 parser.feed(line)

Solution in python3

Solution in python

Subscribe to The Poor Coder | Algorithm Solutions