Search for required tags in a website using python requests and beautiful soup
Using python, we will find out whether there is a html table in the list of urls we have
Assuming we have a urls.txt file where we have a list of urls
https://www.w3schools.com/
https://www.w3schools.com/html/html_tables.asp
https://www.thepoorcoder.com
https://www.thepoorcoder.com/generating-random-marks-and-plotting-to-graph-in-python/
We will find out whether there is a html table in the list of urls we have
Solution
from bs4 import BeautifulSoup
import requests
#open urls.txt file where we have our list of urls
with open("urls.txt", "r") as f:
urls = f.read().splitlines()
#Fake user agent to avoid blocking
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36"
}
#print urls found in our text file
print("List of urls are\n"+"\n".join(urls)+"\n")
#loop through each url
for url in urls:
#use requests to get page html source
res = requests.get(url, headers=headers)
#use beautiful soup to parse html page
soup = BeautifulSoup(res.text, "html.parser")
#print page print
print("Page title:",soup.title.string)
#search for table using soup.find("required-tag-here")
if soup.find("table"):
print("Table found in", url)
else:
print("No table found in", url)
Output
List of urls are
https://www.w3schools.com/
https://www.w3schools.com/html/html_tables.asp
https://www.thepoorcoder.com
https://www.thepoorcoder.com/generating-random-marks-and-plotting-to-graph-in-python/
Page title: W3Schools Online Web Tutorials
No table found in https://www.w3schools.com/
Page title: HTML Tables
Table found in https://www.w3schools.com/html/html_tables.asp
Page title: The Poor Coder
No table found in https://www.thepoorcoder.com
Page title: Generating random student marks and plotting to graph in Python
Table found in https://www.thepoorcoder.com/generating-random-marks-and-plotting-to-graph-in-python/