« BeautifulSoup Basics
We can collect a list of all of the occurrences of a tag used in the web page by using find_all. We will input the name of the tag and in return get a list of its occurrence in the web page.
Let us findout all the H2 tags of the webpage.
import requests
link = "https://www.plus2net.com/html_tutorial/html_form.php"
content = requests.get(link)
from bs4 import BeautifulSoup
soup = BeautifulSoup(content.text, 'html.parser')
print(soup.find_all("h2"))
Output is here
[<h2>How to select a form component</h2>,
<h2>Form tag</h2>, <h2>Method attribute of the html form</h2>,
<h2>Action attribute</h2>,
<h2>Applications and uses of html form elements</h2>]
If you don't want to keep the <h2> </h2>tags, then use this
my_list=soup.find_all("h2")
for my_tags in my_list:
print(my_tags.string)
Collecting all the links of a webpage
One of the important requirement is to collect the all the links present in a webpage. We will use find_all to get the links ( <a href=… > … </a>), then try to get the anchored string part and the URL or the address part of the links. Note that we will get a list of links by using find_all and then by using a for loop we will display all links.
import requests
link = "https://www.plus2net.com/html_tutorial/html_form.php"
content = requests.get(link)
from bs4 import BeautifulSoup
soup = BeautifulSoup(content.text, 'html.parser')
print(soup.find_all('a')) # all the links with string and tags
The output will be all the links present in the webpage.
Now let us try to collect the anchored string and the URL ( or address ) part of the links.
my_list=soup.find_all("a")
for my_tags in my_list:
#print(my_tags['href']) # returns the links or URLs
print(my_tags.string) # returns the string or anchored string
Using Regular expression
We can use regular expression with find_all to get matching tags.
Let us find out all the h1 and h2 tags
import requests
link = "https://www.plus2net.com/html_tutorial/html_form.php"
content = requests.get(link)
from bs4 import BeautifulSoup
soup = BeautifulSoup(content.text, 'html.parser')
import re
print(soup.find_all(re.compile("(h[1|2])")))
We will get one list as output
[<h1 itemprop="headline">Web Form tag & HTML elements</h1>,
<h2>How to select a form component</h2>, <h2>Form tag</h2>,
<h2>Method attribute of the html form</h2>,
<h2>Action attribute</h2>,
<h2>Applications and uses of html form elements</h2>]
all a or div tags
import re
#print(soup.find_all(re.compile("(a|div)"))) # all a or div tags
← Subscribe to our YouTube Channel here