find_all to get a list of specified html tags

BeautifulSoup Basics
We can collect a list of all of the occurrences of a tag used in the web page by using find_all. We will input the name of the tag and in return get a list of its occurrence in the web page.

Let us findout all the H2 tags of the webpage.

import requests
link = "https://www.plus2net.com/html_tutorial/html_form.php"
content = requests.get(link)

from bs4 import BeautifulSoup
soup = BeautifulSoup(content.text, 'html.parser')

print(soup.find_all("h2"))

Output is here

[<h2>How to select a form component</h2>,
 <h2>Form tag</h2>, <h2>Method attribute of the html form</h2>,
 <h2>Action attribute</h2>, 
 <h2>Applications and uses of html form elements</h2>]

If you don't want to keep the <h2> </h2>tags, then use this

my_list=soup.find_all("h2")
for my_tags in my_list:
    print(my_tags.string)

Collecting all the links of a webpage

One of the important requirement is to collect the all the links present in a webpage. We will use find_all to get the links ( <a href=… > … </a>), then try to get the anchored string part and the URL or the address part of the links. Note that we will get a list of links by using find_all and then by using a for loop we will display all links.

import requests
link = "https://www.plus2net.com/html_tutorial/html_form.php"
content = requests.get(link)

from bs4 import BeautifulSoup
soup = BeautifulSoup(content.text, 'html.parser')

print(soup.find_all('a')) # all the links with string and tags

The output will be all the links present in the webpage.

Now let us try to collect the anchored string and the URL ( or address ) part of the links.

my_list=soup.find_all("a")
for my_tags in my_list:
    #print(my_tags['href']) # returns the links or URLs
    print(my_tags.string)   # returns the string or anchored string

Using Regular expression

We can use regular expression with find_all to get matching tags.
Let us find out all the h1 and h2 tags

import requests
link = "https://www.plus2net.com/html_tutorial/html_form.php"
content = requests.get(link)

from bs4 import BeautifulSoup
soup = BeautifulSoup(content.text, 'html.parser')

import re
print(soup.find_all(re.compile("(h[1|2])")))

We will get one list as output

[<h1 itemprop="headline">Web Form tag & HTML elements</h1>,
 <h2>How to select a form component</h2>, <h2>Form tag</h2>,
 <h2>Method attribute of the html form</h2>,
 <h2>Action attribute</h2>,
 <h2>Applications and uses of html form elements</h2>]

all a or div tags

import re
#print(soup.find_all(re.compile("(a|div)"))) # all a or div tags

BeautifulSoup children and parent tags select Python- Tutorials

Subscribe to our YouTube Channel here

find_all : List of tags

Collecting all the links of a webpage

Using Regular expression

Subscribe