duplicated() to get duplicate rows from a DataFrame in Pandas

import pandas as pd 
my_dict={
  'id':[1,2,3,4,5,4,2],
  'name':['John','Max','Arnold','Krish','John','Krish','Max'],
  'class1':['Four','Three','Three','Four','Four','Four','Three'],
  'mark':[75,85,55,60,60,60,85],
  'gender':['female','male','male','female','female','female','male']
	}
df = pd.DataFrame(data=my_dict)
print(df)

Output ( here last two rows are duplicates, 6 is duplicate of 1 and 5 is duplicate of 3 )

   id    name class1  mark  gender
0   1    John   Four    75  female
1   2     Max  Three    85    male
2   3  Arnold  Three    55    male
3   4   Krish   Four    60  female
4   5    John   Four    60  female
5   4   Krish   Four    60  female
6   2     Max  Three    85    male

Display rows indicating duplicates

print(df.duplicated())

Output

0    False
1    False
2    False
3    False
4    False
5     True
6     True
dtype: bool

We can add one column with the above status.

df['status']=df.duplicated()

Based on one column duplicate values we can update one new column.

df['status']=df['class1'].duplicated()

Syntax

`keep`	Optional , `'first'` default, all duplicates are marked True except first one `'last'`, all duplicates are marked True except last one `'False'`,all duplicates are marked True
`subset`	Columns to be considered for identifying duplicates, default value is all columns

Returns a Series by indicates duplicate values.
DataFrame :indicates duplicate rows.( can consider based some column values )
Serries.duplicated()

Display duplicate rows only

print(df[df.duplicated()])

Output

   id   name class1  mark     gender
5   4  Krish   Four    60  female
6   2    Max  Three    85    male

Similar output we will get by using keep='last'

print(df[df.duplicated(keep='last')])

Display unique rows

print(df[~df.duplicated()])

Output ( without 5 and 6th row )

   id    name class1  mark     gender
0   1    John   Four    75  female
1   2     Max  Three    85    male
2   3  Arnold  Three    55    male
3   4   Krish   Four    60  female
4   5    John   Four    60  female

Display based on unique value of column

Using subset.

print(df[df.duplicated(keep='last',subset=['class1'])])

Output

   id    name class1  mark  gender
0   1    John   Four    75  female
1   2     Max  Three    85    male
2   3  Arnold  Three    55    male
3   4   Krish   Four    60  female
4   5    John   Four    60  female

We can use more than one column also.

print(df[df.duplicated(keep='last',subset=['class1','gender'])])

Without using subset
In our class1 column we will identify the first unique values and then display the row.

print(df[~df['class1'].duplicated(keep='first')])

Output

   id  name class1  mark     gender
0   1  John   Four    75  female
1   2   Max  Three    85    male

Duplicate rows considering more than one column values

Find out all the duplicate rows having same class and same mark. Here row is consider as duplicate based on two column values class1 and mark.

print(df[df[['class1','mark']].duplicated() ])

   id   name class1  mark  gender
4   5   John   Four    60  female
5   4  Krish   Four    60  female
6   2    Max  Three    85    male

Create one Tkinter GUI to display DataFrame rows with a set of radio buttons to select duplicate rows, unique rows and all the rows.
Dynamic Creation of Header & Columns in Treeview
( with different sources of data )
Tkinter Radio buttons

Data Cleaning
Pandas Series.duplicated() Series.drop_duplicates() dataframe.drop_duplicates()

Numpy arrays Python & MySQL Python- Tutorials

Subscribe to our YouTube Channel here

duplicated() : getting duplicate rows

Display rows indicating duplicates

Display duplicate rows only

Display unique rows

Display based on unique value of column

Duplicate rows considering more than one column values

Subscribe