duplicated() : getting duplicate rows

import pandas as pd 
my_dict={
  'id':[1,2,3,4,5,4,2],
  'name':['John','Max','Arnold','Krish','John','Krish','Max'],
  'class1':['Four','Three','Three','Four','Four','Four','Three'],
  'mark':[75,85,55,60,60,60,85],
  'gender':['female','male','male','female','female','female','male']
	}
df = pd.DataFrame(data=my_dict)
print(df)
Output ( here last two rows are duplicates, 6 is duplicate of 1 and 5 is duplicate of 3 )
   id    name class1  mark  gender
0   1    John   Four    75  female
1   2     Max  Three    85    male
2   3  Arnold  Three    55    male
3   4   Krish   Four    60  female
4   5    John   Four    60  female
5   4   Krish   Four    60  female
6   2     Max  Three    85    male

Display rows indicating duplicates

print(df.duplicated())
Output
0    False
1    False
2    False
3    False
4    False
5     True
6     True
dtype: bool
We can add one column with the above status.
df['status']=df.duplicated()
Based on one column duplicate values we can update one new column.
df['status']=df['class1'].duplicated()
Syntax
keepOptional ,
'first' default, all duplicates are marked True except first one
'last', all duplicates are marked True except last one
'False',all duplicates are marked True
subsetColumns to be considered for identifying duplicates, default value is all columns
Returns a Series by indicates duplicate values.
DataFrame :indicates duplicate rows.( can consider based some column values )
Serries.duplicated()

Display duplicate rows only

print(df[df.duplicated()])
Output
   id   name class1  mark     gender
5   4  Krish   Four    60  female
6   2    Max  Three    85    male
Similar output we will get by using keep='last'
print(df[df.duplicated(keep='last')])

Display unique rows

print(df[~df.duplicated()])
Output ( without 5 and 6th row )
   id    name class1  mark     gender
0   1    John   Four    75  female
1   2     Max  Three    85    male
2   3  Arnold  Three    55    male
3   4   Krish   Four    60  female
4   5    John   Four    60  female

Display based on unique value of column

Using subset.
print(df[df.duplicated(keep='last',subset=['class1'])])
Output
   id    name class1  mark  gender
0   1    John   Four    75  female
1   2     Max  Three    85    male
2   3  Arnold  Three    55    male
3   4   Krish   Four    60  female
4   5    John   Four    60  female
We can use more than one column also.
print(df[df.duplicated(keep='last',subset=['class1','gender'])])
Without using subset
In our class1 column we will identify the first unique values and then display the row.
print(df[~df['class1'].duplicated(keep='first')])
Output
   id  name class1  mark     gender
0   1  John   Four    75  female
1   2   Max  Three    85    male

Duplicate rows considering more than one column values

Find out all the duplicate rows having same class and same mark. Here row is consider as duplicate based on two column values class1 and mark.
print(df[df[['class1','mark']].duplicated() ])
   id   name class1  mark  gender
4   5   John   Four    60  female
5   4  Krish   Four    60  female
6   2    Max  Three    85    male
Create one Tkinter GUI to display DataFrame rows with a set of radio buttons to select duplicate rows, unique rows and all the rows.
Dynamic Creation of Header & Columns in Treeview
( with different sources of data )

Tkinter Radio buttons
Data Cleaning
Pandas Series.duplicated() Series.drop_duplicates() dataframe.drop_duplicates()
Subscribe to our YouTube Channel here


Subscribe

* indicates required
Subscribe to plus2net

    plus2net.com



    Post your comments , suggestion , error , requirements etc here





    Python Video Tutorials
    Python SQLite Video Tutorials
    Python MySQL Video Tutorials
    Python Tkinter Video Tutorials
    We use cookies to improve your browsing experience. . Learn more
    HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
    ©2000-2024 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer