how i can loops thourgh column in each row using python - python

Hey you guy I got a dataframe like this
empoyees = [('jack', 34, 'Sydney',800) ,
('Riti', 31, 'Delhi',800) ,
('Aadi', 16, 'New York',800) ,
('Mohit', 32,'Delhi',1500) ,
]
empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City',Salary], index=['a', 'b', 'c', 'd'])
how I can loop through columns in each row and get the result like this using pandas in python. Maybe add all it into a small list of each row
a Name jack Age 34 City Sydney Salary 800
b Name Riti Age 31 City Delhi Salary 800
c Name Aadi Age 16 City New York Salary 800
d Name Mohit Age 32 City Delhi Salary 1500

You could use DataFrame.to_dict with orient set to 'index'
The output of dict would be of the form:
{ idx1 : {col1:val1, col2:val2 ... coln:van},
idx2 : {col1:val1, col2:val2 ... coln:valn},
...
}
Loop through the dict and create a list of strings if would like to store them as a list.
[
f'{idx} {" ".join([str(v) for t in vals.items() for v in t])}'
for idx, vals in df.to_dict("index").items()
]
# output
# ['a Name jack Age 34 City Sydney Salary 800',
# 'b Name Riti Age 31 City Delhi Salary 800',
# 'c Name Aadi Age 16 City New York Salary 800',
# 'd Name Mohit Age 32 City Delhi Salary 1500']
If you only want to print them you don't need to build a list of strings. You could do:
for idx, vals in df.to_dict('index').items():
print(idx, *[v for t in vals.items() for v in t], sep=" ")
#output
# a Name jack Age 34 City Sydney Salary 800
# b Name Riti Age 31 City Delhi Salary 800
# c Name Aadi Age 16 City New York Salary 800
# d Name Mohit Age 32 City Delhi Salary 1500

i kept it simple
s=''
for index, row in df.iterrows():
if index in s:
pass
else:
s+=str(index)
for key, value in row[:].items():
s+=" "+ key+" "+str(value)
print(s)
s=''
output
a Name jack Age 34 City Sydney Salary 800
b Name Riti Age 31 City Delhi Salary 800
c Name Aadi Age 16 City New York Salary 800
d Name Mohit Age 32 City Delhi Salary 1500

Related

How to convert pandas DataFrame to multiple DataFrame?

My DataFrame
df= pandas.DataFrame({
"City" :["Chennai","Banglore","Mumbai","Delhi","Chennai","Banglore","Mumbai","Delhi"],
"Name" :["Praveen","Dhansekar","Naveen","Kumar","SelvaRani","Nithya","Suji","Konsy"]
"Gender":["M","M","M","M","F","F","F","F"]})
when printed it appears like this, df=
City
Name
Gender
Chennai
Praveen
M
Banglore
Dhansekar
M
Mumbai
Naveen
M
Delhi
Kumar
M
Chennai
SelvaRani
F
Banglore
Nithya
F
Mumbai
Suji
F
Delhi
Konsy
F
I want to save the data in separate DataFrame as follows:
Chennai=
City
Name
Gender
Chennai
Praveen
M
Chennai
SelvaRani
F
Banglore=
City
Name
Gender
Banglore
Dhansekar
M
Banglore
Nithya
F
Mumbai=
City
Name
Gender
Mumbai
Naveen
M
Mumbai
Suji
F
Delhi=
City
Name
Gender
Delhi
Kumar
M
Delhi
Konsy
F
My code is:
D_name= sorted(df['City'].unique())
for i in D_name:
f"{i}"=df[df['City']==I]
The dataset have more than 100 Cities.How do I write a for loop in python to get output as multiple data frame?
You can groupby and create a dictionary like so:
dict_dfs = dict(iter(df.groupby("City")))
Then you can directly access individual cities:
Delhi = dict_dfs["Delhi"]
print(Delhi)
# result:
City Name Gender
3 Delhi Kumar M
7 Delhi Konsy F
You could do something like this:
groups = df.groupby(by='City')
Bangalore = groups.get_group('Bangalore')

Combine text using delimiter for duplicate column values

What im trying to achieve is to combine Name into one value using comma delimiter whenever Country column is duplicated, and sum the values in Salary column.
Current input :
pd.DataFrame({'Name': {0: 'John',1: 'Steven',2: 'Ibrahim',3: 'George',4: 'Nancy',5: 'Mo',6: 'Khalil'},
'Country': {0: 'USA',1: 'UK',2: 'UK',3: 'France',4: 'Ireland',5: 'Ireland',6: 'Ireland'},
'Salary': {0: 100, 1: 200, 2: 200, 3: 100, 4: 50, 5: 100, 6: 10}})
Name Country Salary
0 John USA 100
1 Steven UK 200
2 Ibrahim UK 200
3 George France 100
4 Nancy Ireland 50
5 Mo Ireland 100
6 Khalil Ireland 10
Expected output :
Row 1 & 2 (in inputs) got grupped into one since Country column is duplicated & Salary column is summed up.
Tha same goes for Row 4,5 & 6.
Name Country Salary
0 John USA 100
1 Steven, Ibrahim UK 400
2 George France 100
3 Nancy, Mo, Khalil Ireland 160
What i have tried, but im not sure how to combine text in Name column :
df.groupby(['Country'],as_index=False)['Salary'].sum()
[Out:]
Country Salary
0 France 100
1 Ireland 160
2 UK 400
3 USA 100
use groupby() and agg():
out=df.groupby('Country',as_index=False).agg({'Name':', '.join,'Salary':'sum'})
If needed unique values of 'Name' column then use :
out=(df.groupby('Country',as_index=False)
.agg({'Name':lambda x:', '.join(set(x)),'Salary':'sum'}))
Note: use pd.unique() in place of set() if order of unique values is important
output of out:
Country Name Salary
0 France George 100
1 Ireland Nancy, Mo, Khalil 160
2 UK Steven, Ibrahim 400
3 USA John 100
Use agg:
df.groupby(['Country'], as_index=False).agg({'Name': ', '.join, 'Salary':'sum'})
And to get the columns in order you can add [df.columns] to the pipe:
df.groupby(['Country'], as_index=False).agg({'Name': ', '.join, 'Salary':'sum'})[df.columns]
Name Country Salary
0 John USA 100
1 Steven, Ibrahim UK 400
2 George France 100
3 Nancy, Mo, Khalil Ireland 160

Add a column from an existing dataframe into another between every other column

I'll try my best to explain this as I had trouble phrasing the title. I have two dataframes. What I would like to do is add a column from df1 into df2 between every other column.
For example, df1 looks like this :
Age City
0 34 Sydney
1 30 Toronto
2 31 Mumbai
3 32 Richmond
And after adding in df2 it looks like this:
Name Age Clicks City Country
0 Ali 34 10 Sydney Australia
1 Lori 30 20 Toronto Canada
2 Asher 31 45 Mumbai United States
3 Lylah 32 33 Richmond United States
In terms of code, I wasn't quite sure where to even start.
'''Concatenating the dataframes'''
for i in range len(df2):
pos = i+1
df3 = df2.insert
#df2 = pd.concat([df1, df2], axis=1).sort_index(axis=1)
#df2.columns = np.arange(len(df2.columns))
#print (df2)
I was originally going to run it through a loop, but I wasn't quite sure how to do it. Any help would be appreciated!
You can use itertools.zip_longest. For example:
from itertools import zip_longest
new_columns = [
v
for v in (c for a in zip_longest(df2.columns, df1.columns) for c in a)
if not v is None
]
df_out = pd.concat([df1, df2], axis=1)[new_columns]
print(df_out)
Prints:
Name Age Clicks City Country
0 Ali 34 10 Sydney Australia
1 Lori 30 20 Toronto Canada
2 Asher 31 45 Mumbai United States
3 Lylah 32 33 Richmond United States

Overlay of two dataframe using python

Problem:
I have two dataframes viz 'infm' and 'ufl'. In 'ufl', Age and Salary column for Ben and Creg is updated. I want to update the corresponding row in 'infm' too .
Approach Taken:
I am iterating through each row of 'infm' and taking 'Name' column to match both dataframe. If corresponding names are matched, then updating the Age column of 'infm' with value in 'ufl'
Input:
NAME AGE SALARY COUNTRY
Adam 24 25000 x
Ben 25 30000 y
Creg 23 22000 x
Dawood 25 30000 w
Update on two rows of Input:
NAME AGE SAlARY COUNTRY
Ben 36 90000 y
Creg 34 92000 x
Expected Output:
NAME AGE SALARY COUNTRY
Adam 24 25000 x
Ben 36 90000 y
Creg 34 92000 x
Dawood 25 30000 w
Actual output:
NAME AGE SALARY COUNTRY
Adam 24 25000 x
Ben 25 30000 y
Creg 23 22000 x
Dawood 25 30000 w
Code used:
import pandas as pd
infm=pd.read_excel('D:/data/test.xls')
ufl=pd.read_excel('D:/data/test1.xls')
for row in infm.iterrows():
a=row[1]['Name']
b=ufl['Name'].unique().tolist()
for i in b:
if i==a:
row[1]['Age']=(ufl['Age'][ufl['Name']==a]).tolist()[0]

Convert nested dictionary to appended dataframe

I have a dictionary as such:
{1:{'name':'john', 'age':26,'salary':50000},11:{'name':'peter', 'age':34, 'salary':70000},14:{'name': 'david', 'age': 21, 'salary': 15000}}
I would like to convert it to a dataframe like this:
name age salary
john 26 50000
peter 34 70000
david 21 15000
Use from_dict with orient='index':
pd.DataFrame.from_dict(d, orient='index')
name age salary
1 john 26 50000
11 peter 34 70000
14 david 21 15000
You can load the dictionary directly into a dataframe and then transpose it:
d = {1:{'name':'john', 'age':26,'salary':50000},11:{'name':'peter', 'age':34, 'salary':70000},14:{'name': 'david', 'age': 21, 'salary': 15000}}
df = pd.DataFrame(d).T
age name salary
1 26 john 50000
11 34 peter 70000
14 21 david 15000
Construct the dataframe out of your dict's values.
>>> d = {1:{'name':'john', 'age':26,'salary':50000},11:{'name':'peter', 'age':34, 'salary':70000},14:{'name': 'david', 'age': 21, 'salary': 15000}}
>>> pd.DataFrame(list(d.values()))
age name salary
0 26 john 50000
1 34 peter 70000
2 21 david 15000
With rearranged columns:
>>> pd.DataFrame(list(d.values()), columns=['name', 'age', 'salary'])
name age salary
0 john 26 50000
1 peter 34 70000
2 david 21 15000
Do this:
pd.DataFrame(list(d.values()))
If you're using Python2, you can directly call pd.DataFrame with p.values() like this:
pd.DataFrame(d.values())
This is because dictionary values no longer returns a list in python3

Categories