Change panda DataFrame into paragraph with a certain format

Change panda DataFrame into paragraph with a certain format - python

Here is the code:
# Import pandas library
import pandas as pd
# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
# print dataframe.
df
here is the output
I wanted to change this dataframe to a text with certain format like:
" Here is the name and age Data:
Tom 10
Nick 15
Juli 14
Thank you for your request"

Use Series.str.capitalize and join values of Age converted to strings:
df['new'] = df['Name'].str.capitalize() + ' ' + df['Age'].astype(str)
print (df)
Name Age new
0 tom 10 Tom 10
1 nick 15 Nick 15
2 juli 14 Juli 14
Or:
df['new'] = df['Name'].str.capitalize().str.cat(df['Age'].astype(str), sep=' ')
If need loop by values:
s = df['Name'].str.capitalize() + ' ' + df['Age'].astype(str)
for v in s:
print (v)
Tom 10
Nick 15
Juli 14

Related

Adding column in a dataframe with 0,1 values based on another column values

In the example dataframe created below:
Name Age
0 tom 10
1 nick 15
2 juli 14
I want to add another column 'Checks' and get the values in it as 0 or 1 if the list check contain s the value as check=['nick']
I have tried the below code:
import numpy as np
import pandas as pd
# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]
check = ['nick']
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
df['Checks'] = np.where(df['Name']== check[], 1, 0)
#print dataframe.
print(df)
print(check)

str.containts
phrase = ['tom', 'nick']
df['check'] = df['Name'].str.contains('|'.join(phrase))

You can use pandas.Series.isin:
check = ['nick']
df['check'] = df['Name'].isin(check).astype(int)
output:
Name Age check
0 tom 10 0
1 nick 15 1
2 juli 14 0

frequency of values in column in multiple panda data frame

I have multiple panda data frames ( more than 70), each having same columns. Let say there are only 10 rows in each data frame. I want to find the column A' value occurence in each of data frame and list it. Example:
# Import pandas library
import pandas as pd
# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
data = [['sam', 12], ['nick', 15], ['juli', 14]]
df2 = pd.DataFrame(data, columns = ['Name', 'Age'])
I am expecting the output as
Name Age
tom 1
sam 1
nick 2
juli 2

You can do the following:
from collections import Counter
d={'df1':df1, 'df2':df2, ..., 'df70':df70}
l=[list(d[i]['Name']) for i in d]
m=sum(l, [])
result=Counter(m)
print(result)

Do you want value counts of Name column across all dataframes?
main = pd.concat([df,df2])
main["Name"].value_counts()
juli 2
nick 2
sam 1
tom 1
Name: Name, dtype: int64

This can work if your data frames are not costly to concat:
pd.concat([x['Name'] for x in [df,df2]]).value_counts()
nick 2
juli 2
tom 1
sam 1

You can try this:
df = pd.concat([df, df2]).groupby('Name', as_index=False).count()
df.rename(columns={'Age': 'Count'}, inplace=True)
print(df)
Name Count
0 juli 2
1 nick 2
2 sam 1
3 tom 1

You can try this:
df = pd.concat([df1,df2])
df = df.groupby(['Name'])['Age'].count().to_frame().reset_index()
df = df.rename(columns={"Age": "Count"})
print(df)

return True if partial match success between two column

I am looking for partial matches success. below is the code for the dataframe.
import pandas as pd
data = [['tom', 10,'aaaaa','aaa'], ['nick', 15,'vvvvv','vv'], ['juli', 14,'sssssss','kk']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age','random','partial'])
df
Output:
I am expecting the output shown below
Name Age random partial Matches
0 tom 10 aaaaa aaa True
1 nick 15 vvvvv vv True
2 juli 14 sssssss kk False

You can use df.apply in combination with a lambda function that checks whether the partial string is part of the other string by using in.
Then we can assign this to a new column in the dataframe:
>>>import pandas as pd
>>>data = [['tom', 10,'aaaaa','aaa'], ['nick', 15,'vvvvv','vv'], ['juli',14,'sssssss','kk']]
>>>df = pd.DataFrame(data, columns = ['Name', 'Age','random','partial'])
>>>df['matching'] = df.apply(lambda x : x.partial in x.random, axis=1)
>>>print(df)
Name Age random partial matching
0 tom 10 aaaaa aaa True
1 nick 15 vvvvv vv True
2 juli 14 sssssss kk False
One important thing to be aware of when using df.apply is the axis argument, as it here allows us to access all columns of a given row at once.

df['Matches'] = df.apply(lambda row: row['partial'] in row['random'], axis = 'columns')
df
gives
Name Age random partial Matches
0 tom 10 aaaaa aaa True
1 nick 15 vvvvv vv True
2 juli 14 sssssss kk False

How do you replace duplicate values with multiple unique strings in Pandas?

import pandas as pd
import numpy as np
data = {'Name':['Tom', 'Tom', 'Jack', 'Terry'], 'Age':[20, 21, 19, 18]}
df = pd.DataFrame(data)
Lets say I have a dataframe that looks like this. I am trying to figure out how to check the Name column for the value 'Tom' and if I find it the first time I replace it with the value 'FirstTom' and the second time it appears I replace it with the value 'SecondTom'. How do you accomplish this? I've used the replace method before but only for replacing all Toms with a single value. I don't want to add a 1 on the end of the value, but completely change the string to something else.
Edit:
If the df looked more like this below, how would we check for Tom in the first column and the second column and then replace the first instance with FirstTom and the second instance with SecondTom
data = {'Name':['Tom', 'Jerry', 'Jack', 'Terry'], 'OtherName':[Tom, John, Bob,Steve]}

Just adding in to the existing solutions , you can use inflect to create dynamic dictionary
import inflect
p = inflect.engine()
df['Name'] += df.groupby('Name').cumcount().add(1).map(p.ordinal).radd('_')
print(df)
Name Age
0 Tom_1st 20
1 Tom_2nd 21
2 Jack_1st 19
3 Terry_1st 18

We can do cumcount
df.Name=df.Name+df.groupby('Name').cumcount().astype(str)
df
Name Age
0 Tom0 20
1 Tom1 21
2 Jack0 19
3 Terry0 18
Update
suf = lambda n: "%d%s"%(n,{1:"st",2:"nd",3:"rd"}.get(n if n<20 else n%10,"th"))
g=df.groupby('Name')
df.Name=df.Name.radd(g.cumcount().add(1).map(suf).mask(g.Name.transform('count')==1,''))
df
Name Age
0 1stTom 20
1 2ndTom 21
2 Jack 19
3 Terry 18
Update 2 for column
suf = lambda n: "%d%s"%(n,{1:"st",2:"nd",3:"rd"}.get(n if n<20 else n%10,"th"))
g=s.groupby([s.index.get_level_values(0),s])
s=s.radd(g.cumcount().add(1).map(suf).mask(g.transform('count')==1,''))
s=s.unstack()
Name OtherName
0 1stTom 2ndTom
1 Jerry John
2 Jack Bob
3 Terry Steve

EDIT: For count duplicated per rows use:
df = pd.DataFrame(data = {'Name':['Tom', 'Jerry', 'Jack', 'Terry'],
'OtherName':['Tom', 'John', 'Bob','Steve'],
'Age':[20, 21, 19, 18]})
print (df)
Name OtherName Age
0 Tom Tom 20
1 Jerry John 21
2 Jack Bob 19
3 Terry Steve 18
import inflect
p = inflect.engine()
#map by function for dynamic counter
f = lambda i: p.number_to_words(p.ordinal(i))
#columns filled by names
cols = ['Name','OtherName']
#reshaped to MultiIndex Series
s = df[cols].stack()
#counter per groups
count = s.groupby([s.index.get_level_values(0),s]).cumcount().add(1)
#mask for filter duplicates
mask = s.reset_index().duplicated(['level_0',0], keep=False).values
#filter only duplicates and map, reshape back and add to original data
df[cols] = count[mask].map(f).unstack().add(df[cols], fill_value='')
print (df)
Name OtherName Age
0 firstTom secondTom 20
1 Jerry John 21
2 Jack Bob 19
3 Terry Steve 18
Use GroupBy.cumcount with Series.map, but only for duplicated values by Series.duplicated:
data = {'Name':['Tom', 'Tom', 'Jack', 'Terry'], 'Age':[20, 21, 19, 18]}
df = pd.DataFrame(data)
nth = {
0: "First",
1: "Second",
2: "Third",
3: "Fourth"
}
mask = df.Name.duplicated(keep=False)
df.loc[mask, 'Name'] = df[mask].groupby('Name').cumcount().map(nth) + df.loc[mask, 'Name']
print (df)
Name Age
0 FirstTom 20
1 SecondTom 21
2 Jack 19
3 Terry 18
Dynamic dictionary should be like:
import inflect
p = inflect.engine()
mask = df.Name.duplicated(keep=False)
f = lambda i: p.number_to_words(p.ordinal(i))
df.loc[mask, 'Name'] = df[mask].groupby('Name').cumcount().add(1).map(f) + df.loc[mask, 'Name']
print (df)
Name Age
0 firstTom 20
1 secondTom 21
2 Jack 19
3 Terry 18

transform
nth = ['First', 'Second', 'Third', 'Fourth']
def prefix(d):
n = len(d)
if n > 1:
return d.radd([nth[i] for i in range(n)])
else:
return d
df.assign(Name=df.groupby('Name').Name.transform(prefix))
Name Age
0 FirstTom 20
1 SecondTom 21
2 Jack 19
3 Terry 18
4 FirstSteve 17
5 SecondSteve 16
6 ThirdSteve 15

String increment of characters for a column

I've tried researching but din't get any leads so posting a question,
I have a df and I want the string column values to be incremented based on their ascii values of each character of string by 3
data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
Name Age
0 Tom 10
1 Nick 15
2 Juli 14
Final answer should be like Name is incremented by 3 ASCII numbers
Name Age
0 Wrp 10
1 Qlfn 15
2 Myol 14
This action has to be carried out on a df with 32,000 row. Please suggest me on how to achieve this result?

Here's one way using python's built-in chr and ord (it seems like you want an increment of 3 not 2):
df['Name'] = [''.join(chr(ord(s)+3) for s in i) for i in df.Name]
print(df)
Name Age
0 Wrp 10
1 Qlfn 15
2 Mxol 14

Try the code below,
data = [['Tom', 10], ['Nick', 15], ['Juli', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
def fn(inp_str):
return ''.join([chr(ord(i) + 3) for i in inp_str])
df['Name'] = df['Name'].apply(fn)
df
Output is
Name Age
0 Wrp 10
1 Qlfn 15
2 Mxol 14

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Change panda DataFrame into paragraph with a certain format - python

Related

Adding column in a dataframe with 0,1 values based on another column values

frequency of values in column in multiple panda data frame

return True if partial match success between two column

How do you replace duplicate values with multiple unique strings in Pandas?

String increment of characters for a column

Categories

Resources