Making a string out of pandas DataFrame - python

I have pandas DataFrame which looks like this:
Name Number Description
car 5 red
And I need to make a string out of it which looks like this:
"""Name: car
Number: 5
Description: red"""
I'm a beginner and I really don't get how do I do it? I'll probably need to apply this to some similar DataFrames later.

You can use iterrows to iterate over you dataframe rows, on each row you can then get the columns and print the result the way you want. For example:
import pandas as pd
dtf = pd.DataFrame({
"Name": ["car", "other"],
"Number": [5, 6],
"Description": ["red", "green"]
})
def stringify_dataframe(dtf):
text = ""
for i, row in dtf.iterrows():
for col in dtf.columns.values:
text += f"{col}: {row[col]}\n"
text += "\n"
return text
s = stringify_dataframe(dtf)
Now s contains the following:
>>> print(s)
Name: car
Number: 5
Description: red
Name: other
Number: 6
Description: green

Iterating over a Dataframe is faster when using apply.
import pandas as pd
df = pd.DataFrame({
"Name": ["car", "other"],
"Number": [5, 6],
"Description": ["red", "green"]
})
s = '\n'.join(
df.apply(
lambda row:
'\n'.join(f'{head}: {val}' for head, val in row.iteritems()),
axis=1))
Of course, for this small data set a for loop is faster, but on my machine a data set with 10 rows was already processed faster.

Another approach,
import pandas as pd
dtf = pd.DataFrame({
"Name": ["car", "other"],
"Number": [5, 6],
"Description": ["red", "green"]
})
for row_index in range(len(dtf)):
for col in dtf.columns:
print(f"{col}: {dtf.loc[row_index, col]}")
Name: car
Number: 5
Description: red
Name: other
Number: 6
Description: green
[Program finished]

Related

Create numpy array from panda daataframe inside a For loop

Lets say that i have the following dataframe:
data = {"Names": ["Ray", "John", "Mole", "Smith", "Jay", "Marc", "Tom", "Rick"],
"Sports": ["Soccer", "Judo", "Tenis", "Judo", "Tenis","Soccer","Judo","Tenis"]}
I want to have a for loop like that for each unique Sport i am able to retrieve a numpy array containing the Names of people playing that sport. In pseudo code that can be explainded as
for unique sport in sports:
nArray= numpy array of names of people practicing sport
---------
Do something with nArray
-------
Use GroupBy.apply with np.array:
df = pd.DataFrame(data)
s = df.groupby('Sports')['Names'].apply(np.array)
print (s)
Sports
Judo [John, Smith, Tom]
Soccer [Ray, Marc]
Tenis [Mole, Jay, Rick]
Name: Names, dtype: object
for sport, name in s.items():
print (name)
['John' 'Smith' 'Tom']
['Ray' 'Marc']
['Mole' 'Jay' 'Rick']
one way to go
df = pd.DataFrame(data)
for sport in df.Sports.unique():
list_of_names = list(df[df.Sports == sport].Names)
data = np.array(list_of_names)
You can do by pandas library for get list array of sport persons name.
import numpy as np
import pandas as pd
data = {"Names": ["Ray", "John", "Mole", "Smith", "Jay", "Marc", "Tom", "Rick"],
"Sports": ["Soccer", "Judo", "Tenis", "Judo", "Tenis","Soccer","Judo","Tenis"]}
df = pd.DataFrame(data)
unique_sports = df['Sports'].unique()
for sport in unique_sports:
uniqueNames = np.array(df[df['Sports'] == sport]['Names'])
print(uniqueNames)
Result :
['Mole' 'Jay' 'Rick']

Is there a way to transform all unique values into a new dataframe using loop and at the same time create additional columns? [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 8 months ago.
My problem is that I have a dataframe like this:
##for demonstration
import pandas as pd
example = {
"ID": [1, 1, 2, 2, 2, 3],
"place":["Maryland","Maryland", "Washington", "Washington", "Washington", "Los Angeles"],
"type": ["condition", "symptom", "condition", "condition", "sky", "condition"],
"name": ["depression", "cough", "fatigue", "depression", "blue", "fever" ]
}
#load into df:
example = pd.DataFrame(example)
print(example)
}
And I want to sort it by unique ID so that it will be reorganized like that:
#for demonstration
import pandas as pd
result = {
"ID": [1,2,3],
"place":["Maryland","Washington", "Los Angeles"],
"condition": ["depression", "fatigue", "fever"],
"condition1":["no", "depression", "no"],
"symptom": ["cough", "no", "no"],
"sky": ["no", "blue", "no"]
}
#load into df:
result = pd.DataFrame(result)
print(result)
I tried to sort it like:
example.nunique()
df_names = dict()
for k, v in example.groupby('ID'):
df_names[k] = v
However, this gives me back a dictionary and is not organized in a way it should.
Is there a way to do it with the loop like for all unique ID create a new column if there is condition, sky or others? If there are couple conditions that the next condition is becoming condition1. Could you please help me if you know the way to realize it?
This should give you the answers you need. It is a combination of cumsum() and pivot
import pandas as pd
df = pd.DataFrame({
"ID": [1, 1, 2, 2, 2, 3],
"place":["Maryland","Maryland", "Washington", "Washington", "Washington", "Los Angeles"],
"type": ["condition", "symptom", "condition", "condition", "sky", "condition"],
"name": ["depression", "cough", "fatigue", "depression", "blue", "fever" ]
})
df['type'] = df['type'].astype(str) + '_' + df.groupby(['place', 'type']).cumcount().astype(str)
df = df.pivot(index=['ID', 'place'], columns = 'type', values = 'name').reset_index()
df = df.fillna('no')
df.columns = df.columns.str.replace('_0', '')
df = df[['ID', 'place', 'condition', 'condition_1', 'symptom', 'sky']]
df

How to access and then compare the values of individual cells in a pandas dataframe?

I have two dataframes df and df1 both have 2 columns one with an object and another with a numerical value. If the objects in certain cells are the same I will add the two numbers together and print them. I tried this but I am getting a syntax error on the first line.
if(df.at[0,'obj'] == df1.at[0,'obj'])
print((df.at[0,'num'] + df1.at[0,'num']))
The data frames are
import pandas as pd
dict0 = {"obj" : ["AB", "BC", "CD", "AF", "GD"],
"num" : [20, 15, 10, 12, 8]
}
dict1 = {"obj" : ["AB", "BD", "CZ", "AF", "GD"],
"num" : [12, 33, 15, 7, 11]
}
df = pd.DataFrame(dict0)
df1 = pd.DataFrame(dict1)
I would really appreciate it if you could help me with this.
With your shown samples, could you please try following.
df = df.set_index('obj')
df1 = df1.set_index('obj')
(df + df1).dropna()
OR with a single shot without saving into df's:
(df.set_index('obj') + df1.set_index('obj')).dropna()

Sort list using other list with different length

I am attempting to resort a list based on the order in which they are listed in a dataframe, despite the dataframe column being of a greater length.
enrolNo Surname
0 1 Jones
1 2 Smith
2 3 Henderson
3 4 Kilm
4 5 Henry
5 6 Joseph
late = ['Kilm', 'Henry', 'Smith']
Desired output:
sorted_late = ['Smith', 'Kilm', 'Henry']
My initial attempt was to add a new column to the existing dataframe and then extract it as a list, but this seems like a very long way around. Furthermore I discovered that my attempt was not going to work due the the different lengths as stated by the error message after trying this to start with:
df_register['late_arrivals'] = np.where((df_register['Surname'] == late),
late , '')
Should I be using a 'for' loop instead?
Pluck out the matching values from dataframe itself. No need to sort the list itself:
sorted_late = df[df.Surname.isin(late)].Surname.to_list()
If it were a list you can be clever with that too:
sorted_late = [master_late for master_late in master_list if master_late in late]
Why not using the .isin() function?
df['Surename'].isin(late)
then you should get the desired output.
you can specify a custom key for the sort function
import pandas
df = pandas.DataFrame([
{"enrolNo": 1, "Surname": "Jones"},
{"enrolNo": 2, "Surname": "Smith"},
{"enrolNo": 3, "Surname": "Henderson"},
{"enrolNo": 4, "Surname": "Kilm"},
{"enrolNo": 5, "Surname": "Henry"},
{"enrolNo": 6, "Surname": "Joseph"},
])
# set Surname as index so we can access enrolNo by it
df = df.set_index('Surname')
# now you can access enrolNo by Surname
assert df.loc['Kilm']['enrolNo'] == 4
# define the list to be sorted
late = ['Kilm', 'Henry', 'Smith']
# Sort late by enrolNo as listed in the dataframe
late_sorted = sorted(late, key=lambda n: df.loc[n]['enrolNo'])
# ['Smith', 'Kilm', 'Henry']

extracting values by keywords in a pandas column

I have a column that is a list of dictionary. I extracted only the values by the name key, and saved it to a list. Since I need to run the column to a tfidVectorizer, I need the columns to be a string of words. My code is as follows.
def transform(s,to_extract):
return [object[to_extract] for object in json.loads(s)]
cols = ['genres','keywords']
for col in cols:
lst = df[col]
df[col] = list(map(lambda x : transform(x,to_extract='name'), lst))
df[col] = [', '.join(x) for x in df[col]]
for testing, here's 2 rows.
data = {'genres': [[{"id": 851, "name": "dual identity"},{"id": 2038, "name": "love of one's life"}],
[{"id": 5983, "name": "pizza boy"},{"id": 8828, "name": "marvel comic"}]],
'keywords': [[{"id": 9663, "name": "sequel"},{"id": 9715, "name": "superhero"}],
[{"id": 14991, "name": "tentacle"},{"id": 34079, "name": "death", "id": 163074, "name": "super villain"}]]
}
df = pd.DataFrame(data)
I'm able to extract the necessary data and save it accordingly. However, I find the codes too verbose, and I would like to know if there's a more pythonic way to achieve the same outcome?
Desired output of one row should be a string, delimited only by a comma. Ex, 'Dual Identity,love of one's life'.
Is this what you need ?
df.applymap(lambda x : pd.DataFrame(x).name.tolist())
Out[278]:
genres keywords
0 [dual identity, love of one's life] [sequel, superhero]
1 [pizza boy, marvel comic] [tentacle, super villain]
Update
df.applymap(lambda x : pd.DataFrame(x).name.str.cat(sep=','))
Out[280]:
genres keywords
0 dual identity,love of one's life sequel,superhero
1 pizza boy,marvel comic tentacle,super villain

Categories