This question already has answers here:
How do I create variable variables?
(17 answers)
Closed 4 months ago.
I have a df in python with different cities.
I am trying to create a df for each city.
So wrote this code in python and it works. It does what I need. But i was wondering if there is any over way to create the name of each df in a different way rather than using
globals()\["df\_"+str(ciudad)\] = new_grouped_by
If I try this:
"df\_"+str(ciudad) = new_grouped_by
Give me this error: SyntaxError: can't assign to operator
Any tips/suggestions would be more than welcome!
def get_city():
for ciudad in df["Ciudad"].unique():
#print (ciudad)
grouped_by = df.groupby('Ciudad')
new_grouped_by=[grouped_by.get_group(ciudad) for i in grouped_by.groups]
globals()["df_"+str(ciudad)] = new_grouped_by
get_city()
A simple way would be to store the dataframes in a dictionary with the city names as keys:
import pandas as pd
data = zip(['Amsterdam', 'Amsterdam', 'Barcelona'],[1,22,333])
df = pd.DataFrame(data, columns=['Ciudad', 'data'])
new_dfs = dict(list(df.groupby('Ciudad')))
Calling new_dfs['Amsterdam'] will then give you the dataframe:
Ciudad
data
0
Amsterdam
1
1
Amsterdam
22
Related
This question already has answers here:
Split / Explode a column of dictionaries into separate columns with pandas
(13 answers)
Closed 9 months ago.
So here's my simple example (the json field in my actual dataset is very nested so I'm unpacking things one level at a time). I need to keep certain columns on the dataset post json_normalize().
https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html
Start:
Expected (Excel mockup):
Actual:
import json
d = {'report_id': [100, 101, 102], 'start_date': ["2021-03-12", "2021-04-22", "2021-05-02"],
'report_json': ['{"name":"John", "age":30, "disease":"A-Pox"}', '{"name":"Mary", "age":22, "disease":"B-Pox"}', '{"name":"Karen", "age":42, "disease":"C-Pox"}']}
df = pd.DataFrame(data=d)
display(df)
df = pd.json_normalize(df['report_json'].apply(json.loads), max_level=0, meta=['report_id', 'start_date'])
display(df)
Looking at the documentation on json_normalize(), I think the meta parameter is what I need to keep the report_id and start_date but it doesn't seem to be working as the expected fields to keep are not appearing on the final dataset.
Does anyone have advice? Thank you.
as you're dealing with a pretty simple json along a structured index you can just normalize your frame then make use of .join to join along your axis.
from ast import literal_eval
df.join(
pd.json_normalize(df['report_json'].map(literal_eval))
).drop('report_json',axis=1)
report_id start_date name age disease
0 100 2021-03-12 John 30 A-Pox
1 101 2021-04-22 Mary 22 B-Pox
2 102 2021-05-02 Karen 42 C-Pox
This question already has answers here:
Splitting dataframe into multiple dataframes
(13 answers)
Closed 1 year ago.
I have a dataset (see here) in which data are available for multiple countries in a period of time that its starting year is unknown (the starting point for each country is different), but we know that last year is 2016. I need to split this dataset into multiple datasets based on the "year" column in a way that gives me a dataset for each year with data for all countries.
I have tried this:
efyear = dict(tuple(eef.groupby('year')))
y = 2016
for y in eef['year']:
try:
exec(f'ef{y} = efyear{y}')
y -= 1
except:
print('Not Available')
but it doesn't work and ends up with 'Not Available' printed many times. I need to produce different names for each dataset or the variable that hold that dataset that was why I used formatting.
Thank you in advance.
You can see the dataset here.
Try:
out = {}
for year, g in df.groupby("year"):
out["ef{}".format(year)] = g
print(out)
This will create a dictionary with keys ef2013, ef2014 etc. and values are dataframes for the year.
I found my answer :))
efyear = dict(tuple(eef.groupby('year')))
y = 2016
for y in eef['year']:
exec(f'ef{y} = efyear[{y}]')
y -= 1
:))
This question already has answers here:
Apply function to each row of pandas dataframe to create two new columns
(5 answers)
How to add multiple columns to pandas dataframe in one assignment?
(13 answers)
Closed 3 years ago.
I am trying to create multiple new dataframe columns using a function. When I run the simple code below, however, I get the error, "KeyError: "['AdjTime1' 'AdjTime2'] not in index."
How can I correct this to add the two new columns ('AdjTime1' & 'AdjTime2') to my dataframe?
Thanks!
import pandas as pd
df = pd.DataFrame({'Runner':['Wade','Brian','Jason'],'Time':[80,75,98]})
def adj_speed(row):
adjusted_speed1 = row['Time']*1.5
adjusted_speed2 = row['Time']*2.0
return adjusted_speed1, adjusted_speed2
df[['AdjTime1','AdjTime2']] = df.apply(adj_speed,axis=1)
Just do something like (assuming you have a list values you want to multiply Time on):
l=[1.5,2.0]
for e,i in enumerate(l):
df['AdjTime'+str(e+1)]=df.Time*i
print(df)
Runner Time AdjTime1 AdjTime2
0 Wade 80 120.0 160.0
1 Brian 75 112.5 150.0
2 Jason 98 147.0 196.0
This question already has answers here:
How to reverse the order of first and last name in a Pandas Series
(4 answers)
Closed 4 years ago.
I need to swap the list of names which is in the format of FirstName and LastName which in a dataframe one column, using python.
Below is the sample format:
~Adam Smith
The above need to change into
~Smith Adam
Is there any single line function available in python?
Could anyone help on this!!
Using apply
import pandas as pd
df = pd.DataFrame({"names": ["Adam Smith", "Greg Rogers"]})
df["names"] = df["names"].apply(lambda x: " ".join(reversed(x.split())))
print(df)
Output:
names
0 Smith Adam
1 Rogers Greg
This question already has answers here:
How to access pandas groupby dataframe by key
(6 answers)
Closed 8 years ago.
I want to group a dataframe by a column, called 'A', and inspect a particular group.
grouped = df.groupby('A', sort=False)
However, I don't know how to access a group, for example, I expect that
grouped.first()
would give me the first group
Or
grouped['foo']
would give me the group where A=='foo'.
However, Pandas doesn't work like that.
I couldn't find a similar example online.
Try: grouped.get_group('foo'), that is what you need.
from io import StringIO # from StringIO... if python 2.X
import pandas
data = pandas.read_csv(StringIO("""\
area,core,stratum,conc,qual
A,1,a,8.40,=
A,1,b,3.65,=
A,2,a,10.00,=
A,2,b,4.00,ND
A,3,a,6.64,=
A,3,b,4.96,=
"""), index_col=[0,1,2])
groups = data.groupby(level=['area', 'stratum'])
groups.get_group(('A', 'a')) # make sure it's a tuple
conc qual
area core stratum
A 1 a 8.40 =
2 a 10.00 =
3 a 6.64 =