This question already has answers here:
Apply function to each row of pandas dataframe to create two new columns
(5 answers)
How to add multiple columns to pandas dataframe in one assignment?
(13 answers)
Closed 3 years ago.
I am trying to create multiple new dataframe columns using a function. When I run the simple code below, however, I get the error, "KeyError: "['AdjTime1' 'AdjTime2'] not in index."
How can I correct this to add the two new columns ('AdjTime1' & 'AdjTime2') to my dataframe?
Thanks!
import pandas as pd
df = pd.DataFrame({'Runner':['Wade','Brian','Jason'],'Time':[80,75,98]})
def adj_speed(row):
adjusted_speed1 = row['Time']*1.5
adjusted_speed2 = row['Time']*2.0
return adjusted_speed1, adjusted_speed2
df[['AdjTime1','AdjTime2']] = df.apply(adj_speed,axis=1)
Just do something like (assuming you have a list values you want to multiply Time on):
l=[1.5,2.0]
for e,i in enumerate(l):
df['AdjTime'+str(e+1)]=df.Time*i
print(df)
Runner Time AdjTime1 AdjTime2
0 Wade 80 120.0 160.0
1 Brian 75 112.5 150.0
2 Jason 98 147.0 196.0
Related
This question already has answers here:
Split / Explode a column of dictionaries into separate columns with pandas
(13 answers)
Closed 9 months ago.
So here's my simple example (the json field in my actual dataset is very nested so I'm unpacking things one level at a time). I need to keep certain columns on the dataset post json_normalize().
https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html
Start:
Expected (Excel mockup):
Actual:
import json
d = {'report_id': [100, 101, 102], 'start_date': ["2021-03-12", "2021-04-22", "2021-05-02"],
'report_json': ['{"name":"John", "age":30, "disease":"A-Pox"}', '{"name":"Mary", "age":22, "disease":"B-Pox"}', '{"name":"Karen", "age":42, "disease":"C-Pox"}']}
df = pd.DataFrame(data=d)
display(df)
df = pd.json_normalize(df['report_json'].apply(json.loads), max_level=0, meta=['report_id', 'start_date'])
display(df)
Looking at the documentation on json_normalize(), I think the meta parameter is what I need to keep the report_id and start_date but it doesn't seem to be working as the expected fields to keep are not appearing on the final dataset.
Does anyone have advice? Thank you.
as you're dealing with a pretty simple json along a structured index you can just normalize your frame then make use of .join to join along your axis.
from ast import literal_eval
df.join(
pd.json_normalize(df['report_json'].map(literal_eval))
).drop('report_json',axis=1)
report_id start_date name age disease
0 100 2021-03-12 John 30 A-Pox
1 101 2021-04-22 Mary 22 B-Pox
2 102 2021-05-02 Karen 42 C-Pox
This question already has answers here:
How to filter a pandas dataframe based on the length of a entry
(2 answers)
Closed 1 year ago.
I have a pandas DataFrame like this:
id subjects
1 [math, history]
2 [English, Dutch, Physics]
3 [Music]
How to filter this dataframe based on the length of the column subjects?
So for example, if I only want to have rows where len(subjects) >= 2?
I tried using
df[len(df["subjects"]) >= 2]
But this gives
KeyError: True
Also, using loc does not help, that gives me the same error.
Thanks in advance!
Use the string accessor to work with lists:
df[df['subjects'].str.len() >= 2]
Output:
id subjects
0 1 [math, history]
1 2 [English, Dutch, Physics]
This question already has answers here:
Preparing an aggregate dataframe for publication
(2 answers)
Closed 2 years ago.
I would like to groupby this dataframe with unique values for priority and Alias column to create a latex report:
Alias Number Duration(h) priority
A 23834 8111.130497 120
B 16453 6773.243598 120
C 15988 8347.042753 120
A 19 113.475702 139
B 16 113.476042 139
So I tried:
df = df.groupby(['priority', 'Alias'])
df
The terminal return:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000002377285CA00>
The expected result:
priority Alias Number Duration(h)
120 A 23834 8111.130497
B 16453 6773.243598
C 15988 8347.042753
139 A 19 113.475702
B 16 113.476042
I don't understand why the terminal return this... Thanks for your time !
Your data are already grouped by priority and alias because every combination of values for this 2 columns is unique at your dataset. It's just a matter of visualize it better and i think the set_index() above recommended is the correct answer.
You can also bring priority column in front of alias.
This question already has answers here:
How to access pandas groupby dataframe by key
(6 answers)
Closed 8 years ago.
I want to group a dataframe by a column, called 'A', and inspect a particular group.
grouped = df.groupby('A', sort=False)
However, I don't know how to access a group, for example, I expect that
grouped.first()
would give me the first group
Or
grouped['foo']
would give me the group where A=='foo'.
However, Pandas doesn't work like that.
I couldn't find a similar example online.
Try: grouped.get_group('foo'), that is what you need.
from io import StringIO # from StringIO... if python 2.X
import pandas
data = pandas.read_csv(StringIO("""\
area,core,stratum,conc,qual
A,1,a,8.40,=
A,1,b,3.65,=
A,2,a,10.00,=
A,2,b,4.00,ND
A,3,a,6.64,=
A,3,b,4.96,=
"""), index_col=[0,1,2])
groups = data.groupby(level=['area', 'stratum'])
groups.get_group(('A', 'a')) # make sure it's a tuple
conc qual
area core stratum
A 1 a 8.40 =
2 a 10.00 =
3 a 6.64 =
This question already has answers here:
How can I display full (non-truncated) dataframe information in HTML when converting from Pandas dataframe to HTML?
(10 answers)
How do I expand the output display to see more columns of a Pandas DataFrame?
(22 answers)
Closed 7 months ago.
When I create the following Pandas Series:
pandas.Series(['a', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaaaaaaaaaaaaa', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa']
I get this as a result:
0 a
1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa...
2 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa...
3 aaaaaaaaaaaaaaaa
4 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa...
How can I instead get a Series without the ellipsis that looks like this:
0 a
1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
2 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
3 aaaaaaaaaaaaaaaa
4 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
pandas is truncating the output, you can change this:
In [4]:
data = pd.Series(['a', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaaaaaaaaaaaaa', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'])
pd.set_option('display.max_colwidth',1000)
data
Out[4]:
0 a
1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
2 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
3 aaaaaaaaaaaaaaaa
4 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
dtype: object
also see related: Output data from all columns in a dataframe in pandas and Output data from all columns in a dataframe in pandas
By the way if you are using IPython then if you do a docstring lookup (by pressing tab) then you will see the current values and the default values (the default is 50 characters).
For Pandas versions older than 0.10 use
pd.set_printoptions(max_colwidth, 1000)
See related: Python pandas, how to widen output display to see more columns?