I have a df which compares the new and old data. Is there a way to plot every 2 columns, with the x axis as the date? Or plot all the columns with the same rootname against the date. So there should be 1 line graph per fruit.
df
date apple_old apple_new banana_old banana_new
0 2015-01-01 5 6 4 2
...
I tried:
for col in df.columns:
if col .endswith("_old") and col .endswith("_new"):
x = x.plot(kind="line", x = date, y =(f"{col}_old", f"{col}_new"))
Use:
df1 = df.set_index('date')
df1.columns = df1.columns.str.split('_', expand=True)
for lev in df1.columns.levels[0]:
print (df1[lev].plot())
Try this this set comprehension:
l = list({i.split('_')[0] for i in df.columns[1:]})
for col in l:
x = x.plot(kind="line", x = date, y =(f"{col}_old", f"{col}_new"))
Related
I would like to loop into some variable name and the equivalent column with an added suffix "_plus"
#original dataset
raw_data = {'time': [2,1,4,2],
'zone': [5,1,3,0],
'time_plus': [5,6,2,3],
'zone_plus': [0,9,6,5]}
df = pd.DataFrame(raw_data, columns = ['time','zone','time_plus','zone_plus'])
df
#desired dataset
df['time']=df['time']*df['time_plus']
df['zone']=df['zone']*df['zone_plus']
df
I would like to do the multiplication in a more elegant way, through a loop, since I have many variables with this pattern: original name * transformed variable with the _plus suffix
something similar to this or better
my_list=['time','zone']
for i in my_list:
df[i]=df[i]*df[i+"_plus"]
Try:
for c in df.filter(regex=r".*(?<!_plus)$", axis=1):
df[c] *= df[c + "_plus"]
print(df)
Prints:
time zone time_plus zone_plus
0 10 0 5 0
1 6 9 6 9
2 8 18 2 6
3 6 0 3 5
Or:
for c in df.columns:
if not c.endswith("_plus"):
df[c] *= df[c + "_plus"]
raw_data = {'time': [2,1,4,2],
'zone': [5,1,3,0],
'time_plus': [5,6,2,3],
'zone_plus': [0,9,6,5]}
df = pd.DataFrame(raw_data, columns = ['time','zone','time_plus','zone_plus'])
# Take every column that doesn't have a "_plus" suffix
cols = [i for i in list(df.columns) if "_plus" not in i]
# Calculate new columns
for col in cols:
df[str(col+"_2")] = df[col]*df[str(col+"_plus")]
I decided to create the new columns with a "_2" suffix, this way we don't mess up the original data.
for c in df.columns:
if f"{c}_plus" in df.columns:
df[c] *= df[f"{c}_plus"]
import pandas as pd
import numpy as np
column_names = [str(x) for x in range(1,4)]
df= pd.DataFrame ( columns = column_names )
new_row = []
for i in range(3):
new_row.append(i)
df = df.append(new_row , ignore_index = True)
print(df)
output:
1 2 3 0
0 NaN NaN NaN 0.0
1 NaN NaN NaN 1.0
2 NaN NaN NaN 2.0
Is there a way to apply the loop to column 1, column 2, and column 3?
I think it's possible with a simple code, isn't it?
I've been thinking a lot, but I don't know how.
I also tried the .loc() method, but I couldn't apply the loop to the row of columns.
This is a supplementary explanation.
'column_names = [str(x) for x in range(1,4)]' creates columns 0 to 3.
A loop is applied to each column.
The "for" loop inserts 0 through 2 into column 1.
Therefore, 0, 1, 2 are input to the row of column 1.
The result I want is below.
You can add the following code after all your codes above:
for col in df:
df[col] = new_row
Result:
If you run after all your codes:
column_names = [str(x) for x in range(1,4)]
df= pd.DataFrame ( columns = column_names )
new_row = []
for i in range(3):
new_row.append(i)
df = df.append(new_row , ignore_index = True)
Then run the code here:
for col in df:
df[col] = new_row
You should get:
print(df)
1 2 3 0
0 0 0 0 0
1 1 1 1 1
2 2 2 2 2
I know it's weird but you can use .loc to do that:
df.loc[len(df.index)+1] = new_row
>>> df
1 2 3
1 0 1 2
you can use the name of the column for example:
for col in column_names:
df[col] = new_row
Assign the new row to the next index position in the dataframe using .loc.
import pandas as pd
import numpy as np
column_names = [str(x) for x in range(1,4)]
df= pd.DataFrame(columns=column_names)
new_row = []
for i in range(3):
new_row.append(i)
df.loc[len(df)] = new_row
If you have multiple rows to add in a loop,
len(df)
in the .loc statement will ensure they're always being added to the end.
not 100% sure what you are trying to do - can you rephrase?
import pandas as pd
column_names = [str(x) for x in range(1,4)]
df= pd.DataFrame ( columns = column_names )
new_row = []
for i in range(len(df.columns)):
new_row.append(i)
df = df.append(new_row , ignore_index = True)
for i in df:
df[i] = new_row
print(df)
I have such a list:
l = ['A','B']
And such a dataframe df
Name x y
A 1 2
B 2 1
C 2 2
I now want to get a new dataframe where only the entries for Name and x which are included in l are kept.
new_df should look like this:
Name x
A 1
B 2
I was playing around with isin but did not solve this problem.
Use DataFrame.loc with Series.isin:
new_df = df.loc[df.Name.isin(l), ["Name", "x"]]
This should do it:
# assuming Name is the index
new_df = df[df.index.isin(l)]
# if you only want column x
new_df = df.loc[df.index.isin(l), "x"]
simple as that
l = ['A','B']
def make_empty(row):
print(row)
for idx, value in enumerate(row):
row[idx] = value if value in l else ''
return row
df_new = df[df['Name'].isin(l) | df['x'].isin(l)][['Name','x']]
df_new.apply(lambda row: make_empty(row)
Output:
Name x
0 A
1 B
How can I count with a loop how many 2-up and 2-dn are in a column at the same index date in a panda dataframe?
df1 = pd.DataFrame()
index = ['2020-01-01','2020-01-01','2020-01-01','2020-01-08','2020-01-08','2020-01-08']
df1 = pd.DataFrame(index = index)
bars = ['1-inside','2-up','2-dn','2-up','2-up','1-inside']
df1['Strat'] = bars
df1
Result should be:
2020-01-01 2-up = 1, 2-dn = 1
2020-01-08 2-up = 2, 2-dn = 0
Afterwards I would like to plot the results with matplotlib.
Use SeriesGroupBy.value_counts for count, reshape by Series.unstack and then plot by DataFrame.plot.bar:
need = ['2-up','2-dn']
df1 = df1['Strat'].groupby(level=0).value_counts().unstack(fill_value=0)[need]
print (df1)
Strat 2-up 2-dn
2020-01-01 1 1
2020-01-08 2 0
Or you can filter before counts by Series.isin in boolean indexing:
need = ['2-up','2-dn']
df1 = (df1.loc[df1['Strat'].isin(need), 'Strat']
.groupby(level=0)
.value_counts()
.unstack(fill_value=0))
df1.plot.bar()
I have below dataframe
clm1, clm2, clm3
10, a, clm4=1|clm5=5
11, b, clm4=2
My desired result is
clm1, clm2, clm4, clm5
10, a, 1, 5
11, b, 2, Nan
I have tried below method
rows = list(df.index)
dictlist = []
for index in rows: #loop through each row to convert clm3 to dict
i = df.at[index, "clm3"]
mydict = dict(map(lambda x: x.split('='), [x for x in i.split('|') if '=' in x]))
dictlist.append(mydict)
l=json_normalize(dictlist) #convert dict column to flat dataframe
resultdf = example.join(l).drop('clm3',axis=1)
This is giving me desired result but I am looking for a more efficient way to convert clm3 to dict which does not involve looping through each row.
two steps :
idea is to create a double split and then group by the index and unstack the values as columns
s = (
df["clm3"]
.str.split("|", expand=True)
.stack()
.str.split("=", expand=True)
.reset_index(level=1, drop=True)
)
final = pd.concat([df, s.groupby([s.index, s[0]])[1].sum().unstack()], axis=1).drop(
"clm3", axis=1
)
print(final)
clm1 clm2 clm4 clm5
0 10 a 1 5
1 11 b 2 NaN
Using str.extractall to get your values and unstack to pivot them to a column for each unique value.
And str.get_dummies to get a column for each unique clm.
values = (
df['clm3'].str.extractall('(=\d)')[0]
.str.replace('=', '')
.unstack()
.rename_axis(None, axis=1)
)
columns = df['clm3'].str.replace('=\d', '').str.get_dummies(sep='|').columns
values.columns = columns
dfnew = pd.concat([df[['clm1', 'clm2']], values], axis=1)
clm1 clm2 0 1
0 10 a 1 5
1 11 b 2 NaN