Plot every 2 columns

Plot every 2 columns - python

I have a df which compares the new and old data. Is there a way to plot every 2 columns, with the x axis as the date? Or plot all the columns with the same rootname against the date. So there should be 1 line graph per fruit.
df
date apple_old apple_new banana_old banana_new
0 2015-01-01 5 6 4 2
...
I tried:
for col in df.columns:
if col .endswith("_old") and col .endswith("_new"):
x = x.plot(kind="line", x = date, y =(f"{col}_old", f"{col}_new"))

Use:
df1 = df.set_index('date')
df1.columns = df1.columns.str.split('_', expand=True)
for lev in df1.columns.levels[0]:
print (df1[lev].plot())

Try this this set comprehension:
l = list({i.split('_')[0] for i in df.columns[1:]})
for col in l:
x = x.plot(kind="line", x = date, y =(f"{col}_old", f"{col}_new"))

Related

Pandas loop into variables adding suffix and transforming original column

I would like to loop into some variable name and the equivalent column with an added suffix "_plus"
#original dataset
raw_data = {'time': [2,1,4,2],
'zone': [5,1,3,0],
'time_plus': [5,6,2,3],
'zone_plus': [0,9,6,5]}
df = pd.DataFrame(raw_data, columns = ['time','zone','time_plus','zone_plus'])
df
#desired dataset
df['time']=df['time']*df['time_plus']
df['zone']=df['zone']*df['zone_plus']
df
I would like to do the multiplication in a more elegant way, through a loop, since I have many variables with this pattern: original name * transformed variable with the _plus suffix
something similar to this or better
my_list=['time','zone']
for i in my_list:
df[i]=df[i]*df[i+"_plus"]

Try:
for c in df.filter(regex=r".*(?<!_plus)$", axis=1):
df[c] *= df[c + "_plus"]
print(df)
Prints:
time zone time_plus zone_plus
0 10 0 5 0
1 6 9 6 9
2 8 18 2 6
3 6 0 3 5
Or:
for c in df.columns:
if not c.endswith("_plus"):
df[c] *= df[c + "_plus"]

raw_data = {'time': [2,1,4,2],
'zone': [5,1,3,0],
'time_plus': [5,6,2,3],
'zone_plus': [0,9,6,5]}
df = pd.DataFrame(raw_data, columns = ['time','zone','time_plus','zone_plus'])
# Take every column that doesn't have a "_plus" suffix
cols = [i for i in list(df.columns) if "_plus" not in i]
# Calculate new columns
for col in cols:
df[str(col+"_2")] = df[col]*df[str(col+"_plus")]
I decided to create the new columns with a "_2" suffix, this way we don't mess up the original data.

for c in df.columns:
if f"{c}_plus" in df.columns:
df[c] *= df[f"{c}_plus"]

In python pandas, How to apply loop to create rows for multiple columns?

import pandas as pd
import numpy as np
column_names = [str(x) for x in range(1,4)]
df= pd.DataFrame ( columns = column_names )
new_row = []
for i in range(3):
new_row.append(i)
df = df.append(new_row , ignore_index = True)
print(df)
output:
1 2 3 0
0 NaN NaN NaN 0.0
1 NaN NaN NaN 1.0
2 NaN NaN NaN 2.0
Is there a way to apply the loop to column 1, column 2, and column 3?
I think it's possible with a simple code, isn't it?
I've been thinking a lot, but I don't know how.
I also tried the .loc() method, but I couldn't apply the loop to the row of columns.
This is a supplementary explanation.
'column_names = [str(x) for x in range(1,4)]' creates columns 0 to 3.
A loop is applied to each column.
The "for" loop inserts 0 through 2 into column 1.
Therefore, 0, 1, 2 are input to the row of column 1.
The result I want is below.

You can add the following code after all your codes above:
for col in df:
df[col] = new_row
Result:
If you run after all your codes:
column_names = [str(x) for x in range(1,4)]
df= pd.DataFrame ( columns = column_names )
new_row = []
for i in range(3):
new_row.append(i)
df = df.append(new_row , ignore_index = True)
Then run the code here:
for col in df:
df[col] = new_row
You should get:
print(df)
1 2 3 0
0 0 0 0 0
1 1 1 1 1
2 2 2 2 2

I know it's weird but you can use .loc to do that:
df.loc[len(df.index)+1] = new_row
>>> df
1 2 3
1 0 1 2

you can use the name of the column for example:
for col in column_names:
df[col] = new_row

Assign the new row to the next index position in the dataframe using .loc.
import pandas as pd
import numpy as np
column_names = [str(x) for x in range(1,4)]
df= pd.DataFrame(columns=column_names)
new_row = []
for i in range(3):
new_row.append(i)
df.loc[len(df)] = new_row
If you have multiple rows to add in a loop,
len(df)
in the .loc statement will ensure they're always being added to the end.

not 100% sure what you are trying to do - can you rephrase?
import pandas as pd
column_names = [str(x) for x in range(1,4)]
df= pd.DataFrame ( columns = column_names )
new_row = []
for i in range(len(df.columns)):
new_row.append(i)
df = df.append(new_row , ignore_index = True)
for i in df:
df[i] = new_row
print(df)

How to get new pandas dataframe with certain columns and rows depending on list elements?

I have such a list:
l = ['A','B']
And such a dataframe df
Name x y
A 1 2
B 2 1
C 2 2
I now want to get a new dataframe where only the entries for Name and x which are included in l are kept.
new_df should look like this:
Name x
A 1
B 2
I was playing around with isin but did not solve this problem.

Use DataFrame.loc with Series.isin:
new_df = df.loc[df.Name.isin(l), ["Name", "x"]]

This should do it:
# assuming Name is the index
new_df = df[df.index.isin(l)]
# if you only want column x
new_df = df.loc[df.index.isin(l), "x"]
simple as that

l = ['A','B']
def make_empty(row):
print(row)
for idx, value in enumerate(row):
row[idx] = value if value in l else ''
return row
df_new = df[df['Name'].isin(l) | df['x'].isin(l)][['Name','x']]
df_new.apply(lambda row: make_empty(row)
Output:
Name x
0 A
1 B

How to count values in a panda dataframe with specific index dates

How can I count with a loop how many 2-up and 2-dn are in a column at the same index date in a panda dataframe?
df1 = pd.DataFrame()
index = ['2020-01-01','2020-01-01','2020-01-01','2020-01-08','2020-01-08','2020-01-08']
df1 = pd.DataFrame(index = index)
bars = ['1-inside','2-up','2-dn','2-up','2-up','1-inside']
df1['Strat'] = bars
df1
Result should be:
2020-01-01 2-up = 1, 2-dn = 1
2020-01-08 2-up = 2, 2-dn = 0
Afterwards I would like to plot the results with matplotlib.

Use SeriesGroupBy.value_counts for count, reshape by Series.unstack and then plot by DataFrame.plot.bar:
need = ['2-up','2-dn']
df1 = df1['Strat'].groupby(level=0).value_counts().unstack(fill_value=0)[need]
print (df1)
Strat 2-up 2-dn
2020-01-01 1 1
2020-01-08 2 0
Or you can filter before counts by Series.isin in boolean indexing:
need = ['2-up','2-dn']
df1 = (df1.loc[df1['Strat'].isin(need), 'Strat']
.groupby(level=0)
.value_counts()
.unstack(fill_value=0))
df1.plot.bar()

Split string column based on delimiter and convert it to dict in Pandas without loop

I have below dataframe
clm1, clm2, clm3
10, a, clm4=1|clm5=5
11, b, clm4=2
My desired result is
clm1, clm2, clm4, clm5
10, a, 1, 5
11, b, 2, Nan
I have tried below method
rows = list(df.index)
dictlist = []
for index in rows: #loop through each row to convert clm3 to dict
i = df.at[index, "clm3"]
mydict = dict(map(lambda x: x.split('='), [x for x in i.split('|') if '=' in x]))
dictlist.append(mydict)
l=json_normalize(dictlist) #convert dict column to flat dataframe
resultdf = example.join(l).drop('clm3',axis=1)
This is giving me desired result but I am looking for a more efficient way to convert clm3 to dict which does not involve looping through each row.

two steps :
idea is to create a double split and then group by the index and unstack the values as columns
s = (
df["clm3"]
.str.split("|", expand=True)
.stack()
.str.split("=", expand=True)
.reset_index(level=1, drop=True)
)
final = pd.concat([df, s.groupby([s.index, s[0]])[1].sum().unstack()], axis=1).drop(
"clm3", axis=1
)
print(final)
clm1 clm2 clm4 clm5
0 10 a 1 5
1 11 b 2 NaN

Using str.extractall to get your values and unstack to pivot them to a column for each unique value.
And str.get_dummies to get a column for each unique clm.
values = (
df['clm3'].str.extractall('(=\d)')[0]
.str.replace('=', '')
.unstack()
.rename_axis(None, axis=1)
)
columns = df['clm3'].str.replace('=\d', '').str.get_dummies(sep='|').columns
values.columns = columns
dfnew = pd.concat([df[['clm1', 'clm2']], values], axis=1)
clm1 clm2 0 1
0 10 a 1 5
1 11 b 2 NaN

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plot every 2 columns - python

Use: df1 = df.set_index('date') df1.columns = df1.columns.str.split('_', expand=True) for lev in df1.columns.levels[0]: print (df1[lev].plot())

Try this this set comprehension: l = list({i.split('_')[0] for i in df.columns[1:]}) for col in l: x = x.plot(kind="line", x = date, y =(f"{col}_old", f"{col}_new"))

Related

Pandas loop into variables adding suffix and transforming original column

In python pandas, How to apply loop to create rows for multiple columns?

How to get new pandas dataframe with certain columns and rows depending on list elements?

How to count values in a panda dataframe with specific index dates

Split string column based on delimiter and convert it to dict in Pandas without loop

Categories

Resources