How to plot pandas pivot with different scales? - python

my dataframe of pivot is looking like this.
df=
DATA
Type P_A P_B
Time
11:38:56 500706.0 981098.0
11:39:46 501704.0 984751.0
11:40:26 501704.0 984737.0
11:43:18 502758.0 987173.0
I want to plot this dataframe. df.plot() is works but since values are very much different on scale so ploting needs to be on different axis . How to do that?

plot has an option secondary_y:
import pandas as pd
df = pd.DataFrame({'Time':['11:38:56', '11:39:46', '11:40:26', '11:43:18'],
'P_A': [500706., 501704., 501704., 502758.],
'P_B': [981098., 984751., 984737., 987173.]})
df.plot(x='Time', y=['P_A','P_B'], secondary_y=['P_B'])

Try using secondary axis-argument to pandas plot as suggested here.
df['P_A'].plot()
df['P_B'].plot(secondary_y=True)

Related

Producing a heatmap from a pandas dataframe with rows of the form (x,y,z), where z is intended to be the heat value

Let's say I have a dataframe with this structure
and I intend to transform it into something like this (done laboriously and quite manually):
Is there a simple call to (say) a seaborn or plotly function that would do this? Something like
heatmap(df, x='Dose', y='Distance', z='Passrate')
or perhaps a simple way of restructuring the dataframe to facilitate using sns.heatmap or plotly's imshow, or similar? It seems strange to me that I cannot find a straightforward way of putting data formatted in this way into a high-level plotting function.
Use df.pivot_table to get your data in the correct shape first.
Setup: create some random data
import pandas as pd
import numpy as np
import seaborn as sns
p_rate = np.arange(0,100)/np.arange(0,100).sum()
data = {'Dose': np.repeat(np.arange(0,3.5,0.5), 10),
'Distance': np.tile(np.arange(0,3.5,0.5), 10),
'Passrate': np.random.choice(np.arange(0,100), size=70,
p=p_rate)}
df = pd.DataFrame(data)
Code: pivot and apply sns.heatmap
df_pivot = df.pivot_table(index='Distance',
columns='Dose',
values='Passrate',
aggfunc='mean').sort_index(ascending=False)
sns.heatmap(df_pivot, annot=True, cmap='coolwarm')
Result:

Plot data frame fast and with correct date format

I have the data as in the screenshot, it is in a dataframe format, I would like to plot the dataframe fast and with correct date format.
The code as follow is much fast than using e.g plt.plot(df["Date"], df["D30"])
df.plot(marker='.', linestyle='none')
So that I would like to keep using the dataframe.plot() functionality directly because it is much faster than plot each column against the "Date" column separately. However, as shown in the graph, the date is not correct. My actually starting Date is 2006-01-10, but in the figure, it is shown from 70-01 (1970-01-01).
For me, the official documentation of matplotlib DateFormatter is quite confusing and not so helpful. I tried to google a easy and clear solution, but most answers are related to plt.plot(x, y) where x is Date and y is the actual value. After that it is easy to adjust the format of the "Date" in the figure. But it will make my plot super slow since I am plotting 11 columns in total.
Any idea how I can plot data frame fast and with correct date format
import os
import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt
date_format = mdates.DateFormatter('%y%m')
df_file = r"C:\Codes\df_file.csv"
df = pd.read_csv(path_file)
print(len(df), df.info(), df["Date"][0], type(df["Date"][0]))
df.head(2)
fig = plt.figure(figsize=(12.0, 8.0))
df.plot(marker='.', linestyle='none')
plt.title("data_frame_show date", fontsize=16)
plt.gca().xaxis.set_major_formatter(dtFmt)
plt.legend(loc=(1.04, 0))
plt.show()
partial input:
Date,D10,D30,D60,D91,D122,D152,D182,D273,D365,D547,D730
2006-01-10,,0.1373444,0.1544265,0.1541397,0.1429375,0.1421464,0.1426055,0.1460771,0.1486266,0.1551848,0.1593932
2006-01-11,,0.135426,0.1411246,0.141093,0.1384091,0.1383636,0.1395791,0.1438944,0.1469191,0.1553112,0.1598582
2006-01-12,,0.1311339,0.1292621,0.1304292,0.1363482,0.1362213,0.1367843,0.1404174,0.1439877,0.152306,0.1568677
2006-01-13,,0.1594458,0.1355387,0.1367246,0.1434708,0.143745,0.1441349,0.1453056,0.1481918,0.157193,0.1607564
2006-01-16,,0.1374846,0.1182223,0.1272385,0.1415359,0.1418881,0.1430098,0.1468544,0.1496407,0.1547714,0.158936
2006-01-17,,0.1453834,0.1418838,0.143198,0.1437924,0.143473,0.1440987,0.1473208,0.1501543,0.1590842,0.1629096
2006-01-18,,0.1385479,0.141472,0.1481763,0.1515037,0.1511353,0.1511544,0.1535245,0.1554254,0.1626349,0.1663554
2006-01-19,,0.1639788,0.1462084,0.1483903,0.1486906,0.1483109,0.1492335,0.1539002,0.1563708,0.1611751,0.1644693
2006-01-20,,0.189771,0.178394,0.1638331,0.1565402,0.1559029,0.1553547,0.1526479,0.1516396,0.1614136,0.1646431
2006-01-23,,0.1420271,0.1570005,0.1614942,0.1607205,0.1605297,0.1630065,0.1653838,0.1642349,0.166809,0.1701779
2006-01-24,,0.1814291,0.1633585,0.1563364,0.1548823,0.15382,0.1545099,0.1590869,0.1609158,0.1653819,0.1681759
2006-01-25,,0.1272998,0.1445222,0.1487031,0.1522032,0.152714,0.1524364,0.1532192,0.1550062,0.1635665,0.1658293
2006-01-26,,0.1392162,0.1413034,0.1443807,0.1476261,0.1482458,0.1473548,0.1471019,0.1493254,0.1578586,0.160699
2006-01-27,,0.1360269,0.1374056,0.1387952,0.1426731,0.1441445,0.144917,0.1462428,0.1478979,0.1519537,0.1550311
2006-01-30,,0.1439245,0.1430108,0.1434628,0.1448731,0.1450397,0.1454756,0.1467621,0.1487521,0.1538424,0.1561802
2006-01-31,,0.1483135,0.1468713,0.1473837,0.1519043,0.1519379,0.1502139,0.1504632,0.1529254,0.1571567,0.1589795
2006-02-01,,0.1464208,0.1447363,0.1443483,0.1459808,0.1477726,0.1505124,0.1520256,0.1535773,0.1589145,0.1607383
2006-02-02,,0.1484249,0.1414394,0.1412338,0.1497531,0.1500731,0.1475751,0.147502,0.1512457,0.1571017,0.1606797
2006-02-03,,0.1496503,0.1485318,0.1502473,0.1565336,0.156727,0.1556335,0.1560396,0.1579241,0.1619183,0.1634751
2006-02-06,,0.149966,0.1457216,0.1475524,0.1539103,0.1546401,0.154973,0.1553681,0.1570598,0.161173,0.1630743
2006-02-08,,0.1463649,0.1436135,0.1454147,0.1498372,0.1507231,0.1520234,0.1538407,0.1563603,0.1617697,0.1639547
2006-02-09,,0.1401312,0.1432856,0.1437166,0.1443243,0.1463163,0.148681,0.1496198,0.1516376,0.1584639,0.1615756
2006-02-10,,0.1339916,0.1405194,0.1432779,0.1464605,0.1470921,0.1484831,0.1514307,0.1550715,0.1599564,0.1623171
2006-02-13,,0.1470304,0.1423007,0.1446087,0.1470668,0.1485171,0.1503383,0.1508497,0.1532987,0.1591155,0.1615874
2006-02-14,,0.1454322,0.1449017,0.1455735,0.1462286,0.1478059,0.1501469,0.1522522,0.1541999,0.157668,0.1601427
2006-02-15,,0.1429312,0.1455881,0.1464055,0.1471812,0.1489883,0.1514654,0.153837,0.1559375,0.16082,0.1631557
2006-02-16,,0.134637,0.1373471,0.140634,0.1432172,0.145788,0.14875,0.1507805,0.15325,0.1581015,0.1613797
2006-02-20,,0.1303785,0.1334454,0.139216,0.1423217,0.1454704,0.1477552,0.1487534,0.1509405,0.1554398,0.1588761
2006-02-21,,0.1359587,0.1370814,0.1416117,0.1418016,0.1441761,0.1468109,0.1476679,0.1496546,0.1561362,0.1607204
2006-02-22,,0.1302253,0.1337104,0.1415016,0.141451,0.1438881,0.1467031,0.1502449,0.1514018,0.1531452,0.1582335
2006-02-23,,0.1282022,0.1333902,0.1342376,0.1385976,0.1453201,0.1481733,0.1490296,0.1512885,0.1554035,0.1593463
2006-02-24,,0.1269229,0.1304391,0.1348061,0.1378378,0.1419301,0.1442134,0.1472283,0.1507224,0.1555662,0.1595938
2006-02-27,,0.1254707,0.128201,0.1334554,0.1374389,0.1427246,0.1446071,0.1465459,0.1496113,0.1541296,0.1578174
2006-02-28,,0.1346332,0.1361773,0.139586,0.1421924,0.1468084,0.1489651,0.1505661,0.1541479,0.1606205,0.1675438
2006-03-01,,0.1301198,0.1318495,0.1343342,0.1376886,0.1434328,0.1459977,0.1490832,0.1525961,0.1557153,0.1593923
2006-03-02,,0.1304425,0.1347556,0.1398592,0.1420431,0.1457691,0.1479747,0.1510143,0.1544964,0.1589201,0.1616325
2006-03-03,,0.1311674,0.1339681,0.138887,0.1418598,0.1451706,0.1472144,0.1495689,0.1536886,0.1599843,0.162247
2006-03-06,,0.1308081,0.1367775,0.1412145,0.1436582,0.1480171,0.1495588,0.1511633,0.1545973,0.1588486,0.1616268
2006-03-07,,0.1344355,0.1387528,0.143365,0.1459607,0.1482421,0.1491656,0.1512236,0.1550063,0.1593201,0.1615385
When plotting time series, pandas takes the index for the x-axis when calling the plot function.
I would suggest to:
df = df.assign(
Date=lambda x: pd.to_datetime(x["Date"], format="%Y-%m%d")
).set_index("Date")

Plotting top 10 Values in Big Data

I need help plotting some categorical and numerical Values in python. the code is given below:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv('train_feature_store.csv')
df.info
df.head
df.columns
plt.figure(figsize=(20,6))
sns.countplot(x='Store', data=df)
plt.show()
Size = df[['Size','Store']].groupby(['Store'], as_index=False).sum()
Size.sort_values(by=['Size'],ascending=False).head(10)
However, the data size is so huge (Big data) that I'm not even able to make meaningful plotting in python. Basically, I just want to take the top 5 or top 10 values in python and make a plot of that as given below:-
In an attempt to plot the thing, I'm trying to put the below code into a dataframe and plot it, but not able to do so. Can anyone help me out in this:-
Size = df[['Size','Store']].groupby(['Store'], as_index=False).sum()
Size.sort_values(by=['Size'],ascending=False).head(10)
Below, is a link to the sample dataset. However, the dataset is a representation, in the original one where I'm trying to do the EDA, which has around 3 thousand unique stores and 60 thousand rows of data. PLEASE HELP! Thanks!
https://drive.google.com/drive/folders/1PdXaKXKiQXX0wrHYT3ZABjfT3QLIYzQ0?usp=sharing
You were pretty close.
import pandas as pd
import seaborn as sns
df = pd.read_csv('train_feature_store.csv')
sns.set(rc={'figure.figsize':(16,9)})
g = df.groupby('Store', as_index=False)['Size'].sum().sort_values(by='Size', ascending=False).head(10)
sns.barplot(data=g, x='Store', y='Size', hue='Store', dodge=False).set(xticklabels=[]);
First of all.. looking at the data ..looks like it holds data from scotland to Kolkata ..
categorize the data by geography first & then visualize.
Regards
Maitryee

Create X Axis from row in pandas dataframe

I'm trying to plot the x-axis from the top row of my dataframe, and the y-axis from another row in my dataframe.
My dataframe looks like this:
sector_data =
Time 13:00 13:15 13:30 13:45
Utilities 1235654 1456267 1354894 1423124
Transports 506245 554862 534685 524962
Telecomms 142653 153264 162357 154698
I've tried a lot of different things, with this seeming to make the most sense. But nothing works:
sector_data.plot(kind='line',x='Time',y='Utilities')
plt.show()
I keep getting:
KeyError: 'Time'
It should end up looking like this:
Expected Chart
enter image description here
Given the little information you provide I believe this should help:
df = sector_data.T
df.plot(kind='line',x='Time',y='Utilities')
plt.show()
This is how I made a case example (I have already transposed the dataframe)
import pandas as pd
import matplotlib.pyplot as plt
a = {'Time':['13:00','13:15','13:30','13:45'],'Utilities':[1235654,1456267,1354894,1423124],'Transports':[506245,554862,534685,524962],'Telecomms':[142653,153264,162357,154698]}
df = pd.DataFrame(a)
df.plot(kind='line',x='Time',y='Utilities')
plt.show()
Output:
Let's take an example DataFrame:
import pandas as pd
df = pd.DataFrame({'ColA':['Time','Utilities','Transports','Telecomms'],'ColB':['13:00', 1235654, 506245, 142653],'ColC':['14:00', 1234654, 506145, 142650], 'ColD':['15:00', 4235654, 906245, 142053],'ColE':['16:00', 4205654, 906845, 742053]})
df = df.set_index('ColA') #set index for the column A or the values you want to plot for
Now you can easily plot with matplotlib
plt.plot(df.loc['Time'].values,df.loc['Utilities'].values)

Inconsistent automatic pandas date labeling

I was wondering how pandas formats the x-axis date exactly. I am using the same script on a bunch of data results, which all have the same pandas df format. However, pandas formats each df date differently. How could this be more consistently?
Each df has a DatetimeIndex like this, dtype='datetime64[ns]
>>> df.index
DatetimeIndex(['2014-10-02', '2014-10-03', '2014-10-04', '2014-10-05',
'2014-10-06', '2014-10-07', '2014-10-08', '2014-10-09',
'2014-10-10', '2014-10-11',
...
'2015-09-23', '2015-09-24', '2015-09-25', '2015-09-26',
'2015-09-27', '2015-09-28', '2015-09-29', '2015-09-30',
'2015-10-01', '2015-10-02'],
dtype='datetime64[ns]', name='Date', length=366, freq=None)
Eventually, I plot with df.plot() where the df has two columns.
But the axes of the plots have different styles, like this:
I would like all plots to have the x-axis style of the first plot. pandas should do this automatically, so I'd rather not prefer to begin with xticks formatting, since I have quite a lot of data to plot. Could anyone explain what to do? Thanks!
EDIT:
I'm reading two csv-files from 2015. The first has the model results of about 200 stations, the second has the gauge measurements of the same stations. Later, I read another two csv-files from 2016 with the same format.
import pandas as pd
df_model = pd.read_csv(path_model, sep=';', index_col=0, parse_dates=True)
df_gauge = pd.read_csv(path_gauge, sep=';', index_col=0, parse_dates=True)
df = pd.DataFrame(columns=['model', 'gauge'], index=df_model.index)
df['model'] = df_model['station_1'].copy()
df['gauge'] = df_gauge['station_1'].copy()
df.plot()
I do this for each year, so the x-axis should look the same, right?
I do not think this possible unless you make modifications to the pandas library. I looked around a bit for options that one may set in Pandas, but couldn't find one. Pandas tries to intelligently select the type of axis ticks using logic implemented here (I THINK). So in my opinion, it would be best to define your own function to make the plots and than overwrite the tick formatting (although you do not want to do that).
There are many references around the internet which show how to do this. I used this one by "Simone Centellegher" and this stackoverflow answer to come up with a function that may work for you (tested in python 3.7.1 with matplotlib 3.0.2, pandas 0.23.4):
import pandas as pd
import numpy as np
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
## pass df with columns you want to plot
def my_plotter(df, xaxis, y_cols):
fig, ax = plt.subplots()
plt.plot(xaxis,df[y_cols])
ax.xaxis.set_minor_locator(mdates.MonthLocator())
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_minor_formatter(mdates.DateFormatter('%b'))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y'))
# Remove overlapping major and minor ticks
majticklocs = ax.xaxis.get_majorticklocs()
minticklocs = ax.xaxis.get_minorticklocs()
minticks = ax.xaxis.get_minor_ticks()
for i in range(len(minticks)):
cur_mintickloc = minticklocs[i]
if cur_mintickloc in majticklocs:
minticks[i].set_visible(False)
return fig, ax
df = pd.DataFrame({'values':np.random.randint(0,1000,36)}, \
index=pd.date_range(start='2014-01-01', \
end='2016-12-31',freq='M'))
fig, ax = my_plotter(df, df.index, ["values"])

Categories