How to plot pandas pivot with different scales? - python
my dataframe of pivot is looking like this.
df=
DATA
Type P_A P_B
Time
11:38:56 500706.0 981098.0
11:39:46 501704.0 984751.0
11:40:26 501704.0 984737.0
11:43:18 502758.0 987173.0
I want to plot this dataframe. df.plot() is works but since values are very much different on scale so ploting needs to be on different axis . How to do that?
plot has an option secondary_y:
import pandas as pd
df = pd.DataFrame({'Time':['11:38:56', '11:39:46', '11:40:26', '11:43:18'],
'P_A': [500706., 501704., 501704., 502758.],
'P_B': [981098., 984751., 984737., 987173.]})
df.plot(x='Time', y=['P_A','P_B'], secondary_y=['P_B'])
Try using secondary axis-argument to pandas plot as suggested here.
df['P_A'].plot()
df['P_B'].plot(secondary_y=True)
Related
Producing a heatmap from a pandas dataframe with rows of the form (x,y,z), where z is intended to be the heat value
Let's say I have a dataframe with this structure and I intend to transform it into something like this (done laboriously and quite manually): Is there a simple call to (say) a seaborn or plotly function that would do this? Something like heatmap(df, x='Dose', y='Distance', z='Passrate') or perhaps a simple way of restructuring the dataframe to facilitate using sns.heatmap or plotly's imshow, or similar? It seems strange to me that I cannot find a straightforward way of putting data formatted in this way into a high-level plotting function.
Use df.pivot_table to get your data in the correct shape first. Setup: create some random data import pandas as pd import numpy as np import seaborn as sns p_rate = np.arange(0,100)/np.arange(0,100).sum() data = {'Dose': np.repeat(np.arange(0,3.5,0.5), 10), 'Distance': np.tile(np.arange(0,3.5,0.5), 10), 'Passrate': np.random.choice(np.arange(0,100), size=70, p=p_rate)} df = pd.DataFrame(data) Code: pivot and apply sns.heatmap df_pivot = df.pivot_table(index='Distance', columns='Dose', values='Passrate', aggfunc='mean').sort_index(ascending=False) sns.heatmap(df_pivot, annot=True, cmap='coolwarm') Result:
Plot data frame fast and with correct date format
I have the data as in the screenshot, it is in a dataframe format, I would like to plot the dataframe fast and with correct date format. The code as follow is much fast than using e.g plt.plot(df["Date"], df["D30"]) df.plot(marker='.', linestyle='none') So that I would like to keep using the dataframe.plot() functionality directly because it is much faster than plot each column against the "Date" column separately. However, as shown in the graph, the date is not correct. My actually starting Date is 2006-01-10, but in the figure, it is shown from 70-01 (1970-01-01). For me, the official documentation of matplotlib DateFormatter is quite confusing and not so helpful. I tried to google a easy and clear solution, but most answers are related to plt.plot(x, y) where x is Date and y is the actual value. After that it is easy to adjust the format of the "Date" in the figure. But it will make my plot super slow since I am plotting 11 columns in total. Any idea how I can plot data frame fast and with correct date format import os import datetime as dt import pandas as pd import matplotlib.pyplot as plt date_format = mdates.DateFormatter('%y%m') df_file = r"C:\Codes\df_file.csv" df = pd.read_csv(path_file) print(len(df), df.info(), df["Date"][0], type(df["Date"][0])) df.head(2) fig = plt.figure(figsize=(12.0, 8.0)) df.plot(marker='.', linestyle='none') plt.title("data_frame_show date", fontsize=16) plt.gca().xaxis.set_major_formatter(dtFmt) plt.legend(loc=(1.04, 0)) plt.show() partial input: Date,D10,D30,D60,D91,D122,D152,D182,D273,D365,D547,D730 2006-01-10,,0.1373444,0.1544265,0.1541397,0.1429375,0.1421464,0.1426055,0.1460771,0.1486266,0.1551848,0.1593932 2006-01-11,,0.135426,0.1411246,0.141093,0.1384091,0.1383636,0.1395791,0.1438944,0.1469191,0.1553112,0.1598582 2006-01-12,,0.1311339,0.1292621,0.1304292,0.1363482,0.1362213,0.1367843,0.1404174,0.1439877,0.152306,0.1568677 2006-01-13,,0.1594458,0.1355387,0.1367246,0.1434708,0.143745,0.1441349,0.1453056,0.1481918,0.157193,0.1607564 2006-01-16,,0.1374846,0.1182223,0.1272385,0.1415359,0.1418881,0.1430098,0.1468544,0.1496407,0.1547714,0.158936 2006-01-17,,0.1453834,0.1418838,0.143198,0.1437924,0.143473,0.1440987,0.1473208,0.1501543,0.1590842,0.1629096 2006-01-18,,0.1385479,0.141472,0.1481763,0.1515037,0.1511353,0.1511544,0.1535245,0.1554254,0.1626349,0.1663554 2006-01-19,,0.1639788,0.1462084,0.1483903,0.1486906,0.1483109,0.1492335,0.1539002,0.1563708,0.1611751,0.1644693 2006-01-20,,0.189771,0.178394,0.1638331,0.1565402,0.1559029,0.1553547,0.1526479,0.1516396,0.1614136,0.1646431 2006-01-23,,0.1420271,0.1570005,0.1614942,0.1607205,0.1605297,0.1630065,0.1653838,0.1642349,0.166809,0.1701779 2006-01-24,,0.1814291,0.1633585,0.1563364,0.1548823,0.15382,0.1545099,0.1590869,0.1609158,0.1653819,0.1681759 2006-01-25,,0.1272998,0.1445222,0.1487031,0.1522032,0.152714,0.1524364,0.1532192,0.1550062,0.1635665,0.1658293 2006-01-26,,0.1392162,0.1413034,0.1443807,0.1476261,0.1482458,0.1473548,0.1471019,0.1493254,0.1578586,0.160699 2006-01-27,,0.1360269,0.1374056,0.1387952,0.1426731,0.1441445,0.144917,0.1462428,0.1478979,0.1519537,0.1550311 2006-01-30,,0.1439245,0.1430108,0.1434628,0.1448731,0.1450397,0.1454756,0.1467621,0.1487521,0.1538424,0.1561802 2006-01-31,,0.1483135,0.1468713,0.1473837,0.1519043,0.1519379,0.1502139,0.1504632,0.1529254,0.1571567,0.1589795 2006-02-01,,0.1464208,0.1447363,0.1443483,0.1459808,0.1477726,0.1505124,0.1520256,0.1535773,0.1589145,0.1607383 2006-02-02,,0.1484249,0.1414394,0.1412338,0.1497531,0.1500731,0.1475751,0.147502,0.1512457,0.1571017,0.1606797 2006-02-03,,0.1496503,0.1485318,0.1502473,0.1565336,0.156727,0.1556335,0.1560396,0.1579241,0.1619183,0.1634751 2006-02-06,,0.149966,0.1457216,0.1475524,0.1539103,0.1546401,0.154973,0.1553681,0.1570598,0.161173,0.1630743 2006-02-08,,0.1463649,0.1436135,0.1454147,0.1498372,0.1507231,0.1520234,0.1538407,0.1563603,0.1617697,0.1639547 2006-02-09,,0.1401312,0.1432856,0.1437166,0.1443243,0.1463163,0.148681,0.1496198,0.1516376,0.1584639,0.1615756 2006-02-10,,0.1339916,0.1405194,0.1432779,0.1464605,0.1470921,0.1484831,0.1514307,0.1550715,0.1599564,0.1623171 2006-02-13,,0.1470304,0.1423007,0.1446087,0.1470668,0.1485171,0.1503383,0.1508497,0.1532987,0.1591155,0.1615874 2006-02-14,,0.1454322,0.1449017,0.1455735,0.1462286,0.1478059,0.1501469,0.1522522,0.1541999,0.157668,0.1601427 2006-02-15,,0.1429312,0.1455881,0.1464055,0.1471812,0.1489883,0.1514654,0.153837,0.1559375,0.16082,0.1631557 2006-02-16,,0.134637,0.1373471,0.140634,0.1432172,0.145788,0.14875,0.1507805,0.15325,0.1581015,0.1613797 2006-02-20,,0.1303785,0.1334454,0.139216,0.1423217,0.1454704,0.1477552,0.1487534,0.1509405,0.1554398,0.1588761 2006-02-21,,0.1359587,0.1370814,0.1416117,0.1418016,0.1441761,0.1468109,0.1476679,0.1496546,0.1561362,0.1607204 2006-02-22,,0.1302253,0.1337104,0.1415016,0.141451,0.1438881,0.1467031,0.1502449,0.1514018,0.1531452,0.1582335 2006-02-23,,0.1282022,0.1333902,0.1342376,0.1385976,0.1453201,0.1481733,0.1490296,0.1512885,0.1554035,0.1593463 2006-02-24,,0.1269229,0.1304391,0.1348061,0.1378378,0.1419301,0.1442134,0.1472283,0.1507224,0.1555662,0.1595938 2006-02-27,,0.1254707,0.128201,0.1334554,0.1374389,0.1427246,0.1446071,0.1465459,0.1496113,0.1541296,0.1578174 2006-02-28,,0.1346332,0.1361773,0.139586,0.1421924,0.1468084,0.1489651,0.1505661,0.1541479,0.1606205,0.1675438 2006-03-01,,0.1301198,0.1318495,0.1343342,0.1376886,0.1434328,0.1459977,0.1490832,0.1525961,0.1557153,0.1593923 2006-03-02,,0.1304425,0.1347556,0.1398592,0.1420431,0.1457691,0.1479747,0.1510143,0.1544964,0.1589201,0.1616325 2006-03-03,,0.1311674,0.1339681,0.138887,0.1418598,0.1451706,0.1472144,0.1495689,0.1536886,0.1599843,0.162247 2006-03-06,,0.1308081,0.1367775,0.1412145,0.1436582,0.1480171,0.1495588,0.1511633,0.1545973,0.1588486,0.1616268 2006-03-07,,0.1344355,0.1387528,0.143365,0.1459607,0.1482421,0.1491656,0.1512236,0.1550063,0.1593201,0.1615385
When plotting time series, pandas takes the index for the x-axis when calling the plot function. I would suggest to: df = df.assign( Date=lambda x: pd.to_datetime(x["Date"], format="%Y-%m%d") ).set_index("Date")
Plotting top 10 Values in Big Data
I need help plotting some categorical and numerical Values in python. the code is given below: import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns df=pd.read_csv('train_feature_store.csv') df.info df.head df.columns plt.figure(figsize=(20,6)) sns.countplot(x='Store', data=df) plt.show() Size = df[['Size','Store']].groupby(['Store'], as_index=False).sum() Size.sort_values(by=['Size'],ascending=False).head(10) However, the data size is so huge (Big data) that I'm not even able to make meaningful plotting in python. Basically, I just want to take the top 5 or top 10 values in python and make a plot of that as given below:- In an attempt to plot the thing, I'm trying to put the below code into a dataframe and plot it, but not able to do so. Can anyone help me out in this:- Size = df[['Size','Store']].groupby(['Store'], as_index=False).sum() Size.sort_values(by=['Size'],ascending=False).head(10) Below, is a link to the sample dataset. However, the dataset is a representation, in the original one where I'm trying to do the EDA, which has around 3 thousand unique stores and 60 thousand rows of data. PLEASE HELP! Thanks! https://drive.google.com/drive/folders/1PdXaKXKiQXX0wrHYT3ZABjfT3QLIYzQ0?usp=sharing
You were pretty close. import pandas as pd import seaborn as sns df = pd.read_csv('train_feature_store.csv') sns.set(rc={'figure.figsize':(16,9)}) g = df.groupby('Store', as_index=False)['Size'].sum().sort_values(by='Size', ascending=False).head(10) sns.barplot(data=g, x='Store', y='Size', hue='Store', dodge=False).set(xticklabels=[]);
First of all.. looking at the data ..looks like it holds data from scotland to Kolkata .. categorize the data by geography first & then visualize. Regards Maitryee
Create X Axis from row in pandas dataframe
I'm trying to plot the x-axis from the top row of my dataframe, and the y-axis from another row in my dataframe. My dataframe looks like this: sector_data = Time 13:00 13:15 13:30 13:45 Utilities 1235654 1456267 1354894 1423124 Transports 506245 554862 534685 524962 Telecomms 142653 153264 162357 154698 I've tried a lot of different things, with this seeming to make the most sense. But nothing works: sector_data.plot(kind='line',x='Time',y='Utilities') plt.show() I keep getting: KeyError: 'Time' It should end up looking like this: Expected Chart enter image description here
Given the little information you provide I believe this should help: df = sector_data.T df.plot(kind='line',x='Time',y='Utilities') plt.show() This is how I made a case example (I have already transposed the dataframe) import pandas as pd import matplotlib.pyplot as plt a = {'Time':['13:00','13:15','13:30','13:45'],'Utilities':[1235654,1456267,1354894,1423124],'Transports':[506245,554862,534685,524962],'Telecomms':[142653,153264,162357,154698]} df = pd.DataFrame(a) df.plot(kind='line',x='Time',y='Utilities') plt.show() Output:
Let's take an example DataFrame: import pandas as pd df = pd.DataFrame({'ColA':['Time','Utilities','Transports','Telecomms'],'ColB':['13:00', 1235654, 506245, 142653],'ColC':['14:00', 1234654, 506145, 142650], 'ColD':['15:00', 4235654, 906245, 142053],'ColE':['16:00', 4205654, 906845, 742053]}) df = df.set_index('ColA') #set index for the column A or the values you want to plot for Now you can easily plot with matplotlib plt.plot(df.loc['Time'].values,df.loc['Utilities'].values)
Inconsistent automatic pandas date labeling
I was wondering how pandas formats the x-axis date exactly. I am using the same script on a bunch of data results, which all have the same pandas df format. However, pandas formats each df date differently. How could this be more consistently? Each df has a DatetimeIndex like this, dtype='datetime64[ns] >>> df.index DatetimeIndex(['2014-10-02', '2014-10-03', '2014-10-04', '2014-10-05', '2014-10-06', '2014-10-07', '2014-10-08', '2014-10-09', '2014-10-10', '2014-10-11', ... '2015-09-23', '2015-09-24', '2015-09-25', '2015-09-26', '2015-09-27', '2015-09-28', '2015-09-29', '2015-09-30', '2015-10-01', '2015-10-02'], dtype='datetime64[ns]', name='Date', length=366, freq=None) Eventually, I plot with df.plot() where the df has two columns. But the axes of the plots have different styles, like this: I would like all plots to have the x-axis style of the first plot. pandas should do this automatically, so I'd rather not prefer to begin with xticks formatting, since I have quite a lot of data to plot. Could anyone explain what to do? Thanks! EDIT: I'm reading two csv-files from 2015. The first has the model results of about 200 stations, the second has the gauge measurements of the same stations. Later, I read another two csv-files from 2016 with the same format. import pandas as pd df_model = pd.read_csv(path_model, sep=';', index_col=0, parse_dates=True) df_gauge = pd.read_csv(path_gauge, sep=';', index_col=0, parse_dates=True) df = pd.DataFrame(columns=['model', 'gauge'], index=df_model.index) df['model'] = df_model['station_1'].copy() df['gauge'] = df_gauge['station_1'].copy() df.plot() I do this for each year, so the x-axis should look the same, right?
I do not think this possible unless you make modifications to the pandas library. I looked around a bit for options that one may set in Pandas, but couldn't find one. Pandas tries to intelligently select the type of axis ticks using logic implemented here (I THINK). So in my opinion, it would be best to define your own function to make the plots and than overwrite the tick formatting (although you do not want to do that). There are many references around the internet which show how to do this. I used this one by "Simone Centellegher" and this stackoverflow answer to come up with a function that may work for you (tested in python 3.7.1 with matplotlib 3.0.2, pandas 0.23.4): import pandas as pd import numpy as np import matplotlib.dates as mdates import matplotlib.pyplot as plt ## pass df with columns you want to plot def my_plotter(df, xaxis, y_cols): fig, ax = plt.subplots() plt.plot(xaxis,df[y_cols]) ax.xaxis.set_minor_locator(mdates.MonthLocator()) ax.xaxis.set_major_locator(mdates.YearLocator()) ax.xaxis.set_minor_formatter(mdates.DateFormatter('%b')) ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y')) # Remove overlapping major and minor ticks majticklocs = ax.xaxis.get_majorticklocs() minticklocs = ax.xaxis.get_minorticklocs() minticks = ax.xaxis.get_minor_ticks() for i in range(len(minticks)): cur_mintickloc = minticklocs[i] if cur_mintickloc in majticklocs: minticks[i].set_visible(False) return fig, ax df = pd.DataFrame({'values':np.random.randint(0,1000,36)}, \ index=pd.date_range(start='2014-01-01', \ end='2016-12-31',freq='M')) fig, ax = my_plotter(df, df.index, ["values"])