I am wondering how I can make the following type of plot in Python (preferably matplotlib):
I would like four categories along the y-axis, and then the dates along the x-axis just as in the figure.
I have a CSV file with two columns [category], [date]. The date format is: dd-mm-yyy.
Extract:
category1,05-01-2020
category1,02-02-2020
category3,06-03-2020
category2,12-04-2020
etc...
Help will be appreciated!
You can simply plot the categories vs. the dates as is. For the color code, you need to convert the categories to individual numbers, which can be easily achieved using pandas Categorical data type.
d = """category1,05-01-2020
category1,02-02-2020
category3,06-03-2020
category2,12-04-2020"""
df = pd.read_csv(StringIO(d), sep=',', parse_dates=[1], header=None, names=['category','date'])
fig, ax = plt.subplots()
ax.scatter(df['date'],df['category'], marker='s', c=df['category'].astype('category').cat.codes, cmap='tab10')
Related
I have a Dataframe (3440 rows x 2 columns) with two columns (int). I need to plot this data frame with y axis (strain-ylabel ) and x axis (time-xlabel) that is the same with the expecting plot (I will show this figure below as a link). There are several visual problems that I hope you guys can teach and show me with, because I am very week in visualization with Python.
Here is the datasource:
Here is the expecting plot:
Here is result:
Here is my code:
df=pd.read_csv('https://www.gw-openscience.org/GW150914data/P150914/fig2-unfiltered-waveform-H.txt')
df= df['index'].str.split(' ', expand=True)
df.coulumns=['time (s)','strain (h1)']
x=df['time'][:200]
y=df['strain'][:200]
plt.figure(figsize=(14,8))
plt.scatter(x,y,c='blue')
plt.show()
Note: I have tried with seaborn, but result was the same. I also tried to narrow down into 200 rows, but the result is different with the expecting plot.
I appreciate if you guys can help me with. Thank you very much!
The following works for me. I'm skipping the first row, because the column labels are not separated correctly. Furthermore, while loading the data I indicate that the columns are separated by a space.
I don't think that the file contains the data to plot the "reconstructed" line.
import pandas as pd
# read the csv file, skip the first row, columns are separated by ' '
df=pd.read_csv('fig2-unfiltered-waveform-H.txt', skiprows=1, sep=' ')
# add proper column names
df.columns = ['index', 'strain (h1)']
# extract the index & strain variables
index=df['index']
strain=df['strain (h1)']
# plot the figure
plt.figure(figsize=(14,8))
plt.plot(index, strain, c='red', label='numerical relativity')
# label the y axis and show the legend
plt.ylabel('strain (h1)')
plt.legend(loc="upper left")
plt.show()
This is the resulting plot:
The same with seaborn, once you've imported the data with pandas:
import seaborn as sns
sns.lineplot(data = df, x="index", y="strain (h1)", color='red')
I'm trying to visualize a data frame I have with a stacked barchart, where the x is websites, the y is frequency and then the groups on the barchart are different groups using them.
This is the dataframe:
This is the plot created just by doing this:
web_data_roles.plot(kind='barh', stacked=True, figsize=(20,10))
As you can see its not what I want, vie tried changing the plot so the axes match up to the different columns of the dataframe but it just says no numerical data to plot, Not sure how to go about this anymore. so all help is appreciated
You need to organise your dataframe so that role is a column.
set_index() initial preparation
unstack() to move role out of index and make a column
droplevel() to clean up multi index columns
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1,1, figsize=[10,5],
sharey=False, sharex=False, gridspec_kw={"hspace":0.3})
df = pd.read_csv(io.StringIO("""website,role,freq
www.bbc.co.uk,director,2000
www.bbc.co.uk,technical,500
www.twitter.com,director,4000
www.twitter.com,technical,1500
"""))
df.set_index(["website","role"]).unstack(1).droplevel(0,axis=1).plot(ax=ax, kind="barh", stacked=True)
I am new python and I have two columns in a dataframe that i want to plot against date
plt.scatter(thing.date,thing.loc[:,['numbers','more_numbers']])
my intuition is the above should work (because matlab allows for this kind of thing), but it doesn't, and I'm not sure why.
Is there away around this?
I'm hoping to plot these columns for a sequence of 4 dataframes on the same axes - so i'd like to use a command like the above so I can colour the columns from each data frame to make it distinctive.
Easiest is to do a loop:
fig, ax = plt.subplots()
for col in ['numbers', 'more_numbers']:
ax.scatter(things.date, things[col], label=col)
# or
# things.scatter(x='date', y=col, label=col, ax=ax)
plt.show()
I have a dataframe
fig, ax = plt.subplots(figsize=(10, 8))
ax.bar(df_temp.index, df_temp['Client $ Amt'].cumsum()/100000)
ax.set_xticks(ax.get_xticks()[::2])
The problem is the graph is showing the empty dates in between the index of my graph. I want to not show those though.
I have already done
df_temp.plot(kind='bar')
But the dates in the x-axis are in long format and I am not able to add the line
ax.set_xticks(ax.get_xticks()[::2])
where I collapse the dates. I need to collapse the dates otherwise the x-axis ticks become unreadable when I add more dates to the dataframe.
I am looking for either one of two solutions. Not show in-between dates in the figure plot, or combine dates and show in a shorter format in the df.plot() example.
Edit
Here is the json dump of the dataframe. You can load it with df_temp_new = pd.read_json(json_dict)
'{"Client $ Amt":{"1483401600000":20,"1483488000000":-20.4,"1483574400000":20.76,"1483920000000":79.5684759707,"1484006400000":20.123,"1484179200000":20.654,"1484265600000":-20.876,"1484611200000":203.1234,"1484697600000":20.654,"1484784000000":20.432,"1484870400000":204.432,"1485129600000":-20.543,"1485216000000":20.654,"1485388800000":108.106,"1485475200000":2151.18,"1485734400000":1515.12,"1485820800000":102.327,"1485907200000":573.41,"1486080000000":449.65,"1486339200000":48.9152,"1486684800000":268.7302,"1486944000000":415.744,"1487030400000":22.335167,"1487116800000":20.6546,"1487203200000":865.45,"1487635200000":43.23,"1487721600000":543.234,"1488153600000":154.476,"1488240000000":20,"1488326400000":20,"1488412800000":20,"1488499200000":280.17256}}'
You can pass in an axis to the pandas plot
fig, ax = plt.subplots(figsize=(10, 8))
df_temp.plot(kind='bar', ax=ax)
So you can use all the matplotlib sorcery afterwards
I stumbled upon a post explaining how to handle this Pandas bar plot changes date format.
The code that worked for me is
import matplotlib.ticker as ticker
ax = (df_temp.cumsum()).plot(kind='bar')
ticklabels = ['']*len(df_temp.index)
# Every 4th ticklable shows the month and day
ticklabels[::4] = [item.strftime('%b %d') for item in df_temp.index[::4]]
ax.xaxis.set_major_formatter(ticker.FixedFormatter(ticklabels))
I am trying to generate a grid of subplots based off of a Pandas groupby object. I would like each plot to be based off of two columns of data for one group of the groupby object. Fake data set:
C1,C2,C3,C4
1,12,125,25
2,13,25,25
3,15,98,25
4,12,77,25
5,15,889,25
6,13,56,25
7,12,256,25
8,12,158,25
9,13,158,25
10,15,1366,25
I have tried the following code:
import pandas as pd
import csv
import matplotlib as mpl
import matplotlib.pyplot as plt
import math
#Path to CSV File
path = "..\\fake_data.csv"
#Read CSV into pandas DataFrame
df = pd.read_csv(path)
#GroupBy C2
grouped = df.groupby('C2')
#Figure out number of rows needed for 2 column grid plot
#Also accounts for odd number of plots
nrows = int(math.ceil(len(grouped)/2.))
#Setup Subplots
fig, axs = plt.subplots(nrows,2)
for ax in axs.flatten():
for i,j in grouped:
j.plot(x='C1',y='C3', ax=ax)
plt.savefig("plot.png")
But it generates 4 identical subplots with all of the data plotted on each (see example output below):
I would like to do something like the following to fix this:
for i,j in grouped:
j.plot(x='C1',y='C3',ax=axs)
next(axs)
but I get this error
AttributeError: 'numpy.ndarray' object has no attribute 'get_figure'
I will have a dynamic number of groups in the groupby object I want to plot, and many more elements than the fake data I have provided. This is why I need an elegant, dynamic solution and each group data set plotted on a separate subplot.
Sounds like you want to iterate over the groups and the axes in parallel, so rather than having nested for loops (which iterates over all groups for each axis), you want something like this:
for (name, df), ax in zip(grouped, axs.flat):
df.plot(x='C1',y='C3', ax=ax)
You have the right idea in your second code snippet, but you're getting an error because axs is an array of axes, but plot expects just a single axis. So it should also work to replace next(axs) in your example with ax = axs.next() and change the argument of plot to ax=ax.