I want to print my dataframe. Unfortunately the picture shows only 2 lines of the dataframe
instead of the 20 lines and the table is below and there is a huge empty area as well.. Could someone help me to get all the 20 lines of the dataframe?
This is the source How to save a pandas DataFrame table as a png
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import table # EDIT: see deprecation warnings below
from pathlib import Path
PATH_DATA = "../data"
PATH_RETAILROCKET = Path(PATH_DATA,"retailrocket/retailrocket/events.csv")
# RetailRocket
print(PATH_RETAILROCKET)
df = pd.read_csv(Path(PATH_RETAILROCKET))
df = df.head(20)
ax = plt.subplot(111, frame_on=False) # no visible frame
ax.xaxis.set_visible(False) # hide the x axis
ax.yaxis.set_visible(False) # hide the y axis
table(ax, df) # where df is your data frame
plt.savefig('mytable.png')
Tables are, by default, placed below the area occupied by the axes. Here you have most of the figure area occupied by an (invisible) axes, leaving little room for the table.
There are several ways to fix the issue depending on your desired output.
Here, I'm using the bbox= argument of Table to override the position, and make the table occupy the entirety of the figure.
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
fig = plt.figure()
ax = plt.subplot(111)
ax.axis('off')
table(ax, df, bbox=[0,0,1,1])
Related
Given the following example:
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
df.plot(linewidth=10)
The order of plotting puts the last column on top:
How can I make this keep the data & legend order but change the behaviour so that it plots X on top of Y on top of Z?
(I know I can change the data column order and edit the legend order but I am hoping for a simpler easier method leaving the data as is)
UPDATE: final solution used:
(Thanks to r-beginners) I used the get_lines to modify the z-order of each plot
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
fig = plt.figure()
ax = fig.add_subplot(111)
df.plot(ax=ax, linewidth=10)
lines = ax.get_lines()
for i, line in enumerate(lines, -len(lines)):
line.set_zorder(abs(i))
fig
In a notebook produces:
Get the default zorder and sort it in the desired order.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
np.random.seed(2021)
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
ax = df.plot(linewidth=10)
l = ax.get_children()
print(l)
l[0].set_zorder(3)
l[1].set_zorder(1)
l[2].set_zorder(2)
Before definition
After defining zorder
I will just put this answer here because it is a solution to the problem, but probably not the one you are looking for.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# generate data
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
# read columns in reverse order and plot them
# so normally, the legend will be inverted as well, but if we invert it again, you should get what you want
df[df.columns[::-1]].plot(linewidth=10, legend="reverse")
Note that in this example, you don't change the order of your data, you just read it differently, so I don't really know if that's what you want.
You can also make it easier on the eyes by creating a corresponding method.
def plot_dataframe(df: pd.DataFrame) -> None:
df[df.columns[::-1]].plot(linewidth=10, legend="reverse")
# then you just have to call this
df = pd.DataFrame(np.random.randint(1,10, size=(8,3)), columns=list('XYZ'))
plot_dataframe(df)
Consider a dataframe
some_id timestamp
a 1.2.2019
b 2.2.2019
c 3.2.2019
a 4.2.2019
b 5.2.2019
Now you can see there are 3 unique ids and among that a and b is associated with 2 timestamps , I want ids to come on x axis and blocks of dates on y axis. How can this be done ? Thank you for your patience. I want this in python using matplotlib or seaborn or any other visualization library. I also appreciate if you can mention a different way of meaningful visualization between these two variables. I want the figure to look like this below.
Here is a way to visualize the data with the id's on the x-axis and the dates on the y-axis. Supposing your dates are in the format day.month.year.
With ax.text you can put text inside the bars, either the date or an other column of interest.
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime
import pandas as pd
def timestr_to_num(timestr):
print(datetime.strptime(timestr, '%d.%m.%Y'))
return mdates.date2num(datetime.strptime(timestr, '%d.%m.%Y'))
rows = [['a', '1.2.2019'],
['b', '2.2.2019'],
['c', '3.2.2019'],
['a', '4.2.2019'],
['b', '5.2.2019']]
columns = ['some_id', 'timestamp']
df = pd.DataFrame(data=rows, columns=columns)
fig, ax = plt.subplots(figsize=(10, 5))
xs = list(df['some_id'].unique())
for row in df.itertuples():
x = xs.index( row.some_id)
y = timestr_to_num(row.timestamp)
ax.barh(y, left=x-0.5, width=1, height=1)
ax.text(x, y, row.timestamp, ha='center', va='center', color='white', fontsize=16)
ax.yaxis.set_major_formatter(mdates.DateFormatter('%d.%m.%Y'))
ax.yaxis.set_major_locator(mdates.DayLocator(interval=1)) # set a tick every hour
ax.set_xlabel('some_id')
ax.set_ylabel('timestamp')
ax.set_xticks(range(len(xs)))
ax.set_xticklabels(xs)
plt.tight_layout()
plt.show()
Another idea could be:
df.sort_values(by=['some_id', 'timestamp']).groupby(['some_id', 'timestamp']).size().unstack().plot(kind='bar', stacked=True)
But then the dates are in a legend, which might not be suitable if the list is too long.
I have a pandas data frame with 5 different column.
I want to plot them with different color, lable, and marker for each column.
I manage to make a different color and different label for each column, by passing a list of colors/label for each column. However this does not work for the marker. Any idea on how to do it?
Here is the code example:
ds # a pandas data frame with 3 columns
list_label=['A','B','C']
list_color=['tab:red','tab:green','tab:blue']
list_marker=['o','s','v']
ds.plot(color=list_color, label=list_label, marker=list_marker)
The later line produce an error like AttributeError: 'Line2D' object has no property 'list_marker'
Plot each Series separately on the same figure so you can specify the correct marker.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randint(1, 10, (10, 3)))
list_label=['A', 'B', 'C']
list_color=['tab:red', 'tab:green', 'tab:blue']
list_marker=['o', 's', 'v']
fig, ax = plt.subplots()
for i, col in enumerate(df):
df[col].plot(color=list_color[i], marker=list_marker[i], label=list_label[i], ax=ax)
plt.legend()
plt.show()
My code-
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.figure()
cols = ['hops','frequency']
data = [[-13,1],[-8,1],[-5,1],[0,2],[2,1],[4,1],[7,1]]
data = np.asarray(data)
indices = np.arange(0,len(data))
plot_data = pd.DataFrame(data, index=indices, columns=cols)
plt.bar(plot_data['hops'].tolist(),plot_data['frequency'].tolist(),width=0.8)
plt.xlim([-20,20])
plt.ylim([0,20])
plt.ylabel('Frequency')
plt.xlabel('Hops')
Output-
My requirements-
I want the graph to have the scale X axis-[-20,20],Y axis [0,18] and the bars should be labelled like in this case the 1st bar should be numbered 1 in this case and so on.
From your comment above, I am assuming this is what you want. You just need to specify the positions at which you want the x-tick labels.
xtcks = [-20, 20]
plt.xticks(np.insert(xtcks, 1, data[:, 0]))
plt.yticks([0, 18])
I have a Dataframe and I slice the Dataframe into three subsets. Each subset has 3 to 4 rows of data. After I slice the data frame into three subsets, I plot them using Matplotlib.
The problem I have is I am not able to create a plot where each subplot is plotted using sliced DataFrame. For example, in a group of three in a set, I have only one of the plots (last subplot) plotted where there is no data for the remaining two plots initial sets in a group. it looks like the 'r' value does not pass to 'r.plot' for all three subplots.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
df['key1'] = 0
df.key1.iloc[0:3] = 1
df.key1.iloc[3:7] = 2
df.key1.iloc[7:] = 3
df_grouped = df.groupby('key1')
for group_name, group_value in df_grouped:
rows, columns = group_value.shape
fig, axes = plt.subplots(rows, 1, sharex=True, sharey=True, figsize=(15,20))
for i,r in group_value.iterrows():
r = r[0:columns-1]
r.plot(kind='bar', fill=False, log=False)
I think you might want what I call df_subset to be summarized in some way, but here's a way to plot each group in its own panel.
# Your Code Setting Up the Dataset
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
df['key1'] = 0
df.key1.iloc[0:3] = 1
df.key1.iloc[3:7] = 2
df.key1.iloc[7:] = 3
# My Code to Plot in Three Panels
distinct_keys = df['key1'].unique()
fig, axes = plt.subplots(len(distinct_keys), 1, sharex=True, figsize=(3,5))
for i, key in enumerate(distinct_keys):
df_subset = df[df.key1==key]
# {maybe insert a line here to summarize df_subset somehow interesting?}
# plot
axes[i] = df_subset.plot(kind='bar', fill=False, log=False)