I have a dataframe of emails that has three columns: From, Message and Received (which is a date format).
I've written the below script to show how many messages there are per month in a bar plot.
But the plot doesn't show and I can't work out why, it's no doubt very simple. Any help understanding why is much appreciated!
Thanks!
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('XXX')
df = df[df['Message'].notna()]
df['Received'] = pd.to_datetime(df['Received'], format='%d/%m/%Y')
df['Received'].groupby(df['Received'].dt.month).count().plot
A pyplot object (commonly plt) is not shown until you call plt.show(). It is designed that way so you can create your plot and then modify it as needed before showing or saving.
Also checkout plt.savefig().
Related
I am trying to plot a dataframe which has been taken from get_data_yahoo attribute in pandas_datareader.data on python IDE using matplotlib.pyplot and I am getting an KeyError for the X-Co-ordinate in prices.plot no matter what I try. Please help!
I have tried this out :-
import matplotlib.pyplot as plt
from pandas import Series,DataFrame
import pandas_datareader.data as pdweb
import datetime
prices=pdweb.get_data_yahoo(['CVX','XOM','BP'],start=datetime.datetime(2020,2,24),
end=datetime.datetime(2020,3,20))['Adj Close']
prices.plot(x="Date",y=["CVX","XOM","BP"])
plt.imshow()
plt.show()
And I have tried this as well:-
prices=DataFrame(prices.to_dict())
prices.plot(x="Timestamp",y=["CVX","XOM","BP"])
plt.imshow()
plt.show()
Please Help...!!
P.S: I am also getting some kind of warning, please explain about it if you could :)
The issue is that the Date column isn't an actual column when you import the data. It's an index. So just use:
prices = prices.reset_index()
Before plotting. This will convert the index into a column, and generate a new, integer-labelled index.
Also, in regards to the warnings, Pandas is full of them and they are super annoying! You can turn them off with the standard python library warnings.
import warnings
warnings.filterwarnings('ignore')
I want the x axis tick marks to be the different states ie. IDLE, Data=Addr, Hammer, etc that are in column A of the csv file.
import pandas as pd
import matplotlib.pyplot as plt
df1 = pd.read_csv("Output.csv", index_col = 0)
df1.plot(x = df1.index.values)
I have also tried
df1.plot(xticks = df1.index.values)
without any success.
CSV File
Plot
Thanks in advance!
You may want to try Seaborn because it looks like it is not a plotting issue but rather peripheral styling issue (all blacked out) in your environment.
Once you installed Seaborn, insert a piece of code below to yours.
import seaborn as sns
sns.set_style("whitegrid")
As a side note, if you wish to align the number of ticks in x axis to that of labels you have, replace your plotting part with the following:
df1.plot()
plt.xticks(range(df1.shape[0]), df1.index)
Hope this helps.
I am building a GUI in PySide where I have to keep redrawing pandas.DataFrame objects.
I found out in this simple snippet of code that plotting the pandas DataFrame object df takes much longer to plot than the numpy.array object, despite the fact that the plots are nearly identical. This is too slow for my GUI. Why is this so much slower?
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data = np.cumsum(np.random.randn(100, 10), axis=0)
df = pd.DataFrame(data)
df.plot() # Compare the speed of this line... (slow)
plt.plot(data) # to this line. (fast)
I like the way that the pandas.DataFrame plots look, especially because in my real example my x-axis is datetime data from pandas. I do not know how to format a matplotlib.pyplot x-axis to look good with datetime data.
How do I speed up pandas.DataFrame plotting?
What i am trying to do is slightly basic, however i am very new to python, and am having trouble.
Goal: is to plot the yellow highlighted Row(which i have highlighted, however it will not be highlighted when i need to read the data) on the Y-Axis and plot the "Time" Column on the X-Axis.
Here is a photo of the Data, and then the code that i have tried along with its error.
Code
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')
#Reading CSV and converting it to a df(Data_Frame)
df1 = pd.read_csv('Test_Sheet_1.csv', skiprows = 8)
#Creating a list from df1 and labeling it 'Time'
Time = df1['Time']
print(Time)
#Reading CSV and converting it to a df(Data_Frame)
df2 = pd.read_csv('Test_Sheet_1.csv').T
#From here i need to know how to skip 4 lines.
#I need to skip 4 lines AFTER the transposition and then we can plot DID and
Time
DID = df2['Parameters']
print(DID)
Error
As you can see from the code, right now i am just trying to print the Data so that i can see it, and then i would like to put it onto a graph.
I think i need to use the 'skiplines' function after the transposition, so that python can know where to read the "column" labeled parameters(its only a column after the Transposition), However i do not know how to use the skip lines function after the transposition unless i transpose it to a new Excel Document, but this is not an option.
Any help is very much appreciated,
Thank you!
Update
This is the output I get when I add print(df2.columns.tolist())
I am pretty sure this particular problem must have been treated somewhere but I cannot find it so I put the question.
I have 66 files with data stored in one single column. I wish to plot all data in a single plot. I'm used to do it with bash where acquiring and plotting data inside a loop is pretty trivial but I can't figure out in python.
thanks a lot for your help.
NM
Something like this should do it, although it will depend on how your data files are named.
import matplotlib.pyplot as plt
import numpy as np
fig,ax = plt.subplots()
# Lets say your files are called data-00.txt, data-01.txt etc.
for i in range(66):
data=np.genfromtxt('data-{:02d}.txt'.format(i))
ax.plot(data)
fig.savefig('my_fig.png')