Plotting dataframes using matplotlib in Python IDE - python

I am trying to plot a dataframe which has been taken from get_data_yahoo attribute in pandas_datareader.data on python IDE using matplotlib.pyplot and I am getting an KeyError for the X-Co-ordinate in prices.plot no matter what I try. Please help!
I have tried this out :-
import matplotlib.pyplot as plt
from pandas import Series,DataFrame
import pandas_datareader.data as pdweb
import datetime
prices=pdweb.get_data_yahoo(['CVX','XOM','BP'],start=datetime.datetime(2020,2,24),
end=datetime.datetime(2020,3,20))['Adj Close']
prices.plot(x="Date",y=["CVX","XOM","BP"])
plt.imshow()
plt.show()
And I have tried this as well:-
prices=DataFrame(prices.to_dict())
prices.plot(x="Timestamp",y=["CVX","XOM","BP"])
plt.imshow()
plt.show()
Please Help...!!
P.S: I am also getting some kind of warning, please explain about it if you could :)

The issue is that the Date column isn't an actual column when you import the data. It's an index. So just use:
prices = prices.reset_index()
Before plotting. This will convert the index into a column, and generate a new, integer-labelled index.
Also, in regards to the warnings, Pandas is full of them and they are super annoying! You can turn them off with the standard python library warnings.
import warnings
warnings.filterwarnings('ignore')

Related

Plotly express doesn't load and refuse to connect

I have this simple program that should display a pie chart, but whenever I run the program, it opens a page on Chrome and just keeps loading without any display, and sometimes it refuses to connect. How do I solve this?
P.S: I would like to use it offline, and I'm running it using cmd on windows10
import pandas as pd
import numpy as np
from datetime import datetime
import plotly.express as px
def graph(dataframe):
figure0 = px.pie(dataframe,values=dataframe['POPULATION'],names=dataframe['CONTINENT'])
figure0.show()
df = pd.DataFrame({'POPULATION':[60,17,9,13,1],'CONTINENT':['Asia','Africa','Europe','Americas','Oceania']})
graph(df)
Disclaimer: I extracted this answer from the OPs question. Answers should not be contained in the question itself.
Answer provided by g_odim_3:
So instead of figure0.show(), I used figure0.write_html('first_figure.html', auto_open=True) and it worked:
import pandas as pd
import numpy as np
from datetime import datetime
import plotly.express as px
def graph(dataframe):
figure0 = px.pie(dataframe,values=dataframe['POPULATION'],names=dataframe['CONTINENT'],title='Global Population')
# figure0.show()
figure0.write_html('first_figure.html', auto_open=True)
df = pd.DataFrame({'POPULATION':[60,17,9,13,1],'CONTINENT':['Asia','Africa','Europe','Americas','Oceania']})
graph(df)
I'm 99% sure that this is a version issue. It's a long time since you needed an internet connection to build Plotly figures. Follow the instructions here on how to upgrade your system. I've tried your exact code on my end at it produces the following plot:
I'm on Plotly 5.2.2. Run import plotly and plotly.__version__ to check the version on your end.

How do I make the ploy show in my df analysis

I have a dataframe of emails that has three columns: From, Message and Received (which is a date format).
I've written the below script to show how many messages there are per month in a bar plot.
But the plot doesn't show and I can't work out why, it's no doubt very simple. Any help understanding why is much appreciated!
Thanks!
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('XXX')
df = df[df['Message'].notna()]
df['Received'] = pd.to_datetime(df['Received'], format='%d/%m/%Y')
df['Received'].groupby(df['Received'].dt.month).count().plot
A pyplot object (commonly plt) is not shown until you call plt.show(). It is designed that way so you can create your plot and then modify it as needed before showing or saving.
Also checkout plt.savefig().

Python - Matplotlib plots incorrect graph when using pandas dataframe

(My first ever StackOverflow question)
I'm trying to plot bitcoin's market-cap against the date using pandas and matplotlib in Python.
Here is my code:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#read in CSV file using Pandas built in method
df = pd.read_csv("btc.csv", index_col=0, parse_dates=True)
Here are some details about the data frame:
dataframe details
matplotlib code:
#Plot marketcap(usd)
plt.plot(df.index, df["marketcap(USD)"])
plt.show()
Result:
Incorrect result
The plot seems to be more like scribbles that seem to move backwards. How could I fix this?
You can plot your Pandas Series "marketcap(USD)" directly using:
df["marketcap(USD)"].plot()
See the Pandas documentation on Basic Plotting

Using Dask with Python causes issues when running Pandas code

I am trying to work with Dask because my dataframe has become large and that pandas by itself can't simply process it. I read my dataset in as follows and get the following result that looks odd, not sure why its not outputting the dataframe:
import pandas as pd
import numpy as np
import seaborn as sns
import scipy.stats as stats
import matplotlib.pyplot as plt
import dask.bag as db
import json
%matplotlib inline
Leads = db.read_text('Leads 6.4.18.txt')
Leads
This returns (instead of my pandas dataframe):
dask.bag<bag-fro..., npartitions=1>
Then when I try to rename a few columns:
Leads_updated = Leads.rename(columns={'Business Type':'Business_Type','Lender
Type':'Lender_Type'})
Leads_updated
I get:
AttributeError: 'Bag' object has no attribute 'rename'
Can someone please explain what I am not doing correctly. The ojective is to just use Dask for all these steps since it is too big for regular Python/Pandas. My understanding is the syntax used under Dask should be the same as Pandas.

Scatter_Matrix Will Not Display Using Pandas and

Working through following the Machine Learning Tutorial:
http://machinelearningmastery.com/machine-learning-in-python-step-by-step/
Specifically, Section 4.2. Unfortunately, my code is throwing an error
NameError: name 'scatter_matrix' is not defined
Here is my code:
import pandas
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
dataset = pandas.read_csv(url, names=names)
scatter_matrix(dataset)
plt.show()
There's at least one Stack Overflow question on scatter_matrix, but I haven't able to figure out what's missing.
Pandas scatter_matrix - plot categorical variables
You will have to import it like this:
from pandas.plotting import scatter_matrix
Cause you've imported the Pandas. You could use it like below:
pd.scatter_matrix(dataset)
However, pandas.scatter_matrix() is deprecated. use pandas.plotting.scatter_matrix() instead

Categories