same data produces different pandas plot - python

I created a graph using DOGE crypto data:
import pandas as pd
import csv
df2 = pd.read_csv("https://raw.githubusercontent.com/peoplecure/pandoras-box/master/doge.csv")
plt.plot(df2['begins_at'], df2['open_price'])
plt.show()
Above graph looks fine. But, when I try to create a graph using another method with the exact same data, the graph looks totally off
from pandas import DataFrame
df = DataFrame (DOGE_data)
plt.plot(df['begins_at'], df['open_price'])
plt.show()
Regrettably, I don't have a way to share the data in the second method. However, data used in the first graph was created by df. I was hoping if anyone has any idea what may be going on here.

The messed up y-axis could be the hint: Usually, with numerical data, there would be 4-12 y-axis label ticks and markers. Then, usually, with non-numerical data, there is one tick for each "category".
Check the data type of y-data in the second dataset: df['open_price'].dtype

Related

Plotting histogram for all features with titles

I wrote the following code to plot histograms of all the features in my dataframe dff. My code snippet:
import matplotlib.pyplot as plt
dff.head()
for i in dff.columns:
plt.figure()
plt.hist(dff[i])
However the histograms just get plotted without the feature name / column name.Is there a way i could also print the column name below each of distribution charts so that i can relate which distribution corresponds to which column?
this is the first time I actually got chance to answer a question.
if you add this line it will print the column name as the title, you can add other text to it as well. This is a super useful little function
plt.title(f'{i}')
here is it in your code
dff.head()
for i in dff.columns:
plt.figure()
plt.title(f'{i}')
plt.hist(dff[i])
*edited this as I put the title in the wrong place originally

How can I loop through a list of elements and create time series plots in Python

Here is a sample of the data I'm working with WellAnalyticalData I'd like to loop through each well name and create a time series chart for each parameter with sample date on the x-axis and the value on the y-axis. I don't think I want subplots, I'm just looking for individual plots of each analyte for each well. I've used pandas to try grouping by well name and then attempting to plot, but that doesn't seem to be the way to go. I'm fairly new to python and I think I'm also having trouble figuring out how to construct the loop statement. I'm running python 3.x and am using the matplotlib library to generate the plots.
so if I understand your question correctly you want one plot for each combination of Well and Parameter. No subplots, just a new plot for each combination. Each plot should have SampleDate on the x-axis and Value on the y-axis. I've written a loop here that does just that, although you'll see that since in your data has just one date per well per parameter, the plots are just a single dot.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.DataFrame({'WellName':['A','A','A','A','B','B','C','C','C'],
'SampleDate':['2018-02-15','2018-03-31','2018-06-07','2018-11-14','2018-02-15','2018-11-14','2018-02-15','2018-03-31','2018-11-14'],
'Parameter':['Arsenic','Lead','Iron','Magnesium','Arsenic','Iron','Arsenic','Lead','Magnesium'],
'Value':[0.2,1.6,0.05,3,0.3,0.79,0.3,2.7,2.8]
})
for well in df.WellName.unique():
temp1 = df[df.WellName==well]
for param in temp1.Parameter.unique():
fig = plt.figure()
temp2 = temp1[temp1.Parameter==param]
plt.scatter(temp2.SampleDate,temp2.Value)
plt.title('Well {} and Parameter {}'.format(well,param))

how to make small multiple box plots with long data frame in python

I have a long data frame like the simplified sample below:
import pandas as pd
import numpy as np
data={'nm':['A','B']*12,'var':['vol','vol','ratio','ratio','price','price']*4,'value':np.random.randn(24)}
sample=pd.DataFrame(data)
sample
And wish to create small multiple box plots using var as facet, nm as category and value as value, how can I do so using matplotlib or seaborn? I've searched for similar code but the examples looked complex.
Perhaps you can start with seaborns catplot:
sns.catplot(x='nm', y='value', col='var', kind='box', data=sample)

How do I "reset the index" for a matplotlib plot?

I have the following code:
fig, ax = plt.subplots(1, 1)
calls["2016-12-24"].resample("1h").sum().plot(ax=ax)
calls["2016-12-25"].resample("1h").sum().plot(ax=ax)
calls["2016-12-26"].resample("1h").sum().plot(ax=ax)
which generates the following image:
How can I make this so the lines share the x-axis? In other words, how do I make them not switch days?
If you don't care about using the correct datetime as index, you could just reset the index as you suggested for all the series. This is going to overlap all the time series, if this is what you're trying to achieve.
# the below should
calls["2016-12-24"].resample("1h").sum().reset_index("2016-12-24").plot(ax=ax)
calls["2016-12-25"].resample("1h").sum().reset_index("2016-12-25").plot(ax=ax)
calls["2016-12-26"].resample("1h").sum().reset_index("2016-12-26").plot(ax=ax)
Otherwise you should try as well to resample the three columns at the same time. Have a go with the below but not knowing how your original dataframe look like, I'm not sure this will fit your case. You should post some more information about the input dataframe.
# have a try with the below
calls[["2016-12-24","2016-12-25","2016-12-26"].resample('1h').sum().plot()

How to output a large number of histograms in a pandas groupby

df is a dataframe with a days column. There are 100 days. I want to look at a histogram for my data column for each of the 100 days. The problem is that this code outputs everything on a single chart and all histograms are stacked together. Two questions:
Any advice to get one histogram for each day?
Any advice to save each histogram to an appropriately named file?
Note: When I replace hist in my code below with describe, it perfectly gives me 100 describe series. Also, the type of the grouper.get_group(days) object is pandas.series.
My simple code:
grouper = df.groupby('days')['data']
for days in grouper.groups.keys():
print grouper.get_group(days).hist()
One option would be to use inline plotting either in ipython qtconsole or ipython notebook:
%matplotlib inline
import matplotlib.pyplot as plt
for days in grouper.groups.keys():
grouper.get_group(days).hist()
plt.show()
Actually, if you are using the Ipython notebook, you can simply do:
df.groupby('days')['data'].hist()
Any function added to the end of the groupby will be fired for all groups in parallel, this is the strength of the groupby function.
No need to iterate

Categories