I have data from two sensors that I want to visualize. Both sensors take only 0/1 values. How can I change the xaxis labels to show the time series and y axis should have 2 labels 0 and 1 representing the value of sensors along the time series.
import pandas as pd
import matplotlib.pyplot as plt
def drawgraph(inputFile):
df=pd.read_csv(inputFile)
fig=plt.figure()
ax=fig.add_subplot(111)
y = df[['sensor1']]
x=df.index
plt.plot(x,y)
plt.show()
You should have explained what you tried before asking a question for this to be meaningful. Anyway, below is the example.
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
trange = pd.date_range("11:00", "21:30", freq="30min")
df = pd.DataFrame({'time':trange,'sensor1':np.round(np.random.rand(len(trange))),\
'sensor2':np.round(np.random.rand(len(trange)))})
df = df.set_index('time')
df.plot(yticks=[0,1],ylim=[-0.1,1.1],style={'sensor1':'ro','sensor2':'bx'})
Related
I have a dummy dataset, df:
Demand WTP
0 13.0 111.3
1 443.9 152.9
2 419.6 98.2
3 295.9 625.5
4 150.2 210.4
I would like to plot this data as a step function in which the "WTP" are y-values and "Demand" are x-values.
The step curve should start with from the row with the lowest value in "WTP", and then increase gradually with the corresponding x-values from "Demand". However, I can't get the x-values to be cumulative, and instead, my plot becomes this:
I'm trying to get something that looks like this:
but instead of a proportion along the y-axis, I want the actual values from my dataset:
This is my code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
Demand_quantity = pd.Series([13, 443.9, 419.6, 295.9, 150.2])
Demand_WTP = [111.3, 152.9, 98.2, 625.5, 210.4]
demand_data = {'Demand':Demand_quantity, 'WTP':Demand_WTP}
Demand = pd.DataFrame(demand_data)
Demand.sort_values(by = 'WTP', axis = 0, inplace = True)
print(Demand)
# sns.ecdfplot(data = Demand_WTP, x = Demand_quantity, stat = 'count')
plt.step(Demand['Demand'], Demand['WTP'], label='pre (default)')
plt.legend(title='Parameter where:')
plt.title('plt.step(where=...)')
plt.show()
You can try:
import matplotlib.pyplot as plt
import pandas as pd
df=pd.DataFrame({"Demand":[13, 443.9, 419.6, 295.9, 150.2],"WTP":[111.3, 152.9, 98.2, 625.5, 210.4]})
df=df.sort_values(by=["Demand"])
plt.step(df.Demand,df.WTP)
But I am not really sure about what you want to do. If the x-values are the df.Demand, than the dataframe should be sorted according to this column.
If you want to cumulate the x-values, than try to use numpy.cumsum:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df=pd.DataFrame({"Demand":[13, 443.9, 419.6, 295.9, 150.2],"WTP":[111.3, 152.9, 98.2, 625.5, 210.4]})
df=df.sort_values(by=["WTP"])
plt.step(np.cumsum(df.Demand),df.WTP)
How to plot the Outliers with Box plot for the below data
no,store_id,revenue,profit,state,country
0,101,779183,281257,WD,India
1,101,144829,838451,WD,India
2,101,766465,757565,AL,Japan
Code is below, code is there till converting data to standardscalar any can choose minmaxscalar. After that How to define Quartile range to define outliers
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
df = pd.read_csv(r'anomaly.csv',index_col=False);
df1 = pd.get_dummies(data=df)
df2 = StandardScaler().fit_transform(df1)
Box and whisker plots display the 25th and 75th percentiles of the data by convention.
This is calculated automatically using the medians of the data you provided.
For example, for the following data:
no,store_id,revenue,profit,state,country
0,101,779183,281257,WD,India
1,101,144829,838451,WD,India
2,101,766465,757565,AL,Japan
2,101,1000000,757565,AL,Italy
You can display the boxplot as follows for the revenue column:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
df = pd.read_csv(r'anomaly.csv',index_col=False)
df1 = pd.get_dummies(data=df)
df2 = StandardScaler().fit_transform(df1)
green_diamond = dict(markerfacecolor='g', marker='D')
fig1, ax1 = plt.subplots()
ax1.set_title('Box plot')
ax1.boxplot(df['revenue'], flierprops=green_diamond)
plt.show()
The outlier is displayed:
This is a very straightforward question. I have and x axis of years and a y axis of numbers increasing linearly by 100. When plotting this with pandas and matplotlib I am given a graph that does not represent the data whatsoever. I need some help to figure this out because it is such a small amount of code:
The CSV is as follows:
A,B
2012,100
2013,200
2014,300
2015,400
2016,500
2017,600
2018,700
2012,800
2013,900
2014,1000
2015,1100
2016,1200
2017,1300
2018,1400
The Code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data = pd.read_csv("CSV/DSNY.csv")
data.set_index("A", inplace=True)
data.plot()
plt.show()
The graph this yields is:
It is clearly very inconsistent with the data - any suggestions?
The default behaviour of matplotlib/pandas is to draw a line between successive data points, and not to mark each data point with a symbol.
Fix: change data.plot() to data.plot(style='o'), or df.plot(marker='o', linewidth=0).
Result:
All you need is sort A before plotting.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data = pd.read_csv("CSV/DSNY.csv").reset_index()
data = data.sort_values('A')
data.set_index("A", inplace=True)
data.plot()
plt.show()
i want to convert that dataframe
into this dataframe and plot a matplotlib graph using date along x axis
changed dataframe
Use df.T.plot(kind='bar'):
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame.from_csv('./housing_price_index_2010-11_100.csv')
df.T.plot(kind='bar')
plt.show()
you can also assign the transpose to a new variable and plot that (what you asked in the comment):
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame.from_csv('./housing_price_index_2010-11_100.csv')
df_transposed = df.T
df_transposed.plot(kind='bar')
plt.show()
both result the same:
suppose I want to plot 2 histogram subplots on the same window in python, one below the next. The data from these histograms will be read from a file containing a table with attributes A and B.
In the same window, I need a plot of A vs the number of each A and a plot of B vs the number of each B - directly below the plot of A. so suppose the attributes were height and weight, then we'd have a graph of height and number of people with said height and below it a separate graph of weight and number of people with said weight.
import numpy as np; import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
frame = pd.read_csv('data.data', header=None)
subplot.hist(frame['A'], frame['A.count()'])
subplot.hist(frame['B'], frame['B.count()'])
Thanks for any help!
Using pandas you can make histograms like this:
import numpy as np; import pandas as pd
import matplotlib.pyplot as plt
frame = pd.read_csv('data.csv')
frame.hist(layout = (2,1))
plt.show()
I'm confused by the second part of the question. Do you want four separate subplots?
You can do this:
import numpy as np
import numpy.random
import pandas as pd
import matplotlib.pyplot as plt
#df = pd.read_csv('data.data', header=None)
df = pd.DataFrame({'A': numpy.random.random_integers(0,10,30),
'B': numpy.random.random_integers(0,10,30)})
print df['A']
ax1 = plt.subplot(211)
ax1.set_title('A')
ax1.set_ylabel('number of people')
ax1.set_xlabel('height')
ax2 = plt.subplot(212)
ax2.set_title('B')
ax2.set_ylabel('number of people')
ax2.set_xlabel('weight')
ax1.hist(df['A'])
ax2.hist(df['B'])
plt.tight_layout()
plt.show()