How to remove a series from box-plot data? - python

Hello I'm trying to solve this question with python and seaborn : Use "seaborn" to create box plots to represent the number of pieces per decade. We will not use the decade of the 40s because it only contains one year. **
The decades are starting from 1940 to 2010 and I would like to know how to delete the first decade (1940) from my boxplot.
Here this is what I did :
piecesDecade = sns.boxplot(x = "decade", y ="pieces" , data = lego)
but I dont know how to not use the first decade !
here the output of lego :

You can just filter out the decades:
sns.boxplot(x = "decade", y ="pieces" , data = lego[lego['year'] > 1949])
# or data = lego[lego['decade'] != '1940s']

Related

How to limit the 'data' in seaborn/sns.lineplot according to values of certain rows like datetime or location?

I have a rainfall dataset with from 1970 to 2019 with Location, Y, M, D columns.
I can plot that with
ax = sns.lineplot(x="Month", y="Rainfall", hue="Year", data=df)
But I want the output plot to only be limited to certain years or location like because if I plot all the years according to hue, then it becomes a mess.
Something like this,
ax = sns.lineplot(x="Month", y="Rainfall", hue="Dayofyear", data=df[(df.Station == 'Dhaka') & (df.Year == 1970])
but when I do that nothing happens.This is output https://i.stack.imgur.com/4KJmE.png
But when I set it like this (df.Year >= 1977), I get an output.
ax = sns.lineplot(x="Month", y="Rainfall", hue="Dayofyear", data=df[(df.Station == 'Dhaka') & (df.Year >= 1977)])
Like this : https://i.stack.imgur.com/kwWUU.png
Is there an easy way to specify the 'data' here. Like, I want to specify a range of years to show like 1970<year<1999 ?
IIUC use:
data=df[(df.Station == 'Dhaka') & (df.Year.betwen(1970, 1999, inclusive=False))]

filling in columns with info from other file based on condition

So there are 2 csv files im working with:
file 1:
City KWR1 KWR2 KWR3
Killeen
Killeen
Houston
Whatever
file2:
location link reviews
Killeen www.example.com 300
Killeen www.differentexample.com 200
Killeen www.example3.com 100
Killeen www.extraexample.com 20
Here's what im trying to make this code do:
look at the 'City' in file one, take the top 3 links in file 2 (you can go ahead and assume the cities wont get mixed up) and then put these top 3 into the KWR1 KWR2 KWR3 columns for all the same 'City' values.
so it gets the top 3 and then just copies them to the right of all the Same 'City' values.
even asking this question correctly is difficult for me, hope i've provided enough information.
i know how to read the file in with pandas and all that, just cant code this exact situation in...
It is a little unusual requirement but I think you need to three steps:
1. Keep only the first three values you actually need.
df = df.sort_values(by='reviews',ascending=False).groupby('location').head(3).reset_index()
Hopefully this keeps only the first three from every city.
Then you somehow need to label your data, there might be better ways to do this but here is one way:- You assign a new column with numbers and create a user defined function
import numpy as np
df['nums'] = np.arange(len(df))
Now you have a column full of numbers (kind of like line numbers)
You create your function then that will label your data...
def my_func(index):
if index % 3 ==0 :
x = 'KWR' + str(1)
elif index % 3 == 1:
x = 'KWR' + str(2)
elif index % 3 == 2:
x = 'KWR' + str(3)
return x
You can then create the labels you need:
df['labels'] = df.nums.apply(my_func)
Then you can do:
my_df = pd.pivot_table(df, values='reviews', index=['location'], columns='labels', aggfunc='max').reset_index()
Which literally pulls out the labels (pivots) and puts the values in to the right places.

Why matplotlib draws me the new graphic superimposing the old one?

I'm working on django project and using the matplotlib library. Theoretically I have created a filter where you can choose the day and and "node" that you want to graph and with this information a pythonscript is executed that together with pandas and matplotlib creates a graph.
The values ​​of "node" and "day" arrive correctly to the script, and this generates the graphic well. But the only thing wrong is that instead of overwriting the old image (with the previous graphic), draw the new lines on it. Next I show an image of how it looks.
As you can see, each line is equivalent to a different day, because it has been overlapping the different tests I have done. Can anyone tell me where I fail?
Below I attach code
def bateria2(node, day):
csv_path = os.path.join(os.path.dirname(__file__), '..\\data\\csv\\dataframe.csv')
df = pd.read_csv(csv_path)
mes, anyo = 12, 2019
new_df = df[(df['Dia'] == day) & (df['Mes'] == mes) & (df['Año'] == anyo) & (df['Node name'] == node)]
if len(new_df) > 0:
#os.remove('static\\img\\bateria2.png')
x = new_df['Hora[UTC]'].tolist()
y = new_df['Bateria'].tolist()
title = 'Carga/Descarga de la batería día '+str(day)+'/'+str(mes)+'/'+str(anyo)+' de '+str(node)
plt.title(title)
plt.xlabel('Hora [UTC]')
plt.ylabel('Batería')
#plt.legend((y)(node))
plt.plot(x,y)
plt.xticks(x, rotation='vertical')
plt.savefig('static\\img\\bateria2.png',transparent=True)
return 1
else:
return 0
Basically what I'm doing it is to access the .csv file that contains the info, filter according to the data that I want. And if the new dataframe generated has data, create the graph to finally save it.
Regards thank you very much.
Try to clear the current figure, plt.clf() after your savefig command. This should keep your plots from stacking up on top of each other.

How to get actual data points from a graph using python?

I have made a graph on stock data using fbprophet module in python. my graph looks like this :
The code i m using is this:
model = Prophet()
model.fit(df)
future = model.make_future_dataframe(periods=365) # forecasting for 1 year from now.
forecast = model.predict(future)
''' Plotting the forecast '''
figure = model.plot(forecast)
plt.plot = figure
figure.savefig('forecasting for 1 year.svg')
From above code i have made that graph. then i extracted the data points from it using mpld3 module
import mpld3
# print(mpld3.fig_to_dict(figure))
print(mpld3.fig_to_dict(figure)['data'])
It gives me output like this:
{'data01': [[734094.0, 3.3773930153824794], [734095.0, 3.379438304627263], ........ 'data03': [[0.0, 0.0]]}
But the problem is from the above output the y values i m getting is correct but not the x values.. The actual x values are like this :
"x": [
"2010-11-18 00:00:00",
"2010-11-19 00:00:00",
"2010-11-22 00:00:00" ... ]
but i m getting x values like this : 734094.0 , 734095.0 ..
So how can i get the actual data (data points x and y values ) from graph ??
Is there any other way to do it ? I want to extract data points from graph then send those from a flask api to UI (angular 4)
Thanks in advance!
734094 / 365.25 = 2009.8398. That's a very suggestive number for a date that, from your example, I assume is 2010-11-18. It looks like your date information is expressed as a floating-point number, where the difference of 1.0 corresponds to one day: and, the reference date for the value 0.0 is January 1, 1 AD.
You could try to write a function that counts from 01-01-1, or maybe you could find one in a library. Alternately, you could look at the converted value for a date you know, and work from there.

Matplotlib Spyder3.2.6 Plotting a lot of information and creating space

I am just starting out using Spyder and doing some simple data analysis. I have some census data that I have filtered. The data is pretty large 32k entries. As you will see I have filtered the census data into age, and hours per week. But when I went to plot it, the information is really scrunched together. I have been searching the internet trying to find a way to separate the values, but I am just coming up short. Any help would be great! Thanks
Picture of Plot
Data information
df = pd.read_csv('adult.data.csv', header=None, delimiter=',')
native_country = np.array(df[13])
united_states = native_country[0]
native_country_us = df.loc[(df[13] == united_states)]
native_country_us_hours_per_week = np.array(native_country_us[12])
native_country_us_age = np.array(native_country_us[0])
plt.plot(native_country_us_hours_per_week, native_country_us_age, "go")
plt.xlabel('Hours per week')
plt.ylabel('Age of US Citizen')
plt.title('Hours Per Week US Citizen Works')
plt.show()
I don't see you plugging the Y values into the plot.
try:
plt.plot(native_country_us_hours_per_week, native_country_us_age , "go")

Categories