I'm very new with these libraries and i'm having troubles while plotting this:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import random
df5 = pd.read_csv('../../../../datos/tiempos-exacto-variando-n-m0.csv', sep=', ', engine='python')
print(df5)
df5['n'] = df5['n'].apply(lambda x: x**2)
sns.jointplot(df5['n'], df5['tiempoTotal'], kind="reg")
sns.plt.show()
And i'm getting this output:
n m tiempoTotal
0 1 0 2274
1 2 0 3370
2 3 0 5709
3 4 0 8959
4 5 0 13354
5 6 0 18503
6 7 0 26329
7 8 0 33859
8 9 0 41110
9 10 0 52710
10 11 0 64364
11 12 0 74142
12 13 0 81072
13 14 0 69332
14 15 0 71027
15 16 0 89721
16 17 0 85459
17 18 0 95217
18 19 0 119210
19 20 0 136888
20 21 0 131903
21 22 0 138395
22 23 0 151222
23 24 0 163542
24 25 0 177236
25 26 0 192475
26 27 0 240162
27 28 0 260701
28 29 0 235752
29 30 0 250835
.. ... .. ...
580 581 0 88306854
581 582 0 89276420
582 583 0 87457875
583 584 0 90807004
584 585 0 87790003
585 586 0 89821530
586 587 0 89486585
587 588 0 88496901
588 589 0 89090661
589 590 0 89110803
590 591 0 90397942
591 592 0 94029839
592 593 0 92749859
593 594 0 105991135
594 595 0 95383921
595 596 0 105155207
596 597 0 114193414
597 598 0 98108892
598 599 0 97888966
599 600 0 103802453
600 601 0 97249346
601 602 0 101917488
602 603 0 104943847
603 604 0 98966140
604 605 0 97924262
605 606 0 97379587
606 607 0 97518808
607 608 0 99839892
608 609 0 100046492
609 610 0 103857464
[610 rows x 3 columns]
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-21-63146953b89d> in <module>()
9 df5['n'] = df5['n'].apply(lambda x: x**2)
10 sns.jointplot(df5['n'], df5['tiempoTotal'], kind="reg")
---> 11 sns.plt.show()
AttributeError: 'module' object has no attribute 'plt'
I'm running this in my Jupyter Notebook with Python 2.7.12. Any ideas?
sns.plt.show() works fine for me using seaborn 0.7.1. Could be that this is different in other versions. However, if you anyways import matplotlib.pyplot as plt you may as well simply use plt.show(), as sns.plt.show() is only working because pyplot is available inside the seaborn namespace.
Well, I ran into this issue as well with Seaborn 0.8.1. Turns out being able to call sns.plt.show() is bad practice and the fact that it worked was a bug which the developer fixed. Unfortunately, there are many tutorials out there that still advise one to use sns.plt.show(). This is how I solved it:
Import plt directly: import matplotlib.pyplot as plt
Before you plot anything, set the default aesthetic parameters: sns.set() - important, because otherwise you won't get the Seaborn palettes.
Replace all calls to sns.plt with plt
As of Seaborn 0.8.1, sns.plt.plot() raises the error module 'seaborn' has no attribute 'plt'.
sns.plot() also raises an error; these methods are not in Seaborn's API.
Dropping the “sns.” to leave “plt.plot()” (as other answers suggest) does work, but only because we've called the sns.set() method in place earlier in the script... i.e. Seaborn is making an aesthetic change: Matplotlib is still the object, which does the plotting, via its plt.plot() method.
This script shows sns.set() in action... if you follow the comments and swap sns.set() between different locations in the script, it changes the appearance of the subplots. They look like Seaborn plots, but Matplotlib is doing the plotting.
Seaborn does of course have a load of its own plot methods (like sns.boxplot(), sns.violinplot() etc) but there is no longer a method sns.plt.plot().
I just want to confirm that I got the same error using Jupyter inside Anaconda (Feb 2018). Got the code from here but the error occured. It turns out that I need to simply add
import matplotlib.pyplot as plt
on top of
import seaborn as sns
and it work just fine using plt.show() instead of sns.plt.show()
Ensure you have updated your python shell as well IDE's like Anaconda.
Like I had a constant error in Spyder (Hosted under Anaconda) with relplot and catplot until I updated Anaconda as well as seaborn (0.90).
Updating via the Anaconda commandline should be pretty straightforward like in my case.
Related
I'm plotting a dataframe of binned values using seaborn
dist_to_next_melt = pd.melt(pd.DataFrame(dist_to_next))
dist_to_next_melt["bins"] = pd.qcut(dist_to_next_melt.index, 10)
print(dist_to_next_melt)
variable value bins
0 0 1 (-0.001, 91.7]
1 0 24 (-0.001, 91.7]
2 0 5 (-0.001, 91.7]
3 0 74 (-0.001, 91.7]
4 0 110 (-0.001, 91.7]
.. ... ... ...
913 0 290 (825.3, 917.0]
914 0 6 (825.3, 917.0]
915 0 15 (825.3, 917.0]
916 0 71 (825.3, 917.0]
917 0 0 (825.3, 917.0]
[918 rows x 3 columns]
(I can put the whole df in a pastebin if it seems relevant, but this doesn't look like it's an issue with the data)
I can get a basic plot of my data:
However when I try to remove my errorbars using sns.barplot(data=dist_to_next_melt, x="bins", y="value", color="pink", errorbar=None), as indicated in the docs, I get this error message: AttributeError: 'Rectangle' object has no property 'errorbar'. I was on seaborn-0.11.1, I have just updated seborn to seaborn-0.12.2 and the problem persists.
I'm running this on a Jupyter Notebook, using Conda to manage my modules, if any of this is relevant.
Have I missed something obvious? Or am I using sns.barplot() incorrectly?
I try to analyze the open data,and I tried to plot the scatter figure, but encounter the problem is always show the error.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# 讀入 csv 文字檔
csv_file = ("../ff0002fiac-4.csv")
data = pd.read_csv(csv_file,names=['a','b','c','d','e','f'])
print(data.head(5))
#df=pd.DataFrame(data)
years=data['a']
people=data['b']
print(years)
print(people)
data.plot(kind='line',x=years,y=people)
plt.show()
I expect to show the scatter figure, but the result is error.
Here is the data:
a b c d e f
0 100 3.56 120905 89608 72562 6686
1 101 3.43 118800 90229 73645 7858
2 102 3.47 116210 90236 73148 9170
3 103 3.17 105977 82889 68020 7949
4 104 3.36 121654 95517 77258 10049
and show the error below
KeyError: '[100 101 102 103 104 105 106] not in index'
From the pandas.DataFrame.plot documentation, the x and y parameters should be labels or positions. You're probably meaning to do this:
data.plot(kind='line',x='a',y='b')
I'm not getting my whole output as well as my column names in my Screen.
import sqlite3
import pandas as pd
hello = sqlite3.connect(r"C:\Users\ravjo\Downloads\Chinook.sqlite")
rs = hello.execute("SELECT * FROM PlaylistTrack INNER JOIN Track on PlaylistTrack.TrackId = Track.TrackId WHERE Milliseconds < 250000")
df = pd.DataFrame(rs.fetchall())
hello.close()
print(df.head())
actual result:
0 1 2 3 4 ... 6 7 8 9 10
0 1 3390 3390 One and the Same 271 ... 23 None 217732 3559040 0.99
1 1 3392 3392 Until We Fall 271 ... 23 None 230758 3766605 0.99
2 1 3393 3393 Original Fire 271 ... 23 None 218916 3577821 0.99
3 1 3394 3394 Broken City 271 ... 23 None 228366 3728955 0.99
4 1 3395 3395 Somedays 271 ... 23 None 213831 3497176 0.99
[5 rows x 11 columns]
expected result:
PlaylistId TrackId TrackId Name AlbumId MediaTypeId \
0 1 3390 3390 One and the Same 271 2
1 1 3392 3392 Until We Fall 271 2
2 1 3393 3393 Original Fire 271 2
3 1 3394 3394 Broken City 271 2
4 1 3395 3395 Somedays 271 2
GenreId Composer Milliseconds Bytes UnitPrice
0 23 None 217732 3559040 0.99
1 23 None 230758 3766605 0.99
2 23 None 218916 3577821 0.99
3 23 None 228366 3728955 0.99
4 23 None 213831 3497176 0.99
The ... in the middle actually says that some of the data have been omitted from display. If you want to see the entire data, you should modify the pandas options. You can do so by using pandas.set_option() method. Documentation here.
In your case, you should set display.max_columns to None so that pandas displays unlimited number of columns. You will have to read in the column names from the database of set it manually. Refer here on how to read in the column names from the database itself.
To display all the columns please use below mentioned code snippet.
pd.set_option("display.max_columns",None)
By default, pandas limits number of rows for display. However you can change it to as per your need. Here is helper function I use, whenever I need to print full data-frame
def print_full(df):
import pandas as pd
pd.set_option('display.max_rows', len(df))
print(df)
pd.reset_option('display.max_rows')
I have written code to show my data set as bar chart. this is my code:
I have read my data from .csv file in this way:
names = ["Clinic Number","Question Text","Answer Text","Answer Date","Class"]
data = pd.read_csv('ADLCI.csv', names = names)
And then
grouped = data.groupby(['Question Text','Answer Text']).size().reset_index(name='counts')
import matplotlib.pyplot as plt
plt.figure()
grouped.plot(kind='bar', title ="Functional Status Count", figsize=(15, 10), legend=True, fontsize=12)
plt.show()
This is also the result of data frame I have which I want to show as bar chart.
Question Text Answer Text counts
0 CI function No 513
1 CI function Yes 373
2 bathing? No 2827
3 bathing? Yes 408
4 dressing? No 2824
5 dressing? Yes 423
6 feeding No 2851
7 feeding Yes 160
8 housekeeping No 2803
9 housekeeping Yes 717
10 preparing food No 2604
11 preparing food Yes 593
12 responsibility for own medications No 2793
13 responsibility for own medications Yes 625
14 shopping No 35
15 shopping Yes 49
16 toileting No 2843
17 toileting Yes 239
18 transferring No 2834
19 transferring Yes 904
20 using transportation No 2816
21 using transportation Yes 483
the first column that is number has been added automatically, Actually I do not have that in my data set.
Here is the bar chart created by this code.
As you see in the bar chart, all bars has the same color. also the x axis is the number I was saying. but I dont want in this shape.
the thing I want is look like this link:
Im going to explain what changes I want to the picture I have uploaded here.
Instead of 0 and 1 ... in the x axis, it should depict the Question Text column. In detail, the bar chart in x axis will be: as we see in the dataframe there is two CI function one for yes and one for No. I want CI function instead of 0 and 1 with two different colors one pointing to the count of No 1596 and one different color pointing to Yes 1376.
The next item will be bathing?, again one bar pointing to 17965 and another one to 702.
With this I should have nearly ten bars, each contains two bars stick with each other like the link I put above.
I tried various ways like the above link but mine not showing like that or getting error.
Thanks :)
Update 1
when I applied your code:
import matplotlib.pyplot as plt
data.groupby(['Question Text','Answer Text']).sum().unstack().plot(kind='bar')
plt.show()
I got this error:
Traceback (most recent call last):
File "C:/Users/M193053/PycharmProjects/ADL-distribution/test.py", line 52, in <module>
data.groupby(['Question Text','Answer Text']).sum().unstack().plot(kind='bar')
File "C:\Users\M193053\Documents\Anaconda3\envs\conda3\lib\site-packages\pandas\plotting\_core.py", line 2941, in __call__
sort_columns=sort_columns, **kwds)
File "C:\Users\M193053\Documents\Anaconda3\envs\conda3\lib\site-packages\pandas\plotting\_core.py", line 1977, in plot_frame
**kwds)
File "C:\Users\M193053\Documents\Anaconda3\envs\conda3\lib\site-packages\pandas\plotting\_core.py", line 1804, in _plot
plot_obj.generate()
File "C:\Users\M193053\Documents\Anaconda3\envs\conda3\lib\site-packages\pandas\plotting\_core.py", line 258, in generate
self._compute_plot_data()
File "C:\Users\M193053\Documents\Anaconda3\envs\conda3\lib\site-packages\pandas\plotting\_core.py", line 373, in _compute_plot_data
'plot'.format(numeric_data.__class__.__name__))
TypeError: Empty 'DataFrame': no numeric data to plot
but when I use this code:
grouped = data.groupby(['Question Text','Answer Text']).size().reset_index(name='counts')
import matplotlib.pyplot as plt
grouped.groupby(['Question Text','Answer Text']).sum().unstack().plot(kind='bar')
plt.show()
It seems ok to me like this:
but it does not seem logical to apply two groupby. because of that Im not sure still what should I do.
Thaks for taking time :)
Update two
this is my data frame, has been got with this code:
grouped = data.groupby(['Question Text','Answer Text']).size().reset_index(name='counts')
0 CI function No 513
1 CI function Yes 373
2 bathing? No 2827
3 bathing? Yes 408
4 dressing? No 2824
5 dressing? Yes 423
6 feeding No 2851
7 feeding Yes 160
8 housekeeping No 2803
9 housekeeping Yes 717
10 preparing food No 2604
11 preparing food Yes 593
12 responsibility for own medications No 2793
13 responsibility for own medications Yes 625
14 shopping No 35
15 shopping Yes 49
16 toileting No 2843
17 toileting Yes 239
18 transferring No 2834
19 transferring Yes 904
20 using transportation No 2816
21 using transportation Yes 483
and this the data frame, has got from combination of your code and mine:
grouped = data.groupby(['Question Text','Answer Text']).size().reset_index(name='counts')
print(grouped)
import matplotlib.pyplot as plt
final = grouped.groupby(['Question Text','Answer Text']).sum()
print(final)
Question Text Answer Text
CI function No 513
Yes 373
bathing? No 2827
Yes 408
dressing? No 2824
Yes 423
feeding No 2851
Yes 160
housekeeping No 2803
Yes 717
preparing food No 2604
Yes 593
responsibility for own medications No 2793
Yes 625
shopping No 35
Yes 49
toileting No 2843
Yes 239
transferring No 2834
Yes 904
using transportation No 2816
Yes 483
Update 3
Original data frame there is 200000 rows like this :
1 bathing? No 3529933
2 dressing? No 3529933
3 feeding No 3529933
4 housekeeping No 3529933
5 responsibility for own medications No 3529933
6 using transportation No 3529933
7 toileting No 3529933
8 transferring No 3529933
10 preparing food No 3529933
11 bathing? NaN 2864155
12 dressing? NaN 2864155
13 feeding NaN 2864155
14 housekeeping NaN 2864155
15 responsibility for own medications NaN 2864155
16 toileting NaN 2864155
17 transferring NaN 2864155
19 preparing food NaN 2864155
20 using transportation Yes 2864155
21 bathing? NaN 2921299
22 dressing? NaN 2921299
You can do so(df is the dataframe you wrote):
import matplotlib
import matplotlib.pyplot as plt
matplotlib.style.use('ggplot')
df.groupby(['Question Text','Answer Text']).sum().unstack().plot(kind='bar')
plt.show()
Output:
You can also rotate the xlabel in this way:
plt.xticks(rotation=45)
but I suggest you to make the labels shorter to make it more clear
I am trying to make a model for predicting energy production, by using ARMA model.
The data I can use for training is as following;
(https://github.com/soma11soma11/EnergyDataSimulationChallenge/blob/master/challenge1/data/training_dataset_500.csv)
ID Label House Year Month Temperature Daylight EnergyProduction
0 0 1 2011 7 26.2 178.9 740
1 1 1 2011 8 25.8 169.7 731
2 2 1 2011 9 22.8 170.2 694
3 3 1 2011 10 16.4 169.1 688
4 4 1 2011 11 11.4 169.1 650
5 5 1 2011 12 4.2 199.5 763
...............
11995 19 500 2013 2 4.2 201.8 638
11996 20 500 2013 3 11.2 234 778
11997 21 500 2013 4 13.6 237.1 758
11998 22 500 2013 5 19.2 258.4 838
11999 23 500 2013 6 22.7 122.9 586
As shown above, I can use data from July 2011 to May 2013 for training.
Using the training, I want to predict energy production on June 2013 for each 500 house.
The problem is that the time series data is not stationary and has trend components and seasonal components (I checked it as following.).
import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data_train = pd.read_csv('../../data/training_dataset_500.csv')
rng=pd.date_range('7/1/2011', '6/1/2013', freq='M')
house1 = data_train[data_train.House==1][['EnergyProduction','Daylight','Temperature']].set_index(rng)
fig, axes = plt.subplots(nrows=1, ncols=3)
for i, column in enumerate(house1.columns):
house1[column].plot(ax=axes[i], figsize=(14,3), title=column)
plt.show()
With this data, I cannot implement ARMA model to get good prediction. So I want to get rid of the trend components and a seasonal components and make the time series data stationary. I tried this problem, but I could not remove these components and make it stationary..
I would recommend the Hodrick-Prescott (HP) filter, which is widely used in macroeconometrics to separate long-term trending component from short-term fluctuations. It is implemented statsmodels.api.tsa.filters.hpfilter.
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
df = pd.read_csv('/home/Jian/Downloads/data.csv', index_col=[0])
# get part of the data
x = df.loc[df.House==1, 'Daylight']
# hp-filter, set parameter lamb=129600 following the suggestions for monthly data
x_smoothed, x_trend = sm.tsa.filters.hpfilter(x, lamb=129600)
fig, axes = plt.subplots(figsize=(12,4), ncols=3)
axes[0].plot(x)
axes[0].set_title('raw x')
axes[1].plot(x_trend)
axes[1].set_title('trend')
axes[2].plot(x_smoothed)
axes[2].set_title('smoothed x')