Supposed I have a df as below, how to add a sum() value in below DataFrame?
df.columns=['value_a','value_b','name','up_or_down','difference']
df
value_a value_b name up_or_down difference
project_name
# sum 27.56 25.04 sum down -1.31
2021-project11 0.43 0.48 2021-project11 up 0.05
2021-project1 0.62 0.56 2021-project1 down -0.06
2021-project2 0.51 0.47 2021-project2 down -0.04
2021-porject3 0.37 0.34 2021-porject3 down -0.03
2021-porject4 0.64 0.61 2021-porject4 down -0.03
2021-project5 0.32 0.25 2021-project5 down -0.07
2021-project6 0.75 0.81 2021-project6 up 0.06
2021-project7 0.60 0.60 2021-project7 down 0.00
2021-project8 0.85 0.74 2021-project8 down -0.11
2021-project10 0.67 0.67 2021-project10 down 0.00
2021-project9 0.73 0.73 2021-project9 down 0.00
2021-project11 0.54 0.54 2021-project11 down 0.00
2021-project12 0.40 0.40 2021-project12 down 0.00
2021-project13 0.76 0.77 2021-project13 up 0.01
2021-project14 1.16 1.28 2021-project14 up 0.12
2021-project15 1.01 0.94 2021-project15 down -0.07
2021-project16 1.23 1.24 2021-project16 up 0.01
2022-project17 0.40 0.36 2022-project17 down -0.04
2022-project_11 0.40 0.40 2022-project_11 down 0.00
2022-project4 1.01 0.80 2022-project4 down -0.21
2022-project1 0.65 0.67 2022-project1 up 0.02
2022-project2 0.75 0.57 2022-project2 down -0.18
2022-porject3 0.32 0.32 2022-porject3 down 0.00
2022-project18 0.91 0.56 2022-project18 down -0.35
2022-project5 0.84 0.89 2022-project5 up 0.05
2022-project19 0.61 0.48 2022-project19 down -0.13
2022-project6 0.77 0.80 2022-project6 up 0.03
2022-project20 0.63 0.54 2022-project20 down -0.09
2022-project8 0.59 0.55 2022-project8 down -0.04
2022-project21 0.58 0.54 2022-project21 down -0.04
2022-project10 0.76 0.76 2022-project10 down 0.00
2022-project9 0.70 0.71 2022-project9 up 0.01
2022-project22 0.62 0.56 2022-project22 down -0.06
2022-project23 2.03 1.74 2022-project23 down -0.29
2022-project12 0.39 0.39 2022-project12 down 0.00
2022-project24 1.35 1.55 2022-project24 up 0.20
project25 0.45 0.42 project25 down -0.03
project26 0.53 NaN project26 down NaN
project27 0.68 NaN project27 down NaN
I tried
df.sum().columns=['value_a_sun','value_b_sum','difference_sum']
And I would like to add below sum value in the above column value,
sum 27.56 25.04 sum down -1.31
But I got AttributeError: 'Series' object has no attribute 'column', how to fix this? Thanks so much for any advice.
Filter columns names in subset by [] before sum and assign for new row in DataFrame.loc:
df.loc['sum'] = df[['value_a','value_b','difference']].sum()
For first line:
df1 = df[['value_a','value_b','difference']].sum().to_frame().T
df = pd.concat([df1, df], ignore_index=True)
How do I fix this code, do I need to make the features_train and the features_test a DataFrame?
Anyone has an idea of how to fix that code? I really can't understand the problem....
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import Normalizer
from sklearn.metrics import r2_score
admissions_data = pd.read_csv('admissions_data.csv')
labels = admissions_data.iloc[:, -1]
features = admissions_data.iloc[:, 1:8]
features_train, labels_train, features_test, labels_test = train_test_split(features, labels, test_size=0.2, random_state=13)
sc = StandardScaler()
features_train_scaled = sc.fit_transform(features_train)
features_test_scale = sc.transform(features_test)
features_train_scaled = pd.DataFrame(features_train_scaled)
features_test_scale = pd.DataFrame(features_test_scale)
The error is:
Traceback (most recent call last):
File "script.py", line 26, in <module>
features_test_scale = sc.transform(features_test)
File "/usr/local/lib/python3.6/dist-packages/sklearn/preprocessing/_data.py", line 794, in transform
force_all_finite='allow-nan')
File "/usr/local/lib/python3.6/dist-packages/sklearn/base.py", line 420, in _validate_data
X = check_array(X, **check_params)
File "/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py", line 73, in inner_f
return f(**kwargs)
File "/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py", line 624, in check_array
"if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[0.57 0.78 0.59 0.64 0.47 0.63 0.65 0.89 0.84 0.73 0.75 0.64 0.46 0.78
0.62 0.53 0.85 0.67 0.84 0.94 0.64 0.53 0.47 0.86 0.62 0.7 0.77 0.61
0.61 0.63 0.86 0.82 0.65 0.58 0.7 0.7 0.84 0.72 0.71 0.77 0.69 0.8
0.52 0.62 0.79 0.71 0.9 0.84 0.6 0.86 0.67 0.61 0.71 0.52 0.62 0.37
0.73 0.64 0.71 0.8 0.88 0.78 0.45 0.62 0.62 0.86 0.74 0.94 0.58 0.7
0.92 0.64 0.65 0.83 0.34 0.66 0.67 0.7 0.71 0.54 0.68 0.61 0.68 0.79
0.57 0.94 0.59 0.79 0.73 0.91 0.86 0.95 0.9 0.92 0.68 0.84 0.69 0.72
0.94 0.53 0.45 0.77 0.77 0.91 0.61 0.78 0.77 0.82 0.9 0.92 0.54 0.92
0.72 0.5 0.68 0.78 0.72 0.53 0.79 0.49 0.68 0.72 0.73 0.93 0.72 0.52
0.54 0.86 0.65 0.93 0.89 0.72 0.34 0.64 0.96 0.79 0.73 0.49 0.73 0.94
0.7 0.95 0.65 0.86 0.78 0.75 0.89 0.94 0.91 0.87 0.93 0.81 0.94 0.89
0.57 0.77 0.39 0.46 0.78 0.64 0.76 0.58 0.56 0.53 0.79 0.9 0.92 0.96
0.67 0.65 0.64 0.58 0.94 0.76 0.78 0.88 0.84 0.68 0.66 0.42 0.56 0.66
0.46 0.65 0.58 0.72 0.48 0.68 0.89 0.95 0.46 0.71 0.79 0.52 0.57 0.76
0.52 0.8 0.77 0.91 0.75 0.49 0.72 0.72 0.61 0.97 0.8 0.85 0.73 0.64
0.87 0.63 0.97 0.72 0.82 0.54 0.71 0.45 0.8 0.49 0.77 0.93 0.89 0.93
0.81 0.62 0.81 0.66 0.78 0.76 0.48 0.61 0.82 0.68 0.7 0.68 0.62 0.81
0.87 0.94 0.38 0.67 0.64 0.84 0.62 0.7 0.62 0.5 0.79 0.78 0.36 0.77
0.57 0.87 0.74 0.71 0.61 0.57 0.64 0.73 0.81 0.74 0.8 0.69 0.66 0.64
0.93 0.64 0.59 0.71 0.82 0.69 0.69 0.89 0.93 0.74 0.64 0.84 0.91 0.97
0.55 0.74 0.72 0.71 0.93 0.96 0.8 0.8 0.81 0.88 0.64 0.38 0.87 0.73
0.78 0.89 0.56 0.61 0.76 0.46 0.78 0.71 0.81 0.59 0.47 0.7 0.42 0.76
0.8 0.67 0.94 0.65 0.51 0.73 0.9 0.8 0.65 0.7 0.96 0.96 0.73 0.79
0.86 0.89 0.85 0.76 0.76 0.71 0.83 0.76 0.42 0.9 0.58 0.66 0.86 0.71
0.8 0.51 0.65 0.58 0.76 0.8 0.7 0.61 0.71 0.69 0.95 0.72 0.79 0.97
0.74 0.96 0.47 0.56 0.73 0.94 0.76 0.79 0.71 0.58 0.94 0.66 0.75 0.76
0.84 0.59 0.68 0.75 0.76 0.72 0.87 0.78 0.67 0.79 0.91 0.57 0.77 0.69
0.73 0.43 0.93 0.68 0.82 0.67 0.74 0.82 0.85 0.62 0.54 0.71 0.92 0.85
0.79 0.63 0.59 0.73 0.66 0.74 0.9 0.81].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
You have made a mistake with splitting the data. That is because you set labels_train which are 1D to features_test by mistake, and since transform function does not expect 1D array, it returns error.
train_test_split() returns features_train, features_test, label_train, labels_test respectively.
So, change your code like this:
#features_train, labels_train, features_test, labels_test = train_test_split(features, labels, test_size=0.2, random_state=13)
features_train, features_test, label_train, labels_test = train_test_split(features, labels, test_size=0.2, random_state=13)
I'm trying to make a contour map with Basemap. My lat, lon and eof1 arrays are all 1-D and 79 items long. When I run this code, I get an error saying:
IndexError: too many indices for array
Any suggestions? I'm guessing a meshgrid or something, but all the combinations that I tried did not work.
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np
data = np.genfromtxt('/Volumes/NO_NAME/Classwork/Lab3PCAVarimax.txt',usecols=(1,2,3,4,5,6,7),skip_header=1)
eof1 = data[:,6]
locs = np.genfromtxt('/Volumes/NO_NAME/Classwork/OK_vic_grid.txt')
lat = locs[:,1]
lon = locs[:,2]
fig, ax = plt.subplots()
m = Basemap(projection='stere',lon_0=-95,lat_0=35.,lat_ts=40,\
llcrnrlat=33,urcrnrlat=38,\
llcrnrlon=-103.8,urcrnrlon=-94)
X,Y = m(lon,lat)
m.drawcoastlines()
m.drawstates()
m.drawcountries()
m.drawmapboundary(fill_color='lightblue')
m.drawparallels(np.arange(0.,40.,2.),color='gray',dashes=[1,3],labels=[1,0,0,0])
m.drawmeridians(np.arange(0.,360.,2.),color='gray',dashes=[1,3],labels=[0,0,0,1])
m.fillcontinents(color='beige',lake_color='lightblue',zorder=0)
plt.title('Oklahoma PCA-Derived Soil Moisture Regions (Varimax)')
m.contour(X,Y,eof1)
lat and lon data:
1 33.75 -97.75
2 33.75 -97.25
3 33.75 -96.75
4 33.75 -96.25
5 33.75 -95.75
6 33.75 -95.25
7 33.75 -94.75
8 34.25 -99.75
9 34.25 -99.25
10 34.25 -98.75
11 34.25 -98.25
12 34.25 -97.75
13 34.25 -97.25
14 34.25 -96.75
15 34.25 -96.25
16 34.25 -95.75
17 34.25 -95.25
18 34.25 -94.75
19 34.75 -99.75
20 34.75 -99.25
21 34.75 -98.75
22 34.75 -98.25
23 34.75 -97.75
24 34.75 -97.25
25 34.75 -96.75
26 34.75 -96.25
27 34.75 -95.75
28 34.75 -95.25
29 34.75 -94.75
30 35.25 -99.75
31 35.25 -99.25
32 35.25 -98.75
33 35.25 -98.25
34 35.25 -97.75
35 35.25 -97.25
36 35.25 -96.75
37 35.25 -96.25
38 35.25 -95.75
39 35.25 -95.25
40 35.25 -94.75
41 35.75 -99.75
42 35.75 -99.25
43 35.75 -98.75
44 35.75 -98.25
45 35.75 -97.75
46 35.75 -97.25
47 35.75 -96.75
48 35.75 -96.25
49 35.75 -95.75
50 35.75 -95.25
51 35.75 -94.75
52 36.25 -99.75
53 36.25 -99.25
54 36.25 -98.75
55 36.25 -98.25
56 36.25 -97.75
57 36.25 -97.25
58 36.25 -96.75
59 36.25 -96.25
60 36.25 -95.75
61 36.25 -95.25
62 36.25 -94.75
63 36.75 -102.75
64 36.75 -102.25
65 36.75 -101.75
66 36.75 -101.25
67 36.75 -100.75
68 36.75 -100.25
69 36.75 -99.75
70 36.75 -99.25
71 36.75 -98.75
72 36.75 -98.25
73 36.75 -97.75
74 36.75 -97.25
75 36.75 -96.75
76 36.75 -96.25
77 36.75 -95.75
78 36.75 -95.25
79 36.75 -94.75
eof data
PC5 PC3 PC2 PC6 PC7 PC4 PC1
1 0.21 0.14 0.33 0.39 0.73 0.13 0.03
2 0.19 0.17 0.42 0.24 0.78 0.1 0.04
3 0.17 0.18 0.51 0.18 0.71 0.01 0.1
4 0.18 0.2 0.58 0.19 0.67 0.07 0.11
5 0.15 0.17 0.76 0.2 0.43 0.11 0.13
6 0.12 0.16 0.82 0.17 0.34 0.12 0.15
7 0.1 0.2 0.84 0.14 0.28 0.14 0.13
8 0.16 0.09 0.2 0.73 0.29 0.25 0.1
9 0.18 0.14 0.18 0.68 0.36 0.24 0.14
10 0.23 0.22 0.18 0.63 0.53 0.21 0.14
11 0.19 0.23 0.23 0.52 0.62 0.19 0.14
12 0.2 0.18 0.23 0.43 0.74 0.15 0.11
13 0.21 0.19 0.43 0.24 0.77 0.11 0.11
14 0.15 0.21 0.51 0.15 0.72 0.1 0.15
15 0.14 0.23 0.58 0.19 0.66 0.1 0.12
16 0.13 0.22 0.74 0.19 0.49 0.13 0.12
17 0.08 0.24 0.85 0.19 0.28 0.15 0.1
18 0.1 0.29 0.86 0.15 0.18 0.16 0.07
19 0.26 0.11 0.17 0.77 0.1 0.24 0.06
20 0.36 0.16 0.14 0.74 0.24 0.23 0.12
21 0.32 0.27 0.14 0.65 0.42 0.14 0.14
22 0.39 0.29 0.21 0.58 0.47 0.09 0.21
23 0.3 0.3 0.29 0.47 0.48 0.09 0.33
24 0.25 0.35 0.35 0.42 0.45 0.09 0.45
25 0.25 0.33 0.43 0.29 0.52 0.11 0.46
26 0.24 0.36 0.48 0.26 0.53 0.09 0.4
27 0.18 0.35 0.62 0.24 0.48 0.11 0.28
28 0.13 0.4 0.83 0.12 0.15 0.12 0.06
29 0.13 0.42 0.81 0.1 0.14 0.08 0.05
30 0.45 0.14 0.14 0.7 0.05 0.2 0.04
31 0.52 0.19 0.13 0.68 0.25 0.18 0.06
32 0.53 0.2 0.16 0.66 0.32 0.09 0.08
33 0.48 0.26 0.2 0.56 0.37 0.06 0.21
34 0.41 0.34 0.28 0.44 0.35 0.06 0.43
35 0.37 0.4 0.28 0.37 0.32 0.06 0.54
36 0.24 0.41 0.39 0.27 0.33 0.11 0.56
37 0.29 0.47 0.37 0.28 0.32 0.11 0.54
38 0.3 0.61 0.36 0.25 0.26 0.13 0.47
39 0.21 0.6 0.66 0.13 0.18 0.1 0.12
40 0.13 0.48 0.75 0.1 0.13 0.07 0.06
41 0.55 0.15 0.14 0.63 0.07 0.25 0.1
42 0.55 0.19 0.17 0.65 0.13 0.2 0.11
43 0.6 0.19 0.15 0.62 0.27 0.04 0.06
44 0.63 0.18 0.16 0.53 0.25 0.04 0.16
45 0.69 0.27 0.22 0.36 0.23 -0.01 0.28
46 0.56 0.39 0.25 0.22 0.24 0.06 0.47
47 0.45 0.51 0.28 0.23 0.25 0.11 0.51
48 0.38 0.63 0.3 0.27 0.24 0.14 0.4
49 0.3 0.75 0.34 0.19 0.21 0.13 0.3
50 0.29 0.77 0.44 0.16 0.19 0.12 0.13
51 0.18 0.66 0.63 0.11 0.17 0.1 0.06
52 0.53 0.12 0.08 0.35 0.1 0.52 0.14
53 0.68 0.19 0.14 0.4 0.09 0.36 0.12
54 0.76 0.24 0.14 0.34 0.09 0.29 0.12
55 0.84 0.25 0.12 0.29 0.15 0.1 0.14
56 0.82 0.25 0.11 0.28 0.21 0.03 0.12
57 0.64 0.44 0.22 0.23 0.21 0.06 0.36
58 0.54 0.52 0.27 0.21 0.2 0.09 0.39
59 0.44 0.72 0.26 0.22 0.17 0.17 0.23
60 0.3 0.79 0.28 0.17 0.14 0.11 0.19
61 0.26 0.81 0.35 0.18 0.17 0.12 0.08
62 0.24 0.82 0.37 0.16 0.17 0.1 0.05
63 0.17 0.07 0.22 0.26 0.18 0.75 0.07
64 0.25 0.15 0.24 0.23 0.12 0.82 0.08
65 0.3 0.15 0.16 0.23 0.11 0.82 0.04
66 0.39 0.23 0.15 0.19 0.06 0.77 0.05
67 0.58 0.2 0.09 0.21 0.12 0.55 -0.1
68 0.68 0.17 0.04 0.21 0.11 0.48 -0.07
69 0.59 0.18 0.01 0.14 0.04 0.47 0.07
70 0.75 0.2 0.1 0.29 0.06 0.36 0.11
71 0.75 0.22 0.13 0.26 0.13 0.31 0.07
72 0.82 0.25 0.12 0.2 0.19 0.17 0.06
73 0.79 0.3 0.11 0.15 0.13 0.16 0.03
74 0.76 0.41 0.13 0.16 0.17 0.08 0.13
75 0.65 0.48 0.16 0.14 0.15 0.13 0.15
76 0.52 0.66 0.18 0.16 0.2 0.22 0.05
77 0.45 0.74 0.24 0.16 0.19 0.2 0.06
78 0.38 0.78 0.32 0.17 0.14 0.15 0.02
79 0.28 0.79 0.34 0.15 0.16 0.11 0
AFAICT the essence of your problem is that your x/y grid isn't strictly rectangular. The documentation for matplotlib.pyplot.contour says:
X and Y must both be 2-D with the same shape as Z, or they must both
be 1-D such that len(X) is the number of columns in Z and len(Y) is
the number of rows in Z
see http://matplotlib.org/api/pyplot_api.html
With your unmodified data you can get a quiver plot by e.g:
# create vectors up and slightly right
v=eof1
u=[eof1[i]*0.5 for i in range(len(eof1))]
m.quiver(lon,lat,u,v, latlon=True)
plt.show()
So you will have to map your data to the 1-D,1-D,2-D or 2-D,2-D,2-D format required by contour().
It's fairly easy to make your data cover a smaller latlon rectangular area by deleting rows 1-7 and 63-68 (or I guess you could pad it out with 0 values to cover your original area), but by the time the lon/lat are projected to your stere projection coordinates they aren't rectangular any more, which I think will also be a problem. How about you use a merc projection, just to get things going?
However overall I think you will need more data, particularly to get contours over your Oaklahoma boundary you need data up to the boundary. Use the latlon=True parameter to the contour call so it transforms the lon and lat correctly, even with the merc projection. I also tried adding parameter tri=True but that seems to place different requirements on the xx/y/z data.
Another example, you can get a bubble plot using scatter():
s=[eof1[i]*500 for i in range(len(eof1))]
m.scatter(lon,lat,s=s,latlon=True)
Addition:
Managed to get some contours!
Simplest solution was to hardcode your lat/lon/data for the rectangular region, the meshgrid turns the 1-D lon and lat into a full 2-D grid in xx and yy, and the value points are 2-D. Here's the code:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
data = np.genfromtxt('Lab3PCAVarimax.txt',usecols=(1,2,3,4,5,6,7),skip_header=1)
eof1 = data[:,6]
print eof1
eof11= [
[ 0.1 ,0.14 ,0.14 ,0.14 ,0.11 ,0.11 ,0.15 ,0.12 ,0.12 ,0.1 ,0.07]
,[ 0.06 ,0.12 ,0.14 ,0.21 ,0.33 ,0.45 ,0.46 ,0.4 ,0.28 ,0.06 ,0.05]
,[ 0.04 ,0.06 ,0.08 ,0.21 ,0.43 ,0.54 ,0.56 ,0.54 ,0.47 ,0.12 ,0.06]
,[ 0.1 ,0.11 ,0.06 ,0.16 ,0.28 ,0.47 ,0.51 ,0.4 ,0.3 ,0.13 ,0.06]
,[ 0.14 ,0.12 ,0.12 ,0.14 ,0.12 ,0.36 ,0.39 ,0.23 ,0.19 ,0.08 ,0.05]
,[ 0.07 ,0.11 ,0.07 ,0.06 ,0.03 ,0.13 ,0.15 ,0.05 ,0.06 ,0.02 ,0. ]
]
locs = np.genfromtxt('OK_vic_grid.txt')
lat = locs[:,1]
lon = locs[:,2]
lat1 = [34.25 ,34.75,35.25,35.75,36.25,36.75]
lon1 =[-99.75,-99.25, -98.75, -98.25, -97.75, -97.25, -96.75, -96.25, -95.75, -95.25, -94.75]
fig, ax = plt.subplots()
m = Basemap(projection='merc',lon_0=-95,lat_0=35.,lat_ts=40,\
llcrnrlat=33,urcrnrlat=38,\
llcrnrlon=-103.8,urcrnrlon=-94)
#X,Y = m(lon,lat)
m.drawcoastlines()
m.drawstates()
m.drawcountries()
m.drawmapboundary(fill_color='lightblue')
m.drawparallels(np.arange(0.,40.,2.),color='gray',dashes=[1,3],labels=[1,0,0,0])
m.drawmeridians(np.arange(0.,360.,2.),color='gray',dashes=[1,3],labels=[0,0,0,1])
m.fillcontinents(color='beige',lake_color='lightblue',zorder=0)
plt.title('Oklahoma PCA-Derived Soil Moisture Regions (Varimax)')
xx, yy = m(*np.meshgrid(lon1,lat1))
m.contourf(xx,yy,eof11)
plt.show()
Further addition: Actually this still works when the projection is stere :-)