How to solve Index out of bounds error? - python

I'm a bit confused on an error I keep running into. I didn't have it before, but at the same time my data was wrong so I had to re-write the code.
Running the following:
plt.figure(figsize=(20,10))
x = np.arange(1416, 1426, 0.009766)
gaverage = np.empty((21,1024), dtype = np.float64)
calibdata = open(pathc + 'calib_5m.dat').readlines()
#print(np.size(calibdata)) ||| Yields: 624
#print(np.size(calibdata)//16) ||| Yields: 39
calib = np.empty(shape=(np.size(calibdata)//16,1024), dtype=np.float64)
for i in range(0, np.size(calibdata)//4):
calib[i] = calibdata[i*4+3].split()
caverage = np.average(calib[i] ,axis = 0)
Yields this:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-25-87f3f4739851> in <module>()
11 calib = np.empty(shape=(np.size(calibdata)//16,1024), dtype=np.float64)
12 for i in range(0, np.size(calibdata)//4):
---> 13 calib[i] = calibdata[i*4+3].split()
14 caverage = np.average(calib[i] ,axis = 0)
15
IndexError: index 39 is out of bounds for axis 0 with size 39
Now what I'm trying to do here is basically take every 4th line in the file read in calibdata and write it to a new array, calib[i]. If the indices are the same size how are they out of bounds? I think there's some fundamentally flawed logic here on my part so if anyone can point out where I'm falling short, that would be great.

calib is initialized to size (39,n). But i iterator goes well beyond that:
In [243]: for i in range(np.size(calibdata)//4):
...: print(i, i*4+3)
...:
0 3
1 7
2 11
3 15
4 19
5 23
6 27
7 31
8 35
....
147 591
148 595
149 599
150 603
151 607
152 611
153 615
154 619
155 623
In [244]: calib=np.zeros((np.size(calibdata)//16),int)
In [245]: calib.shape
Out[245]: (39,)

Related

How to fix IndexErrors in Python

I know theres tons of similar question titles but none of them solved my particular question.
So I have this code:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
# my_list contains 983 list items
df = pd.DataFrame(np.array(my_list), columns=list('ABCDEF'))
df contains 983 items composed of lists of list
df.head()
A B C D E F
0 47 5 17 16 57 58
1 6 23 34 21 46 37
2 57 5 53 42 18 55
3 43 24 36 16 39 22
4 32 53 5 18 34 29
scaler = StandardScaler().fit(df.values)
transformed_dataset = scaler.transform(df.values)
transformed_df = pd.DataFrame(data=transformed_dataset, index=df.index)
number_of_rows = df.values.shape[0] # all our lists
window_length = 983 # amount of past number list we need to take in consideration for prediction
number_of_features = df.values.shape[1] # number count
train = np.empty([number_of_rows-window_length, window_length, number_of_features], dtype=float)
label = np.empty([number_of_rows-window_length, number_of_features], dtype=float)
window_length = 982
for i in range(0, number_of_rows-window_length):
train[i]=transformed_df.iloc[i:i+window_length,0:number_of_features]
label[i]=transformed_df.iloc[i:i+window_length:i+window_length+1,0:number_of_features]
train.shape
(0, 983, 6)
label.shape
(0, 6)
train[0] is working fine but when I do train[1] I got this error:
train[1]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-43-e73aed9430c6> in <module>
----> 1 train[1]
IndexError: index 1 is out of bounds for axis 0 with size 0
also when I do label[0], its fine. but when I do label[1] I got this error:
label[1]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-45-1e13a70afa10> in <module>
----> 1 label[1]
IndexError: index 1 is out of bounds for axis 0 with size 0
how to fix IndexErrors
You're creating an array whose first dimension has size 0 - that's why you're getting these errors
You're using the value number_of_rows - window_length for the first dimension - which is 0. I guess that's not what you want.

How to aapend data efficiently into pandas dataframe using for loop

I am trying to get the data from Python script and storing that into a list and then creating a dataframe out of it.
But it create different Datafarme's for individual items in for loop, how to avoid that and create a single dataFrame.
Code:
from __future__ import (absolute_import, division, print_function)
import getpass
import ssl
from pyVim.connect import SmartConnect
from pyVmomi import vim
from ssl import CERT_NONE, PROTOCOL_TLSv1_2, SSLContext
import pandas as pd
import numpy as np
from tabulate import tabulate
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('max_colwidth', None)
pd.set_option('expand_frame_repr', False)
s = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)
s.verify_mode = ssl.CERT_NONE
userid = input("please enter your wbi userid (ex. abx#example.com):")
p = getpass.getpass("password:")
vcenter = ["vcenter-oracle.com","vcente.simplivity.com"]
for instance in vcenter:
try:
c = SmartConnect(host=instance, user=userid, pwd=p, sslContext=s)
except Exception as e:
print(e)
continue
content = c.content
obj_ds = content.viewManager.CreateContainerView(content.rootFolder,[vim.Datastore],True)
# Lists
a_list = []
b_list = []
c_list = []
d_list = []
# for loop
for z in obj_ds.view:
a_list.append(instance)
b_list.append(z)
c_list.append(int(z.summary.capacity/(1024*1024*1024)))
d_list.append(int(z.summary.freeSpace/(1024*1024*1024)))
#dataframe
df = pd.DataFrame({'A': a_list, 'B': b_list, 'C': c_list, 'D': d_list})
I also tried as below for for loop section but result are same:
raw_data = []
for z in obj_ds.view:
vc_data = instance, z, int(z.summary.capacity/(1024*1024*1024)) ,int(z.summary.freeSpace/(1024*1024*1024))
raw_data.append(vc_data)
df = pd.DataFrame(raw_data, columns=['Vcenter', 'DS', 'TDS', 'FDS'])
print(df)
Output:
A B C D
0 vcenter-oracle.com 'vim.Datastore:datastore-357' 439 430
1 vcenter-oracle.com 'vim.Datastore:datastore-311' 439 430
2 vcenter-oracle.com 'vim.Datastore:datastore-306' 439 430
3 vcenter-oracle.com 'vim.Datastore:datastore-262' 20480 7030
4 vcenter-oracle.com 'vim.Datastore:datastore-356' 439 430
5 vcenter-oracle.com 'vim.Datastore:datastore-465' 52 46
6 vcenter-oracle.com 'vim.Datastore:datastore-94' 5836 1850
7 vcenter-oracle.com 'vim.Datastore:datastore-122' 11646 3592
8 vcenter-oracle.com 'vim.Datastore:datastore-89' 52 46
9 vcenter-oracle.com 'vim.Datastore:datastore-83' 52 46
10 vcenter-oracle.com 'vim.Datastore:datastore-149' 52 46
A B C D
0 vcenter.simplivity.com 'vim.Datastore:datastore-143230' 1945 501
1 vcenter.simplivity.com 'vim.Datastore:datastore-52354' 5120 2096
2 vcenter.simplivity.com 'vim.Datastore:datastore-142927' 274 271
3 vcenter.simplivity.com 'vim.Datastore:datastore-143231' 2048 987
4 vcenter.simplivity.com 'vim.Datastore:datastore-878' 553 549
5 vcenter.simplivity.com 'vim.Datastore:datastore-877' 553 552
6 vcenter.simplivity.com 'vim.Datastore:datastore-74327' 1500 949
7 vcenter.simplivity.com 'vim.Datastore:datastore-142929' 274 271
8 vcenter.simplivity.com 'vim.Datastore:datastore-708' 4677 1933
Expected:
A B C D
0 vcenter-oracle.com 'vim.Datastore:datastore-357' 439 430
1 vcenter-oracle.com 'vim.Datastore:datastore-311' 439 430
2 vcenter-oracle.com 'vim.Datastore:datastore-306' 439 430
3 vcenter-oracle.com 'vim.Datastore:datastore-262' 20480 7030
4 vcenter-oracle.com 'vim.Datastore:datastore-356' 439 430
5 vcenter-oracle.com 'vim.Datastore:datastore-465' 52 46
6 vcenter-oracle.com 'vim.Datastore:datastore-94' 5836 1850
7 vcenter-oracle.com 'vim.Datastore:datastore-122' 11646 3592
8 vcenter-oracle.com 'vim.Datastore:datastore-89' 52 46
9 vcenter-oracle.com 'vim.Datastore:datastore-83' 52 46
10 vcenter-oracle.com 'vim.Datastore:datastore-149' 52 46
11 vcenter.simplivity.com 'vim.Datastore:datastore-143230' 1945 501
12 vcenter.simplivity.com 'vim.Datastore:datastore-52354' 5120 2096
13 vcenter.simplivity.com 'vim.Datastore:datastore-142927' 274 271
14 vcenter.simplivity.com 'vim.Datastore:datastore-143231' 2048 987
15 vcenter.simplivity.com 'vim.Datastore:datastore-878' 553 549
16 vcenter.simplivity.com 'vim.Datastore:datastore-877' 553 552
17 vcenter.simplivity.com 'vim.Datastore:datastore-74327' 1500 949
18 vcenter.simplivity.com 'vim.Datastore:datastore-142929' 274 271
19 vcenter.simplivity.com 'vim.Datastore:datastore-708' 4677 1933
There is little tweak needs to be done as mentioned in the previous answer that you need to declare list construct before the first loop, as after every instance change, raw_data getting reinitialize to an empty list.
try below, it should work for you.
s = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)
s.verify_mode = ssl.CERT_NONE
userid = input("please enter your wbi userid (ex. abx#example.com):")
p = getpass.getpass("password:")
vcenter = ["vcenter-oracle.com","vcente.simplivity.com"]
# Place your list construct here
raw_data = []
# First loop
for instance in vcenter:
try:
c = SmartConnect(host=instance, user=userid, pwd=p, sslContext=s)
except Exception as e:
print(e)
continue
content = c.content
obj_ds = content.viewManager.CreateContainerView(content.rootFolder,[vim.Datastore],True)
# Second loop
for z in obj_ds.view:
vc_data = instance, z, int(z.summary.capacity/(1024*1024*1024)) ,int(z.summary.freeSpace/(1024*1024*1024))
raw_data.append(vc_data)
# Create the DataFrame and process the columns.
# Keep the DataFrame outside the for loop.
df = pd.DataFrame(raw_data, columns=['Vcenter', 'DS', 'TDS', 'FDS'])
print(df)
# new df to concat every instance
final_df = pd.DataFrame()
for instance in vcenter:
try:
c = SmartConnect(host=instance, user=userid, pwd=p, sslContext=s)
except Exception as e:
print(e)
continue
content = c.content
obj_ds = content.viewManager.CreateContainerView(content.rootFolder,[vim.Datastore],True)
# Lists
a_list = []
b_list = []
c_list = []
d_list = []
# for loop
for z in obj_ds.view:
a_list.append(instance)
b_list.append(z)
c_list.append(int(z.summary.capacity/(1024*1024*1024)))
d_list.append(int(z.summary.freeSpace/(1024*1024*1024)))
#dataframe
df = pd.DataFrame({'A': a_list, 'B': b_list, 'C': c_list, 'D': d_list})
# append in final_df
final_df = pd.concat([final_df, df])
# reset index
final_df = final_df.reset_index(drop=True)
print(final_df)
There are two for loops in your code, and you initialize pandas dataframe inside the first loop which results in your situation, where a new dataframe is generated for every instance. You need to initialize the dataframe before the first loop or outside the first loop to get a single dataframe.
Here is the corrected code:
df_rows = []
# first for loop
for instance in vcenter:
try:
c = SmartConnect(host=instance, user=userid, pwd=p, sslContext=s)
except Exception as e:
print(e)
continue
content = c.content
obj_ds = content.viewManager.CreateContainerView(content.rootFolder,[vim.Datastore],True)
# List for storing single row data
row = []
# second for loop
for z in obj_ds.view:
row.append(instance)
row.append(z)
row.append(int(z.summary.capacity/(1024*1024*1024)))
row.append(int(z.summary.freeSpace/(1024*1024*1024)))
df_rows.append(row)
df = pd.DataFrame(df_rows, columns=['A', 'B', 'C', 'D'])
print(df)
Hope this has helped.

RuntimeError: Factor is exactly singular

I trying to find vertex similarities using random walk approach, in this work a transition matrix is used. Each time when I tried to run the code implemented using python I get this error. I also read similar question but no specific answer. Can you help me on how to solve this problem, Your help is really appreciated.
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-259-2639b08a8eb7> in <module>()
45
46
---> 47 tuple_steps_prob,b=similarities(training_graph,test_edge_list)
48 print(tuple_steps_prob)
49 # pre_list_=Precision(tuple_steps_prob, test_edge_list,test_num,b)
<ipython-input-237-e0348fd15773> in similarities(graph, test_edge_list)
16 prob_vec[0][k] = 1
17 #print(prob_vec)
---> 18 extracted,prob,y=RandomWalk(graph,nodes,adj,prob_vec)
19
20 j=0
<ipython-input-236-6b0298295e01> in RandomWalk(G, nodes, adj, prob_vec)
31 beta_=0.1
32
---> 33 TM = Transition_Matrix(adj,beta_)
34
35 extracted1=[]
~\Desktop\RW\RW\Transition_Probability_Matrix.py in Transition_Matrix(adj, beta_)
18
19 Iden=np.identity(len(TM))
---> 20
21
22 Transition=beta_/(1+beta_) * Iden + 1/(1+beta_) * TM
~\Anaconda3\lib\site-packages\scipy\sparse\linalg\matfuncs.py in inv(A)
72 """
73 I = speye(A.shape[0], A.shape[1], dtype=A.dtype, format=A.format)
---> 74 Ainv = spsolve(A, I)
75 return Ainv
76
~\Anaconda3\lib\site-packages\scipy\sparse\linalg\dsolve\linsolve.py in spsolve(A, b, permc_spec, use_umfpack)
196 else:
197 # b is sparse
--> 198 Afactsolve = factorized(A)
199
200 if not isspmatrix_csc(b):
~\Anaconda3\lib\site-packages\scipy\sparse\linalg\dsolve\linsolve.py in factorized(A)
438 return solve
439 else:
--> 440 return splu(A).solve
441
442
~\Anaconda3\lib\site-packages\scipy\sparse\linalg\dsolve\linsolve.py in splu(A, permc_spec, diag_pivot_thresh, relax, panel_size, options)
307 _options.update(options)
308 return _superlu.gstrf(N, A.nnz, A.data, A.indices, A.indptr,
--> 309 ilu=False, options=_options)
310
311
RuntimeError: Factor is exactly singular

Matplotlib - How to draw a 3D plane

I want to draw a 3D-plane graph using matplotlib. I do not understand why I receive an error to indicate x and y must be the same length.
In [134]: dat_vis
Out[134]:
param_C param_gamma mean_test_score x y
4 1 0.001 0.875129 0 1
5 1 0.0001 0.844759 0 0
6 10 0.001 0.903091 0.00900901 1
7 10 0.0001 0.875191 0.00900901 0
8 100 0.001 0.899622 0.0990991 1
9 100 0.0001 0.902420 0.0990991 0
10 1000 0.001 0.909187 1 1
11 1000 0.0001 0.896094 1 0
In [135]: ax.plot_trisurf(dat_vis.x, dat_vis.y, dat_vis.mean_test_score)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-135-1693be3ae757> in <module>()
----> 1 ax.plot_trisurf(dat_vis.x, dat_vis.y, dat_vis.mean_test_score)
~/anaconda3/lib/python3.6/site-packages/mpl_toolkits/mplot3d/axes3d.py in plot_trisurf(self, *args, **kwargs)
1976 lightsource = kwargs.pop('lightsource', None)
1977
-> 1978 tri, args, kwargs = Triangulation.get_from_args_and_kwargs(*args, **kwargs)
1979 if 'Z' in kwargs:
1980 z = np.asarray(kwargs.pop('Z'))
~/anaconda3/lib/python3.6/site-packages/matplotlib/tri/triangulation.py in get_from_args_and_kwargs(*args, **kwargs)
162 mask = kwargs.pop('mask', None)
163
--> 164 triangulation = Triangulation(x, y, triangles, mask)
165 return triangulation, args, kwargs
166
~/anaconda3/lib/python3.6/site-packages/matplotlib/tri/triangulation.py in __init__(self, x, y, triangles, mask)
53 # No triangulation specified, so use matplotlib._qhull to obtain
54 # Delaunay triangulation.
---> 55 self.triangles, self._neighbors = _qhull.delaunay(x, y)
56 self.is_delaunay = True
57 else:
ValueError: x and y must be 1D arrays of the same length
I get the Dataframe object from the sklearn.model_selection.GridSearchCV(). it return an object dtypes columns, so when i try to use the variance to draw a gragh, it can't be operated well. so if you can't find where the question is ,you should come back and make sure you have a right dtypes.

Sci-kit learn how to print labels for confusion matrix?

So I'm using sci-kit learn to classify some data. I have 13 different class values/categorizes to classify the data to. Now I have been able to use cross validation and print the confusion matrix. However, it only shows the TP and FP etc without the classlabels, so I don't know which class is what. Below is my code and my output:
def classify_data(df, feature_cols, file):
nbr_folds = 5
RANDOM_STATE = 0
attributes = df.loc[:, feature_cols] # Also known as x
class_label = df['task'] # Class label, also known as y.
file.write("\nFeatures used: ")
for feature in feature_cols:
file.write(feature + ",")
print("Features used", feature_cols)
sampler = RandomOverSampler(random_state=RANDOM_STATE)
print("RandomForest")
file.write("\nRandomForest")
rfc = RandomForestClassifier(max_depth=2, random_state=RANDOM_STATE)
pipeline = make_pipeline(sampler, rfc)
class_label_predicted = cross_val_predict(pipeline, attributes, class_label, cv=nbr_folds)
conf_mat = confusion_matrix(class_label, class_label_predicted)
print(conf_mat)
accuracy = accuracy_score(class_label, class_label_predicted)
print("Rows classified: " + str(len(class_label_predicted)))
print("Accuracy: {0:.3f}%\n".format(accuracy * 100))
file.write("\nClassifier settings:" + str(pipeline) + "\n")
file.write("\nRows classified: " + str(len(class_label_predicted)))
file.write("\nAccuracy: {0:.3f}%\n".format(accuracy * 100))
file.writelines('\t'.join(str(j) for j in i) + '\n' for i in conf_mat)
#Output
Rows classified: 23504
Accuracy: 17.925%
0 372 46 88 5 73 0 536 44 317 0 200 127
0 501 29 85 0 136 0 655 9 154 0 172 67
0 97 141 78 1 56 0 336 37 429 0 435 198
0 135 74 416 5 37 0 507 19 323 0 128 164
0 247 72 145 12 64 0 424 21 296 0 304 223
0 190 41 36 0 178 0 984 29 196 0 111 43
0 218 13 71 7 52 0 917 139 177 0 111 103
0 215 30 84 3 71 0 1175 11 55 0 102 62
0 257 55 156 1 13 0 322 184 463 0 197 160
0 188 36 104 2 34 0 313 99 827 0 69 136
0 281 80 111 22 16 0 494 19 261 0 313 211
0 207 66 87 18 58 0 489 23 157 0 464 239
0 113 114 44 6 51 0 389 30 408 0 338 315
As you can see, you can't really know what column is what and the print is also "misaligned" so it's difficult to understand.
Is there a way to print the labels as well?
From the doc, it seems that there is no such option to print the rows and column labels of the confusion matrix. However, you can specify the label order using argument labels=...
Example:
from sklearn.metrics import confusion_matrix
y_true = ['yes','yes','yes','no','no','no']
y_pred = ['yes','no','no','no','no','no']
print(confusion_matrix(y_true, y_pred))
# Output:
# [[3 0]
# [2 1]]
print(confusion_matrix(y_true, y_pred, labels=['yes', 'no']))
# Output:
# [[1 2]
# [0 3]]
If you want to print the confusion matrix with labels, you may try pandas and set the index and columns of the DataFrame.
import pandas as pd
cmtx = pd.DataFrame(
confusion_matrix(y_true, y_pred, labels=['yes', 'no']),
index=['true:yes', 'true:no'],
columns=['pred:yes', 'pred:no']
)
print(cmtx)
# Output:
# pred:yes pred:no
# true:yes 1 2
# true:no 0 3
Or
unique_label = np.unique([y_true, y_pred])
cmtx = pd.DataFrame(
confusion_matrix(y_true, y_pred, labels=unique_label),
index=['true:{:}'.format(x) for x in unique_label],
columns=['pred:{:}'.format(x) for x in unique_label]
)
print(cmtx)
# Output:
# pred:no pred:yes
# true:no 3 0
# true:yes 2 1
It is important to ensure that the way you label your confusion matrix rows and columns corresponds exactly to the way sklearn has coded the classes. The true order of the labels can be revealed using the .classes_ attribute of the classifier. You can use the code below to prepare a confusion matrix data frame.
labels = rfc.classes_
conf_df = pd.DataFrame(confusion_matrix(class_label, class_label_predicted, columns=labels, index=labels))
conf_df.index.name = 'True labels'
The second thing to note is that your classifier is not predicting labels well. The number of correctly predicted labels is shown on the main diagonal of the confusion matrix. You have non-zero values accross the matrix and some classes have not been predicted at all - the columns that are all zero. It might be a good idea to run the classifier with its default parameters and then try to optimise them.
Another better way of doing this is using crosstab function in pandas.
pd.crosstab(y_true, y_pred, rownames=['True'], colnames=['Predicted'], margins=True)
or
pd.crosstab(le.inverse_transform(y_true),
le.inverse_transform(y_pred),
rownames=['True'],
colnames=['Predicted'],
margins=True)
Since confusion matrix is just a numpy matrix, it does not contain any column information. What you can do is convert your matrix into a dataframe and then print this dataframe.
import pandas as pd
import numpy as np
def cm2df(cm, labels):
df = pd.DataFrame()
# rows
for i, row_label in enumerate(labels):
rowdata={}
# columns
for j, col_label in enumerate(labels):
rowdata[col_label]=cm[i,j]
df = df.append(pd.DataFrame.from_dict({row_label:rowdata}, orient='index'))
return df[labels]
cm = np.arange(9).reshape((3, 3))
df = cm2df(cm, ["a", "b", "c"])
print(df)
Code snippet is from https://gist.github.com/nickynicolson/202fe765c99af49acb20ea9f77b6255e
Output:
a b c
a 0 1 2
b 3 4 5
c 6 7 8
It appears your data has 13 different classes, which is why your confusion matrix has 13 rows and columns. Furthermore, your classes aren't labeled in any way, just integers from what I can see.
If this isn't the case, and your training data has actual labels, you can pass a list of unique labels to confusion_matrix
conf_mat = confusion_matrix(class_label, class_label_predicted, df['task'].unique())

Categories