I know theres tons of similar question titles but none of them solved my particular question.
So I have this code:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
# my_list contains 983 list items
df = pd.DataFrame(np.array(my_list), columns=list('ABCDEF'))
df contains 983 items composed of lists of list
df.head()
A B C D E F
0 47 5 17 16 57 58
1 6 23 34 21 46 37
2 57 5 53 42 18 55
3 43 24 36 16 39 22
4 32 53 5 18 34 29
scaler = StandardScaler().fit(df.values)
transformed_dataset = scaler.transform(df.values)
transformed_df = pd.DataFrame(data=transformed_dataset, index=df.index)
number_of_rows = df.values.shape[0] # all our lists
window_length = 983 # amount of past number list we need to take in consideration for prediction
number_of_features = df.values.shape[1] # number count
train = np.empty([number_of_rows-window_length, window_length, number_of_features], dtype=float)
label = np.empty([number_of_rows-window_length, number_of_features], dtype=float)
window_length = 982
for i in range(0, number_of_rows-window_length):
train[i]=transformed_df.iloc[i:i+window_length,0:number_of_features]
label[i]=transformed_df.iloc[i:i+window_length:i+window_length+1,0:number_of_features]
train.shape
(0, 983, 6)
label.shape
(0, 6)
train[0] is working fine but when I do train[1] I got this error:
train[1]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-43-e73aed9430c6> in <module>
----> 1 train[1]
IndexError: index 1 is out of bounds for axis 0 with size 0
also when I do label[0], its fine. but when I do label[1] I got this error:
label[1]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-45-1e13a70afa10> in <module>
----> 1 label[1]
IndexError: index 1 is out of bounds for axis 0 with size 0
how to fix IndexErrors
You're creating an array whose first dimension has size 0 - that's why you're getting these errors
You're using the value number_of_rows - window_length for the first dimension - which is 0. I guess that's not what you want.
Related
IndexError Traceback (most recent call last)
in
----> 1 create_tf_record_2D(training_set, train_tfrec2D, LABELS)
2 create_tf_record_2D(test_set, test_tfrec2D, LABELS)
3 create_tf_record_2D(validation_set, val_tfrec2D, LABELS)
in create_tf_record_2D(img_filenames, tf_rec_filename, labels)
19
20 # load the image and label
---> 21 img, label = load_image_2D(meta_data, labels)
22
23 # create a feature
in load_image_2D(abs_path, labels)
27
28 # make the 2D image
---> 29 img = slices_matrix_2D(img)
30
31 return img, label
in slices_matrix_2D(img)
44 for i in range(cut.shape[0]):
45 for j in range(cut.shape[1]):
---> 46 image_2D[i + row_it, j + col_it] = cut[i, j]
47 row_it += cut.shape[0]
48
IndexError: index 440 is out of bounds for axis 0 with size 440
I currently use a Jupyter notebook to analyse company data. My first step is to clean and format the data. My code so far is:
%matplotlib inline
# First, we'll import pandas, a data processing and CSV file I/O library
import pandas as pd
# We'll also import seaborn, a Python graphing library
import warnings # current version of seaborn generates a bunch of warnings that we'll ignore
warnings.filterwarnings("ignore")
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
sns.set(style="dark", color_codes=True)
Users = pd.read_csv("Users.csv", delimiter = ';', engine = 'python') # maak een pandas dataframe per bestand
Users['ContractHours'].fillna(0, inplace = True)
Users['ContractHours'] = Users['ContractHours'].apply(pd.to_numeric)
Afterwards I tried to replace NaN values with zero in the column ContractHours and convert the column to a float. Replace NaN by 0 was succesfull. But I receive the error:
ValueError Traceback (most recent call last)
pandas\_libs\src\inference.pyx in pandas._libs.lib.maybe_convert_numeric (pandas\_libs\lib.c:56156)()
ValueError: Unable to parse string "32,5"
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-22-bcb66b8c06fb> in <module>()
20 #Users = Users['ContractHours'].replace(',', '.')
21 Users['ContractHours'].fillna(0, inplace = True)
---> 22 Users['ContractHours'] = Users['ContractHours'].apply(pd.to_numeric)
23
24 #print(Customers.head(10))
C:\Users\masc\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
2353 else:
2354 values = self.asobject
-> 2355 mapped = lib.map_infer(values, f, convert=convert_dtype)
2356
2357 if len(mapped) and isinstance(mapped[0], Series):
pandas\_libs\src\inference.pyx in pandas._libs.lib.map_infer (pandas\_libs\lib.c:66645)()
C:\Users\masc\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\tools\numeric.py in to_numeric(arg, errors, downcast)
124 coerce_numeric = False if errors in ('ignore', 'raise') else True
125 values = lib.maybe_convert_numeric(values, set(),
--> 126 coerce_numeric=coerce_numeric)
127
128 except Exception:
pandas\_libs\src\inference.pyx in pandas._libs.lib.maybe_convert_numeric (pandas\_libs\lib.c:56638)()
ValueError: Unable to parse string "32,5" at position 0
How can I parse the string "32,5" to a float in the column 'ContractHours'?
I also tried to replace ',' by '.' before but it results to all other columns disappear and the comma is still a comma.
Users = Users['ContractHours'].replace(',', '.')
The result is:
0 34
1 24
2 40
3 35
4 40
5 24
6 32
7 32
8 32
9 24
10 24
11 24
12 24
13 0
14 32
15 28
16 32
17 32
18 28
19 24
20 40
21 40
22 36
23 24
24 32,5
25 36
26 36
27 24
28 40
29 40
30 28
31 32
32 32
33 40
34 32
35 24
36 24
37 40
38 25
39 24
Name: ContractHours, dtype: object
and all other columns are disappeared and 32,5 needs to be 32.5
Use parameter decimal for correct floats parsing in read_csv:
Users = pd.read_csv("Users.csv", sep = ';', decimal=',')
Your solution should be changed by regex=True for replace by substrings:
Users = Users['ContractHours'].replace(',', '.', regex=True).astype(float)
I'm a bit confused on an error I keep running into. I didn't have it before, but at the same time my data was wrong so I had to re-write the code.
Running the following:
plt.figure(figsize=(20,10))
x = np.arange(1416, 1426, 0.009766)
gaverage = np.empty((21,1024), dtype = np.float64)
calibdata = open(pathc + 'calib_5m.dat').readlines()
#print(np.size(calibdata)) ||| Yields: 624
#print(np.size(calibdata)//16) ||| Yields: 39
calib = np.empty(shape=(np.size(calibdata)//16,1024), dtype=np.float64)
for i in range(0, np.size(calibdata)//4):
calib[i] = calibdata[i*4+3].split()
caverage = np.average(calib[i] ,axis = 0)
Yields this:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-25-87f3f4739851> in <module>()
11 calib = np.empty(shape=(np.size(calibdata)//16,1024), dtype=np.float64)
12 for i in range(0, np.size(calibdata)//4):
---> 13 calib[i] = calibdata[i*4+3].split()
14 caverage = np.average(calib[i] ,axis = 0)
15
IndexError: index 39 is out of bounds for axis 0 with size 39
Now what I'm trying to do here is basically take every 4th line in the file read in calibdata and write it to a new array, calib[i]. If the indices are the same size how are they out of bounds? I think there's some fundamentally flawed logic here on my part so if anyone can point out where I'm falling short, that would be great.
calib is initialized to size (39,n). But i iterator goes well beyond that:
In [243]: for i in range(np.size(calibdata)//4):
...: print(i, i*4+3)
...:
0 3
1 7
2 11
3 15
4 19
5 23
6 27
7 31
8 35
....
147 591
148 595
149 599
150 603
151 607
152 611
153 615
154 619
155 623
In [244]: calib=np.zeros((np.size(calibdata)//16),int)
In [245]: calib.shape
Out[245]: (39,)
I'm trying to find the convex hull of a series of points based on two columns of a pandas dataframe.
My current code is:
# Create column of point co-ordinates
df['xy'] = df.apply(lambda x: [x['col_1'], x['col_2']], axis=1)
# Return a numpy array of the point coordinates
point_list = df.xy.values
# pass the list to ConvexHull (imported using: from scipy.spatial import ConvexHull)
hull = ConvexHull(point_list)
I get this error when I run:
Traceback (most recent call last):
File "<ipython-input-41-517201a29182>", line 1, in <module>
hull = ConvexHull(point_list)
File "qhull.pyx", line 2220, in scipy.spatial.qhull.ConvexHull.__init__ (scipy\spatial\qhull.c:19058)
File "C:\Users\****\AppData\Local\Continuum\Anaconda\lib\site- packages\numpy\core\numeric.py", line 550, in ascontiguousarray
return array(a, dtype, copy=False, order='C', ndmin=1)
ValueError: setting an array element with a sequence.
Any thoughts on this?
Best Regards,
What you're doing looks overly complicated, you can pass df columns directly to ConvexHull:
In [311]:
from scipy.spatial import ConvexHull
df = pd.DataFrame({'col_1':np.random.randn(30), 'col_2':np.random.randn(30), 'col3':0})
df
Out[311]:
col3 col_1 col_2
0 0 0.837349 1.526832
1 0 -0.282778 -0.150751
2 0 -0.331192 -0.382630
3 0 -0.933054 -0.234423
4 0 1.074336 -1.180293
5 0 0.296417 0.626924
6 0 0.806266 -0.501335
7 0 -1.192482 -1.793160
8 0 0.920646 1.377393
9 0 -1.255671 0.428256
10 0 -1.518031 0.888582
11 0 1.231974 0.566314
12 0 -0.717847 -0.236354
13 0 0.758947 -0.286670
14 0 -1.546001 1.774912
15 0 -0.707825 -0.529058
16 0 0.446111 0.406430
17 0 0.711017 0.774281
18 0 -2.616337 0.293725
19 0 -0.370344 -0.471336
20 0 -0.281950 -0.243941
21 0 -1.088772 -1.471154
22 0 -0.422274 -0.266592
23 0 0.423735 -0.341429
24 0 1.166969 -0.329791
25 0 0.689842 1.143460
26 0 0.462430 -0.843409
27 0 3.071030 1.615058
28 0 -0.812258 0.272436
29 0 0.707237 -1.717054
Then I can pass the columns directly:
hull = ConvexHull(df[['col_1','col_2']])
import matplotlib.pyplot as plt
plt.plot(df['col_1'], df['col_2'], 'o')
for simplex in hull.simplices:
plt.plot(df['col_1'].iloc[simplex], df['col_2'].iloc[simplex], 'k-')
Which produces this plot:
I would like to plot an histogram representing the value TP on the y axis and the method on the x axis. In particular I would like to obtain different figures according to the value of the column 'data'.
In this case I want a first histogram with values 2,1,6,9,8,1,0 and a second histogram with values 10,10,16,...
The python version of ggplot seems to be slightly different by the R ones.
FN FP TN TP data method
method
SS0208 18 0 80 2 A p=100 n=100 SNR=0.5 SS0208
SS0408 19 0 80 1 A p=100 n=100 SNR=0.5 SS0408
SS0206 14 9 71 6 A p=100 n=100 SNR=0.5 SS0206
SS0406 11 6 74 9 A p=100 n=100 SNR=0.5 SS0406
SS0506 12 6 74 8 A p=100 n=100 SNR=0.5 SS0506
SS0508 19 0 80 1 A p=100 n=100 SNR=0.5 SS0508
LKSC 20 0 80 0 A p=100 n=100 SNR=0.5 LKSC
SS0208 10 1 79 10 A p=100 n=100 SNR=10 SS0208
SS0408 10 0 80 10 A p=100 n=100 SNR=10 SS0408
SS0206 4 5 75 16 A p=100 n=100 SNR=10 SS0206
As a first step I have tried to plot only one histogram and I received an error.
df = df[df.data == df.data.unique()[0]]
In [65]: ggplot() + geom_bar(df, aes(x='method', y='TP'), stat='identity')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-65-dd47b8d85375> in <module>()
----> 1 ggplot() + geom_bar(df, aes(x='method', y='TP'), stat='identity')
TypeError: __init__() missing 2 required positional arguments: 'aesthetics' and 'data'w
In [66]:
I have tried different combinations of commands but I did not solve.
Once that this first problem has been solved I would like the histograms grouped according to the value of 'data'. This could probably be done by 'facet_wrap'
This is probably because you called ggplot() without an argument (Not sure if that should be possible. If you think so, please add a issue on http://github.com/yhat/ggplot).
Anyway, this should work:
ggplot(df, aes(x='method', y='TP')) + geom_bar(stat='identity')
Unfortunately, faceting with geom_bar doesn't work yet properly (only when all facets have all levels/ x values!) -> Bugreport