list index out of range reading files - python

I am a beginner of python and would need some help. I have run into a problem when trying to manipulating some dat-files.
I have created 159 dat.files (refitted_to_digit0_0.dat, refitted_to_digit0_1.dat, ...refitted_to_digit0_158.dat) containing time series data of two columns (timestep, value) of 2999 rows. In my python program I have created a list of these files with filelist_refit_0=glob.glob('refitted_to_digit0_*')
plist_refit_0=[]
I now try to load the second column of each 159 files into the plist_refit_0 so that each place in the list contains an array of 2999 values (second columns) that I will use for further manipulations. I have created a for-loop for this and use the len(filelist_refit_0) as the range for the loop. The length being 159 (number of files: 0-158).
However, when I run this I get an error message: list index out of range.
I have tried with a lower range for the for-loop and it seems to work up until range 66 but not above that. filelist_refit_0[66] refer to file refitted_to_digit0_158.dat and filelist_refit_0[67] refer to refitted_to_digit0_16.dat. filelist_refit_0[158] refer to refitted_to_digit0_99.dat. Instead of being sorted in ascending order based on the value 0->158 I think the plist_refit_0 have the files in ascending order based on the digits: refitted_to_digit0_0.dat first, then refitted_to_digit0_1.dat, then refitted_to_digit0_10.dat, then refitted_to_digit0_100.dat, then refitted_to_digit0_101.dat resulting in refitted_to_digit0_158.dat being on place 66 in the list. However, I still don't understand why the compiler interprets the index as being out of range above 66 when the length of the filelist_refit_0 being 159 and there really are 159 files, no matter the order. If anyone can explain this and have some advice how to solve this problem, I highly appreciate it! Thanks for your help.
I have tried the following to understand the sorting:
print len(filelist_refit_0) => 159
print filelist_refit_0[66] => refitted_to_digit0_158.dat
print filelist_refit_0[67] => refitted_to_digit0_16.dat
print filelist_refit_0[158] => refitted_to_digit0_99.dat
print filelist_refit_0[0] => refitted_to_digit0_0.dat
I have "manually" loaded the files and it seems to work for most index e.g.
t, p = loadtxt(filelist_refit_0[65], usecols=(0, 1), unpack=True)
plist_refit_0.append(p)
t, p = loadtxt(filelist_refit_0[67], usecols=(0, 1), unpack=True)
plist_refit_0.append(p)
print plist_refit_0[0]
print plist_refit_0[1]
BUT it does not work for index66!:
t, p = loadtxt(filelist_refit_0[66], usecols=(0, 1), unpack=True)
plist_refit_0.append(p)
Then I get error: list index out of range.
As can be seen above it refers to refitted_to_digit0_158.dat which is the last file. I have looked into the file and it looks exactly the same as all the other files, which the same number of columns and raw-elements (2999). Why is this entry different?
Python 2:
filelist_refit_0 = glob.glob('refitted_to_digit0_*')
plist_refit_0 = []
for i in range(len(filelist_refit_0)):
t, p = loadtxt(filelist_refit_0[i], usecols=(0, 1), unpack=True)
plist_refit_0.append(p)
Traceback (most recent call last):
File "test.py", line 107, in <module>
t,p=loadtxt(filelist_refit_0[i],usecols=(0,1),unpack=True)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/npyio.py", line 1092, in loadtxt
for x in read_data(_loadtxt_chunksize):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/npyio.py", line 1012, in read_data
vals = [vals[j] for j in usecols]
IndexError: list index out of range

Related

Too many values to unpack? python-radar

I have problem which when i run the code it shows an error:
Traceback (most recent call last):
File "C:\Users\server\PycharmProjects\Publictest2\main.py", line 19, in <module>
Distance = radar.route.distance(Starts, End, modes='transit')
File "C:\Users\server\PycharmProjects\Publictest2\venv\lib\site-packages\radar\endpoints.py", line 612, in distance
(origin_lat, origin_lng) = origin
ValueError: too many values to unpack (expected 2)
My Code:
from radar import RadarClient
import pandas as pd
API_key = 'API'
radar = RadarClient(API_key)
file = pd.read_excel('files')
file['AntGeo'] = Sourced[['Ant_lat', 'Ant_long']].apply(','.join, axis=1)
file['BaseGeo'] = Sourced[['Base_lat', 'Base_long']].apply(','.join, axis=1)
antpoint = file['AntGeo']
basepoint = file['BaseGeo']
for antpoint in antpoint:
dist= radar.route.distance(antpoint , basepoint, modes='transit')
dist= dist['routes'][0]['distance']
dist= dist / 1000
Firstly, your error code does not match your given code sample correctly.
It is apparent you are working with the python library for the Radar API.
Your corresponding line 19 is dist= radar.route.distance(antpoint , basepoint, modes='transit')
From the radar-python 'pypi manual', your route should be referenced as:
## Routing
radar.route.distance(origin=[lat,lng], destination=[lat,lng], modes='car', units='metric')
Without having sight of your dataset, file, one can nonetheless deduce or expect the following:
Your antpoint and basepoint must be a two-item list (or tuple).
For instance, your antpoint ought to have a coordinate like [40.7041029, -73.98706]
See the radar-python manual
line 11 and 13 in your code
file['AntGeo'] = Sourced[['Ant_lat', 'Ant_long']].apply(','.join, axis=1)
file['BaseGeo'] = Sourced[['Base_lat', 'Base_long']].apply(','.join, axis=1)
Your error is occuring at this part:
Distance = radar.route.distance(Starts, End, modes='transit')
(origin_lat, origin_lng) = origin
First of all check the amount of variables that "origin" delivers to you, it's mismatched with the expectation I guess.

Issue appending info from networkx dict to a numpy array

Trying to take strings from my networkx dict and append them to a certain place within an array:
def sameSports(node1, node2, G):
A = [G.node[node1]['sport1'], G.node[node1]['sport2'], G.node[node1]['sport3']]
B = [G.node[node2]['sport1'], G.node[node2]['sport2'], G.node[node2]['sport3']]
sharedSports = np.intersect1d(A,B)
sharedSports = np.delete(sharedSports, np.where(sharedSports == 'N/A'))
return sharedSports
def meetingDay(sharedSports,rightAxis):
meetingDays = [[] for i in range (len(sharedSports))]
columns=['Monday','Tuesday','Wednesday','Thursday','Friday']
for i in range (0,5):
for t in sharedSports:
n=np.where(sharedSports==t)
meetingDays[n].append(t)
if t in rightAxis[i]:
meetingDays[n].append(columns[i])
return meetingDays
The G.node[n]['sport'] dicts return single letter strings 'A'-'W', and the rightAxis array is a 5-section array containing some of the same letters. Getting this error:
Traceback (most recent call last):
File ".\main.py", line 18, in
a,b=add_sport_to_edges(G , rightAxis)
File "C:\Users\rjtkr\OneDrive\Documents\Work\Summer 2020 Research\epidemic\Athletics Network\sportsnetwork.py", line 99, in add_sport_to_edges
temp = meetingDay(sharedSports, rightAxis)
File "C:\Users\rjtkr\OneDrive\Documents\Work\Summer 2020 Research\epidemic\Athletics Network\sportsnetwork.py", line 90, in meetingDay
meetingDays[n].append(t)
TypeError: list indices must be integers or slices, not tuple
Unsure if it's 'n' or 't' that it doesn't like, but fairly sure that neither is a tuple. Hopefully I included all relevant code. The other lines showing up in the error come after the attached code, and the error remains the same when they are removed.

ValueError in Python 3 code

I have this code that will allow me to count the number of missing rows of numbers within the csv for a script in Python 3.6. However, these are the following errors in the program:
Error:
Traceback (most recent call last):
File "C:\Users\GapReport.py", line 14, in <module>
EndDoc_Padded, EndDoc_Padded = (int(s.strip()[2:]) for s in line)
File "C:\Users\GapReport.py", line 14, in <genexpr>
EndDoc_Padded, EndDoc_Padded = (int(s.strip()[2:]) for s in line)
ValueError: invalid literal for int() with base 10: 'AC-SEC 000000001'
Code:
import csv
def out(*args):
print('{},{}'.format(*(str(i).rjust(4, "0") for i in args)))
prev = 0
data = csv.reader(open('Padded Numbers_export.csv'))
print(*next(data), sep=', ') # header
for line in data:
EndDoc_Padded, EndDoc_Padded = (int(s.strip()[2:]) for s in line)
if start != prev+1:
out(prev+1, start-1)
prev = end
out(start, end)
I'm stumped on how to fix these issues.Also, I think the csv many lines in it, so if there's a section that limits it to a few numbers, please feel free to update me on so.
CSV Snippet (Sorry if I wasn't clear before!):
The values you have in your CSV file are not numeric.
For example, FMAC-SEC 000000001 is not a number. So when you run int(s.strip()[2:]), it is not able to convert it to an int.
Some more comments on the code:
What is the utility of doing EndDoc_Padded, EndDoc_Padded = (...)? Currently you are assigning values to two variables with the same name. Either name one of them something else, or just have one variable there.
Are you trying to get the two different values from each column? In that case, you need to split line into two first. Are the contents of your file comma separated? If yes, then do for s in line.split(','), otherwise use the appropriate separator value in split().
You are running this inside a loop, so each time the values of the two variables would get updated to the values from the last line. If you're trying to obtain 2 lists of all the values, then this won't work.

IndexError: too many indices for array

I know there is a ton of these threads but all of them are for very simple cases like 3x3 matrices and things of that sort and the solutions do not even begin to apply to my situation. So I'm trying to graph G versus l1 (that's not an eleven, but an L1). The data is in the file that I loaded from an excel file. The excel file is 14x250 so there are 14 arguments, each with 250 data points. I had another user (shout out to Hugh Bothwell!) help me with an error in my code, but now another error has surfaced.
So here is the code in question:
# format for CSV file:
header = ['l1', 'l2', 'l3', 'l4', 'l5', 'EI',
'S', 'P_right', 'P1_0', 'P3_0',
'w_left', 'w_right', 'G_left', 'G_right']
def loadfile(filename, skip=None, *args):
skip = set(skip or [])
with open(filename, *args) as f:
cr = csv.reader(f, quoting=csv.QUOTE_NONNUMERIC)
return np.array(row for i,row in enumerate(cr) if i not in skip)
#plot data
outputs_l1 = [loadfile('C:\\Users\\Chris\\Desktop\\Work\\Python Stuff\\BPCROOM - Shingles analysis\\ERR analysis\\l_1 analysis//BS(1) ERR analysis - l_1 - P_3 = {}.csv'.format(p)) for p in p3_arr]
col = {name:i for i,name in enumerate(header)}
fig = plt.figure()
for data,color in zip(outputs_l1, colors):
xs = data[:, col["l1" ]]
gl = data[:, col["G_left" ]] * 1000.0 # column 12
gr = data[:, col["G_right"]] * 1000.0 # column 13
plt.plot(xs, gl, color + "-", gr, color + "--")
for output, col in zip(outputs_l1, colors):
plt.plot(output[:,0], output[:,11]*1E3, col+'--')
plt.ticklabel_format(axis='both', style='plain', scilimits=(-1,1))
plt.xlabel('$l1 (m)$')
plt.ylabel('G $(J / m^2) * 10^{-3}$')
plt.xlim(xmin=.2)
plt.ylim(ymax=2, ymin=0)
plt.subplots_adjust(top=0.8, bottom=0.15, right=0.7)
After running the entire program, I recieve the error message:
Traceback (most recent call last):
File "C:/Users/Chris/Desktop/Work/Python Stuff/New Stuff from Brenday 8 26 2014/CD_ssa_plot(2).py", line 115, in <module>
xs = data[:, col["l1" ]]
IndexError: too many indices for array
and before I ran into that problem, I had another involving the line a few below the one the above error message refers to:
Traceback (most recent call last): File "FILE", line 119, in <module>
gl = data[:, col["G_left" ]] * 1000.0 # column 12
IndexError: index 12 is out of bounds for axis 1 with size 12
I understand the first error, but am just having problems fixing it. The second error is confusing for me though. My boss is really breathing down my neck so any help would be GREATLY appreciated!
I think the problem is given in the error message, although it is not very easy to spot:
IndexError: too many indices for array
xs = data[:, col["l1" ]]
'Too many indices' means you've given too many index values. You've given 2 values as you're expecting data to be a 2D array. Numpy is complaining because data is not 2D (it's either 1D or None).
This is a bit of a guess - I wonder if one of the filenames you pass to loadfile() points to an empty file, or a badly formatted one? If so, you might get an array returned that is either 1D, or even empty (np.array(None) does not throw an Error, so you would never know...). If you want to guard against this failure, you can insert some error checking into your loadfile function.
I highly recommend in your for loop inserting:
print(data)
This will work in Python 2.x or 3.x and might reveal the source of the issue. You might well find it is only one value of your outputs_l1 list (i.e. one file) that is giving the issue.
The message that you are getting is not for the default Exception of Python:
For a fresh python list, IndexError is thrown only on index not being in range (even docs say so).
>>> l = []
>>> l[1]
IndexError: list index out of range
If we try passing multiple items to list, or some other value, we get the TypeError:
>>> l[1, 2]
TypeError: list indices must be integers, not tuple
>>> l[float('NaN')]
TypeError: list indices must be integers, not float
However, here, you seem to be using matplotlib that internally uses numpy for handling arrays. On digging deeper through the codebase for numpy, we see:
static NPY_INLINE npy_intp
unpack_tuple(PyTupleObject *index, PyObject **result, npy_intp result_n)
{
npy_intp n, i;
n = PyTuple_GET_SIZE(index);
if (n > result_n) {
PyErr_SetString(PyExc_IndexError,
"too many indices for array");
return -1;
}
for (i = 0; i < n; i++) {
result[i] = PyTuple_GET_ITEM(index, i);
Py_INCREF(result[i]);
}
return n;
}
where, the unpack method will throw an error if it the size of the index is greater than that of the results.
So, Unlike Python which raises a TypeError on incorrect Indexes, Numpy raises the IndexError because it supports multidimensional arrays.
Before transforming the data into a list, I transformed the data into a list
data = list(data) data = np.array(data)

Tuples Vs List Vs Numpy Arrays for Plotting a Boxplot in Python

I am trying to plot a boxplot for a column in several csv files (without the header row of course), but running into some confusion around tuples, lists and arrays. Here is what I have so far
#!/usr/bin/env python
import csv
from numpy import *
import pylab as p
import matplotlib
#open one file, until boxplot-ing works
f = csv.reader (open('2-node.csv'))
#get all the columns in the file
timeStamp,elapsed,label,responseCode,responseMessage,threadName,dataType,success,bytes,Latency = zip(*f)
#Make list out of elapsed to pop the 1st element -- the header
elapsed_list = list(elapsed)
elapsed_list.pop(0)
#Turn list back to a tuple
elapsed = tuple(elapsed_list)
#Turn list to an numpy array
elapsed_array = array(elapsed_list)
#Elapsed Column statically entered into an array
data_array = ([4631, 3641, 1902, 1937, 1745, 8937] )
print data_array #prints in this format: ([xx,xx,xx,xx]), .__class__ is list ... ?
print elapsed #prints in this format: ('xx','xx','xx','xx'), .__class__ is tuple
print elapsed_list # #print in this format: ['xx', 'xx', 'xx', 'xx', 'xx'], .__class__ is list
print elapsed_array #prints in this format: ['xx' 'xx' 'xx' 'xx' 'xx'] -- notice no commas, .__class__ is numpy.ndarray
p.boxplot (data_array) #works
p.boxplot (elapsed) # does not work, error below
p.boxplit (elapsed_list) #does not work
p.boxplot (elapsed_array) #does not work
p.show()
For boxplots, the 1st argument is an "an array or a sequence of vectors", so I would think elapsed_array would work ... ? But yet data_array, a "list," works ... but elapsed_list` a "list" does not ... ? Is there a better way to do this ... ?
I am fairly new to python, and I would like to understand the what about the differences among a tuple, list, and numpy-array prevents this boxplot from working.
Example error message is:
Traceback (most recent call last):
File "../pullcol.py", line 32, in <module>
p.boxplot (elapsed_list)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/pyplot.py", line 1962, in boxplot
ret = ax.boxplot(x, notch, sym, vert, whis, positions, widths, patch_artist, bootstrap)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/axes.py", line 5383, in boxplot
q1, med, q3 = mlab.prctile(d,[25,50,75])
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/mlab.py", line 946, in prctile
return _interpolate(values[ai],values[bi],frac)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/mlab.py", line 920, in _interpolate
return a + (b - a)*fraction
TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'
elapsed contains strings. Matplotlib needs integers or floats to plot something. Try converting each value of elapsed to integer. You can do this like so
elapsed = tuple([int(i) for i in elapsed])
or as FredL commented below:
elapsed_list = array(elapsed_list, dtype=float)
I'm not familiar with numpy or matplotlib, but just from the description and what's working, it appears it is looking for a nested sequence of sequences. Which is why data_array works as it's a tuple containing a list, where as all your other input is only one layer deep.
As for the differences, a list is a mutable sequence of objects, a tuple is an immutable sequence of objects and an array is a mutable sequence of bytes, ints, chars (basically 1, 2, 4 or 8 byte values).
Here's a link to the Python docs about 5.6. Sequence Types, from there you can jump to more detailed info about lists, tuples, arrays or any of the other sequence types in Python.

Categories