Index out of bounds for a 2D Array? - python

I am having a 2D List and i am trying to retrieve a column with the index spcified as a parameter (type : IntEnum).
I get the index out of bounds error when trying to retrieve any column other then the one at index 0.
Enum:
class Column(IntEnum):
ROAD = 0
SECTION = 1
FROM = 2
TO = 3
TIMESTAMP = 4
VFLOW=5
class TrafficData:
data=[[]]
Below are member methods of TrafficData
Reading from file and storing the matrix:
def __init__(self,file):
self.data=[[word for word in line.split('\t')]for line in file.readlines()[1:]]
Retrieve-ing the desired column:
def getColumn(self,columnName):
return [line[columnName] for line in self.data]
Call:
)
column1 = traficdata.getColumn(columnName=Column.ROAD)
`column2 = traficdata.getColumn(columnName=Column.FROM)` //error
`column3 = traficdata.getColumn(columnName=Column.TO)` //error
I attached a picture with the data after __init__ processing:
[

I tested the code that you provided above, and didn't see any issues. That leads me to believe that there might be something wrong with the data that you have in the file. Could you paste the file data? (the tab delimited data)
UPDATE -
I found the issue - as suspected, it was a data issue (there is a minor code update involved too). Make the following changes -
1) When opening the file use the appropriate encoding, I used utf-16.
2) At the end of the data file that you shared, it contains the text - "(72413 row(s) affected)" along with a couple of new line characters. So, you have 2 options, either manually cleanup the data file, or update the code to ignore the "(72413 row(s) affected)" & "\n" characters.
Hope that helps.

Related

Indexing error after removing line from 2D array

I am facing an 'List Index out of range' error when trying to iterate a for-loop over a table I've created from a CSV extract, but cannot figure out why - even after trying many different methods.
Here is the step by step description of how the error happens :
I'm removing the first line of an imported CSV file, as this
line contains the columns' names but no data. The CSV has the following structure.
columnName1, columnName2, columnName3, columnName4
This, is, some, data
I, have, in, this
very, interesting, CSV, file
After storing the CSV in a first array called oldArray, I want to populate a newArray that will get all values from oldArray but not the first line, which is the column name line, as previously
mentioned. My newArray should then look like this.
This, is, some, data
I, have, in, this
very, interesting, CSV, file
To create this newArray, I'm using the following code with the append() function.
tempList = []
newArray = []
for i in range(len(oldArray)):
if i > 0: #my ugly way of skipping line 0...
for j in range(len(oldArray[0])):
tempList.append(oldArray[i][j])
newArray.append(tempList)
tempList = []
I also stored the columns in their own separate list.
i = 0
for i in range(len(oldArray[0])):
my_columnList[i] = oldArray[0][i]
And the error comes up next : I now want to populate a treeview table from this newArray, using a for-loop and insert (in a function). But I always get the 'Index List out of range error' and I cannot figure out why.
def populateTable(my_tree, newArray, my_columnList):
i = 0
for i in range(len(newArray)):
my_tree.insert('','end', text=newArray[i][0], values = (newArray[i][1:len(newArray[0]))
#(im using the text option to bypass treeview's column 0 problem)
return my_tree
Error message --> " File "(...my working directory...)", line 301, in populateTable
my_tree.insert(parent='', index='end', text=data[i][0], values=(data[i][1:len(data[0])]))
IndexError: list index out of range "
Using that same function with different datasets and columns worked fine, but not for this here newArray.
I'm fairy certain that the error comes strictly from this 'newArray' and is not linked to another parameter.
I've tested the validity of the columns list, of the CSV import in oldArray through some print() functions, and everything seems normal - values, row dimension, column dimension.
This is a great mystery to me...
Thank you all very much for your help and time.
You can find a problem from your error message: File "(...my working directory...)", line 301, in populateTable my_tree.insert(parent='', index='end', text=data[i][0], values=(data[i][1:len(data[0])])) IndexError: list index out of range
It means there is an index out of range in line 301: data[i][0] or data[i][1:len(data[0])]
(i is over len(data)) or (0 or 1 is over len(data[0]))
My guess is there is some empty list in data(maybe data[-1]?).
if data[i] is [] or [some_one_item], then data[i][1:len(data[0])] try to access to second item which not exists.
there is no problem in your "ugly" way to skip line 0 but I recommend having a look on this way
new_array = old_array.copy()
new_array.remove(new_array[0])
now for fixing your issue
looks like you have a problem in the indexing
when you use a for loop using the range of the length of an array you use normal indexing which starts from one while you identify your i variable to be zero
to make it simple
len(oldArray[0])
this is equal to 4 so when you use it in the for loop it's just like saying
for i in range(4):
to fix this you can either subtract 1 from the length of the old array or just identify the i variable to be 1 at the first
i = 1
for i in range(len(oldArray[0])):
my_columnList[i] = oldArray[0][i]
or
i = 0
for i in range(len(oldArray[0])-1):
my_columnList[i] = oldArray[0][i]
this mistake is also repeated in your populateTree function
so in the same way your code would be
def populateTree(my_tree, newArray, my_columnList):
i = 0
for i in range(len(newArray)-1):
my_tree.insert('','end', text=newArray[i][0], values = (newArray[i][1:len(newArray[0]))
#(im using the text option to bypass treeview's column 0 problem)
return my_tree

Is there a way to obtain the actual A1 range from sheetfu - get_data_range()

I am trying to obtain the actual A1 values using the Sheetfu library's get_data_range().
When I use the code below, it works perfectly, and I get what I would expect.
invoice_sheet = spreadsheet.get_sheet_by_name('Invoice')
invoice_data_range = invoice_sheet.get_data_range()
invoice_values = invoice_data_range.get_values()
print(invoice_data_range)
print(invoice_values)
From the print() statements I get:
<Range object Invoice!A1:Q42>
[['2019-001', '01/01/2019', 'Services']...] #cut for brevity
What is the best way to get that "A1:Q42" value? I really only want the end of the range (Q42), because I need to build the get_range_from_a1() argument "A4:Q14". My sheet has known headers (rows 1-3), and the get_values() includes 3 rows that I don't want in the get_values() list.
I guess I could do some string manipulation to pull out the text between the ":" and ">" in
<Range object Invoice!A1:Q42>
...but that seems a bit sloppy.
As a quick aside, it would be fantastic to be able to call get_data_range() like so:
invoice_sheet = spreadsheet.get_sheet_by_name('Invoice')
invoice_data_range = invoice_sheet.get_data_range(start="A4", end="")
invoice_values = invoice_data_range.get_values()
...but that's more like a feature request. (Which I'm happy to do BTW).
Author here. Alan answers it well.
I added some methods at Range level to the library, that are simply shortcuts to the coordinates properties.
from sheetfu import SpreadsheetApp
spreadsheet = SpreadsheetApp("....access_file.json").open_by_id('long_string_id')
sheet = spreadsheet.get_sheet_by_name('test')
data_range = sheet.get_data_range()
starting_row = data_range.get_row()
starting_column = data_range.get_column()
max_row = data_range.get_max_row()
max_column = data_range.get_max_column()
This will effectively tell you the max row and max column that contains data in your sheet.
If you use the get_data_range method, the first row and first column typically is 1.
I received a response from the owner of Sheetfu, and the following code provides the information that I'm looking for.
Example code:
from sheetfu import SpreadsheetApp
spreadsheet = SpreadsheetApp("....access_file.json").open_by_id('long_string_id')
sheet = spreadsheet.get_sheet_by_name('test')
data_range = sheet.get_data_range()
range_max_row = data_range.coordinates.row + data_range.coordinates.number_of_rows - 1
range_max_column = data_range.coordinates.column + data_range.coordinates.number_of_columns - 1
As of this writing, the .coordinates properties are not currently documented, but they are usable, and should be officially documented within the next couple of weeks.

How to use different fonts for two lines within the same cell in Excel?

I have an excel file with a table A6:E233. I had to concatenate columns A and B so that values from B are displayed in a new line. I have achieved that with the CONCATENATE function (and CHAR(10) for new line) that is built into Excel.
After concatenation the spreadsheets looks like this:
EXAMPLE1
Now i would also need different formatting for each line inside the cell, namely size 12, bold for the first line and size 8 for second line:
EXAMPLE2
How do achieve this? If it would be a short table, I would do it manually, but since I have a few files, totally well over 5000 rows, maybe an automated way would be better.
I have found answers that touch upon this problem, but since I dont know how to use VBA, I am more or less lost. I am also using a lot of python and have looked through openpyexl and csv, but have not found a way how to achieve this.
Thank you for your help!
With Excel VBA, you need to use the Characters property of the Range object. For example:
Sub Test()
Dim rngCell As Range
Dim lngPos As Long
'get cell
Set rngCell = Sheet1.Range("A1")
'find linebreak
lngPos = InStr(1, rngCell.Value, vbLf, vbBinaryCompare)
'format either side
rngCell.Characters(1, lngPos).Font.Bold = True
rngCell.Characters(lngPos + 1, Len(rngCell.Value) - lngPos).Font.Color = 1234
End Sub
Which will format like this:
Here, try this code. I built this according to your screenshot.
Sub partialFormatting()
Dim tws As Worksheet
Dim fr, lr As Integer
Dim pos As Integer
Set tws = ThisWorkbook.Worksheets("Sheet1")
fr = 7
lr = tws.Range("A1000000").End(xlUp).Row
For r = fr To lr
With tws.Range("A" & r)
pos = InStr(.Value, vbLf)
With .Characters(Start:=1, Length:=pos - 1).Font
.FontStyle = "Bold"
.Size = 12
End With
With .Characters(Start:=pos + 1, Length:=Len(.Value) - pos).Font
.FontStyle = "Normal"
.Size = 8
End With
End With
Next r
End Sub
Please let me know if you have any questions on how the code works!

Python - Reading a CSV, won't print the contents of the last column

I'm pretty new to Python, and put together a script to parse a csv and ultimately output its data into a repeated html table.
I got most of it working, but there's one weird problem I haven't been able to fix. My script will find the index of the last column, but won't print out the data in that column. If I add another column to the end, even an empty one, it'll print out the data in the formerly-last column - so it's not a problem with the contents of that column.
Abridged (but still grumpy) version of the code:
import os
os.chdir('C:\\Python34\\andrea')
import csv
csvOpen = open('my.csv')
exampleReader = csv.reader(csvOpen)
tableHeader = next(exampleReader)
if 'phone' in tableHeader:
phoneIndex = tableHeader.index('phone')
else:
phoneIndex = -1
for row in exampleReader:
row[-1] =''
print(phoneIndex)
print(row[phoneIndex])
csvOpen.close()
my.csv
stuff,phone
1,3235556177
1,3235556170
Output
1
1
Same script, small change to the CSV file:
my.csv
stuff,phone,more
1,3235556177,
1,3235556170,
Output
1
3235556177
1
3235556170
I'm using Python 3.4.3 via Idle 3.4.3
I've had the same problem with CSVs generated directly by mysql, ones that I've opened in Excel first then re-saved as CSVs, and ones I've edited in Notepad++ and re-saved as CSVs.
I tried adding several different modes to the open function (r, rU, b, etc.) and either it made no difference or gave me an error (for example, it didn't like 'b').
My workaround is just to add an extra column to the end, but since this is a frequently used script, it'd be much better if it just worked right.
Thank you in advance for your help.
row[-1] =''
The CSV reader returns to you a list representing the row from the file. On this line you set the last value in the list to an empty string. Then you print it afterwards. Delete this line if you don't want the last column to be set to an empty string.
If you know it is the last column, you can count them and then use that value minus 1. Likewise you can use your string comparison method if you know it will always be "phone". I recommend if you are using the string compare, convert the value from the csv to lower case so that you don't have to worry about capitalization.
In my code below I created functions that show how to use either method.
import os
import csv
os.chdir('C:\\temp')
csvOpen = open('my.csv')
exampleReader = csv.reader(csvOpen)
tableHeader = next(exampleReader)
phoneColIndex = None;#init to a value that can imply state
lastColIndex = None;#init to a value that can imply state
def getPhoneIndex(header):
for i, col in enumerate(header): #use this syntax to get index of item
if col.lower() == 'phone':
return i;
return -1; #send back invalid index
def findLastColIndex(header):
return len(tableHeader) - 1;
## methods to check for phone col. 1. by string comparison
#and 2. by assuming it's the last col.
if len(tableHeader) > 1:# if only one row or less, why go any further?
phoneColIndex = getPhoneIndex(tableHeader);
lastColIndex = findLastColIndex(tableHeader)
for row in exampleReader:
print(row[phoneColIndex])
print('----------')
print(row[lastColIndex])
print('----------')
csvOpen.close()

Unexpected whitespace in python generated strings

I am using Python to generate an ASCII file composed of very long lines. This is one example line (let's say line 100 in the file, '[...]' are added by me to shorten the line):
{6 1,14 1,[...],264 1,270 2,274 2,[...],478 1,479 8,485 1,[...]}
If I open the ASCII file that I generated with ipython:
f = open('myfile','r')
print repr(f.readlines()[99])
I do obtain the expected line printed correctly ('[...]' are added by me to shorten the line):
'{6 1,14 1,[...],264 1,270 2,274 2,[...],478 1,479 8,485 1,[...]}\n'
On the contrary, if I open this file with the program that is suppose to read it, it will generate an exception, complaining about an unexpected pair after 478 1.
So I tried to open the file with vim. Still vim shows no problem, but if I copy the line as printed by vim and paste it in another text editor (in my case TextMate), this is the line that I obtain ('[...]' are added by me to shorten the line):
{6 1,14 1,[...],264 1,270 2,274 2,[...],478 1,4 79 8,485 1,[...]}
This line indeed has a problem after the pair 478 1.
I tried to generate my lines in different ways (concatenating, with cStringIO, ...), but I always obtain this result. When using the cStringIO, for example, the lines are generated as in the following (even though I tried to change this, as well, with no luck):
def _construct_arff(self,attributes,header,data_rows):
"""Create the string representation of a Weka ARFF file.
*attributes* is a dictionary with attribute_name:attribute_type
(e.g., 'num_of_days':'NUMERIC')
*header* is a list of the attributes sorted
(e.g., ['age','name','num_of_days'])
*data_rows* is a list of lists with the values, sorted as in the header
(e.g., [ [88,'John',465],[77,'Bob',223]]"""
arff_str = cStringIO.StringIO()
arff_str.write('#relation %s\n' % self.relation_name)
for idx,att_name in enumerate(header):
try:
name = att_name.replace("\\","\\\\").replace("'","\\'")
arff_str.write("#attribute '%s' %s\n" % (name,attributes[att_name]))
except UnicodeEncodeError:
arff_str.write('#attribute unicode_err_%s %s\n'
% (idx,attributes[att_name]))
arff_str.write('#data\n')
for data_row in data_rows:
row = []
for att_idx,att_name in enumerate(header):
att_type = attributes[att_name]
value = data_row[att_idx]
# numeric attributes can be sparse: None and zeros are not written
if ((not att_type == constants.ARRF_NUMERIC)
or not ((value == None) or value == 0)):
row.append('%s %s' % (att_idx,value))
arff_str.write('{' + (','.join(row)) + '}\n')
return arff_str.getvalue()
UPDATE: As you can see from the code above, the function transforms a given set of data to a special arff file format. I noticed that one of the attributes I was creating contained numbers as strings (e.g., '1', instead of 1). By forcing these numbers into integers:
features[name] = int(value)
I recreated the arff file successfully. However I don't see how this, which is a value, can have an impact on the formatting of *att_idx*, which is always an integer, as also pointed out by #JohnMachin and #gnibbler (thanks for your answers, btw). So, even if my code runs now, I still don't see why this happens. How can the value, if not properly transformed into int, influence the formatting of something else?
This file contains the wrongly formatted version.
The built-in function repr is your friend. It will show you unambiguously what you have in your file.
Do this:
f = open('myfile','r')
print repr(f.readlines()[99])
and edit your question to show the result.
Update: As to how it got there, it is impossible to tell, because it cannot have been generated by the code that you showed. The value 37 should be a value of att_idx which comes from enumerate() and so must be an int. You are formatting this int with %s ... 37 can't become 3rubbish7. Also that should generate att_idx in order 0, 1, etc etc but you are missing many values and there is nothing conditional inside your loop.
Please show us the code that you actually ran.
Update:
And again, this code won't run:
for idx,att_name in enumerate(header):
arff_str.write("#attribute '%s' %s\n" % (name,attributes[att_name]))
because name is not defined; you probably mean att_name.
Perhaps we can short-circuit all this stuffing about: post a copy of your output file (zipped if it's huge) on the web somewhere so that we can see for ourselves what might be disturbing its consumers. Please do edit your question to say which line(s) exhibits(s) the problem.
By the way, you say some of the data is string rather than integer, and the problem goes away if you coerce the data to int by doing features[name] = int(value) ... what is 'features'?? What is 'name'??
Are any of those strings unicode instead of str?
Update 2 (after bad file posted on net)
No info supplied on which line(s) exhibits(s) the problem. As it turned out, no lines exhibited the described problem with attribute 479. I wrote this checking script:
import re, sys
# sample data line:
# {40 1,101 3,319 2,375 2,525 2,530 bug}
# Looks like all data lines end in ",530 bug}" or ",530 other}"
pattern1 = r"\{(?:\d+ \d+,)*\d+ \w+\}$"
matcher1 = re.compile(pattern1).match
pattern2 = r"\{(?:\d+ \d+,)*"
matcher2 = re.compile(pattern2).match
bad_atts = re.compile(r"\D\d+\s+\W").findall
got_data = False
for lino, line in enumerate(open(sys.argv[1], "r"), 1):
if not got_data:
got_data = line.startswith('#data')
continue
if not matcher1(line):
print
print lino, repr(line)
m = matcher2(line)
if m:
print "OK up to offset", m.end()
print bad_atts(line)
Sample output (wrapped at column 80):
581 '{2 1,7 1,9 1,12 1,13 1,14 1,15 1,16 1,17 1,18 1,21 1,22 1,24 1,25 1,26 1,27
1,29 1,32 1,33 1,36 1,39 1,40 1,44 1,48 1,49 1,50 1,54 1,57 1,58 1,60 1,67 1,68
1,69 1,71 1,74 1,75 1,76 1,77 1,80 1,88 1,93 1,101 ,103 6,104 2,109 20,110 3,11
2 2,114 1,119 17,120 4,124 39,128 5,137 1,138 1,139 1,162 1,168 1,172 18,175 1,1
76 6,179 1,180 1,181 2,185 2,187 9,188 8,190 1,193 1,195 2,196 4,197 1,199 3,201
3,202 4,203 5,206 1,207 2,208 1,210 2,211 1,212 5,213 1,215 2,216 3,218 2,220 2
,221 3,225 8,226 1,233 1,241 4,242 1,248 5,254 2,255 1,257 4,258 4,260 1,266 1,2
68 1,269 3,270 2,271 5,273 1,276 1,277 1,280 1,282 1,283 11,285 1,288 1,289 1,29
6 8,298 1,299 1,303 1,304 11,306 5,308 1,309 8,310 1,315 3,316 1,319 11,320 5,32
1 11,322 2,329 1,342 2,345 1,349 1,353 2,355 2,358 3,359 1,362 1,367 2,368 1,369
1,373 2,375 9,377 1,381 4,382 1,383 3,387 1,388 5,395 2,397 2,400 1,401 7,407 2
,412 1,416 1,419 2,421 2,422 1,425 2,427 1,431 1,433 7,434 1,435 1,436 2,440 1,4
49 1,454 2,455 1,460 3,461 1,463 1,467 1,470 1,471 2,472 7,477 2,478 11,479 31,4
82 6,485 7,487 1,490 2,492 16,494 2,495 1,497 1,499 1,501 1,502 1,503 1,504 11,5
06 3,510 2,515 1,516 2,517 3,518 1,522 4,523 2,524 1,525 4,527 2,528 7,529 3,530
bug}\n'
OK up to offset 203
[',101 ,']
709 '{101 ,124 2,184 1,188 1,333 1,492 3,500 4,530 bug}\n'
OK up to offset 1
['{101 ,']
So it looks like the attribute with att_idx == 101 can sometimes contain the empty string ''. You need to sort out how this attribute is to be treated. It would help your thinking if you unwound this Byzantine code:
if ((not att_type == constants.ARRF_NUMERIC)
or not ((value == None) or value == 0)):
Aside: that "expletive deleted" code won't run; it should be ARFF, not ARRF
into:
if value or att_type != constants.ARFF_NUMERIC:
or maybe just if value: which will filter out all of None, 0, and "". Note that att_idx == 101 corresponds to the attribute "priority" which is given a STRING type in the ARFF file header:
[line 103] #attribute 'priority' STRING
By the way, your statement about features[name] = int(value) "fixing" the problem is very suspicious; int("") raises an exception.
It may help you to read the warning at the end of this wiki section about sparse ARFF files.

Categories