I'm writing a short code (my first in python) to filter a large table.
import sys
gwas_annot = open('gwascatalog.txt').read()
gwas_entry_list = gwas_annot.split('\n')[1:-1]
# paste line if has value
for lines in gwas_entry_list:
entry_notes = lines.split('\t')
source_name = entry_notes[7]
if 'omega-6' in source_name:
print(entry_notes)
Basically I want to take the 'gwascatalog' table, parse it into lines and columns, search column 7 for a string ('omega-6' in this case) and if it contains it, print the entire line.
Right now it prints all the rows to the console but won't let me paste it into another file. It also gives me the error:
Traceback (most recent call last):<br>
File "gwas_parse.py", line 9, in <module><br>
source_name = entry_notes[7]<br>
IndexError: list index out of range
Unsure why there is an error. Anything obvious to fix?
Edit: Adding snippet from data.
You can secure yourself by checking the length of the list first.
if len(entry_notes) > 7:
source_name = entry_notes[7]
The list index out of range could be that you hit a row (line) where there are less than 7 columns.
# index 0 1 2 3 4 5 6 (... no 7)
columnsArray = ['one', 'two','three','four','five','six', 'seven']
So here, if you ask for array[7], you get a "list index out of range" error because the line that the for loop is currently on only goes up to index 6.
The error tells you it happens at "line 9", which is where "source_name = entry_notes[7]". I would suggest printing out the number of columns for each row on the table. You might notice that somewhere you have 7 columns instead of 8. I also think you mean to say column 8, but position(or index 7), since counting in python starts at 0.
Maybe add another "if" to only look for lines that have a len() of 8 or more.
Related
I am facing an 'List Index out of range' error when trying to iterate a for-loop over a table I've created from a CSV extract, but cannot figure out why - even after trying many different methods.
Here is the step by step description of how the error happens :
I'm removing the first line of an imported CSV file, as this
line contains the columns' names but no data. The CSV has the following structure.
columnName1, columnName2, columnName3, columnName4
This, is, some, data
I, have, in, this
very, interesting, CSV, file
After storing the CSV in a first array called oldArray, I want to populate a newArray that will get all values from oldArray but not the first line, which is the column name line, as previously
mentioned. My newArray should then look like this.
This, is, some, data
I, have, in, this
very, interesting, CSV, file
To create this newArray, I'm using the following code with the append() function.
tempList = []
newArray = []
for i in range(len(oldArray)):
if i > 0: #my ugly way of skipping line 0...
for j in range(len(oldArray[0])):
tempList.append(oldArray[i][j])
newArray.append(tempList)
tempList = []
I also stored the columns in their own separate list.
i = 0
for i in range(len(oldArray[0])):
my_columnList[i] = oldArray[0][i]
And the error comes up next : I now want to populate a treeview table from this newArray, using a for-loop and insert (in a function). But I always get the 'Index List out of range error' and I cannot figure out why.
def populateTable(my_tree, newArray, my_columnList):
i = 0
for i in range(len(newArray)):
my_tree.insert('','end', text=newArray[i][0], values = (newArray[i][1:len(newArray[0]))
#(im using the text option to bypass treeview's column 0 problem)
return my_tree
Error message --> " File "(...my working directory...)", line 301, in populateTable
my_tree.insert(parent='', index='end', text=data[i][0], values=(data[i][1:len(data[0])]))
IndexError: list index out of range "
Using that same function with different datasets and columns worked fine, but not for this here newArray.
I'm fairy certain that the error comes strictly from this 'newArray' and is not linked to another parameter.
I've tested the validity of the columns list, of the CSV import in oldArray through some print() functions, and everything seems normal - values, row dimension, column dimension.
This is a great mystery to me...
Thank you all very much for your help and time.
You can find a problem from your error message: File "(...my working directory...)", line 301, in populateTable my_tree.insert(parent='', index='end', text=data[i][0], values=(data[i][1:len(data[0])])) IndexError: list index out of range
It means there is an index out of range in line 301: data[i][0] or data[i][1:len(data[0])]
(i is over len(data)) or (0 or 1 is over len(data[0]))
My guess is there is some empty list in data(maybe data[-1]?).
if data[i] is [] or [some_one_item], then data[i][1:len(data[0])] try to access to second item which not exists.
there is no problem in your "ugly" way to skip line 0 but I recommend having a look on this way
new_array = old_array.copy()
new_array.remove(new_array[0])
now for fixing your issue
looks like you have a problem in the indexing
when you use a for loop using the range of the length of an array you use normal indexing which starts from one while you identify your i variable to be zero
to make it simple
len(oldArray[0])
this is equal to 4 so when you use it in the for loop it's just like saying
for i in range(4):
to fix this you can either subtract 1 from the length of the old array or just identify the i variable to be 1 at the first
i = 1
for i in range(len(oldArray[0])):
my_columnList[i] = oldArray[0][i]
or
i = 0
for i in range(len(oldArray[0])-1):
my_columnList[i] = oldArray[0][i]
this mistake is also repeated in your populateTree function
so in the same way your code would be
def populateTree(my_tree, newArray, my_columnList):
i = 0
for i in range(len(newArray)-1):
my_tree.insert('','end', text=newArray[i][0], values = (newArray[i][1:len(newArray[0]))
#(im using the text option to bypass treeview's column 0 problem)
return my_tree
I have 2 different excel files file 1/file 2. I have stored the values of the columns in 2 different lists. I have to search the number present in file 1 with file 2 and I wanted the output as per file 3/ExpectedAnswer.
File 1:
File 2:
File 3/ Expected Answer:
I tried the below code for the above requirement. But I don't know where I'm going wrong.
for j in range(len(terr_code)):
g=terr_code[j]
#print(g)
for lists in Zip_code:
Zip_code= lists.split(";")
while('' in Zip_code):
Zip_code.remove('')
for i in range(len(Zip_code)):
#print(i)
h=Zip_code[i]
print(g)
if g in h:
print(h)
territory_code.append(str(terr_code[j]))
print(territory_code[j])
final_list.append(Zip_terr_Hem['NAME'][i])
#print(final_list)
s = ";"
s= s.join(str(v) for v in final_list)
#print(s)
final_file['Territory Code'] = pd.Series(str(terr_code[j]))
final_file['Territory Name'] = pd.Series(s)
final_file = pd.DataFrame(final_file )
final_file.to_csv('test file.csv', index=False)
The first for loop is working fine. But when I try to print the list of number from the 2nd for loop, the first number is getting printed multiple time. And though both the list are working, still they are not getting inside the if condition. Please tell me what I'm doing wrong here. Thanks
I'm trying to split multiple variables that were dynamically created off a for loop and then delete everything after the first space.
Minor back story: I'm using paramiko to SSH to a network switch to pull VLAN information. Trying to create a new variable for each VLAN name and then present all variables back into a list for the user to select from.
#VLANLines## were split from VLANList off \r\n. Variables created form a for loop
VLANLine1 = 'GGGGGGGGG 5 5/7'
VLANLine2 = 'HHHH 66 22/23'
VLANLine3 = 'SSSSSSS 33 3/4'
#HHHH and SSSSSS are random names I put in place for this question. This is the data I need to keep.
#Length of VLANList = 14 in this demo
i = 0
while i < len(VLANList):
VLANLine[i].split(" ")
del VLAN[i][1:]
Error below
Traceback (most recent call last):
File "<pyshell#16>", line 2, in <module>
VLANLine[i].split(" ")
IndexError: string index out of range
How can I dynamically split 'VLANLine##' and then delete out everything after the space? I may be going at this all wrong too. I just started working with python a few weeks ago.
This may work for you.
VLAN_clean = [v[0:v.find(' ')] for v in VLANList if v.find(' ') != -1]
str.split does what you need cleanly:
VLANList = [
'GGGGGGGGG 5 5/7',
'HHHH 66 22/23',
'SSSSSSS 33 3/4',
]
VLAN_Clean = [v.split()[0] for v in VLANList]
print(VLAN_Clean)
Output:
['GGGGGGGGG', 'HHHH', 'SSSSSSS']
split will split each string at the first space character, returning a tuple of values. If there is no blank, it will simply return a tuple of length 1 containing the entire string. So, running split on each item, then selecting the first item from the resulting tuple gives you the right thing.
I'm pretty new to Python, and put together a script to parse a csv and ultimately output its data into a repeated html table.
I got most of it working, but there's one weird problem I haven't been able to fix. My script will find the index of the last column, but won't print out the data in that column. If I add another column to the end, even an empty one, it'll print out the data in the formerly-last column - so it's not a problem with the contents of that column.
Abridged (but still grumpy) version of the code:
import os
os.chdir('C:\\Python34\\andrea')
import csv
csvOpen = open('my.csv')
exampleReader = csv.reader(csvOpen)
tableHeader = next(exampleReader)
if 'phone' in tableHeader:
phoneIndex = tableHeader.index('phone')
else:
phoneIndex = -1
for row in exampleReader:
row[-1] =''
print(phoneIndex)
print(row[phoneIndex])
csvOpen.close()
my.csv
stuff,phone
1,3235556177
1,3235556170
Output
1
1
Same script, small change to the CSV file:
my.csv
stuff,phone,more
1,3235556177,
1,3235556170,
Output
1
3235556177
1
3235556170
I'm using Python 3.4.3 via Idle 3.4.3
I've had the same problem with CSVs generated directly by mysql, ones that I've opened in Excel first then re-saved as CSVs, and ones I've edited in Notepad++ and re-saved as CSVs.
I tried adding several different modes to the open function (r, rU, b, etc.) and either it made no difference or gave me an error (for example, it didn't like 'b').
My workaround is just to add an extra column to the end, but since this is a frequently used script, it'd be much better if it just worked right.
Thank you in advance for your help.
row[-1] =''
The CSV reader returns to you a list representing the row from the file. On this line you set the last value in the list to an empty string. Then you print it afterwards. Delete this line if you don't want the last column to be set to an empty string.
If you know it is the last column, you can count them and then use that value minus 1. Likewise you can use your string comparison method if you know it will always be "phone". I recommend if you are using the string compare, convert the value from the csv to lower case so that you don't have to worry about capitalization.
In my code below I created functions that show how to use either method.
import os
import csv
os.chdir('C:\\temp')
csvOpen = open('my.csv')
exampleReader = csv.reader(csvOpen)
tableHeader = next(exampleReader)
phoneColIndex = None;#init to a value that can imply state
lastColIndex = None;#init to a value that can imply state
def getPhoneIndex(header):
for i, col in enumerate(header): #use this syntax to get index of item
if col.lower() == 'phone':
return i;
return -1; #send back invalid index
def findLastColIndex(header):
return len(tableHeader) - 1;
## methods to check for phone col. 1. by string comparison
#and 2. by assuming it's the last col.
if len(tableHeader) > 1:# if only one row or less, why go any further?
phoneColIndex = getPhoneIndex(tableHeader);
lastColIndex = findLastColIndex(tableHeader)
for row in exampleReader:
print(row[phoneColIndex])
print('----------')
print(row[lastColIndex])
print('----------')
csvOpen.close()
Started fiddling with Python for the first time a week or so ago and have been trying to create a script that will replace instances of a string in a file with a new string. The actual reading and creation of a new file with intended strings seems to be successful, but error checking at the end of the file displays output suggesting that there is an error. I checked a few other threads but couldn't find a solution or alternative that fit what I was looking for or was at a level I was comfortable working with.
Apologies for messy/odd code structure, I am very new to the language. Initial four variables are example values.
editElement = "Testvalue"
newElement = "Testvalue2"
readFile = "/Users/Euan/Desktop/Testfile.csv"
writeFile = "/Users/Euan/Desktop/ModifiedFile.csv"
editelementCount1 = 0
newelementCount1 = 0
editelementCount2 = 0
newelementCount2 = 0
#Reading from file
print("Reading file...")
file1 = open(readFile,'r')
fileHolder = file1.readlines()
file1.close()
#Creating modified data
fileHolder_replaced = [row.replace(editElement, newElement) for row in fileHolder]
#Writing to file
file2 = open(writeFile,'w')
file2.writelines(fileHolder_replaced)
file2.close()
print("Modified file generated!")
#Error checking
for row in fileHolder:
if editElement in row:
editelementCount1 +=1
for row in fileHolder:
if newElement in row:
newelementCount1 +=1
for row in fileHolder_replaced:
if editElement in row:
editelementCount2 +=1
for row in fileHolder_replaced:
if newElement in row:
newelementCount2 +=1
print(editelementCount1 + newelementCount1)
print(editelementCount2 +newelementCount2)
Expected output would be the last two instances of 'print' displaying the same value, however...
The first instance of print returns the value of A + B as expected.
The second line only returns the value of B (from fileHolder), and from what I can see, A has indeed been converted to B (In fileHolder_replaced).
Edit:
For example,
if the first two counts show A and B to be 2029 and 1619 respectively (fileHolder), the last two counts show A as 0 and B as 2029 (fileHolder_replace). Obviously this is missing the original value of B.
So in am more exdented version as in the comment.
If you look for "TestValue" in the modified file, it will find the string, even if you assume it is "TestValue2". Thats because the originalvalue is a substring of the modified value. Therefore it should find twice the number of occurences. Or more precise the number of lines in which the string occurs.
If you query
if newElement in row
It will have a look if the string newElement is contained in the string row