Python row.replace issue - python

Started fiddling with Python for the first time a week or so ago and have been trying to create a script that will replace instances of a string in a file with a new string. The actual reading and creation of a new file with intended strings seems to be successful, but error checking at the end of the file displays output suggesting that there is an error. I checked a few other threads but couldn't find a solution or alternative that fit what I was looking for or was at a level I was comfortable working with.
Apologies for messy/odd code structure, I am very new to the language. Initial four variables are example values.
editElement = "Testvalue"
newElement = "Testvalue2"
readFile = "/Users/Euan/Desktop/Testfile.csv"
writeFile = "/Users/Euan/Desktop/ModifiedFile.csv"
editelementCount1 = 0
newelementCount1 = 0
editelementCount2 = 0
newelementCount2 = 0
#Reading from file
print("Reading file...")
file1 = open(readFile,'r')
fileHolder = file1.readlines()
file1.close()
#Creating modified data
fileHolder_replaced = [row.replace(editElement, newElement) for row in fileHolder]
#Writing to file
file2 = open(writeFile,'w')
file2.writelines(fileHolder_replaced)
file2.close()
print("Modified file generated!")
#Error checking
for row in fileHolder:
if editElement in row:
editelementCount1 +=1
for row in fileHolder:
if newElement in row:
newelementCount1 +=1
for row in fileHolder_replaced:
if editElement in row:
editelementCount2 +=1
for row in fileHolder_replaced:
if newElement in row:
newelementCount2 +=1
print(editelementCount1 + newelementCount1)
print(editelementCount2 +newelementCount2)
Expected output would be the last two instances of 'print' displaying the same value, however...
The first instance of print returns the value of A + B as expected.
The second line only returns the value of B (from fileHolder), and from what I can see, A has indeed been converted to B (In fileHolder_replaced).
Edit:
For example,
if the first two counts show A and B to be 2029 and 1619 respectively (fileHolder), the last two counts show A as 0 and B as 2029 (fileHolder_replace). Obviously this is missing the original value of B.

So in am more exdented version as in the comment.
If you look for "TestValue" in the modified file, it will find the string, even if you assume it is "TestValue2". Thats because the originalvalue is a substring of the modified value. Therefore it should find twice the number of occurences. Or more precise the number of lines in which the string occurs.
If you query
if newElement in row
It will have a look if the string newElement is contained in the string row

Related

Indexing error after removing line from 2D array

I am facing an 'List Index out of range' error when trying to iterate a for-loop over a table I've created from a CSV extract, but cannot figure out why - even after trying many different methods.
Here is the step by step description of how the error happens :
I'm removing the first line of an imported CSV file, as this
line contains the columns' names but no data. The CSV has the following structure.
columnName1, columnName2, columnName3, columnName4
This, is, some, data
I, have, in, this
very, interesting, CSV, file
After storing the CSV in a first array called oldArray, I want to populate a newArray that will get all values from oldArray but not the first line, which is the column name line, as previously
mentioned. My newArray should then look like this.
This, is, some, data
I, have, in, this
very, interesting, CSV, file
To create this newArray, I'm using the following code with the append() function.
tempList = []
newArray = []
for i in range(len(oldArray)):
if i > 0: #my ugly way of skipping line 0...
for j in range(len(oldArray[0])):
tempList.append(oldArray[i][j])
newArray.append(tempList)
tempList = []
I also stored the columns in their own separate list.
i = 0
for i in range(len(oldArray[0])):
my_columnList[i] = oldArray[0][i]
And the error comes up next : I now want to populate a treeview table from this newArray, using a for-loop and insert (in a function). But I always get the 'Index List out of range error' and I cannot figure out why.
def populateTable(my_tree, newArray, my_columnList):
i = 0
for i in range(len(newArray)):
my_tree.insert('','end', text=newArray[i][0], values = (newArray[i][1:len(newArray[0]))
#(im using the text option to bypass treeview's column 0 problem)
return my_tree
Error message --> " File "(...my working directory...)", line 301, in populateTable
my_tree.insert(parent='', index='end', text=data[i][0], values=(data[i][1:len(data[0])]))
IndexError: list index out of range "
Using that same function with different datasets and columns worked fine, but not for this here newArray.
I'm fairy certain that the error comes strictly from this 'newArray' and is not linked to another parameter.
I've tested the validity of the columns list, of the CSV import in oldArray through some print() functions, and everything seems normal - values, row dimension, column dimension.
This is a great mystery to me...
Thank you all very much for your help and time.
You can find a problem from your error message: File "(...my working directory...)", line 301, in populateTable my_tree.insert(parent='', index='end', text=data[i][0], values=(data[i][1:len(data[0])])) IndexError: list index out of range
It means there is an index out of range in line 301: data[i][0] or data[i][1:len(data[0])]
(i is over len(data)) or (0 or 1 is over len(data[0]))
My guess is there is some empty list in data(maybe data[-1]?).
if data[i] is [] or [some_one_item], then data[i][1:len(data[0])] try to access to second item which not exists.
there is no problem in your "ugly" way to skip line 0 but I recommend having a look on this way
new_array = old_array.copy()
new_array.remove(new_array[0])
now for fixing your issue
looks like you have a problem in the indexing
when you use a for loop using the range of the length of an array you use normal indexing which starts from one while you identify your i variable to be zero
to make it simple
len(oldArray[0])
this is equal to 4 so when you use it in the for loop it's just like saying
for i in range(4):
to fix this you can either subtract 1 from the length of the old array or just identify the i variable to be 1 at the first
i = 1
for i in range(len(oldArray[0])):
my_columnList[i] = oldArray[0][i]
or
i = 0
for i in range(len(oldArray[0])-1):
my_columnList[i] = oldArray[0][i]
this mistake is also repeated in your populateTree function
so in the same way your code would be
def populateTree(my_tree, newArray, my_columnList):
i = 0
for i in range(len(newArray)-1):
my_tree.insert('','end', text=newArray[i][0], values = (newArray[i][1:len(newArray[0]))
#(im using the text option to bypass treeview's column 0 problem)
return my_tree

Using an if statement to pass through variables ot further functions for python

I am a biologist that is just trying to use python to automate a ton of calculations, so I have very little experience.
I have a very large array that contains values that are formatted into two columns of observations. Sometimes the observations will be the same between the columns:
v1,v2
x,y
a,b
a,a
x,x
In order to save time and effort I wanted to make an if statement that just prints 0 if the two columns are the same and then moves on. If the values are the same there is no need to run those instances through the downstream analyses.
This is what I have so far just to test out the if statement. It has yet to recognize any instances where the columns are equivalen.
Script:
mylines=[]
with open('xxxx','r') as myfile:
for myline in myfile:
mylines.append(myline) ##reads the data into the two column format mentioned above
rang=len(open ('xxxxx,'r').readlines( )) ##returns the number or lines in the file
for x in range(1, rang):
li = mylines[x] ##selected row as defined by x and the number of lines in the file
spit = li.split(',',2) ##splits the selected values so they can be accessed seperately
print(spit[0]) ##first value
print(spit[1]) ##second value
if spit[0] == spit[1]:
print(0)
else:
print('Issue')
Output:
192Alhe52
192Alhe52
Issue ##should be 0
188Alhe48
192Alhe52
Issue
191Alhe51
192Alhe52
Issue
How do I get python to recgonize that certain observations are actually equal?
When you read the values and store them in the array, you can be storing '\n' as well, which is a break line character, so your array actually looks like this
print(mylist)
['x,y\n', 'a,b\n', 'a,a\n', 'x,x\n']
To work around this issue, you have to use strip(), which will remove this character and occasional blank spaces in the end of the string that would also affect the comparison
mylines.append(myline.strip())
You shouldn't use rang=len(open ('xxxxx,'r').readlines( )), because you are reading the file again
rang=len(mylines)
There is a more readable, pythonic way to replicate your for
for li in mylines[1:]:
spit = li.split(',')
if spit[0] == spit[1]:
print(0)
else:
print('Issue')
Or even
for spit.split(',') in mylines[1:]:
if spit[0] == spit[1]:
print(0)
else:
print('Issue')
will iterate on the array mylines, starting from the first element.
Also, if you're interested in python packages, you should have a look at pandas. Assuming you have a csv file:
import pandas as pd
df = pd.read_csv('xxxx')
for i, elements in df.iterrows():
if elements['v1'] == elements['v2']:
print('Equal')
else:
print('Different')
will do the trick. If you need to modify values and write another file
df.to_csv('nameYouWant')
For one, your issue with the equals test might be because iterating over lines like this also yields the newline character. There is a string function that can get rid of that, .strip(). Also, your argument to split is 2, which splits your row into three groups - but that probably doesn't show here. You can avoid having to parse it yourself when using the csv module, as your file presumably is that:
import csv
with open("yourfile.txt") as file:
reader = csv.reader(file)
next(reader) # skip header
for first, second in reader:
print(first)
print(second)
if first == second:
print(0)
else:
print("Issue")

duplicate values in new csv file

I'm working on a program and want to write my result into a comma separated file, like a CSV.
new_throughput =[]
self.t._interval = 2
self.f = open("output.%s.csv"%postfix, "w")
self.f.write("time, Byte_Count, Throughput \n")
cur_throughput = stat.byte_count
t_put.append(cur_throughput)
b_count = (cur_throughput/131072.0) #calculating bits
b_count_list.append(b_count)
L = [y-x for x,y in zip(b_count_list, b_count_list[1:])] #subtracting current value - previous, saves value into list
for i in L:
new_throughput.append(i/self.t._interval)
self.f.write("%s,%s,%s,%s \n"%(self.experiment, b_count, b_count_list,new_throughput)) #write to file
when running this code i get this in my CSV file
picture here.
It somehow prints out the previous value every time.
What I want is new row for each new line:
time , byte_count, throughput
20181117013759,0.0,0.0
20181117013759,14.3157348633,7.157867431640625
0181117013759,53.5484619141,, 19.616363525390625
I don't have a working minimal example, but your last line should refer to the last member of each list, not the whole list. Something like this:
self.f.write("%s,%s,%s,%s \n"%(self.experiment, b_count, b_count_list[-1],new_throughput[-1])) #write to file
Edit: ...although if you want this simple solution to work, then you should initialize the lists with one initial value, e.g. [0], otherwise you'd get a "list index out of range error" at the first iteration according to your output.

Python - Reading a CSV, won't print the contents of the last column

I'm pretty new to Python, and put together a script to parse a csv and ultimately output its data into a repeated html table.
I got most of it working, but there's one weird problem I haven't been able to fix. My script will find the index of the last column, but won't print out the data in that column. If I add another column to the end, even an empty one, it'll print out the data in the formerly-last column - so it's not a problem with the contents of that column.
Abridged (but still grumpy) version of the code:
import os
os.chdir('C:\\Python34\\andrea')
import csv
csvOpen = open('my.csv')
exampleReader = csv.reader(csvOpen)
tableHeader = next(exampleReader)
if 'phone' in tableHeader:
phoneIndex = tableHeader.index('phone')
else:
phoneIndex = -1
for row in exampleReader:
row[-1] =''
print(phoneIndex)
print(row[phoneIndex])
csvOpen.close()
my.csv
stuff,phone
1,3235556177
1,3235556170
Output
1
1
Same script, small change to the CSV file:
my.csv
stuff,phone,more
1,3235556177,
1,3235556170,
Output
1
3235556177
1
3235556170
I'm using Python 3.4.3 via Idle 3.4.3
I've had the same problem with CSVs generated directly by mysql, ones that I've opened in Excel first then re-saved as CSVs, and ones I've edited in Notepad++ and re-saved as CSVs.
I tried adding several different modes to the open function (r, rU, b, etc.) and either it made no difference or gave me an error (for example, it didn't like 'b').
My workaround is just to add an extra column to the end, but since this is a frequently used script, it'd be much better if it just worked right.
Thank you in advance for your help.
row[-1] =''
The CSV reader returns to you a list representing the row from the file. On this line you set the last value in the list to an empty string. Then you print it afterwards. Delete this line if you don't want the last column to be set to an empty string.
If you know it is the last column, you can count them and then use that value minus 1. Likewise you can use your string comparison method if you know it will always be "phone". I recommend if you are using the string compare, convert the value from the csv to lower case so that you don't have to worry about capitalization.
In my code below I created functions that show how to use either method.
import os
import csv
os.chdir('C:\\temp')
csvOpen = open('my.csv')
exampleReader = csv.reader(csvOpen)
tableHeader = next(exampleReader)
phoneColIndex = None;#init to a value that can imply state
lastColIndex = None;#init to a value that can imply state
def getPhoneIndex(header):
for i, col in enumerate(header): #use this syntax to get index of item
if col.lower() == 'phone':
return i;
return -1; #send back invalid index
def findLastColIndex(header):
return len(tableHeader) - 1;
## methods to check for phone col. 1. by string comparison
#and 2. by assuming it's the last col.
if len(tableHeader) > 1:# if only one row or less, why go any further?
phoneColIndex = getPhoneIndex(tableHeader);
lastColIndex = findLastColIndex(tableHeader)
for row in exampleReader:
print(row[phoneColIndex])
print('----------')
print(row[lastColIndex])
print('----------')
csvOpen.close()

xlwt writing excel file python problem

So I have to two separate lists of data, wordlist and digitlist. And both sets have related sets of data so wordlist[1] is related to digitlist[1]. The data sets are shown below:
seglist = ['aaa111', 'bbb222', 'ccc333']
wordlist = ['aaa', 'bbb', 'ccc']
digitlist = ['111', '222', '333']
I'm tryin to create a a new sheet for each set of data, and write the wordlist in the first column and digitlist in the second column of the sheet. Now the following code works when I only include one set of data:
for i in range(len(seglist)):
p = str(i)
ws = w.add_sheet(p)
for i, cell in enumerate(wordlist[i]):
ws.write(i,0,cell)
But when I try to add second data set it gives me and error. I think the problem is that the excel worksheets can't be written on twice. So does that mean I have to reopen the newly excel file, and then reformat it? Also, does anyone have any ideas on how to write simultaneously with both sets of data on the excel spreadsheet?
for i in range(len(seglist)):
p = str(i)
ws = w.add_sheet(p)
for i, cell in enumerate(wordlist[i]):
ws.write(i,0,cell)
for i, cell in enumerate(digitlist[i]):
ws.write(i,1,cell)
Exception: Attempt to overwrite cell: sheetname=u'0' rowx=0 colx=1
Also the number of items listed in each data set may not be the same.
Neither of your two pieces of code (as shown in your question) have any chance of working, because (1) the indentation is stuffed up (2) you use the variable i in every for loop.
Fix the indentation and replace the loop variables with meaningful names, like sheet_index, row_index, col_index` (or sheetx/rowx/colx if you prefer fewer keystrokes).
What is seglist?
"""Excel spreadsheets can't be written on twice""" -- not so. By default, xlwt prevents you from writing on the same cell twice. Empirical evidence is that 99.99% of the time, the cause is woolly logic. People who need to overwrite cells (not required in your above task) can override the overwrite check.
So, putting it all together, the following code implements what I understand out of """I'm tryin to create a a new sheet for each set of data, and write the wordlist in the first column and digitlist in the second column of the sheet."""
assert len(wordlist) == len(digitlist)
for sheetx in xrange(len(seglist)):
ws = w.add_sheet(str(sheeetx))
assert len(wordlist[sheetx]) == len(digitlist[sheetx])
for rowx, values in enumerate(zip(wordlist[sheetx], digitlist[sheetx]):
for colx, value in enumerate(values):
ws.write(rowx, colx, value)
If any assert triggers, you need to examine your data ...
Update in response to info about ragged data:
assert len(wordlist) == len(digitlist) == len(seglist))
for sheetx in xrange(len(seglist)):
ws = w.add_sheet(str(sheeetx))
for colx, values in enumerate((wordlist[sheetx], digitlist[sheetx])):
for rowx, value in enumerate(values):
ws.write(rowx, colx, value)
The issue is that in your second loop you are continually hitting the same cells over and over again.
That is to say assuming your data looks something like this data:
wordlist = [['cat','feline','kitty'], ['dog','canine','puppy']]
digitlist = [[3,7,9000],[17,8,4000]]
When you hit wordlist, you write:
Sheet0
Row0
-----
cat
feline
kitty
Sheet1
Row0
-----
dog
canine
puppy
When you add the iteration over digitlist into the mix, then you are running over all of the entries in digitlist[i] for every entry in wordlist[i].
So your program writes out "cat" into Row0, Cell0 [that is, (0,0)] and then writes out 3, 7 and 9000 into (0,1), (1,1) and (1,2). Then your program writes out "feline" into (1,0) ... and then tries to write out 3, 7 and 9000 into (0,1), (1,1) and (1,2) all over again. Also, the inner i shadows the outer i ... if it didn't, (if you used ii for you inner loop), then you would be trying to write out 17, 8 and 4000 into (0,1), (1,1) and (1,2) ... which may not be what you want at all.
If what you want to do is merge the word and digit lists and write them out next to each other then try zip instead.
for i in range(len(wordlist)):
merged = zip(wordlist[i], digitlist[i])
cells = range(len(merged[0])) # Assumes no ragged arrays
for row in range(len(merged)):
for col in cells:
ws.write(row, col, merged[row][col])

Categories