So I have to two separate lists of data, wordlist and digitlist. And both sets have related sets of data so wordlist[1] is related to digitlist[1]. The data sets are shown below:
seglist = ['aaa111', 'bbb222', 'ccc333']
wordlist = ['aaa', 'bbb', 'ccc']
digitlist = ['111', '222', '333']
I'm tryin to create a a new sheet for each set of data, and write the wordlist in the first column and digitlist in the second column of the sheet. Now the following code works when I only include one set of data:
for i in range(len(seglist)):
p = str(i)
ws = w.add_sheet(p)
for i, cell in enumerate(wordlist[i]):
ws.write(i,0,cell)
But when I try to add second data set it gives me and error. I think the problem is that the excel worksheets can't be written on twice. So does that mean I have to reopen the newly excel file, and then reformat it? Also, does anyone have any ideas on how to write simultaneously with both sets of data on the excel spreadsheet?
for i in range(len(seglist)):
p = str(i)
ws = w.add_sheet(p)
for i, cell in enumerate(wordlist[i]):
ws.write(i,0,cell)
for i, cell in enumerate(digitlist[i]):
ws.write(i,1,cell)
Exception: Attempt to overwrite cell: sheetname=u'0' rowx=0 colx=1
Also the number of items listed in each data set may not be the same.
Neither of your two pieces of code (as shown in your question) have any chance of working, because (1) the indentation is stuffed up (2) you use the variable i in every for loop.
Fix the indentation and replace the loop variables with meaningful names, like sheet_index, row_index, col_index` (or sheetx/rowx/colx if you prefer fewer keystrokes).
What is seglist?
"""Excel spreadsheets can't be written on twice""" -- not so. By default, xlwt prevents you from writing on the same cell twice. Empirical evidence is that 99.99% of the time, the cause is woolly logic. People who need to overwrite cells (not required in your above task) can override the overwrite check.
So, putting it all together, the following code implements what I understand out of """I'm tryin to create a a new sheet for each set of data, and write the wordlist in the first column and digitlist in the second column of the sheet."""
assert len(wordlist) == len(digitlist)
for sheetx in xrange(len(seglist)):
ws = w.add_sheet(str(sheeetx))
assert len(wordlist[sheetx]) == len(digitlist[sheetx])
for rowx, values in enumerate(zip(wordlist[sheetx], digitlist[sheetx]):
for colx, value in enumerate(values):
ws.write(rowx, colx, value)
If any assert triggers, you need to examine your data ...
Update in response to info about ragged data:
assert len(wordlist) == len(digitlist) == len(seglist))
for sheetx in xrange(len(seglist)):
ws = w.add_sheet(str(sheeetx))
for colx, values in enumerate((wordlist[sheetx], digitlist[sheetx])):
for rowx, value in enumerate(values):
ws.write(rowx, colx, value)
The issue is that in your second loop you are continually hitting the same cells over and over again.
That is to say assuming your data looks something like this data:
wordlist = [['cat','feline','kitty'], ['dog','canine','puppy']]
digitlist = [[3,7,9000],[17,8,4000]]
When you hit wordlist, you write:
Sheet0
Row0
-----
cat
feline
kitty
Sheet1
Row0
-----
dog
canine
puppy
When you add the iteration over digitlist into the mix, then you are running over all of the entries in digitlist[i] for every entry in wordlist[i].
So your program writes out "cat" into Row0, Cell0 [that is, (0,0)] and then writes out 3, 7 and 9000 into (0,1), (1,1) and (1,2). Then your program writes out "feline" into (1,0) ... and then tries to write out 3, 7 and 9000 into (0,1), (1,1) and (1,2) all over again. Also, the inner i shadows the outer i ... if it didn't, (if you used ii for you inner loop), then you would be trying to write out 17, 8 and 4000 into (0,1), (1,1) and (1,2) ... which may not be what you want at all.
If what you want to do is merge the word and digit lists and write them out next to each other then try zip instead.
for i in range(len(wordlist)):
merged = zip(wordlist[i], digitlist[i])
cells = range(len(merged[0])) # Assumes no ragged arrays
for row in range(len(merged)):
for col in cells:
ws.write(row, col, merged[row][col])
Related
I am a biologist that is just trying to use python to automate a ton of calculations, so I have very little experience.
I have a very large array that contains values that are formatted into two columns of observations. Sometimes the observations will be the same between the columns:
v1,v2
x,y
a,b
a,a
x,x
In order to save time and effort I wanted to make an if statement that just prints 0 if the two columns are the same and then moves on. If the values are the same there is no need to run those instances through the downstream analyses.
This is what I have so far just to test out the if statement. It has yet to recognize any instances where the columns are equivalen.
Script:
mylines=[]
with open('xxxx','r') as myfile:
for myline in myfile:
mylines.append(myline) ##reads the data into the two column format mentioned above
rang=len(open ('xxxxx,'r').readlines( )) ##returns the number or lines in the file
for x in range(1, rang):
li = mylines[x] ##selected row as defined by x and the number of lines in the file
spit = li.split(',',2) ##splits the selected values so they can be accessed seperately
print(spit[0]) ##first value
print(spit[1]) ##second value
if spit[0] == spit[1]:
print(0)
else:
print('Issue')
Output:
192Alhe52
192Alhe52
Issue ##should be 0
188Alhe48
192Alhe52
Issue
191Alhe51
192Alhe52
Issue
How do I get python to recgonize that certain observations are actually equal?
When you read the values and store them in the array, you can be storing '\n' as well, which is a break line character, so your array actually looks like this
print(mylist)
['x,y\n', 'a,b\n', 'a,a\n', 'x,x\n']
To work around this issue, you have to use strip(), which will remove this character and occasional blank spaces in the end of the string that would also affect the comparison
mylines.append(myline.strip())
You shouldn't use rang=len(open ('xxxxx,'r').readlines( )), because you are reading the file again
rang=len(mylines)
There is a more readable, pythonic way to replicate your for
for li in mylines[1:]:
spit = li.split(',')
if spit[0] == spit[1]:
print(0)
else:
print('Issue')
Or even
for spit.split(',') in mylines[1:]:
if spit[0] == spit[1]:
print(0)
else:
print('Issue')
will iterate on the array mylines, starting from the first element.
Also, if you're interested in python packages, you should have a look at pandas. Assuming you have a csv file:
import pandas as pd
df = pd.read_csv('xxxx')
for i, elements in df.iterrows():
if elements['v1'] == elements['v2']:
print('Equal')
else:
print('Different')
will do the trick. If you need to modify values and write another file
df.to_csv('nameYouWant')
For one, your issue with the equals test might be because iterating over lines like this also yields the newline character. There is a string function that can get rid of that, .strip(). Also, your argument to split is 2, which splits your row into three groups - but that probably doesn't show here. You can avoid having to parse it yourself when using the csv module, as your file presumably is that:
import csv
with open("yourfile.txt") as file:
reader = csv.reader(file)
next(reader) # skip header
for first, second in reader:
print(first)
print(second)
if first == second:
print(0)
else:
print("Issue")
I'm working on a program and want to write my result into a comma separated file, like a CSV.
new_throughput =[]
self.t._interval = 2
self.f = open("output.%s.csv"%postfix, "w")
self.f.write("time, Byte_Count, Throughput \n")
cur_throughput = stat.byte_count
t_put.append(cur_throughput)
b_count = (cur_throughput/131072.0) #calculating bits
b_count_list.append(b_count)
L = [y-x for x,y in zip(b_count_list, b_count_list[1:])] #subtracting current value - previous, saves value into list
for i in L:
new_throughput.append(i/self.t._interval)
self.f.write("%s,%s,%s,%s \n"%(self.experiment, b_count, b_count_list,new_throughput)) #write to file
when running this code i get this in my CSV file
picture here.
It somehow prints out the previous value every time.
What I want is new row for each new line:
time , byte_count, throughput
20181117013759,0.0,0.0
20181117013759,14.3157348633,7.157867431640625
0181117013759,53.5484619141,, 19.616363525390625
I don't have a working minimal example, but your last line should refer to the last member of each list, not the whole list. Something like this:
self.f.write("%s,%s,%s,%s \n"%(self.experiment, b_count, b_count_list[-1],new_throughput[-1])) #write to file
Edit: ...although if you want this simple solution to work, then you should initialize the lists with one initial value, e.g. [0], otherwise you'd get a "list index out of range error" at the first iteration according to your output.
I'm pretty new to Python, and put together a script to parse a csv and ultimately output its data into a repeated html table.
I got most of it working, but there's one weird problem I haven't been able to fix. My script will find the index of the last column, but won't print out the data in that column. If I add another column to the end, even an empty one, it'll print out the data in the formerly-last column - so it's not a problem with the contents of that column.
Abridged (but still grumpy) version of the code:
import os
os.chdir('C:\\Python34\\andrea')
import csv
csvOpen = open('my.csv')
exampleReader = csv.reader(csvOpen)
tableHeader = next(exampleReader)
if 'phone' in tableHeader:
phoneIndex = tableHeader.index('phone')
else:
phoneIndex = -1
for row in exampleReader:
row[-1] =''
print(phoneIndex)
print(row[phoneIndex])
csvOpen.close()
my.csv
stuff,phone
1,3235556177
1,3235556170
Output
1
1
Same script, small change to the CSV file:
my.csv
stuff,phone,more
1,3235556177,
1,3235556170,
Output
1
3235556177
1
3235556170
I'm using Python 3.4.3 via Idle 3.4.3
I've had the same problem with CSVs generated directly by mysql, ones that I've opened in Excel first then re-saved as CSVs, and ones I've edited in Notepad++ and re-saved as CSVs.
I tried adding several different modes to the open function (r, rU, b, etc.) and either it made no difference or gave me an error (for example, it didn't like 'b').
My workaround is just to add an extra column to the end, but since this is a frequently used script, it'd be much better if it just worked right.
Thank you in advance for your help.
row[-1] =''
The CSV reader returns to you a list representing the row from the file. On this line you set the last value in the list to an empty string. Then you print it afterwards. Delete this line if you don't want the last column to be set to an empty string.
If you know it is the last column, you can count them and then use that value minus 1. Likewise you can use your string comparison method if you know it will always be "phone". I recommend if you are using the string compare, convert the value from the csv to lower case so that you don't have to worry about capitalization.
In my code below I created functions that show how to use either method.
import os
import csv
os.chdir('C:\\temp')
csvOpen = open('my.csv')
exampleReader = csv.reader(csvOpen)
tableHeader = next(exampleReader)
phoneColIndex = None;#init to a value that can imply state
lastColIndex = None;#init to a value that can imply state
def getPhoneIndex(header):
for i, col in enumerate(header): #use this syntax to get index of item
if col.lower() == 'phone':
return i;
return -1; #send back invalid index
def findLastColIndex(header):
return len(tableHeader) - 1;
## methods to check for phone col. 1. by string comparison
#and 2. by assuming it's the last col.
if len(tableHeader) > 1:# if only one row or less, why go any further?
phoneColIndex = getPhoneIndex(tableHeader);
lastColIndex = findLastColIndex(tableHeader)
for row in exampleReader:
print(row[phoneColIndex])
print('----------')
print(row[lastColIndex])
print('----------')
csvOpen.close()
Started fiddling with Python for the first time a week or so ago and have been trying to create a script that will replace instances of a string in a file with a new string. The actual reading and creation of a new file with intended strings seems to be successful, but error checking at the end of the file displays output suggesting that there is an error. I checked a few other threads but couldn't find a solution or alternative that fit what I was looking for or was at a level I was comfortable working with.
Apologies for messy/odd code structure, I am very new to the language. Initial four variables are example values.
editElement = "Testvalue"
newElement = "Testvalue2"
readFile = "/Users/Euan/Desktop/Testfile.csv"
writeFile = "/Users/Euan/Desktop/ModifiedFile.csv"
editelementCount1 = 0
newelementCount1 = 0
editelementCount2 = 0
newelementCount2 = 0
#Reading from file
print("Reading file...")
file1 = open(readFile,'r')
fileHolder = file1.readlines()
file1.close()
#Creating modified data
fileHolder_replaced = [row.replace(editElement, newElement) for row in fileHolder]
#Writing to file
file2 = open(writeFile,'w')
file2.writelines(fileHolder_replaced)
file2.close()
print("Modified file generated!")
#Error checking
for row in fileHolder:
if editElement in row:
editelementCount1 +=1
for row in fileHolder:
if newElement in row:
newelementCount1 +=1
for row in fileHolder_replaced:
if editElement in row:
editelementCount2 +=1
for row in fileHolder_replaced:
if newElement in row:
newelementCount2 +=1
print(editelementCount1 + newelementCount1)
print(editelementCount2 +newelementCount2)
Expected output would be the last two instances of 'print' displaying the same value, however...
The first instance of print returns the value of A + B as expected.
The second line only returns the value of B (from fileHolder), and from what I can see, A has indeed been converted to B (In fileHolder_replaced).
Edit:
For example,
if the first two counts show A and B to be 2029 and 1619 respectively (fileHolder), the last two counts show A as 0 and B as 2029 (fileHolder_replace). Obviously this is missing the original value of B.
So in am more exdented version as in the comment.
If you look for "TestValue" in the modified file, it will find the string, even if you assume it is "TestValue2". Thats because the originalvalue is a substring of the modified value. Therefore it should find twice the number of occurences. Or more precise the number of lines in which the string occurs.
If you query
if newElement in row
It will have a look if the string newElement is contained in the string row
I imported an excel spreadsheet and I am trying to clean up empty values with default values in all rows in my spreadsheet. I don't need to update the spreadsheet, I just need to set default values because I am using this information to insert into a local database. Whenever I try to do so, it never gets processed correctly. Here is my original iteration of the code:
for root,dirs,files in os.walk(path):
xlsfiles=['1128CNLOAD.xlsx']
#xlsfiles=[ _ for _ in files if _.endswith('CNLOAD.xlsx') ]
print (xlsfiles)
for xlsfile in xlsfiles:
book=xlrd.open_workbook(os.path.join(root,xlsfile))
sheet=book.sheet_by_index(0)
cell=sheet.cell(1,1)
print (sheet)
sheet0 = book.sheet_by_index(0)
#sheet1 = book.sheet_by_index(1)
for rownum in range(sheet0.nrows):
print sheet0.row_values(rownum)
values=()
print sheet0.nrows
for row_index in range(1, sheet.nrows):
if sheet.cell(row_index,4).value == '':
sheet.cell(row_index,4).value = 0.0
print sheet.row(row_index)
The code spits returns no errors but nothing gets updated and the cells I am trying to update are still empty.
I also tried to change the loop to just do a value replace for the list which is seen below:
for row_index in range(1, sheet.nrows):
if sheet.row(1)[4] == "empty:''":
sheet.row(1)[4] = "number:0.0"
When I print after running this update, the list has not changed.
print(sheet.row(1))
[text:u'FRFHF', text:u' ', number:0.15, number:0.15, empty:'', empty:'', number:2.5, number:2.5, empty:'', empty:'']
Thank you for any help and let me know if you have any questions.
xlrd isn't really set up to edit the spreadsheet once you have it in memory. You can do it, but you have to use the undocumented internal implementation.
On my version (0.7.1), cells are stored internally to the sheet in a couple of different two-dimensional arrays - sheet._cell_types and sheet._cell_values are the main two. The types are defined by a set of constants in biffh.py, which the xlrd module imports. When you call cell, it constructs a new Cell instance using the value and type looked up for the given row/column pair. You could update those directly, or you could use the put_cell method.
So it looks like this would work:
if sheet.cell_type(1, 4) == xlrd.XL_CELL_EMPTY:
sheet._cell_types[1][4] = xlrd.XL_CELL_NUMBER
sheet._cell_values[1][4] = 0.0
Alternately:
if sheet.cell_type(1, 4) == xlrd.XL_CELL_EMPTY:
sheet.put_cell(1, 4, xlrd.XL_CELL_NUMBER, 0.0, sheet.cell_xf_index(1, 4))
You may need to review the code to make sure this didn't change if you're on a different version.