I have an interesting situation with Python's csv module. I have a function that takes specific lines from a text file and writes them to csv file:
import os
import csv
def csv_save_use(textfile, csvfile):
with open(textfile, "rb") as text:
for line in text:
line=line.strip()
with open(csvfile, "ab") as f:
if line.startswith("# Online_Resource"):
write = csv.writer(f, dialect='excel',
delimiter='\t',
lineterminator="\t",
)
write.writerow([line.lstrip("# ")])
if line.startswith("##"):
write = csv.writer(f, dialect='excel',
delimiter='\t',
lineterminator="\t",
)
write.writerow([line.lstrip("# ")])
Here is a sample of some strings from the original text file:
# Online_Resource: https://www.ncdc.noaa.gov/
## Corg% percent organic carbon,,,%,,paleoceanography,,,N
What is really bizarre is the final csv file looks good, except the characters in the first column only (those with the # originally) partially "overwrite" each other when I try to manually delete some characters from the cell:
Oddly enough, too, there seems to be no formula to how the characters get jumbled each time I try to delete some after running the script. I tried encoding the csv file as unicode to no avail.
Thanks.
You've selected excel dialect but you overrode it with weird parameters:
You're using TAB as separator and line terminator, which creates a 1-line CSV file. Close enough to "truncated" to me
Also quotechar shouldn't be a space.
This conveyed a nice side-effect as you noted: the csv module actually splits the lines according to commas!
The code is inefficient and error-prone: you're opening the file in append mode in the loop and create a new csv writer each time. Better done outside the loop.
Also, comma split must be done by hand now. So even better: use csv module to read the file as well. My fix proposal for your routine:
import os
import csv
def csv_save_use(textfile, csvfile):
with open(textfile, "rU") as text, open(csvfile, "wb") as f:
write = csv.writer(f, dialect='excel',
delimiter='\t')
reader = csv.reader(text, delimiter=",")
for row in reader:
if not row:
continue # skip possible empty rows
if row[0].startswith("# Online_Resource"):
write.writerow([row[0].lstrip("# ")])
elif row[0].startswith("##"):
write.writerow([row[0].lstrip("# ")]+row[1:]) # write row, stripping the first item from hashes
Note that the file isn't properly displayed in excel unless to remove delimiter='\t (reverts back to default comma)
Also note that you need to replace open(csvfile, "wb") as f by open(csvfile, "w",newline='') as f for Python 3.
here's how the output looks now (note that the empty cells are because there are several commas in a row)
more problems:
line=line.strip(" ") removes leading and trailing spaces. It doesn't remove \r or \n ... try line=line.strip() which removes leading and trailing whitespace
you get all your line including commas in one cell because you haven't split it up somehow ... like using a csv.reader instance. See here:
https://docs.python.org/2/library/csv.html#csv.reader
str.lstrip non-default arg is treated as a set of characters to be removed, so '## ' has the same effect as '# '. if guff.startswith('## ') then do guff = guff[3:] to get rid of the unwanted text
It is not very clear at all what the sentence containing "bizarre" means. We need to see exactly what is in the output csv file. Create a small test file with 3 records (1) with '# Online_Resource' (2) with "## " (3) none of the above, run your code, and show the output, like this:
print repr(open('testout.csv', 'rb').read())
Related
I have seen similar posts to this but they all seem to be print statements (viewing the cleaned data) rather than overwriting the original csv with the cleaned data so I am stuck. When I tried to write back to the csv myself, it just deleted everything in the file. Here is the format of the csv:
30;"unemployed";"married";"primary";"no";1787;"no";"no";"cellular";19;"oct";79;1;-1;0;"unknown";"no"
33;"services";"married";"secondary";"no";4747;"yes";"cellular";11;"may";110;1;339;2;"failure";"no"
35;"management";"single";"tertiary";"no";1470;"yes";"no";"cellular";12;"apr"185;1;330;1;"failure";"no"
It is delimited by semicolons, which is fine, but all text is wrapped in quotes and I only want to remove the quotes and write back to the file. Here is the code I reverted back to that successfully reads the file, removes all quotes, and then prints the results:
import csv
f = open("bank.csv", 'r')
try:
for row in csv.reader(f, delimiter=';', skipinitialspace=True):
print(' '.join(row))
finally:
f.close()
Any help on properly writing back to the csv would be appreciated, thanks!
See here: Python CSV: Remove quotes from value
I've done this basically two different ways, depending on the size of the csv.
You can read the entire csv into a python object (list), do some things and then
overwrite the other existing file with the cleaned version
As in the link above, you can use one reader and one writer, Create a new file, and write line by-line as you clean the input from the csv reader, delete the original csv and rename the new one to replace the old file.
In my opinion option #2 is vastly preferable as it avoids the possibility of data loss if your script has an error part way through writing. It also will have lower memory usage.
Finally: It may be possible to open a file as read/write, and iterate line-by-line overwriting as you go: But that will leave you open to half of your file having quotes, and half not if your script crashes part way through.
You could do something like this. Read it in, and write using quoting=csv.QUOTE_NONE
import csv
f = open("bank.csv", 'r')
inputCSV = []
try:
for row in csv.reader(f, delimiter=';', skipinitialspace=True):
inputCSV.append(row)
finally:
f.close()
with open('bank.csv', 'w', newline='') as csvfile:
csvwriter = csv.writer(csvfile, delimiter=';')
for row in inputCSV:
csvwriter.writerow(row)
I'm trying to remove the " sign from the csv file I'm creating, but it either gives me an error: "_csv.Error: need to escape, but no escapechar set" or it gives me the quotechar='"' every first character and latest one.
I'm running 3.7 Python and I tried changing my code with the below changes:
passphrase_writer = csv.writer(file, lineterminator='\n' ,quoting=csv.QUOTE_NONE)
#passphrase_writer = csv.writer(file, delimiter=',', lineterminator='\n',quoting=csv.QUOTE_NONE,)
#passphrase_writer = csv.writer(file, delimiter=',' ,lineterminator='\n', quoting=csv.QUOTE_NONE)
def print_dict(d,site_id):
with open('passphrases.csv', mode='w', newline='') as file:
passphrase_writer = csv.writer(file, lineterminator='\n' ,quoting=csv.QUOTE_NONE)
#passphrase_writer = csv.writer(file, delimiter=',', lineterminator='\n',quoting=csv.QUOTE_NONE,)
#passphrase_writer = csv.writer(file, delimiter=',' ,lineterminator='\n', quotechar='|')
for idx, val in enumerate(d['data']):
x = (u'{},{},{},{},{},{},{},{}'.format(val['id'],
val['Name'],
val['domain'],
val['Version'],
val['lastLoggedIn'],
val['networkInterfaces'][0]['inet'][0],
val['id2'],
passphrase(val['id'], site_id)
))
print(x)
passphrase_writer.writerow([x])
The results in the print are good:
54356,tomer-a36,WORKGROUP,2.,tom,192.168.30.133,eafa2eb,DREAM
However, the csv file will have:
"54356,tomer-a36,WORKGROUP,2.,tom,192.168.30.133,eafa2eb,DREAM"
I wish to remove the extra "
note - when changing the quotechar='|', I'm getting:
|54356,tomer-a36,WORKGROUP,2.,tom,192.168.30.133,eafa2eb,DREAM|
trying to set quotechar='' gives an error.
You're overcomplicating it & mixing custom formatting with built-in csv module
You're passing a list of an already formatted argument, with the separator as ,. csv module has to quote it to make it remain a single "cell" because default csv separator is already ,. csv module has a safety built-in to avoid losing/corrupting data: not quoting data would make 1 cell with commas in it indistinguishable from several cells separated with commas.
Instead, just write your argument as a tuple of your data and stop building x using str.format.
for val in d['data']:
passphrase_writer.writerow((val['id'],
val['Name'],
val['domain'],
val['Version'],
val['lastLoggedIn'],
val['networkInterfaces'][0]['inet'][0],
val['id2'],
passphrase(val['id'], site_id)
))
Since your data never contains , (are you sure of that? seems that passphrase could, actually) you don't have to worry about quotes being inserted.
And if there were (passphrase could be a good candidate), csv.reader would be able to parse them and remove them to unserialize your data (as Excel would do too). Don't try to read & parse back the file manually splitting with ,, use csv module again.
I have a text file with lots of lines. Out of it I need to find 2 pattern of strings and save it to csv.
Example:
Text file contains:
NA: 2.0
slit uniformity at power: 3.6
integrated slit uniformity at power: 4.7
slit uniformity: 8.6
and the output in the csv I want
[NA] [2.0]
[slit uniformity] [8.6]
In short, I want to save an exact string in one column and the number next to it in the next column.
If this data format happens to match a well-known format perfectly, you can parse it as that format.
In your sample data, the text field never includes any colons, or quotes, or backslash escapes, or anything "weird". Is that guaranteed to always be true?
If so, this is a valid CSV file, with colons for delimiters and optional whitespace heading the fields. So you can parse it that way. (Your output format is a little weird for CSV—normally you can't use separate "open" and "close" quoting character like that. But you're not asking about the output part here, so I'll cheat a bit.)
with open(inpath) as fin, open(outpath, 'w') as fout:
w = csv.writer(fout, delimiter=' ')
for text, number in csv.reader(fin, delimiter=':', skipinitialspace=True):
w.writerow((f'[{text}]', f'[{number.strip()}]))
On the other hand, this may be simpler to do without thinking of either file as a weird CSV dialect and just parsing and generating the lines manually:
with open(inpath) as fin, open(outpath, 'w') as fout:
for line in fin:
text, _, number = line.rstrip().partition(': ')
fout.write(f'[{text}] [{number}]\n')
Of course the error handling won't be as nice if you have lines that break the format, since you're spreading the format specification implicitly over a few lines rather than defining it explicitly as a CSV dialect, but that may not be a problem.
prefixes = ['NA:', 'slit uniformity:']
with open('file.txt') as input, open('file.csv', 'w') as output:
for line in input:
for prefix in prefixes:
if line.startswith(prefix):
output.write('[%s] [%s]\n' % (prefix[:-1], line[len(prefix)+1:-1]))
Find this really weird, for some reason '\n' is added to the last entry in my list when I split a line from a .csv file.
Script
f = open("temp.csv")
lines = f.readlines()
headings = lines[0]
global heading_list
heading_list = headings.split(";")
print headings
I've printed out just headings itself and it doesn't have '\n' when at the end of it, it seems to be only when it's split at the semi colon.
.csv file
timestamp;%usr;%nice;%sys;%iowait;%steal;%irq;%soft;%guest;%idle
10-20-39;6.53;0.00;4.02;0.00;0.00;0.00;0.00;0.00;89.45
10-20-41;0.50;0.00;1.51;0.00;0.00;0.00;0.00;0.00;97.99
10-20-43;1.98;0.00;1.98;5.45;0.00;0.50;0.00;0.00;90.10
10-20-45;0.50;0.00;1.51;0.00;0.00;0.00;0.00;0.00;97.99
10-20-47;0.50;0.00;1.50;0.00;0.00;0.00;0.00;0.00;98.00
10-20-49;0.50;0.00;1.01;3.02;0.00;0.00;0.00;0.00;95.48
Output from script
When you read a line in Python, the end of line character is not removed. You have to do this manually, for example with line.rstrip("\r\n"). It's not a problem with split, but with readlines.
Short answer - use the csv module. See below.
The new line character is present in the data that was read from the file. readlines() does not remove it, and in fact you will find that the new line character is present in headings :
>>> headings = lines[0]
>>> headings
'timestamp;%usr;%nice;%sys;%iowait;%steal;%irq;%soft;%guest;%idle\n'
A better way is to use splitlines() on the data read from the file. This will remove new lines, regardless of the type ('\n', '\r\n', '\r'):
>>> with open("temp.csv") as f:
>>> lines = f.read().splitlines()
>>> headings = lines[0]
>>> headings
'timestamp;%usr;%nice;%sys;%iowait;%steal;%irq;%soft;%guest;%idle'
readlines() fails for Mac newlines ('\r'), so you should open the file with universal newline support by specifying 'rU' as the mode:
with open('temp.csv', 'rU') as f:
...
One other thing worth mentioning is that processing files this way can consume a lot of memory if the file is large because the whole file is read in one go. Instead it is more efficient to iterate over the file like this:
with open('temp.csv', 'rU') as f:
heading_list = next(f).rstrip().split(';') # headings on the first line
for line in f:
process_data_row(line.rstrip().split(';'))
Finally, the real answer. You can avoid all of the mess above by using the csv module:
import csv
with open('temp.csv', 'rU') as csv_file: # NB. 'rU' is important for handling mac newlines
csv_data = csv.reader(csv_file, delimiter=';')
heading_list = next(csv_data)
for row in csv_data:
process_data_row(row)
import csv, Tkinter
with open('most_common_words.csv') as csv_file: # Opens the file in a 'closure' so that when it's finished it's automatically closed"
csv_reader = csv.reader(csv_file) # Create a csv reader instance
for row in csv_reader: # Read each line in the csv file into 'row' as a list
print row[0] # Print the first item in the list
I'm trying to import this list of most common words using csv. It continues to give me the same error
for row in csv_reader: # Read each line in the csv file into 'row' as a list
Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
I've tried a couple different ways to do it as well, but they didn't work either. Any suggestions?
Also, where does this file need to be saved? Is it okay just being in the same folder as the program?
You should always open a CSV file in binary mode (Python 2) or universal newline mode (Python 3). Also, make sure that the delimiters and quote characters are , and ", or you'll need to specify otherwise:
with open('most_common_words.csv', 'rb') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=';', quotechar='"') # for EU CSV
You can save the file in the same folder as your program. If you don't, you can provide the correct path to open() as well. Be sure to use raw strings if you're on Windows, otherwise the backslashes may trick you: open(r"C:\Python27\data\table.csv")
It seems you have a file with one column as you say here:
It is a simple list of words. When I open it up, it opens into Excel
with one column and 500 rows of 500 different words.
If so, you don't need the csv module at all:
with open('most_common_words.csv') as f:
rows = list(f)
Note in this case, each item of the list will have the newline appended to it, so if your file is:
apple
dog
cat
rows will be ['apple\n', 'dog\n', 'cat\n']
If you want to strip the end of line, then you can do this:
with open('most_common_words.csv') as f:
rows = list(i.rstrip() for i in f)