Find this really weird, for some reason '\n' is added to the last entry in my list when I split a line from a .csv file.
Script
f = open("temp.csv")
lines = f.readlines()
headings = lines[0]
global heading_list
heading_list = headings.split(";")
print headings
I've printed out just headings itself and it doesn't have '\n' when at the end of it, it seems to be only when it's split at the semi colon.
.csv file
timestamp;%usr;%nice;%sys;%iowait;%steal;%irq;%soft;%guest;%idle
10-20-39;6.53;0.00;4.02;0.00;0.00;0.00;0.00;0.00;89.45
10-20-41;0.50;0.00;1.51;0.00;0.00;0.00;0.00;0.00;97.99
10-20-43;1.98;0.00;1.98;5.45;0.00;0.50;0.00;0.00;90.10
10-20-45;0.50;0.00;1.51;0.00;0.00;0.00;0.00;0.00;97.99
10-20-47;0.50;0.00;1.50;0.00;0.00;0.00;0.00;0.00;98.00
10-20-49;0.50;0.00;1.01;3.02;0.00;0.00;0.00;0.00;95.48
Output from script
When you read a line in Python, the end of line character is not removed. You have to do this manually, for example with line.rstrip("\r\n"). It's not a problem with split, but with readlines.
Short answer - use the csv module. See below.
The new line character is present in the data that was read from the file. readlines() does not remove it, and in fact you will find that the new line character is present in headings :
>>> headings = lines[0]
>>> headings
'timestamp;%usr;%nice;%sys;%iowait;%steal;%irq;%soft;%guest;%idle\n'
A better way is to use splitlines() on the data read from the file. This will remove new lines, regardless of the type ('\n', '\r\n', '\r'):
>>> with open("temp.csv") as f:
>>> lines = f.read().splitlines()
>>> headings = lines[0]
>>> headings
'timestamp;%usr;%nice;%sys;%iowait;%steal;%irq;%soft;%guest;%idle'
readlines() fails for Mac newlines ('\r'), so you should open the file with universal newline support by specifying 'rU' as the mode:
with open('temp.csv', 'rU') as f:
...
One other thing worth mentioning is that processing files this way can consume a lot of memory if the file is large because the whole file is read in one go. Instead it is more efficient to iterate over the file like this:
with open('temp.csv', 'rU') as f:
heading_list = next(f).rstrip().split(';') # headings on the first line
for line in f:
process_data_row(line.rstrip().split(';'))
Finally, the real answer. You can avoid all of the mess above by using the csv module:
import csv
with open('temp.csv', 'rU') as csv_file: # NB. 'rU' is important for handling mac newlines
csv_data = csv.reader(csv_file, delimiter=';')
heading_list = next(csv_data)
for row in csv_data:
process_data_row(row)
Related
I have a text file consisting of multiline (hundreds of lines actually) strings. Each of the strings starts with '&' sign. I want to change my text file in a way that only the first 300 characters of each string remain in the new file. How I can do this by using python?
You can read a file and loop over the lines to do what you want. Strings are easily slicable in python to get the first 300 to write to another file.
file = open(path,"r")
lines = file.readlines()
newFile = open(newPath,"w")
for index, line in enumerate(lines):
newLine = line[0:301]
newFile.writelines([newLine])
Hope this is what you meant
You could do something like this:
# Open output file in append mode
with open('output.txt', 'a') as out_file:
# Open input file in read mode
with open("input.txt", "r") as in_file:
for line in in_file:
# Take first 300 characters from line
# I believe this works even when line is < 300 characters
new_line = line[0:300]
# Write new line to output
# (You might need to add '\n' for new lines)
out_file.write(new_line)
print(new_line)
You can use the string method split to split your lines, then you can use slices to keep only the 300 first characters of each split.
with open("oldFile.txt", "rt") as old_file, open("newFile.txt", "wt") as new_file:
for line in old_file.read().split("&"):
new_file.write("&{}\n".format(line[:300]))
This version preserves ends of line \n within your strings.
If you want to remove ends of line in each individual string, you can use replace:
with open("oldFile.txt", "rt") as old_file, open("newFile.txt", "wt") as new_file:
for line in old_file.read().split("&"):
new_file.write("&{}\n".format(line.replace("\n", "")[:300]))
Note that your new file will end with an empty line.
Another note is, depending on the size of your file, you may rather use a generator function version, instead of split which results in the whole file content being loaded in memory as a list of strings.
i need to read a csv file and translate it in uppercase and store it in another csv file with python
I have this code:
import csv
with open('data_csv.csv', 'rb') as f:
header = next(f).strip().split(',')
reader = csv.DictReader((l.upper() for l in f), fieldnames=header)
for line in reader:
print line
with open('test.csv', 'r') as f:
for line in f:
print line
but I can not find the right result
You don't need to read the header separately. If you don't provide a fieldnames argument to DictReader(), the header is read automatically. Next, don't print your lines, you have now read the whole file and dropped all the lines.
Open both the input and output files in the same with statement, you can then write lines directly to the output. There is no need to use the csv module here, because you don't need to parse out the rows, then form the rows into lines again.
Just loop over the file, uppercase the lines, and write out the result:
with open('data_csv.csv', 'r') as input, open('test.csv', 'w') as output:
output.writelines(line.upper() for line in input)
I have an interesting situation with Python's csv module. I have a function that takes specific lines from a text file and writes them to csv file:
import os
import csv
def csv_save_use(textfile, csvfile):
with open(textfile, "rb") as text:
for line in text:
line=line.strip()
with open(csvfile, "ab") as f:
if line.startswith("# Online_Resource"):
write = csv.writer(f, dialect='excel',
delimiter='\t',
lineterminator="\t",
)
write.writerow([line.lstrip("# ")])
if line.startswith("##"):
write = csv.writer(f, dialect='excel',
delimiter='\t',
lineterminator="\t",
)
write.writerow([line.lstrip("# ")])
Here is a sample of some strings from the original text file:
# Online_Resource: https://www.ncdc.noaa.gov/
## Corg% percent organic carbon,,,%,,paleoceanography,,,N
What is really bizarre is the final csv file looks good, except the characters in the first column only (those with the # originally) partially "overwrite" each other when I try to manually delete some characters from the cell:
Oddly enough, too, there seems to be no formula to how the characters get jumbled each time I try to delete some after running the script. I tried encoding the csv file as unicode to no avail.
Thanks.
You've selected excel dialect but you overrode it with weird parameters:
You're using TAB as separator and line terminator, which creates a 1-line CSV file. Close enough to "truncated" to me
Also quotechar shouldn't be a space.
This conveyed a nice side-effect as you noted: the csv module actually splits the lines according to commas!
The code is inefficient and error-prone: you're opening the file in append mode in the loop and create a new csv writer each time. Better done outside the loop.
Also, comma split must be done by hand now. So even better: use csv module to read the file as well. My fix proposal for your routine:
import os
import csv
def csv_save_use(textfile, csvfile):
with open(textfile, "rU") as text, open(csvfile, "wb") as f:
write = csv.writer(f, dialect='excel',
delimiter='\t')
reader = csv.reader(text, delimiter=",")
for row in reader:
if not row:
continue # skip possible empty rows
if row[0].startswith("# Online_Resource"):
write.writerow([row[0].lstrip("# ")])
elif row[0].startswith("##"):
write.writerow([row[0].lstrip("# ")]+row[1:]) # write row, stripping the first item from hashes
Note that the file isn't properly displayed in excel unless to remove delimiter='\t (reverts back to default comma)
Also note that you need to replace open(csvfile, "wb") as f by open(csvfile, "w",newline='') as f for Python 3.
here's how the output looks now (note that the empty cells are because there are several commas in a row)
more problems:
line=line.strip(" ") removes leading and trailing spaces. It doesn't remove \r or \n ... try line=line.strip() which removes leading and trailing whitespace
you get all your line including commas in one cell because you haven't split it up somehow ... like using a csv.reader instance. See here:
https://docs.python.org/2/library/csv.html#csv.reader
str.lstrip non-default arg is treated as a set of characters to be removed, so '## ' has the same effect as '# '. if guff.startswith('## ') then do guff = guff[3:] to get rid of the unwanted text
It is not very clear at all what the sentence containing "bizarre" means. We need to see exactly what is in the output csv file. Create a small test file with 3 records (1) with '# Online_Resource' (2) with "## " (3) none of the above, run your code, and show the output, like this:
print repr(open('testout.csv', 'rb').read())
I have three short JSON text files. I want to combine them with Python, and as far as it works and creates an output file with everything on the right place, on the last line I have a comma, and I would like to replace it with } . I have came up with such a code:
def join_json_file (file_name_list,output_file_name):
with open(output_file_name,"w") as file_out:
file_out.write('{')
for filename in file_name_list:
with open(filename) as infile:
file_out.write(infile.read()[1:-1] + ",")
with open(output_file_name,"r") as file_out:
lines = file_out.readlines()
print lines[-1]
lines[-1] = lines[-1].replace(",","")
but it doesn't replace the last line. Could somebody help me? I am new to Python and I can't find the solution by myself.
You are writing all of the files, and then loading it back in to change the last line. The change though will only be in memory, not in the file itself. The better approach would be to avoid writing the extra , in the first place. For example:
def join_json_file (file_name_list, output_file_name):
with open(output_file_name, "w") as file_out:
file_out.write('{')
for filename in file_name_list[:-1]:
with open(filename) as infile:
file_out.write(infile.read()[1:-1] + ",")
with open(file_name_list[-1]) as infile:
file_out.write(infile.read()[1:-1])
This first writes all but the last file with the extra comma, and then writes the last file seperately. You might also want to check for the case of a single file.
I need a comma seperated txt file with txt extension.
"a,b,c"
I used csv.writer to create a csv file changed the extension. Another prog would not use/process the data. I tried "wb", "w."
F = open(Fn, 'w')
w = csv.writer(F)
w.writerow(sym)
F.close()
opened with notepad ---These are the complete files.
Their file: created using their gui used three symbols
PDCO,ICUI,DVA
my file : created using python
PDCO,ICUI,DVA
Tested: open thier file- worked, opened my file - failed.
Simple open and close with save in notepad. open my file-- worked
Works= 'PDCO,ICUI,DVA'
Fails= 'PDCO,ICUI,DVA\r\r\n'
Edit: writing txt file without Cvs writer.....
sym = ['MHS','MRK','AIG']
with open(r'C:\filename.txt', 'w') as F: # also try 'w'
for s in sym[:-1]: # separate all but the last
F.write(s + ',') # symbols with commas
F.write(sym[-1]) # end with the last symbol
To me, it look like you don't exactly know you third party application input format. If a .CSV isn't reconized, it might be something else.
Did you try to change the delimiter fromn ';' to ','
import csv
spamWriter = csv.writer(open('eggs.csv', 'wb'), delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
spamWriter.writerow(['Spam'] * 5 + ['Baked Beans'])
spamWriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
Take a look in the CSV Python API
I think the problem is your file write mode, as per CSV file written with Python has blank lines between each row
If you create your csv file like
csv.writer(open('myfile.csv', 'w'))
csv.writer ends its lines in '\r\n', and Python's text file handling (on Windows machines) then converts '\n' to '\r\n', resulting in lines ending in '\r\r\n'. Many programs will choke on this; Notepad recognizes it as a problem and strips the extra '\r' out.
If you use
csv.writer(open('myfile.csv', 'wb'))
it produces the expected '\r\n' line ending, which should work as desired.
Edit: #senderle has a good point; try the following:
goodf = open('file_that_works.txt', 'rb')
print repr(goodf.read(100))
badf = open('file_that_fails.txt', 'rb')
print repr(badf.read(100))
paste the results of that here, so we can see how the two compare byte-for-byte.
Try this:
with open('file_that_works.csv', 'rb') as testfile: # file is automatically
d = csv.Sniffer().sniff(testfile.read(1024)) # closed at end of with
# block
with open(Fn, 'wb') as F: # also try 'w'
w = csv.writer(F, dialect=d)
w.writerow(sym)
To explain further: this looks at a sample of a working .csv file and deduces its format. Then it uses that format to write a new .csv file that, hopefully, will not have to be resaved in notepad.
Edit: if the program you're using doesn't accept multi-line input (?!) then don't use csv. Just do something like this:
syms = ['JAGHS','GJKDGJ','GJDFAJ']
with open('filename.txt', 'wb') as F:
for s in syms[:-1]: # separate all but the last
F.write(s + ',') # symbols with commas
F.write(syms[-1]) # end with the last symbol
Or more tersely:
with open('filename.txt', 'wb') as F:
F.write(','.join(syms))
Also, check different file extensions (i.e. .txt, .csv, etc) to make sure that's not the problem. If this program chokes on a newline, then anything is possible.
So, I save as text file.
Now, create my own txt file with python.
What are the exact differences between their file and your file? Exact.
I suspect that #Hugh's comment is correct that it's an encoding issue.
When you do a Save As in notepad, what's selected in the Encoding dropdown? If you select different encodings do some or all of those fail to be opened by the 3rd party program?