CSV file with comma delimiter and quotes, but not on every line - python

Having an issue with reading a csv file that delimits everything with commas, but the first one in the csv file does not contain quotes. Example:
Symbol,"Name","LastSale","MarketCap","IPOyear","Sector","industry","Summary Quote",
the code used to try and read this is as follows:
from ystockquote import *
import csv
with open('companylist.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=",", quotechar='"', quoting=csv.QUOTE_MINIMAL)
for row in readCSV:
print(row[0])
What I get is the following:
Symbol,"Name","LastSale","MarketCap","IPOyear","Sector","industry","Summary Quote",;
However, I just want to get all of the symbols from this list. Anyone an idea on how to do this?
edit
More data:
Symbol,"Name","LastSale","MarketCap","IPOyear","Sector","industry","Summary Quote",;
PIH,"1347 Property Insurance Holdings, Inc.","7.505","$45.23M","2014","Finance","Property-Casualty Insurers","http://www.nasdaq.com/symbol/pih",;
FLWS,"1-800 FLOWERS.COM, Inc.","9.59","$623.46M","1999","Consumer Services","Other Specialty Stores","http://www.nasdaq.com/symbol/flws",;
So my expected output would be:
Symbol
PIH
FLWS
This would happen if the csv.reader read my file as each of the rows as a seperate list, and within each of these lists all of the items (delimited by commas) would be their seperate values. (e.g. symbol would be the value of [0], "name" would be the value of [1], etc.)
I hope this clears up what I'm looking for

Found the easy way out:
Replaced all of the
"
with nothing in my csv file, this made it so that the csv.reader could read the csv file normally again.

If print(row[0]) is giving you a list, it might be because each row of your csv file is being read in as a list.
try print(row[0][0]) maybe?

Related

I need to edit a python script to remove quotes from a csv, then write back to that same csv file, quotes removed

I have seen similar posts to this but they all seem to be print statements (viewing the cleaned data) rather than overwriting the original csv with the cleaned data so I am stuck. When I tried to write back to the csv myself, it just deleted everything in the file. Here is the format of the csv:
30;"unemployed";"married";"primary";"no";1787;"no";"no";"cellular";19;"oct";79;1;-1;0;"unknown";"no"
33;"services";"married";"secondary";"no";4747;"yes";"cellular";11;"may";110;1;339;2;"failure";"no"
35;"management";"single";"tertiary";"no";1470;"yes";"no";"cellular";12;"apr"185;1;330;1;"failure";"no"
It is delimited by semicolons, which is fine, but all text is wrapped in quotes and I only want to remove the quotes and write back to the file. Here is the code I reverted back to that successfully reads the file, removes all quotes, and then prints the results:
import csv
f = open("bank.csv", 'r')
try:
for row in csv.reader(f, delimiter=';', skipinitialspace=True):
print(' '.join(row))
finally:
f.close()
Any help on properly writing back to the csv would be appreciated, thanks!
See here: Python CSV: Remove quotes from value
I've done this basically two different ways, depending on the size of the csv.
You can read the entire csv into a python object (list), do some things and then
overwrite the other existing file with the cleaned version
As in the link above, you can use one reader and one writer, Create a new file, and write line by-line as you clean the input from the csv reader, delete the original csv and rename the new one to replace the old file.
In my opinion option #2 is vastly preferable as it avoids the possibility of data loss if your script has an error part way through writing. It also will have lower memory usage.
Finally: It may be possible to open a file as read/write, and iterate line-by-line overwriting as you go: But that will leave you open to half of your file having quotes, and half not if your script crashes part way through.
You could do something like this. Read it in, and write using quoting=csv.QUOTE_NONE
import csv
f = open("bank.csv", 'r')
inputCSV = []
try:
for row in csv.reader(f, delimiter=';', skipinitialspace=True):
inputCSV.append(row)
finally:
f.close()
with open('bank.csv', 'w', newline='') as csvfile:
csvwriter = csv.writer(csvfile, delimiter=';')
for row in inputCSV:
csvwriter.writerow(row)

Python place quote around csv cells containing comma

Python: v 3.6
Update:
I'm trying code where EVERYTHING is quoted, i.e. quoting=csv.QUOTE_ALL. For some reason even that is not working, i.e. file is outputting, but WITHOUT quotes.
If this can be resolved, it may help with the remaining question.
Code
import csv
in_path = "eateries.csv"
with open(in_path,"r") as infile, open("out.csv","w", newline='') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile, delimiter=",", quoting=csv.QUOTE_ALL)
writer.writerows(reader)
Original Question:
I am trying to write python script that reads csv file and outputs csv file. In output, cells with comma (",") will have quotes
Input:
Expected Output:
Actual Output:
Below is code, please assist
import csv
in_path = "eateries.csv"
with open(in_path,"r") as infile, open("out.csv","w", newline='') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile, delimiter=",", quotechar=",", quoting=csv.QUOTE_MINIMAL)
writer.writerows(reader)
quotechar doesn't mean "quote this character". It means "this is the character you use to quote things".
You do not want to use commas to quote things. Remove quotechar=",".
With quotechar corrected, your CSV will quote field values that have commas in them, but importing the CSV into Excel or some other spreadsheet application may not produce cell values with quotation marks. (Also, eateries.csv probably had quoting already.) It is quite likely that you don't actually need quotes in Excel or whatever your spreadsheet app is; the fact that the value is in a single cell instead of spread across multiple is the spreadsheet version of quoting.

CSV Writer truncates characters in sequence in Excel 2013

I have an interesting situation with Python's csv module. I have a function that takes specific lines from a text file and writes them to csv file:
import os
import csv
def csv_save_use(textfile, csvfile):
with open(textfile, "rb") as text:
for line in text:
line=line.strip()
with open(csvfile, "ab") as f:
if line.startswith("# Online_Resource"):
write = csv.writer(f, dialect='excel',
delimiter='\t',
lineterminator="\t",
)
write.writerow([line.lstrip("# ")])
if line.startswith("##"):
write = csv.writer(f, dialect='excel',
delimiter='\t',
lineterminator="\t",
)
write.writerow([line.lstrip("# ")])
Here is a sample of some strings from the original text file:
# Online_Resource: https://www.ncdc.noaa.gov/
## Corg% percent organic carbon,,,%,,paleoceanography,,,N
What is really bizarre is the final csv file looks good, except the characters in the first column only (those with the # originally) partially "overwrite" each other when I try to manually delete some characters from the cell:
Oddly enough, too, there seems to be no formula to how the characters get jumbled each time I try to delete some after running the script. I tried encoding the csv file as unicode to no avail.
Thanks.
You've selected excel dialect but you overrode it with weird parameters:
You're using TAB as separator and line terminator, which creates a 1-line CSV file. Close enough to "truncated" to me
Also quotechar shouldn't be a space.
This conveyed a nice side-effect as you noted: the csv module actually splits the lines according to commas!
The code is inefficient and error-prone: you're opening the file in append mode in the loop and create a new csv writer each time. Better done outside the loop.
Also, comma split must be done by hand now. So even better: use csv module to read the file as well. My fix proposal for your routine:
import os
import csv
def csv_save_use(textfile, csvfile):
with open(textfile, "rU") as text, open(csvfile, "wb") as f:
write = csv.writer(f, dialect='excel',
delimiter='\t')
reader = csv.reader(text, delimiter=",")
for row in reader:
if not row:
continue # skip possible empty rows
if row[0].startswith("# Online_Resource"):
write.writerow([row[0].lstrip("# ")])
elif row[0].startswith("##"):
write.writerow([row[0].lstrip("# ")]+row[1:]) # write row, stripping the first item from hashes
Note that the file isn't properly displayed in excel unless to remove delimiter='\t (reverts back to default comma)
Also note that you need to replace open(csvfile, "wb") as f by open(csvfile, "w",newline='') as f for Python 3.
here's how the output looks now (note that the empty cells are because there are several commas in a row)
more problems:
line=line.strip(" ") removes leading and trailing spaces. It doesn't remove \r or \n ... try line=line.strip() which removes leading and trailing whitespace
you get all your line including commas in one cell because you haven't split it up somehow ... like using a csv.reader instance. See here:
https://docs.python.org/2/library/csv.html#csv.reader
str.lstrip non-default arg is treated as a set of characters to be removed, so '## ' has the same effect as '# '. if guff.startswith('## ') then do guff = guff[3:] to get rid of the unwanted text
It is not very clear at all what the sentence containing "bizarre" means. We need to see exactly what is in the output csv file. Create a small test file with 3 records (1) with '# Online_Resource' (2) with "## " (3) none of the above, run your code, and show the output, like this:
print repr(open('testout.csv', 'rb').read())

What does this line of code do?

I was just wondering what this line of code does:
writerow([recordlist[i][0], recordlist[i][1], recordlist[i][2]])
I know its a parameter of some sort, but what does it actually do in all of this code:
recordlist=[["1",chinese, "male"],["2",indian, "female"]]
import math
import csv
file_name = 'info.txt'
ofile = open(file_name, 'a')
writer = csv.writer(ofile, delimiter=',', lineterminator='\n')
for i in range(0,len(recordlist)):
writer.writerow([recordlist[i][0], recordlist[i][1], recordlist[i][2]])
ofile.close()
Thank you!
You've created a csvwriter. It has a method writerow that takes a sequence (list, tuple, etc.) of values to write the underlying file in delimited format, which in this case uses a comma as the delimiter. So it will create a row in the csv file for each row in the recordlist variable, as it iterates over it in the for loop. Each row will consist of the values defined on the first line of your code, separated by commas.
The real answer should be "run it and try it" to see what it does.
Then read the documentation of the csv module in Python here

Trying to import a list of words using csv (Python 2.7)

import csv, Tkinter
with open('most_common_words.csv') as csv_file: # Opens the file in a 'closure' so that when it's finished it's automatically closed"
csv_reader = csv.reader(csv_file) # Create a csv reader instance
for row in csv_reader: # Read each line in the csv file into 'row' as a list
print row[0] # Print the first item in the list
I'm trying to import this list of most common words using csv. It continues to give me the same error
for row in csv_reader: # Read each line in the csv file into 'row' as a list
Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
I've tried a couple different ways to do it as well, but they didn't work either. Any suggestions?
Also, where does this file need to be saved? Is it okay just being in the same folder as the program?
You should always open a CSV file in binary mode (Python 2) or universal newline mode (Python 3). Also, make sure that the delimiters and quote characters are , and ", or you'll need to specify otherwise:
with open('most_common_words.csv', 'rb') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=';', quotechar='"') # for EU CSV
You can save the file in the same folder as your program. If you don't, you can provide the correct path to open() as well. Be sure to use raw strings if you're on Windows, otherwise the backslashes may trick you: open(r"C:\Python27\data\table.csv")
It seems you have a file with one column as you say here:
It is a simple list of words. When I open it up, it opens into Excel
with one column and 500 rows of 500 different words.
If so, you don't need the csv module at all:
with open('most_common_words.csv') as f:
rows = list(f)
Note in this case, each item of the list will have the newline appended to it, so if your file is:
apple
dog
cat
rows will be ['apple\n', 'dog\n', 'cat\n']
If you want to strip the end of line, then you can do this:
with open('most_common_words.csv') as f:
rows = list(i.rstrip() for i in f)

Categories