Why am I getting "_csv.Error: newline inside string"? - python

There is one answer to this question:
Getting "newline inside string" while reading the csv file in Python?
But this didn't work when I used the accepted answer.

If the answer in the above link doesn't work and you have opened multiple files during the execution of your code, go back and make sure you have closed all your previous files when you were done with them.
I had a script that opened and processed multiple files. Then at the very end, it kept throwing a _csv.Error in the same manner that Amit Pal saw.
My code runs about 500 lines and has three stages where it processes multiple files in succession. Here's the section of code that gave the error. As you can see, the code is plain vanilla:
f = open('file.csv')
fread = csv.reader(f)
for row in fread:
do something
And the error was:
for row in fread:
_csv.Error: newline inside string
So I told the script to print what the row....OK, that's not clear, here's what I did:
print row
f = open('file.csv')
fread = csv.reader(f)
for row in fread:
do something
Interestingly, what printed was the LAST LINE from one of the previous files I had opened and processed.
What made this really weird was that I used different variable names, but apparently the data was stuck in a buffer or memory somewhere.
So I went back and made sure I closed all previously opened files and that solved my problem.
Hope this helps someone.

Related

Attempting to save webscraping to csv file with no luck

The result is printed correctly, but the csv file stops at the first iteration and repeats itself.
Here is the code :
**with open('stocknews.csv','w') as new_file:
csv_writer=csv.writer(new_file, delimiter=' ')
csv_reader=csv.reader('stocknews.csv')
i=0
lenght=len(soup.find_all('div',{'class':'eachStory'}))**
**for i in range(lenght):
print(i+1,")")
headlines+=[soup.find_all('div',{'class':'eachStory'})[. i].find_all('a')[-1].text]
descriptions+=[soup.find_all('div',{'class':'eachStory'}). [i].find_all('p')[0].text]
print(headlines[i])
print(descriptions[i])**
**i+=1
print(i)
for i in csv_reader :
csv_writer.writerow(['headlines','descriptions'])
csv_writer.writerow([headlines, descriptions])**
I'm pretty sure the problem lies within the last few lines. i.e. csv_writer.writerow.. I've tried many things but never managed to save to csv correctly.
This isn't really an answer (There are a lot of things I don't quite understand about your code and I can't test it without any data to work with).
However,
for i in csv_reader :
csv_writer.writerow(['headlines','descriptions'])
csv_writer.writerow([headlines, descriptions])**
This is a loop over a csv_reader but what are you actually looping over? Normally you don't loop over csv_reader (I have never seen this actual construction before). Normally you loop over some collection - such as the lines in a text file which you have read. As far as I can tell, you are looping the csv_reader itself. There is only one csv_reader. hence, only one loop.
This would be more typical:
lines = csv_reader.readlines()
for line in lines:
pass #do something
I have no idea why you have double asterisks sprinkled around your code. I would suggest you to break this down and step through each line carefully - is it doing what you think it is doing? Is each intermediate step correct? There isn't a lot of code here, but 1) is the reading of the file working? 2) are the headlines and descriptions being read as you expect? 3) Once you have the headlines and descriptions, are they being written out correctly?

Python import csv file and replace blank values

I have just started a data quality class in which I got zero instruction on Python but am expected to create a script. There are three instructions for my Python script:
Create a script that loads an entire CSV file and replace all the blank values to NAN
Use genfromtxt function
Write the results set into a different file
I have been working on this for a few hours, but with no previous experience with Python, I am completely stuck! This is what I have so far:
import csv
file = open(quality.csv, 'r')
csvreader = csv.reader(file)
header = next(csvreader)
print(header)
rows = []
for row in csvreader:
rows.append(row)
print(rows)
My first problem is that when I tried using genfromtxt, it would not print out the headers or the entire csv file, it would only print out a few lines. If it matters, all of the values of the csv file are ints/floats, but the headers are strings.
See here
The next problem is I have tried several different ways to replace blank values, but I was not successful. All of the blank fields in this file are in the last column. When I print out the csv in full, this is what the line looks like (I've highlighted the empty value):
See here
Finally, I have no idea what instruction #3 means. I am completely new at this with zero Python knowledge! I think I am unsure of the Python syntax and rules - which I will look into more and learn, however I only had two days to complete this assignment and I do not know anything yet! Thank you in advance.
What you did with genfromtxt seems correct already. With big data like this, terminal only shows some data from the beginning and at the end, and those 3 dots in the middle also indicates the other records you're not seeing there!

Problems with File reader

so this piece of code is meant to create a new file, file1.txt, and have it write inside the file this block under, and close it.
Summer
Break
Now
However, I am having problems with it entering the code. If I add commas to split them, it gives an error saying it can only take one argument. But, right now it is not printing anything.
f = open("file1.txt","w")
f.write("Summer\nBreak\nNow")
f.close()

Write Python List to file - works on friends computer but not mine

I've written a program in which I need 3 arrays (lists) to write to a txt file. When I run the code on my computer, the txt file is empty. I sent it to a friend, who ran it and the program populated the txt file on his computer.
I have next to no experience coding and need the txt file for a homework assignment.
Note: It did run earlier today on my computer, although one of the arrays did not write to the file. However, when I ran it again later (after adding additional print statements earlier in the code for error checking), it again wasn't writing to the txt file. What could cause this type of behavior? My code for writing to the txt file is as follows:
import csv
.....
MyFile = open('BattSim.txt', 'w')
wr = csv.writer(MyFile)
wr.writerow(RedArmy)
wr.writerow(BlueArmy)
wr.writerow(BattleTime)
MyFile.close
Did you run this in an interactive interpreter (or in a non-CPython interpreter or otherwise crash in some weird way)? If so, the problem is that you didn't actually flush/close the file; you referenced the close method without calling it. You wanted MyFile.close() (with parens to call).
Alternatively, you use a with statement to get guaranteed close behavior (even if an exception is thrown midway):
import csv
.....
# Use with statement to autoclose, and newline='' to follow csv module rules
with open('BattSim.txt', 'w', newline='') as MyFile:
wr = csv.writer(MyFile)
wr.writerow(RedArmy)
wr.writerow(BlueArmy)
wr.writerow(BattleTime)

python csv reader not reading all rows

So I've got about 5008 rows in a CSV file, a total of 5009 with the headers. I'm creating and writing this file all within the same script. But when i read it at the end, with either pandas pd.read_csv, or python3's csv module, and print the len, it outputs 4967. I checked the file for any weird characters that may be confusing python but don't see any. All the data is delimited by commas.
I also opened it in sublime and it shows 5009 rows not 4967.
I could try other methods from pandas like merge or concat, but if python wont read the csv correct, that's no use.
This is one method i tried.
df1=pd.read_csv('out.csv',quoting=csv.QUOTE_NONE, error_bad_lines=False)
df2=pd.read_excel(xlsfile)
print (len(df1))#4967
print (len(df2))#5008
df2['Location']=df1['Location']
df2['Sublocation']=df1['Sublocation']
df2['Zone']=df1['Zone']
df2['Subnet Type']=df1['Subnet Type']
df2['Description']=df1['Description']
newfile = input("Enter a name for the combined csv file: ")
print('Saving to new csv file...')
df2.to_csv(newfile, index=False)
print('Done.')
target.close()
Another way I tried is
dfcsv = pd.read_csv('out.csv')
wb = xlrd.open_workbook(xlsfile)
ws = wb.sheet_by_index(0)
xlsdata = []
for rx in range(ws.nrows):
xlsdata.append(ws.row_values(rx))
print (len(dfcsv))#4967
print (len(xlsdata))#5009
df1 = pd.DataFrame(data=dfcsv)
df2 = pd.DataFrame(data=xlsdata)
df3 = pd.concat([df2,df1], axis=1)
newfile = input("Enter a name for the combined csv file: ")
print('Saving to new csv file...')
df3.to_csv(newfile, index=False)
print('Done.')
target.close()
But not matter what way I try the CSV file is the actual issue, python is writing it correctly but not reading it correctly.
Edit: Weirdest part is that i'm getting absolutely no encoding errors or any errors when running the code...
Edit2: Tried testing it with nrows param in first code example, works up to 4000 rows. Soon as i specify 5000 rows, it reads only 4967.
Edit3: manually saved csv file with my data instead of using the one written by the program, and it read 5008 rows. Why is python not writing the csv file correctly?
I ran into this issue also. I realized that some of my lines had open-ended quotes, which was for some reason interfering with the reader.
So for example, some rows were written as:
GO:0000026 molecular_function "alpha-1
GO:0000027 biological_process ribosomal large subunit assembly
GO:0000033 molecular_function "alpha-1
and this led to rows being read incorrectly. (Unfortunately I don't know enough about how csvreader works to tell you why. Hopefully someone can clarify the quote behavior!)
I just removed the quotes and it worked out.
Edited: This option works too, if you want to maintain the quotes:
quotechar=None
My best guess without seeing the file is that you have some lines with too many or not enough commas, maybe due to values like foo,bar.
Please try setting error_bad_lines=True. From Pandas documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html to see if it catches lines with errors in them, and my guess is that there will be 41 such lines.
error_bad_lines : boolean, default True
Lines with too many fields (e.g. a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. If False, then these “bad lines” will dropped from the DataFrame that is returned. (Only valid with C parser)
The csv.QUOTE_NONE option seems to not quote fields and replace the current delimiter with escape_char + delimiter when writing, but you didn't paste your writing code, but on read it's unclear what this option does. https://docs.python.org/3/library/csv.html#csv.Dialect

Categories