Error with urlopen: new-line character seen in unquoted field - python

I am using urllib.urlopen with Python 2.7 to read csv files located on an external webserver:
# Try & Except statements removed for clarity
import urllib
import csv
url = ...
csv_file = urllib.urlopen(url)
for row in csv.reader(csv_file):
do_something()
All 100+ files can be read fine, except one that has been updated recently and that returns:
Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
The file is accessible here. According to my text editor, its mode is Mac (CR), as opposed to Windows (CRLF) for the other files.
I found that based on this thread, python urlopen will handle correctly all formats of newlines. Therefore, the problem is likely to come from somewhere else. I have no clue though. The file opens fine with all my text editors and my speadsheet editors.
Does any one have any idea how to diagnose the problem ?
* EDIT *
The creator of the file informed me by email that I was not the only one to experience such issues. Therefore, he decided to make it again. The code above now works fine again. Unfortunately, using a new file also means that the issue can no longer be reproduced, and the solutions tested properly.
Before closing the question, I want to thank all the stackers who dedicated some of their time to figure out a solution and post it here.

It might be a corrupt .csv file? Otherwise, this code runs perfectly.
#!/usr/bin/python
import urllib
import csv
url = "http://www.football-data.co.uk/mmz4281/1213/I1.csv"
csv_file = urllib.urlopen(url)
for row in csv.reader(csv_file):
print row
Credits to J.F. Sebastian for the .csv file.
Altough, you might want to consider sharing the specific .csv file with us? So we can try to re-create the error.

The following code runs without any error:
#!/usr/bin/env python
import csv
import urllib2
r = urllib2.urlopen('http://www.football-data.co.uk/mmz4281/1213/I1.csv')
for row in csv.reader(r):
print row

I was having the same problem with a downloaded csv.
I know the fix would be to use open with 'rU'. But I would rather not have to save the file to disk, just to open back up into a variable. That seems unnecessary.
file = open(filepath,'rU')
mydata = csv.reader(file)
So if someone has a better solution that would be nice. Stackoverflow links that got me this far:
CSV new-line character seen in unquoted field error
Open the file in universal-newline mode using the CSV Django module
I found what I actually wanted with stringIO, or cStringIO, or io:
Using Python, how do I to read/write data in memory like I would with a file?
I ended up getting io working,
import csv
import urllib2
import io
# warning its a 20MB csv
url = 'http://poweredgec.com/latest_poweredge-11g.csv'
urlRead = urllib2.urlopen(url).read()
ramFile = io.open(urlRead, mode='w')
openRamFile = open(ramFile, 'rU')
csvCurrent = csv.reader(openRamFile)
csvTuple = map(tuple, csvCurrent)
print csvTuple

Related

Find & replace data in a CSV using Python on Zapier

I'm new to python, zapier and pretty much everything, so forgive me if this is easy or impossible...
I'm trying to import multiple csv's into zapier for an automated workflow, however they contain dot points that aren't formatted using UTF-8, which is all zapier can read.
It consistently errors -
"utf-8' codec can't decode byte 0x95 in position 829: invalid start byte"
After talking to zapier support, they've suggested using python to possibly find an replace these dot points with asterisk or dash, then import this corrected csv into my zapier workflow.
This is what i have written so far as a Python action in Zapier (just trying to read the csv to start with) with no luck:
import csv
with open(input_data['file'], 'r') as file:
reader = csv.reader(file)
for row in reader:
print(row)
Is this possible?
Thanks!
Zapier trying to import CSV with bullet points
My current python code (not working) in an attempt to find & replace bullet points in the CSV's
This is possible, but it's a little tricky. Zapier is confusing when it comes to files. On your computer, files are a series of bytes. But in Zapier, a file is usually a url that points to the actual file. This is great for cross-app compatibility, but tricky to work with in code.
You're trying to open to open a url as a file in Python, which isn't working. Instead, make a request for that file, then read it as a series of bytes. Try this:
import csv
import io
file_data = requests.get(input_data['file'])
reader = csv.reader(file_data.content.decode('utf-8').splitlines(), delimiter=',')
result = io.StringIO() # a string interface to write
writer = csv.writer(result)
for row in reader:
# some modifications here
# row = row.replace(...)
writer.writerow(row)
return [{'data': result.getvalue()}]
The result there is because you want to write out a string that you can then re-package as a CSV in your virtual filesystem of choice (gDrive, Dropbox, etc).
You can also test this locally instead of in the Zapier editor (I find that's a bit easier to iterate with). Simply get the file url from the code step (it'll be something like https://zapier.com/engine/... and make a local python file with:
input_data = {'file': 'https://zapier.com/engine/...'}
...
You'll also need to pip install requests if you don't have it.

How does one read a .dif file with Python

I am working on a project that requires me to read a file with a .dif extension. Dif stands for data information exchange. The file opens nicely in Open Office Calc. Then you can easily save as a csv file, however when I open in Python all I get are random characters that don't make sense. Here is the last code that I tried just to see if I could read.
txt = open('C:\myfile.dif', 'rb').read()
print txt
I would even be open to programatically converting the file to csv first. before opening if someone knows how to do that. As always, any help is much appreciated. Below is a partial screenshot of what I get when I run the code.
Hadn't heard of this file format. Went and got a sample here.
I tested your method and it works fine:
>>> content = open(r"E:\sample.dif", 'rb').read()
>>> print (content)
b'TABLE\r\n0,1\r\n"EXCEL"\r\nVECTORS\r\n0,8\r\n""\r\nTUPLES\r\n0,3\r\n""\r\nDATA\r\n0,0\r\n""\r\n-1,0\r\nBOT\r\n1,0\r\n"Welcome to File Extension FYI Center!"\r\n1,0\r\n""\r\n1,0\r\n""\r\n-1,0\r\nBOT\r\n1,0\r\n""\r\n1,0\r\n""\r\n1,0\r\n""\r\n-1,0\r\nBOT\r\n1,0\r\n"ID"\r\n1,0\r\n"Type"\r\n1,0\r\n"Description"\r\n-1,0\r\nBOT\r\n0,1\r\nV\r\n1,0\r\n"ASP"\r\n1,0\r\n"Active Server Pages"\r\n-1,0\r\nBOT\r\n0,2\r\nV\r\n1,0\r\n"JSP"\r\n1,0\r\n"JavaServer Pages"\r\n-1,0\r\nBOT\r\n0,3\r\nV\r\n1,0\r\n"PNG"\r\n1,0\r\n"Portable Network Graphics"\r\n-1,0\r\nBOT\r\n0,4\r\nV\r\n1,0\r\n"GIF"\r\n1,0\r\n"Graphics Interchange Format"\r\n-1,0\r\nBOT\r\n0,5\r\nV\r\n1,0\r\n"WMV"\r\n1,0\r\n"Windows Media Video"\r\n-1,0\r\nEOD\r\n'
>>>
The question is what is in the file and how do you want to handle it. Personally I liked:
with open(r"E:\sample.dif", 'rb') as f:
for line in f:
print (line)
In the first code block, that long line that has a b'' (for bytes!) in front of it can be iterated on \r\n:
b'TABLE\r\n'
b'0,1\r\n'
b'"EXCEL"\r\n'
b'VECTORS\r\n'
b'0,8\r\n'
b'""\r\n'
b'TUPLES\r\n'
b'0,3\r\n'
b'""\r\n'
b'DATA\r\n'
b'0,0\r\n'
.
.
.
b'"Windows Media Video"\r\n'
b'-1,0\r\n'
b'EOD\r\n'

LOAD XML INFILE save nested childs as plain

I did my research on the internet and it seems, that LOAD XML INFILE could not save nested childs with same names or simply with different names.
imported XML sample here
But is there any option, which could be used to keep whole content in parent as plaintext? Its not problem for me after that to parse that content line by line.
Please do not tell me I need to parse it with PHP, it fails in case of speed and I have many XMLs I need to load, so terminal is best solution for me.
So if there is for example some kind of shell or python script (in case that its not possible to import it as plain).
Thanks in advance
Thank you all for correcting grammar mistakes, its very useful and you should earn another badge for helping to community.
Since nobody came up with solution, I did following, which helped me:
1) create file script.py with this contents
#!/usr/bin/python3
# coding: utf-8
import os
import sys
import fileinput
replacements = {'<Image>':'', '</Image>':';',' ':'','\n':''}
with open('/var/www/html/XX/data/xml/products.xml') as infile, open('/var/www/html/XXX/data/xml/products_clean.xml', 'w') as outfile:
for line in infile:
for src, target in replacements.iteritems():
line = line.replace(src, target)
outfile.write(line)
2) run it through terminal
python /var/www/html/script.py
3) then you load XML infile that XML to your mysql as usual, or you can transform that column into json for better use

Python basics - request data from API and write to a file

I am trying to use "requests" package and retrieve info from Github, like the Requests doc page explains:
import requests
r = requests.get('https://api.github.com/events')
And this:
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
I have to say I don't understand the second code block.
filename - in what form do I provide the path to the file if created? where will it be saved if not?
'wb' - what is this variable? (shouldn't second parameter be 'mode'?)
following two lines probably iterate over data retrieved with request and write to the file
Python docs explanation also not helping much.
EDIT: What I am trying to do:
use Requests to connect to an API (Github and later Facebook GraphAPI)
retrieve data into a variable
write this into a file (later, as I get more familiar with Python, into my local MySQL database)
Filename
When using open the path is relative to your current directory. So if you said open('file.txt','w') it would create a new file named file.txt in whatever folder your python script is in. You can also specify an absolute path, for example /home/user/file.txt in linux. If a file by the name 'file.txt' already exists, the contents will be completely overwritten.
Mode
The 'wb' option is indeed the mode. The 'w' means write and the 'b' means bytes. You use 'w' when you want to write (rather than read) froma file, and you use 'b' for binary files (rather than text files). It is actually a little odd to use 'b' in this case, as the content you are writing is a text file. Specifying 'w' would work just as well here. Read more on the modes in the docs for open.
The Loop
This part is using the iter_content method from requests, which is intended for use with large files that you may not want in memory all at once. This is unnecessary in this case, since the page in question is only 89 KB. See the requests library docs for more info.
Conclusion
The example you are looking at is meant to handle the most general case, in which the remote file might be binary and too big to be in memory. However, we can make your code more readable and easy to understand if you are only accessing small webpages containing text:
import requests
r = requests.get('https://api.github.com/events')
with open('events.txt','w') as fd:
fd.write(r.text)
filename is a string of the path you want to save it at. It accepts either local or absolute path, so you can just have filename = 'example.html'
wb stands for WRITE & BYTES, learn more here
The for loop goes over the entire returned content (in chunks incase it is too large for proper memory handling), and then writes them until there are no more. Useful for large files, but for a single webpage you could just do:
# just W becase we are not writing as bytes anymore, just text.
with open(filename, 'w') as fd:
fd.write(r.content)

Python 3 CSV not writing

When I open my csv file I see nothing. Is this the right way to build a csv file? Just trying to learn it all. Thanks for all your help.
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://shop.nordstrom.com/c/designer-handbags?dept=8000001&origin=topnav#category=b60133547&type=category&color=&price=&brand=&stores=&instoreavailability=false&lastfilter=&sizeFinderId=0&resultsmode=&segmentId=0&page=1&partial=1&pagesize=100&contextualsortcategoryid=0")
nordHandbags = BeautifulSoup(html)
bagList = nordHandbags.findAll("a", {"class":"title"})
f = csv.writer(open("./nordstrom.csv", "w"))
f.writerow(["Product Title"])
for title in bagList:
productTitles = title.contents[0]
f.writerow([productTitles])
Really hard to see how you could fail to have at least a "Product Title" header in that file. Are you checking the file after you have tgerminated the Python interpreter? This, because there is no explicit close of the file in that code, and until it is closed, its contents may be cached in memory.
More Pythonic, and avoiding this problem, is
with open("./nordstrom.csv", "w") as csvfile:
f = csv.writer( csvfile)
f.writerow(["Product Title"])
# etc.
pass # close the with block, csvfile is now closed.
Also (grasping at straws) are you opening the file with a text editor to check it, or just using the type command in Windows cmd.exe? Because, if the file doesn't contain an explicit LF, the C:\wherever\ >prompt may overwrite the header before you see it.

Categories