Python 3 CSV not writing - python

When I open my csv file I see nothing. Is this the right way to build a csv file? Just trying to learn it all. Thanks for all your help.
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://shop.nordstrom.com/c/designer-handbags?dept=8000001&origin=topnav#category=b60133547&type=category&color=&price=&brand=&stores=&instoreavailability=false&lastfilter=&sizeFinderId=0&resultsmode=&segmentId=0&page=1&partial=1&pagesize=100&contextualsortcategoryid=0")
nordHandbags = BeautifulSoup(html)
bagList = nordHandbags.findAll("a", {"class":"title"})
f = csv.writer(open("./nordstrom.csv", "w"))
f.writerow(["Product Title"])
for title in bagList:
productTitles = title.contents[0]
f.writerow([productTitles])

Really hard to see how you could fail to have at least a "Product Title" header in that file. Are you checking the file after you have tgerminated the Python interpreter? This, because there is no explicit close of the file in that code, and until it is closed, its contents may be cached in memory.
More Pythonic, and avoiding this problem, is
with open("./nordstrom.csv", "w") as csvfile:
f = csv.writer( csvfile)
f.writerow(["Product Title"])
# etc.
pass # close the with block, csvfile is now closed.
Also (grasping at straws) are you opening the file with a text editor to check it, or just using the type command in Windows cmd.exe? Because, if the file doesn't contain an explicit LF, the C:\wherever\ >prompt may overwrite the header before you see it.

Related

Python subprocess can't find the output of csv writer

I'm ripping some data from Mongo, sanitizing it via Python, and writing it to text file to import to Vertica. Vertica can't parse the python-written gzip (no idea why), so I'm trying to write the data to a csv and use bash to gzip the file instead.
csv_filename = '/home/deploy/tablecopy/{0}.csv'.format(vertica_table)
with open(csv_filename, 'wb') as csv_file:
csv_writer = csv.writer(csv_file, delimiter=',')
for replacement in mongo_object.find():
replacement_id = clean_value(replacement, "_id")
csv_writer.writerow([replacement_id, booking_id, style, added_ts])
subprocess.call(['gzip', 'file', csv_filename])
When I run this code, I get "gzip: file: No such file or directory," despite the fact that 1) the file is getting created immediately beforehand and 2) there's already a copy of the csv in the directory prior to the run, since this is a script that gets run repeatedly.
These points make me think that python is tying up the file somehow and bash can't see/access it. Any ideas on how to get this conversion to run?
Thanks
Just pass the csv_filename, gzip is looking for a file called "file" which does not exists so it errors not the csv_filename file:
subprocess.call(['gzip', csv_filename])
There is no file argument for gzip, you simply need to pass the filename.
You've already got the correct answer to your problem.... but alternately, you can use the gzip module to compress as you write so there is no need to call the gzip program at all. This example assumes you use python 3.x and you just have ascii text.
import gzip
csv_filename = '/home/deploy/tablecopy/{0}.csv'.format(vertica_table)
with gzip.open(csv_filename + '.gz', 'wt', encoding='ascii', newline='') as csv_file:
csv_writer = csv.writer(csv_file, delimiter=',')
for replacement in mongo_object.find():
replacement_id = clean_value(replacement, "_id")
csv_writer.writerow([replacement_id, booking_id, style, added_ts])

Python DictWriter /n

I am using the following code and it works well except for the fact that my code spits out on to a CSV file from Excel and it skips every other line. I have googled the csv module documentation and other examples in stackoverflow.com and I found that I need to use DictWriter with the lineterminator set at '\n'. My own attempts to write it into the code have been foiled.
So I am wondering is there a way for me to apply this(being the lineterminator) to the whole file so that I do not have any lines skipped? And if so how?
Here is the code:
import urllib2
from BeautifulSoup import BeautifulSoup
import csv
page = urllib2.urlopen('http://finance.yahoo.com/q/ks?s=F%20Key%20Statistics').read()
f = csv.writer(open("pe_ratio.csv","w"))
f.writerow(["Name","PE"])
soup = BeautifulSoup(page)
all_data = soup.findAll('td', "yfnc_tabledata1")
f.writerow([all_data[2].getText()])
Thanks for your help in advance.
You need to open your file with the right options for the csv.writer class to work correctly. The module has universal newline support internally, so you need to turn off Python's universal newline support at the file level.
For Python 2, the docs say:
If csvfile is a file object, it must be opened with the 'b' flag on platforms where that makes a difference.
For Python 3, they say:
If csvfile is a file object, it should be opened with newline=''.
Also, you should probably use a with statement to handle opening and closing your file, like this:
with open("pe_ratio.csv","wb") as f: # or open("pe_ratio.csv", "w", newline="") in Py3
writer = csv.writer(f)
# do other stuff here, staying indented until you're done writing to the file
First, since Yahoo provides an API that returns CSV files, maybe you can solve your problem that way? For example, this URL returns a CSV file containing prices, market cap, P/E and other metrics for all stocks in that industry. There is some more information in this Google Code project.
Your code only produces a two-row CSV because there are only two calls to f.writerow(). If the only piece of data you want from that page is the P/E ratio, this is almost certainly not the best way to do it, but you should pass to f.writerow() a tuple containing the value for each column. To be consistent with your header row, that would be something like:
f.writerow( ('Ford', all_data[2].getText()) )
Of course, that assumes that the P/E ratio will always be second in the list. If instead you wanted all the statistics provided on that page, you could try:
# scrape the html for the name and value of each metric
metrics = soup.findAll('td', 'yfnc_tablehead1')
values = soup.findAll('td', 'yfnc_tabledata1')
# create a list of tuples for the writerows method
def stripTag(tag): return tag.text
data = zip(map(stripTag, metrics), map(stripTag, values))
# write to csv file
f.writerows(data)

How to read a CSV file from a URL with Python?

when I do curl to a API call link http://example.com/passkey=wedsmdjsjmdd
curl 'http://example.com/passkey=wedsmdjsjmdd'
I get the employee output data on a csv file format, like:
"Steve","421","0","421","2","","","","","","","","","421","0","421","2"
how can parse through this using python.
I tried:
import csv
cr = csv.reader(open('http://example.com/passkey=wedsmdjsjmdd',"rb"))
for row in cr:
print row
but it didn't work and I got an error
http://example.com/passkey=wedsmdjsjmdd No such file or directory:
Thanks!
Using pandas it is very simple to read a csv file directly from a url
import pandas as pd
data = pd.read_csv('https://example.com/passkey=wedsmdjsjmdd')
This will read your data in tabular format, which will be very easy to process
You need to replace open with urllib.urlopen or urllib2.urlopen.
e.g.
import csv
import urllib2
url = 'http://winterolympicsmedals.com/medals.csv'
response = urllib2.urlopen(url)
cr = csv.reader(response)
for row in cr:
print row
This would output the following
Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
1924,Chamonix,Skating,Figure skating,AUT,individual,M,Silver
1924,Chamonix,Skating,Figure skating,AUT,individual,W,Gold
...
The original question is tagged "python-2.x", but for a Python 3 implementation (which requires only minor changes) see below.
You could do it with the requests module as well:
url = 'http://winterolympicsmedals.com/medals.csv'
r = requests.get(url)
text = r.iter_lines()
reader = csv.reader(text, delimiter=',')
To increase performance when downloading a large file, the below may work a bit more efficiently:
import requests
from contextlib import closing
import csv
url = "http://download-and-process-csv-efficiently/python.csv"
with closing(requests.get(url, stream=True)) as r:
reader = csv.reader(r.iter_lines(), delimiter=',', quotechar='"')
for row in reader:
# Handle each row here...
print row
By setting stream=True in the GET request, when we pass r.iter_lines() to csv.reader(), we are passing a generator to csv.reader(). By doing so, we enable csv.reader() to lazily iterate over each line in the response with for row in reader.
This avoids loading the entire file into memory before we start processing it, drastically reducing memory overhead for large files.
This question is tagged python-2.x so it didn't seem right to tamper with the original question, or the accepted answer. However, Python 2 is now unsupported, and this question still has good google juice for "python csv urllib", so here's an updated Python 3 solution.
It's now necessary to decode urlopen's response (in bytes) into a valid local encoding, so the accepted answer has to be modified slightly:
import csv, urllib.request
url = 'http://winterolympicsmedals.com/medals.csv'
response = urllib.request.urlopen(url)
lines = [l.decode('utf-8') for l in response.readlines()]
cr = csv.reader(lines)
for row in cr:
print(row)
Note the extra line beginning with lines =, the fact that urlopen is now in the urllib.request module, and print of course requires parentheses.
It's hardly advertised, but yes, csv.reader can read from a list of strings.
And since someone else mentioned pandas, here's a pandas rendition that displays the CSV in a console-friendly output:
python3 -c 'import pandas
df = pandas.read_csv("http://winterolympicsmedals.com/medals.csv")
print(df.to_string())'
Pandas is not a lightweight library, though. If you don't need the things that pandas provides, or if startup time is important (e.g. you're writing a command line utility or any other program that needs to load quickly), I'd advise that you stick with the standard library functions.
import pandas as pd
url='https://raw.githubusercontent.com/juliencohensolal/BankMarketing/master/rawData/bank-additional-full.csv'
data = pd.read_csv(url,sep=";") # use sep="," for coma separation.
data.describe()
I am also using this approach for csv files (Python 3.6.9):
import csv
import io
import requests
r = requests.get(url)
buff = io.StringIO(r.text)
dr = csv.DictReader(buff)
for row in dr:
print(row)
what you were trying to do with the curl command was to download the file to your local hard drive(HD). You however need to specify a path on HD
curl http://example.com/passkey=wedsmdjsjmdd -o ./example.csv
cr = csv.reader(open('./example.csv',"r"))
for row in cr:
print row

Error with urlopen: new-line character seen in unquoted field

I am using urllib.urlopen with Python 2.7 to read csv files located on an external webserver:
# Try & Except statements removed for clarity
import urllib
import csv
url = ...
csv_file = urllib.urlopen(url)
for row in csv.reader(csv_file):
do_something()
All 100+ files can be read fine, except one that has been updated recently and that returns:
Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
The file is accessible here. According to my text editor, its mode is Mac (CR), as opposed to Windows (CRLF) for the other files.
I found that based on this thread, python urlopen will handle correctly all formats of newlines. Therefore, the problem is likely to come from somewhere else. I have no clue though. The file opens fine with all my text editors and my speadsheet editors.
Does any one have any idea how to diagnose the problem ?
* EDIT *
The creator of the file informed me by email that I was not the only one to experience such issues. Therefore, he decided to make it again. The code above now works fine again. Unfortunately, using a new file also means that the issue can no longer be reproduced, and the solutions tested properly.
Before closing the question, I want to thank all the stackers who dedicated some of their time to figure out a solution and post it here.
It might be a corrupt .csv file? Otherwise, this code runs perfectly.
#!/usr/bin/python
import urllib
import csv
url = "http://www.football-data.co.uk/mmz4281/1213/I1.csv"
csv_file = urllib.urlopen(url)
for row in csv.reader(csv_file):
print row
Credits to J.F. Sebastian for the .csv file.
Altough, you might want to consider sharing the specific .csv file with us? So we can try to re-create the error.
The following code runs without any error:
#!/usr/bin/env python
import csv
import urllib2
r = urllib2.urlopen('http://www.football-data.co.uk/mmz4281/1213/I1.csv')
for row in csv.reader(r):
print row
I was having the same problem with a downloaded csv.
I know the fix would be to use open with 'rU'. But I would rather not have to save the file to disk, just to open back up into a variable. That seems unnecessary.
file = open(filepath,'rU')
mydata = csv.reader(file)
So if someone has a better solution that would be nice. Stackoverflow links that got me this far:
CSV new-line character seen in unquoted field error
Open the file in universal-newline mode using the CSV Django module
I found what I actually wanted with stringIO, or cStringIO, or io:
Using Python, how do I to read/write data in memory like I would with a file?
I ended up getting io working,
import csv
import urllib2
import io
# warning its a 20MB csv
url = 'http://poweredgec.com/latest_poweredge-11g.csv'
urlRead = urllib2.urlopen(url).read()
ramFile = io.open(urlRead, mode='w')
openRamFile = open(ramFile, 'rU')
csvCurrent = csv.reader(openRamFile)
csvTuple = map(tuple, csvCurrent)
print csvTuple

Create hash table from the contents of a file

How can I open a text file, read the contents of the file and create a hash table from this content? So far I have tried:
import json
json_data = open(/home/azoi/Downloads/yes/1.txt).read()
data = json.loads(json_data)
pprint(data)
I suggest this solution:
import json
with open("/home/azoi/Downloads/yes/1.txt") as f:
data=json.load(f)
pprint(data)
The with statement ensures that your file is automatically closed whatever happens and that your program throws the correct exception if the open fails. The json.load function directoly loads data from an open file handle.
Additionally, I strongly suggest reading and understanding the Python tutorial. It's essential reading and won't take too long.
To open a file you have to use the open statment correctly, something like:
json_data=open('/home/azoi/Downloads/yes/1.txt','r')
where the first string is the path to the file and the second is the mode: r = read, w = write, a = append

Categories