get file size of the file from the file object - python

I need to know the size of the file based on file object
import csv
import os
with open("test.csv", "rb") as infile:
reader = csv.reader(infile)
print reader
filesize(reader)
def filesize(reader):
os.getsize(reader) #And i need work with reader for more details.so I must need to pass a reader or file object
When I run this I got an output is
<_csv.reader object at 0x7f5644584980>
From this file object how I get the size of the file?
And I checked this site but these are not CSV attribute size of an open file object
EDIT: When I use that two inbuilt function I got errors that are
AttributeError: '_csv.reader' object has no attribute 'seek'
AttributeError: '_csv.reader' object has no attribute 'tell'

You can use os.path.getsize or os.stat
import os
os.path.getsize('test.csv')
OR
os.stat('test.csv').st_size
Return the size, in bytes.

Adding this answer because it actually answers the question asked about using the file object directly:
import csv
import os
with open("test.csv", "rb") as infile:
reader = csv.reader(infile)
print reader
infile.seek(0, os.SEEK_END)
filesize = infile.tell()

What's wrong with os.path.getsize?
With your code:
import os
import csv
with open("test.csv", "rb") as infile:
reader = csv.reader(infile)
print os.path.getsize(infile.name)
The size is in bytes.

Related

Python TypeError: unhashable type: 'list' reading a CSV

I'm trying to learn python to implement a user agent transformation script in our analytics database.
I imported the user_agents lib to do the conversion and show me the user data. When executing this script in python reading a csv file that I extracted containing the user_agents (the csv has only one column) it returns this error:
TypeError: nailshable type: 'list'
Here is the script I am using:
import csv
from user_agents import parse
with open ('UserAgent.csv', 'r') as csv_file:
csv_reader = csv.reader (csv_file)
for line in csv_reader:
print (parse (line))
The parse method takes a string as an argument. However, in your code, each line is a list and not a string, you can try this:
with open('UserAgent.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)
for line in csv_reader:
print( parse(' '.join(line)) )

python: use CSV reader with single file extracted from tarfile

I am trying to use the Python CSV reader to read a CSV file that I extract from a .tar.gz file using Python's tarfile library.
I have this:
tarFile = tarfile.open(name=tarFileName, mode="r")
for file in tarFile.getmembers():
tarredCSV = tarFile.extractfile(file)
reader = csv.reader(tarredCSV)
next(reader) # skip header
for row in reader:
if row[3] not in CSVRows.values():
CSVRows[row[3]] = row
All the files in the tar file are all CSVs.
I am getting an exception on the first file. I am getting this exception on the first next line:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
How do I open said file (without extracting the file then opening it)?
tarfile.extractfile returns an io.BufferedReader object, a bytes stream, and yet csv.reader expects a text stream. You can use io.TextIOWrapper to convert the bytes stream to a text stream instead:
import io
...
reader = csv.reader(io.TextIOWrapper(tarredCSV, encoding='utf-8'))
You need to provide a file-like object to csv.reader.
Probably the best solution, without having to consume a complete file at once is this approach (thanks to blhsing and damon for suggesting it):
import csv
import io
import tarfile
tarFile = tarfile.open(name=tarFileName, mode="r")
for file in tarFile.getmembers():
csv_file = io.TextIOWrapper(tarFile.extractfile(file), encoding="utf-8")
reader = csv.reader(csv_file)
next(reader) # skip header
for row in reader:
print(row)
Alternatively a possible solution from here: Python3 working with csv files in tar files would be
import csv
import io
import tarfile
tarFile = tarfile.open(name=tarFileName, mode="r")
for file in tarFile.getmembers():
csv_file = io.StringIO(tarFile.extractfile(file).read().decode('utf-8'))
reader = csv.reader(csv_file)
next(reader) # skip header
for row in reader:
print(row)
Here a io.StringIO object is used to make csv.reader happy. However, this might not scale well for larger files contained in the tar as each file is read in one single step.

Open file from zip without extracting it in Python?

I am working on a script that fetches a zip file from a URL using tje request library. That zip file contains a csv file. I'm trying to read that csv file without saving it. But while parsing it's giving me this error: _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
import csv
import requests
from io import BytesIO, StringIO
from zipfile import ZipFile
response = requests.get(url)
zip_file = ZipFile(BytesIO(response.content))
files = zip_file.namelist()
with zip_file.open(files[0]) as csvfile:
csvreader = csv.reader(csvfile)
# _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
for row in csvreader:
print(row)
Try this:
import pandas as pd
import requests
from io import BytesIO, StringIO
from zipfile import ZipFile
response = requests.get(url)
zip_file = ZipFile(BytesIO(response.content))
files = zip_file.namelist()
with zip_file.open(files[0]) as csvfile:
print(pd.read_csv(csvfile, encoding='utf8', sep=","))
As #Aran-Fey alluded to:
import zipfile
import csv
import io
with open('/path/to/archive.zip', 'r') as f:
with zipfile.ZipFile(f) as zf:
csv_filename = zf.namelist()[0] # see namelist() for the list of files in the archive
with zf.open(csv_filename) as csv_f:
csv_f_as_text = io.TextIOWrapper(csv_f)
reader = csv.reader(csv_f_as_text)
csv.reader (and csv.DictReader) require a file-like object opened in text mode. Normally this is not a problem when simply open(...)ing file in 'r' mode, as the Python 3 docs say, text mode is the default: "The default mode is 'r' (open for reading text, synonym of 'rt')". But if you try rt with open on a ZipFile, you'll see an error that: ZipFile.open() requires mode "r" or "w":
with zf.open(csv_filename, 'rt') as csv_f:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
...
ValueError: open() requires mode "r" or "w"
That's what io.TextIOWrapper is for -- for wrapping byte streams to be readable as text, decoding them on the fly.

Python json to CSV

I am trying to convert a json data set file into csv. I am really new to python, and have been looking on the forums and cannot seem to resolve my issues. I have attached the json data url link in below along with my code. Thanks in advance!
https://data.ny.gov/api/views/nqur-w4p7/rows.json?accessType=DOWNLOAD
import json
import csv
inputFile = ("rows.json?accessType=DOWNLOAD", "r")
data = json.load(inputFile)
with open("Data.csv","wb") as csvfile:
csv_writer = csv.DictWriter(csvfile,delimiter=",", fieldnames=["data", "new_york_state_average_gal", "albany_average_gal", "binghamton_average_gal", "bu\
ffalo_average_gal", "nassau_average_gal", "new_york_city_average_gal", "rochester_average_gal", "utica_average_gal"])
csv_writer.writerheader()
csv_writer.writerows(data)
Here is the error I am getting:
File "ChangeDataType.py", line 5, in <module>
data = json.load(inputFile)
File "/usr/lib64/python3.4/json/__init__.py", line 265, in load
return loads(fp.read(),
AttributeError: 'tuple' object has no attribute 'read'
Your error happens because you made a tuple:
inputFile = ("rows.json?accessType=DOWNLOAD", "r")
And you're trying to use json.load in that tuple. Since json.load works only on files, you need to call the open function:
inputFile = open("rows.json?accessType=DOWNLOAD", "r")
The "r" part indicates you're opening the file for reading.

AttributeError when rewriting code so it works in Python 3.4

I am trying to alter the code below so that it works in Python 3.4. However, I get the Error AttributeError: 'int' object has no attribute 'replace' in the line line.replace(",", "\t"). I am trying to understand how to rewrite this part of the code.
import os
import gzip
from io import BytesIO
import pandas as pd
try:
import urllib.request as urllib2
except ImportError:
import urllib2
baseURL = "http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file="
filename = "data/irt_euryld_d.tsv.gz"
outFilePath = filename.split('/')[1][:-3]
response = urllib2.urlopen(baseURL + filename)
compressedFile = BytesIO()
compressedFile.write(response.read())
compressedFile.seek(0)
decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')
with open(outFilePath, 'w') as outfile:
outfile.write(decompressedFile.read().decode("utf-8", errors="ignore"))
#Now have to deal with tsv file
import csv
csvout = 'C:/Sidney/ECB.tsv'
outfile = open(csvout, "w")
with open(outFilePath, "rb") as f:
for line in f.read():
line.replace(",", "\t")
outfile.write(line)
outfile.close()
Thank You
You're writing ASCII (by default) with the 'w' mode, but the file you're getting that content from is being read as bytes with the 'rb' mode. Open that file with 'r'.
And then, as Sebastian suggests, just iterate over the file object with for line in f:. Using f.read() will read the entire thing into a single string, so if you iterate over that, you'll be iterating over each character of the file. Strictly speaking, since all you're doing is replacing a single character, the end result will be identical, but iterating over the file object is preferred (uses less memory).
Let's make better use of the with construct and go from this:
outfile = open(csvout, "w")
with open(outFilePath, "rb") as f:
for line in f.read():
line.replace(",", "\t")
outfile.write(line)
outfile.close()
to this:
with open(outFilePath, "r") as f, open(csvout, 'w') as outfile:
for line in f:
outfile.write(line.replace(",", "\t"))
Also, I should note that this is much easier to do with find-and-replace in your text editor of choice (I like Notepad++).
Try rewriting it as this:
with open(outFilePath, "r") as f:
for line in f: #don't iterate over entire file at once, go line by line
line.replace(",", "\t")
outfile.write(line)
Originally you were opening it as a 'read-binary' rb file, which returns an integer (bytes), not a string as you were expecting. In Python, int objects do not have the .replace() method, however the string object does. This is the cause for your AttributeError. Ensuring that you open it as a regular 'read' r file, will then return a string which has the .replace() method available to call.
Related post on return type of .read() here and more information available from the docs here.

Categories