How to gzip query results-Python/Airflow

How to gzip query results-Python/Airflow - python

I am trying to GZIP my query results and write it to a location in Airflow. However I get the error of
TypeError: memoryview: a bytes-like object is required, not 'str'
whenever I run my code.
Check out the fp variable in my code:
def create_tunnel_postgres():
try:
tunnel = SSHTunnelForwarder((ssh_host, 22),
ssh_username=ssh_username,
ssh_private_key=pkf,
remote_bind_address=(psql_host,
5432))
# local_bind_address=('localhost',6543) # could be any available port
# Start the tunnel
tunnel.start()
except:
print 'connection'
else:
conn = psycopg2.connect(database='my_db', user='user',
password='my_pwd',
host=tunnel.local_bind_host,
port=tunnel.local_bind_port)
cur = conn.cursor()
cur.execute("""
select * from pricing.public.seller_tiers ;
""")
result = cur.fetchall()
# Getting Field Header names
column_names = [i[0] for i in cur.description]
fp = gzip.open(path, 'wb')
myFile = csv.writer(fp, delimiter=',')
myFile.writerow(column_names)
myFile.writerows(result)
fp.close()
conn.close
tunnel.stop
Any ideas or suggestions? I am new to python/airflow so anything would help.

I think the error would be with respect to the way you are using gzip writer file content.
You are trying to open the gzip file in byte mode and writing string to it, in here fp = gzip.open(path, 'wb').
As per python documentation in here it's stated:
'rt', 'at', 'wt', or 'xt' for text mode.
Change your code to use wt which is write text or use encode feature to encode to bytes:
import gzip
import csv
with gzip.open("sample.gz", "wt") as gz_fp:
fieldnames = ['first_name', 'last_name']
writer = csv.writer(gz_fp, delimiter=",")
writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})
writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})
If you want to write bytes only then:
with gzip.open('file.gz', 'wb') as f:
f.write('Hello world!'.encode())

Related

How to create a csv file virtually without any physical file / path defined?

Need to create a csv file and convert it into byte like data for sending as EDI doc. I am trying to achieve this without having a physical file because location/path is unknown. Let me know if there is anyway we could achieve.
with open(
"/home/some path/*.dat", "r+", newline="\n"
) as write_f:
data_file = csv.writer(write_f, delimiter=';')
header_vals = ["header values"]
query = """data fetching query"""
data_file.writerow(header_vals)
self.env.cr.execute(query)
data_vals = self.env.cr.fetchall()
data_file.writerows(data_vals)
po_data = write_f.read(1024)
return po_data
Try 1: Instead of path, tried IO objects(BytesIO/StringIO)
data_file = BytesIO()
data_write = csv.writer(data_file, delimiter=';')
header_vals = ["header values"]
query = """data fetching query"""
data_write.writerow(header_vals)
self.env.cr.execute(query)
data_vals = self.env.cr.fetchall()
data_write.writerows(data_vals)
Received the error at writerow: TypeError: a bytes-like object is required, not 'str'

BytesIO behaves like a file in binary (!) mode. You need to write bytes to it.
But a csv.writer cannot write bytes, it only writes strings. That's the error message you see.
from io import StringIO
buffer = StringIO()
writer = csv.writer(buffer, delimiter=';')
header_vals = ['column_1', 'column_2']
writer.writerow(header_vals)
print(buffer.getvalue())
# => 'column_1;column_2\r\n'

Write sql data to csv from python API

I am running SQL query from python API and want to collect data in Structured(column-wise data under their own header).CSV format.
This is the code so far I have.
import pymysql.cursors
import csv
conn = pymysql.connect(host='159.XXX.XXX.XXX',user='proXXX',password='PXX',db='pXX',charset='utf8mb4',cursorclass=pymysql.cursors.DictCursor)
cursor = conn.cursor()
print (type(conn))
sql = "SELECT id,author From researches WHERE id < 20 "
cursor.execute(sql)
data = cursor.fetchall()
print (data)
with open('metadata.csv', 'w', newline='') as f_handle:
writer = csv.writer(f_handle,delimiter=',')
header = ['id', 'author']
writer.writerow(header)
for row in data:
writer.writerow(row)
Now the data is being printed on the console but not getting in.CSV file this is what I am getting asnoutput. What is that I am missing? Please help.

with open('metadata.csv', 'w', newline='') as f_handle:
fieldnames = ['id', 'author']
writer = csv.DictWriter(f_handle, fieldnames=fieldnames)
writer.writeheader()
for row in data:
writer.writerow(row)
So the thing is, your data is in the form of dictionaries, while the Writer object expects tuples. You should be using the DictWriter object instead.

Save MS Access tables as CSV

I am new to Python and request your kind assistance. I have five tables in the MS Access database and I need to compile a CSV file for each of the tables. One of the tables is Perm_Reviews, which is part of the snippet. Fortunately, I am able to query the MS Access data and it returns rows and the columns associated from the database. Can someone please provide assistance on how to store the tables as CSV files.
import pyodbc
import csv
conn_string = ("DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=T:\\DataDump\\7.18.2016 PCR etrakit.accdb")
save_csv = 'C:\Desktop\CSVFiles'
conn = pyodbc.connect(conn_string)
cursor = conn.cursor()
SQL = 'select * from Perm_Reviews;'
for row in cursor.execute(SQL):
print row
cursor.close()
conn.close()
print 'All done for now'

I think this is what you are looking for.
import pyodbc
import csv
conn_string = ("DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=T:\\DataDump\\7.18.2016 PCR etrakit.accdb")
conn = pyodbc.connect(conn_string)
cursor = conn.cursor()
cursor.execute('select * from Perm_Reviews;')
with open('Perms_Review.csv','w') as f:
writer = csv.writer(f)
writer.writerows([i[0] for i in cursor.description])
writer.writerows(cursor)
cursor.close()
conn.close()

Python has a built-in csv module that you can use readily, below a simple example on csv with headers:
import csv
with open('names.csv', 'w') as csvfile:
fieldnames = ['first_name', 'last_name']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})
writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})

python csv, writing headers only once

So I have a program that creates CSV from .Json.
First I load the json file.
f = open('Data.json')
data = json.load(f)
f.close()
Then I go through it, looking for a specific keyword, if I find that keyword. I'll write everything related to that in a .csv file.
for item in data:
if "light" in item:
write_light_csv('light.csv', item)
This is my write_light_csv function :
def write_light_csv(filename,dic):
with open (filename,'a') as csvfile:
headers = ['TimeStamp', 'light','Proximity']
writer = csv.DictWriter(csvfile, delimiter=',', lineterminator='\n',fieldnames=headers)
writer.writeheader()
writer.writerow({'TimeStamp': dic['ts'], 'light' : dic['light'],'Proximity' : dic['prox']})
I initially had wb+ as the mode, but that cleared everything each time the file was opened for writing. I replaced that with a and now every time it writes, it adds a header. How do I make sure that header is only written once?.

You could check if file is already exists and then don't call writeheader() since you're opening the file with an append option.
Something like that:
import os.path
file_exists = os.path.isfile(filename)
with open (filename, 'a') as csvfile:
headers = ['TimeStamp', 'light', 'Proximity']
writer = csv.DictWriter(csvfile, delimiter=',', lineterminator='\n',fieldnames=headers)
if not file_exists:
writer.writeheader() # file doesn't exist yet, write a header
writer.writerow({'TimeStamp': dic['ts'], 'light': dic['light'], 'Proximity': dic['prox']})

Just another way:
with open(file_path, 'a') as file:
w = csv.DictWriter(file, my_dict.keys())
if file.tell() == 0:
w.writeheader()
w.writerow(my_dict)

You can check if the file is empty
import csv
import os
headers = ['head1', 'head2']
for row in interator:
with open('file.csv', 'a') as f:
file_is_empty = os.stat('file.csv').st_size == 0
writer = csv.writer(f, lineterminator='\n')
if file_is_empty:
writer.writerow(headers)
writer.writerow(row)

I would use some flag and run a check before writing headers! e.g.
flag=0
def get_data(lst):
for i in lst:#say list of url
global flag
respons = requests.get(i)
respons= respons.content.encode('utf-8')
respons=respons.replace('\\','')
print respons
data = json.loads(respons)
fl = codecs.open(r"C:\Users\TEST\Desktop\data1.txt",'ab',encoding='utf-8')
writer = csv.DictWriter(fl,data.keys())
if flag==0:
writer.writeheader()
writer.writerow(data)
flag+=1
print "You have written % times"%(str(flag))
fl.close()
get_data(urls)

Can you change the structure of your code and export the whole file at once?
def write_light_csv(filename, data):
with open (filename, 'w') as csvfile:
headers = ['TimeStamp', 'light','Proximity']
writer = csv.DictWriter(csvfile, delimiter=',', lineterminator='\n',fieldnames=headers)
writer.writeheader()
for item in data:
if "light" in item:
writer.writerow({'TimeStamp': item['ts'], 'light' : item['light'],'Proximity' : item['prox']})
write_light_csv('light.csv', data)

You can use the csv.Sniffer Class and
with open('my.csv', newline='') as csvfile:
if csv.Sniffer().has_header(csvfile.read(1024))
# skip writing headers

While using Pandas: (for storing Dataframe data to CSV file)
just add this check before setting header property if you are using an index to iterate over API calls to add data in CSV file.
if i > 0:
dataset.to_csv('file_name.csv',index=False, mode='a', header=False)
else:
dataset.to_csv('file_name.csv',index=False, mode='a', header=True)

Here's another example that only depends on Python's builtin csv package. This method checks that the header is what's expected or it throws an error. It also handles the case where the file doesn't exist or does exist but is empty by writing the header. Hope this helps:
import csv
import os
def append_to_csv(path, fieldnames, rows):
is_write_header = not os.path.exists(path) or _is_empty_file(path)
if not is_write_header:
_assert_field_names_match(path, fieldnames)
_append_to_csv(path, fieldnames, rows, is_write_header)
def _is_empty_file(path):
return os.stat(path).st_size == 0
def _assert_field_names_match(path, fieldnames):
with open(path, 'r') as f:
reader = csv.reader(f)
header = next(reader)
if header != fieldnames:
raise ValueError(f'Incompatible header: expected {fieldnames}, '
f'but existing file has {header}')
def _append_to_csv(path, fieldnames, rows, is_write_header: bool):
with open(path, 'a') as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
if is_write_header:
writer.writeheader()
writer.writerows(rows)
You can test this with the following code:
file_ = 'countries.csv'
fieldnames_ = ['name', 'area', 'country_code2', 'country_code3']
rows_ = [
{'name': 'Albania', 'area': 28748, 'country_code2': 'AL', 'country_code3': 'ALB'},
{'name': 'Algeria', 'area': 2381741, 'country_code2': 'DZ', 'country_code3': 'DZA'},
{'name': 'American Samoa', 'area': 199, 'country_code2': 'AS', 'country_code3': 'ASM'}
]
append_to_csv(file_, fieldnames_, rows_)
If you run this once you get the following in countries.csv:
name,area,country_code2,country_code3
Albania,28748,AL,ALB
Algeria,2381741,DZ,DZA
American Samoa,199,AS,ASM
And if you run it twice you get the following (note, no second header):
name,area,country_code2,country_code3
Albania,28748,AL,ALB
Algeria,2381741,DZ,DZA
American Samoa,199,AS,ASM
Albania,28748,AL,ALB
Algeria,2381741,DZ,DZA
American Samoa,199,AS,ASM
If you then change the header in countries.csv and run the program again, you'll get a value error, like this:
ValueError: Incompatible header: expected ['name', 'area', 'country_code2', 'country_code3'], but existing file has ['not', 'right', 'fieldnames']

DictReader, No quotes, tabbed file

I have a csv file that looks like this:
Please note, there are no quotes, a tab (\t) is the delimiter, and there is a blank line between the header and the actual content.
Facility No Testing No Name Age
252 2351 Jackrabbit, Jazz 15
345 257 Aardvark, Ethel 41
I think I've tried nearly every possible combination of ideas and parameters
f = open('/tmp/test', 'r')
csvFile = f.read()
reader = csv.DictReader(csvFile, delimiter='\t', quoting=csv.QUOTE_NONE)
print reader.fieldnames
the result of the print is:
['F']
How can I get this into something I can parse to put into a database?
Getting it into a dictionary would be helpful.

What is your csvFile? Is it a string representing your filename starting with 'F'?
csv.DictReader needs an opened file object, not a filename.
Try:
with open(csvFile, 'rb') as f:
reader = csv.DictReader(f, delimiter='\t', quoting=csv.QUOTE_NONE)
print reader.fieldnames
EDIT
If your csvFile is a string containing the whole data, you will have to convert it into a StringIO (because csv can access only file-like objects, not strings).
Try:
from cStringIO import StringIO
# csvFile = 'Facility No\tTesting No\tName\tAge\n\n252\t2351\tJackrabbit, Jazz\t15\n345\t257\tAardvark, Ethel\t41\n'
reader = csv.DictReader(StringIO(csvFile), delimiter='\t', quoting=csv.QUOTE_NONE)
print reader.fieldnames
Or, if your edited question opens and reads a file:
with open('/tmp/test', 'rb') as f:
reader = csv.DictReader(f, delimiter='\t', quoting=csv.QUOTE_NONE)
print reader.fieldnames
This works for me.

this might work for you, at least as a start:
>>> import csv
>>> input = open('/tmp/csvtemp.csv')
>>> csvin = csv.reader(input, delimiter='\t')
>>> data = [row for row in csvin]
>>> header = data.pop(0)
>>> data.pop(0) # skip blank line
[]
>>> for row in data:
... rowdict = dict(zip(header, row))
... print rowdict
...
{'Age': '15', 'Testing No': '2351', 'Name': 'Jackrabbit, Jazz', 'Facility No': '252'}
{'Age': '41', 'Testing No': '257', 'Name': 'Aardvark, Ethel', 'Facility No': '345'}

From the comments I understand that you get your data via urllib2. response is a file-like object; you could pass it directly to csv.DictReader:
response = urllib2.urlopen(URL)
reader = csv.DictReader(response, dialect=csv.excel_tab)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to gzip query results-Python/Airflow - python

Related

How to create a csv file virtually without any physical file / path defined?

Write sql data to csv from python API

Save MS Access tables as CSV

python csv, writing headers only once

DictReader, No quotes, tabbed file

Categories

Resources