How to write row names while exporting django querysets to csv? - python

The below function exporting into csv properly but one thing missing is the header of those value.
How to write header of those values ?
I have quite a large queryset so i am using StreamingHttpResponse as suggested by django documentation but couldn't find the solution for adding a header of these values.
class Echo:
def write(self, value):
return value
def some_streaming_csv_view(request):
rows = MyModel.objects.values_list("value1", "value2")
pseudo_buffer = Echo()
writer = csv.writer(pseudo_buffer)
writer.writerow(["name1", "name2"]) # didn't work
response = StreamingHttpResponse(
(writer.writerow(row) for row in rows),
status=200,
content_type="text/csv",
)
response["Content-Disposition"] ='attachment; filename=filname.csv'
return response

To be able to add the header part you are trying, you should write them to the pseudo_buffer and yield them.
def iter_content(rows, headers):
pseudo_buffer = Echo()
writer = csv.writer(pseudo_buffer)
yield pseudo_buffer.write(headers)
for row in rows:
yield writer.writerow(row)
and then use it like this:
response = StreamingHttpResponse(
(iter_content(rows, headers)),
status=200,
content_type="text/csv",
)

Related

Extracting case insensitive words from a list in django

I'm extracting values from a csv file and storing these in a list.
The problem I have is that unless there is an exact match the elements/strings don't get extracted. How would I go about a case insensitive list search in Django/Python?
def csv_upload_view(request):
print('file is being uploaded')
if request.method == 'POST':
csv_file_name = request.FILES.get('file')
csv_file = request.FILES.get('file')
obj = CSV.objects.create(file_name=csv_file)
result = []
with open(obj.file_name.path, 'r') as f:
f.readline()
reader = csv.reader(f)
#reader.__next__()
for row in reader:
data = str(row).strip().split(',')
result.append(data)
transaction_id = data[1]
product = data[2]
quantity = data[3]
customer = data[4]
date = parse_date(data[5])
try:
product_obj = Product.objects.get(name__iexact=product)
except Product.DoesNotExist:
product_obj = None
print(product_obj)
return HttpResponse()
Edit:
the original code that for some reason doesn't work for me contained the following iteration:
for row in reader:
data = "".join(row)
data = data.split(';')
data.pop()
which allows to work with extracted string elements per row. The way I adopted the code storing the elements in a list (results=[]) makes it impossible to access the elements via the product models with Django.
The above mentioned data extraction iteration was from a Macbook while I'm working with a Windows 11 (wsl2 Ubuntu2204), is this the reason that the Excel data needs to be treated differently?
Edit 2:
Ok, I just found this
If your export file is destined for use on a Macintosh, you should choose the second CSV option. This option results in a CSV file where each record (each line in the file) is terminated with a carriage return, as expected by the Mac
So I guess I need to create a csv file in Mac format to make the first iteration work. Is there a way to make both csv (Windows/Mac) be treated the same? Similar to the mentioned str(row).strip().lower().split(',') suggestion?
If what you're trying to do is simply search for a string case insensitive then all you gotta do is lower the case of your search and your query (or upper).
Here's a revised code
def csv_upload_view(request):
print('file is being uploaded')
if request.method == 'POST':
csv_file_name = request.FILES.get('file')
csv_file = request.FILES.get('file')
obj = CSV.objects.create(file_name=csv_file)
result = []
with open(obj.file_name.path, 'r') as f:
f.readline()
reader = csv.reader(f)
#reader.__next__()
for row in reader:
data = str(row).strip().lower().split(',')
result.append(data)
_, transaction_id, product, quantity, customer, date, *_ = data
date = parse_date(date)
try:
product_obj = Product.objects.get(name__iexact=product)
except Product.DoesNotExist:
product_obj = None
print(product_obj)
return HttpResponse()
Then when you're trying to store the data make sure to store it lowercase.
Also, do not split a csv file on ,. Instead use the Python's CSV library to open a csv file, since the data might contain ,. Make sure to change csv.QUOTE so that it encapsulates everything with ".

passing value from panda dataframe to http request

I'm not sure how I should ask this question. I'm looping through a csv file using panda (at least I think so). As I'm looping through rows, I want to pass a value from a specific column to run an http request for each row.
Here is my code so far:
def api_request(request):
fs = gcsfs.GCSFileSystem(project=PROJECT)
with fs.open('gs://project.appspot.com/file.csv') as f:
df = pd.read_csv(f,)
value = df[['ID']].to_string(index=False)
print(value)
response = requests.get(REQUEST_URL + value,headers={'accept': 'application/json','ClientToken':TOKEN }
)
json_response = response.json()
print(json_response)
As you can see, I'm looping through the csv file to get the ID to pass it to my request url.
I'm not sure I understand the issue but looking at the console log it seems that print(value) is in the loop when the response request is not. In other words, in the console log I'm seeing all the ID printed but I'm seeing only one http request which is empty (probably because the ID is not correctly passed to it).
I'm running my script with cloud functions.
Actually, forgo the use of the Pandas library and simply iterate through csv
import csv
def api_request(request):
fs = gcsfs.GCSFileSystem(project=PROJECT)
with fs.open('gs://project.appspot.com/file.csv') as f:
reader = csv.reader(f)
next(reader, None) # SKIP HEADERS
for row in reader: # LOOP THROUGH GENERATOR (NOT PANDAS SERIES)
value = row[0] # SELECT FIRST COLUMN (ASSUMED ID)
response = requests.get(
REQUEST_URL + value,
headers={'accept': 'application/json', 'ClientToken': TOKEN }
)
json_response = response.json()
print(json_response)
Give this a try instead:
def api_request(request):
fs = gcsfs.GCSFileSystem(project=PROJECT)
with fs.open('gs://project.appspot.com/file.csv') as f:
df = pd.read_csv(f)
for value in df['ID']:
response = requests.get(
REQUEST_URL + value,
headers = {'accept': 'application/json', 'ClientToken': TOKEN }
)
json_response = response.json()
print(json_response)
As mentioned in my comment, you haven't iterated through the data. What you are seeing is just the string representation of it with linebreaks (which might be why you mistakenly thought to be looping).

proper use of class (csv reader example)

I've done the following CSV reader class:
class CSVread(object):
filtered = []
def __init__(self, file):
self.file = file
def get_file(self):
try:
with open(self.file, "r") as f:
self.reader = [row for row in csv.reader(f, delimiter = ";")]
return self.reader
except IOError as err:
print("I/O error({0}): {1}".format(errno, strerror))
return
def get_num_rows(self):
print(sum(1 for row in self.reader))
Which can be used with the following example:
datacsv = CSVread("data.csv") # ; seperated file
for row in datacsv.get_file(): # prints all the rows
print(row)
datacsv.get_num_rows() # number of rows in data.csv
My goal is to filter out the content of the csv file (data.csv) by filtering column 12 by the keyword "00GG". I can get it to work outside the class like this:
with open("data.csv") as csvfile:
reader = csv.reader(csvfile, delimiter = ";")
filtered = []
filtered = filter((lambda row: row[12] in ("00GG")), list(reader))
Code below returns an empty list (filtered) when it's defined inside the class:
def filter_data(csv_file):
filtered = filter((lambda row: row[12] in ("00GGL")), self.reader)
return filtered
Feedback for the existing code is also appreciated.
Could it be that in the first filter example you are searching for 00GG whereas in the second one you are searching for 00GGL?
Regardless, if you want to define filter_data() within the class you should write is as a method of the class. That means that it takes a self parameter, not a csv_file:
def filter_data(self):
filtered = filter((lambda row: row[12] in ("00GGL")), self.reader)
return filtered
Making it more general:
def filter_data(self, column, values):
return filter((lambda row: row[column] in values), self.reader)
Now you can call it like this:
datacsv.filter_data(12, ('00GGL',))
which should work if the input data does indeed contain rows with 00GGL in column 12.
Note that filter_data() should only be called after get_file() otherwise there is no self.reader. Unless you have a good reason not to read in the data when the CSVread object is created (e.g. you are aiming for lazy evaluation), you should read it in then. Otherwise, set self.reader = [] which will prevent failure in other methods.

How to return one generated line from csv?

I'd like to yield results from a large array (coming from the database) to the browser (with Flask) using the method shared in their documentation :
#app.route('/large.csv')
def generate_large_csv():
def generate():
for row in iter_all_rows():
yield ','.join(row) + '\n'
return Response(generate(), mimetype='text/csv')
With a twist : Instead of generating the csv myself (join with ',', adding a breakline), I'd like to use the csv package.
Now, the only way I found to return only one written line is to do the following :
#app.route('/large.csv')
def generate_large_csv():
def generate():
for row in iter_all_rows():
dest = io.StringIO()
writer = csv.writer(dest)
writer.writerow(row)
yield dest.getvalue()
return Response(generate(), mimetype='text/csv')
But creating a new io.StringIO & csv.writer for every row just does not seems right at all!
I took a look at the documentation of the package, but I wasn't able to find something that would only return one line.
You can to it easily with a custom file object. If you create an object with a write method that simply stores its input and give it to a csv writer, it is done :
class keep_writer:
def write(self, txt):
self.txt = txt
#app.route('/large.csv')
def generate_large_csv():
def generate():
kw = keep_writer()
wr = csv.writer(kw) # add optional configuration for the csv.writer
for row in iter_all_rows():
wr.writerow(row) # just write the row
yield kw.txt # and yield the line build by the csv.writer
return Response(generate(), mimetype='text/csv')

Streaming a CSV file in Django

I am attempting to stream a csv file as an attachment download. The CSV files are getting to be 4MB in size or more, and I need a way for the user to actively download the files without waiting for all of the data to be created and committed to memory first.
I first used my own file wrapper based on Django's FileWrapper class. That failed. Then I saw a method here for using a generator to stream the response:
How to stream an HttpResponse with Django
When I raise an error within the generator, I can see that I am creating the proper data with the get_row_data() function, but when I try to return the response it comes back empty. I've also disabled the Django GZipMiddleware. Does anyone know what I'm doing wrong?
Edit: The issue I was having was with the ConditionalGetMiddleware. I had to replace it, the code is in an answer below.
Here is the view:
from django.views.decorators.http import condition
#condition(etag_func=None)
def csv_view(request, app_label, model_name):
""" Based on the filters in the query, return a csv file for the given model """
#Get the model
model = models.get_model(app_label, model_name)
#if there are filters in the query
if request.method == 'GET':
#if the query is not empty
if request.META['QUERY_STRING'] != None:
keyword_arg_dict = {}
for key, value in request.GET.items():
#get the query filters
keyword_arg_dict[str(key)] = str(value)
#generate a list of row objects, based on the filters
objects_list = model.objects.filter(**keyword_arg_dict)
else:
#get all the model's objects
objects_list = model.objects.all()
else:
#get all the model's objects
objects_list = model.objects.all()
#create the reponse object with a csv mimetype
response = HttpResponse(
stream_response_generator(model, objects_list),
mimetype='text/plain',
)
response['Content-Disposition'] = "attachment; filename=foo.csv"
return response
Here is the generator I use to stream the response:
def stream_response_generator(model, objects_list):
"""Streaming function to return data iteratively """
for row_item in objects_list:
yield get_row_data(model, row_item)
time.sleep(1)
And here is how I create the csv row data:
def get_row_data(model, row):
"""Get a row of csv data from an object"""
#Create a temporary csv handle
csv_handle = cStringIO.StringIO()
#create the csv output object
csv_output = csv.writer(csv_handle)
value_list = []
for field in model._meta.fields:
#if the field is a related field (ForeignKey, ManyToMany, OneToOne)
if isinstance(field, RelatedField):
#get the related model from the field object
related_model = field.rel.to
for key in row.__dict__.keys():
#find the field in the row that matches the related field
if key.startswith(field.name):
#Get the unicode version of the row in the related model, based on the id
try:
entry = related_model.objects.get(
id__exact=int(row.__dict__[key]),
)
except:
pass
else:
value = entry.__unicode__().encode("utf-8")
break
#if it isn't a related field
else:
#get the value of the field
if isinstance(row.__dict__[field.name], basestring):
value = row.__dict__[field.name].encode("utf-8")
else:
value = row.__dict__[field.name]
value_list.append(value)
#add the row of csv values to the csv file
csv_output.writerow(value_list)
#Return the string value of the csv output
return csv_handle.getvalue()
Here's some simple code that'll stream a CSV; you can probably go from this to whatever you need to do:
import cStringIO as StringIO
import csv
def csv(request):
def data():
for i in xrange(10):
csvfile = StringIO.StringIO()
csvwriter = csv.writer(csvfile)
csvwriter.writerow([i,"a","b","c"])
yield csvfile.getvalue()
response = HttpResponse(data(), mimetype="text/csv")
response["Content-Disposition"] = "attachment; filename=test.csv"
return response
This simply writes each row to an in-memory file, reads the row and yields it.
This version is more efficient for generating bulk data, but be sure to understand the above before using it:
import cStringIO as StringIO
import csv
def csv(request):
csvfile = StringIO.StringIO()
csvwriter = csv.writer(csvfile)
def read_and_flush():
csvfile.seek(0)
data = csvfile.read()
csvfile.seek(0)
csvfile.truncate()
return data
def data():
for i in xrange(10):
csvwriter.writerow([i,"a","b","c"])
data = read_and_flush()
yield data
response = HttpResponse(data(), mimetype="text/csv")
response["Content-Disposition"] = "attachment; filename=test.csv"
return response
The middleware issue has been solved as of Django 1.5 and a StreamingHttpResponse has been introduced. The following should do:
import cStringIO as StringIO
import csv
def csv_view(request):
...
# Assume `rows` is an iterator or lists
def stream():
buffer_ = StringIO.StringIO()
writer = csv.writer(buffer_)
for row in rows:
writer.writerow(row)
buffer_.seek(0)
data = buffer_.read()
buffer_.seek(0)
buffer_.truncate()
yield data
response = StreamingHttpResponse(
stream(), content_type='text/csv'
)
disposition = "attachment; filename=file.csv"
response['Content-Disposition'] = disposition
return response
There's some documentation on how to output csv from Django but it doesn't take advantage of the StreamingHttpResponse so I went ahead and opened a ticket in order to track it.
The problem I was having was with the ConditionalGetMiddleware. I saw django-piston come up with a replacement middleware for the ConditionalGetMiddleware that allows streaming:
from django.middleware.http import ConditionalGetMiddleware
def compat_middleware_factory(klass):
"""
Class wrapper that only executes `process_response`
if `streaming` is not set on the `HttpResponse` object.
Django has a bad habbit of looking at the content,
which will prematurely exhaust the data source if we're
using generators or buffers.
"""
class compatwrapper(klass):
def process_response(self, req, resp):
if not hasattr(resp, 'streaming'):
return klass.process_response(self, req, resp)
return resp
return compatwrapper
ConditionalMiddlewareCompatProxy = compat_middleware_factory(ConditionalGetMiddleware)
So then you will replace ConditionalGetMiddleware with your ConditionalMiddlewareCompatProxy middleware, and in your view (borrowed code from a clever answer to this question):
def csv_view(request):
def data():
for i in xrange(10):
csvfile = StringIO.StringIO()
csvwriter = csv.writer(csvfile)
csvwriter.writerow([i,"a","b","c"])
yield csvfile.getvalue()
#create the reponse object with a csv mimetype
response = HttpResponse(
data(),
mimetype='text/csv',
)
#Set the response as an attachment with a filename
response['Content-Disposition'] = "attachment; filename=test.csv"
response.streaming = True
return response

Categories