Taking *ANY csv column format into Django form - python

My code is currently hard coded to only accept csv files in the column format:
first_name,last_name,phone,email,address,company
However I would like users to be able to upload csv files that are in any* format order and naming scheme and correctly populate our forms. For example:
Email,LastName,FirstName,Company,Phone,Address
would be a valid column format. How would I go about doing that? Relevant code as follows:
dr = csv.reader(open(tmp_file,'r'))
data_dict = {}
headers = next(dr)
print (headers)
#skips over first line in csv
iterlines = iter(dr)
next(iterlines)
for row in iterlines:
data_dict["first_name"] = row[0]
#print(data_dict["first_name"])
data_dict["last_name"] = row[1]
#print(data_dict["last_name"])
data_dict["phone"] = row[2]
#print(data_dict["phone"])
data_dict["email"] = row[3]
#print(data_dict["email"])
data_dict["address"] = row[4]
#print(data_dict["address"])
data_dict["company"] = row[5]
#print(data_dict["company"])
#adds to form
try:
form = AddContactForm(data_dict)
if form.is_valid():
obj = form.save(commit=False)
obj.owner = request.user.username
first_name = form.cleaned_data.get(data_dict["first_name"])
last_name = form.cleaned_data.get(data_dict["last_name"])
phone = form.cleaned_data.get(data_dict["phone"])
email = form.cleaned_data.get(data_dict["email"])
address = form.cleaned_data.get(data_dict["address"])
company = form.cleaned_data.get(data_dict["company"])
obj.save()
else:
logging.getLogger("error_logger").error(form.errors.as_json())
except Exception as e:
logging.getLogger("error_logger").error(repr(e))
pass

headers = "first_name,last_name,email"
headers_array = headers.split(',')
headers_map = {}
for i, column_name in enumerate(headers_array):
headers_map[column_name] = i
#creates {'first_name': 0, 'last_name': 1, 'email': 2}
Now you can now use headers_map to get the row element
row[headers_map['first_name']]
Edit: For those loving one liners
headers_map = {column_name: i for i, column_name in enumerate(headers.split(','))}

There are a number of approaches to handling inconsistent header names in the file. The best approach is to prevent it by rejecting such files at upload time, obliging the uploader to correct them. Assuming this isn't possible, you could try to transform the provided headers into what you want
import csv
import io
import re
with open(tmp_file, 'r') as f:
reader = csv.reader
headers = next(reader)
# Make a new header list with placeholders
fixed_headers = [None * len(headers)]
for i, value in enumerate(headers)
fixed = re.sub(r'(\w+)(?<=[a-z])([A-Z]\w+)', r'\1_\2', v).lower()
new_headers[i] = fixed
The regex finds capital letters in the middle of strings and inserts an underscore; then str.lower is called on the result (so values like 'Email' will be converted to 'email'.
Now rewrite the csv with the fixed headers:
with open(tmp_file, 'r') as f:
reader = csv.reader(f)
next(reader)
new_file = io.StringIO()
writer = csv.writer(new_file)
writer.writerow(fixed_headers)
for row in reader:
writer.writerow(row)
# Rewind the file pointer
new_file.seek(0)
Use csv.DictReader to get rows as dictionaries of values mapped to headers.
dr = csv.DictReader(new_file)
for data_dict in dr:
#adds to form
try:
form = AddContactForm(data_dict)
if form.is_valid():
obj = form.save(commit=False)
obj.owner = request.user.username
first_name = form.cleaned_data.get(data_dict["first_name"])
last_name = form.cleaned_data.get(data_dict["last_name"])
phone = form.cleaned_data.get(data_dict["phone"])
email = form.cleaned_data.get(data_dict["email"])
address = form.cleaned_data.get(data_dict["address"])
company = form.cleaned_data.get(data_dict["company"])
obj.save()
else:
logging.getLogger("error_logger").error(form.errors.as_json())
except Exception as e:
logging.getLogger("error_logger").error(repr(e))
pass

Related

Extracting case insensitive words from a list in django

I'm extracting values from a csv file and storing these in a list.
The problem I have is that unless there is an exact match the elements/strings don't get extracted. How would I go about a case insensitive list search in Django/Python?
def csv_upload_view(request):
print('file is being uploaded')
if request.method == 'POST':
csv_file_name = request.FILES.get('file')
csv_file = request.FILES.get('file')
obj = CSV.objects.create(file_name=csv_file)
result = []
with open(obj.file_name.path, 'r') as f:
f.readline()
reader = csv.reader(f)
#reader.__next__()
for row in reader:
data = str(row).strip().split(',')
result.append(data)
transaction_id = data[1]
product = data[2]
quantity = data[3]
customer = data[4]
date = parse_date(data[5])
try:
product_obj = Product.objects.get(name__iexact=product)
except Product.DoesNotExist:
product_obj = None
print(product_obj)
return HttpResponse()
Edit:
the original code that for some reason doesn't work for me contained the following iteration:
for row in reader:
data = "".join(row)
data = data.split(';')
data.pop()
which allows to work with extracted string elements per row. The way I adopted the code storing the elements in a list (results=[]) makes it impossible to access the elements via the product models with Django.
The above mentioned data extraction iteration was from a Macbook while I'm working with a Windows 11 (wsl2 Ubuntu2204), is this the reason that the Excel data needs to be treated differently?
Edit 2:
Ok, I just found this
If your export file is destined for use on a Macintosh, you should choose the second CSV option. This option results in a CSV file where each record (each line in the file) is terminated with a carriage return, as expected by the Mac
So I guess I need to create a csv file in Mac format to make the first iteration work. Is there a way to make both csv (Windows/Mac) be treated the same? Similar to the mentioned str(row).strip().lower().split(',') suggestion?
If what you're trying to do is simply search for a string case insensitive then all you gotta do is lower the case of your search and your query (or upper).
Here's a revised code
def csv_upload_view(request):
print('file is being uploaded')
if request.method == 'POST':
csv_file_name = request.FILES.get('file')
csv_file = request.FILES.get('file')
obj = CSV.objects.create(file_name=csv_file)
result = []
with open(obj.file_name.path, 'r') as f:
f.readline()
reader = csv.reader(f)
#reader.__next__()
for row in reader:
data = str(row).strip().lower().split(',')
result.append(data)
_, transaction_id, product, quantity, customer, date, *_ = data
date = parse_date(date)
try:
product_obj = Product.objects.get(name__iexact=product)
except Product.DoesNotExist:
product_obj = None
print(product_obj)
return HttpResponse()
Then when you're trying to store the data make sure to store it lowercase.
Also, do not split a csv file on ,. Instead use the Python's CSV library to open a csv file, since the data might contain ,. Make sure to change csv.QUOTE so that it encapsulates everything with ".

How to print only a the content of a cell in a specific row from a csv file in Python

I'm new to Python so excuse me if my question is kind of dumb.
I send some data into a csv file (I'm making a password manager). So I send this to this file (in this order), the name of the site, the e-mail corresponding and finally the password.
But I would like to print all the names already written in the csv file but here is my problem, for the first row it does print the whole row but for the following rows it works just well.
Here is my code, I hope u can help me with this.
csv_file = csv.reader(open('mycsvfile.csv', 'r'), delimiter=';')
try :
print("Here are all the sites you saved :")
for row in csv_file :
print(row[0])
except :
print("Nothing already saved")
Maybe it can help, but here is how I wrote my data into the csv file:
#I encrypt the email and the password thanks to fernet and an already written key
#I also make sure that the email is valid
file = open('key.key', 'rb')
key = file.read()
file.close()
f = Fernet(key)
website = input("web site name : \n")
restart = True
while restart :
mail = input("Mail:\n")
a = isvalidEmail(mail)
if a == True :
print("e-mail validated")
restart = False
else :
print("Wrong e-mail")
pws = input("password :\n")
psw_bytes = psw.encode()
mail_bytes = mail.encode()
psw_encrypted_in_bytes = f.encrypt(psw_bytes)
mail_encrypted_in_bytes = f.encrypt(mail_bytes)
mail_encrypted_str = mail_encrypted_in_bytes.decode()
psw_encrypted_str = psw_encrypted_in_bytes.decode()
f = open('a.csv', 'a', newline='')
tup1 = (website, mail_encrypted_str, psw_encrypted_str)
writer = csv.writer(f, delimiter = ';')
writer.writerow(tup1)
print("Saved ;)")
f.close()
return
And here is my output (I have already saved data)
Output (First, you see the name of the ws with the email and the psw encrypted then just the name which is what I want
I finally succeed, instead of using a csv.Reader, i used a csv.DictReader and as all the names i'm looking for are on the same column, i juste have to use the title of the columns.
So here is the code :
with open('mycsv.csv', newline='') as csvfile:
data = csv.DictReader(csvfile)
print("Websites")
print("---------------------------------")
for row in data:
print(row['The_title_of_my_column'])
make list from csv.reader()
rows = [row for row in csv_file]
and now you can get element by identifier using rows as list of lists
rows[id1][id2]

replace('\n','') does not work for import to .txt format

The place where I put this code belongs to an import page.And here there is data in the data I want to import in .txt format, but this data contains the \n character.
if request.method == "POST":
txt_file = request.FILES['file']
if not txt_file .name.endswith('.txt'):
messages.info(request,'This is not a txt file')
data_set = csv_file.read().decode('latin-1')
io_string = io.StringIO(data_set)
next(io_string)
csv_reader = csv.reader(io_string, delimiter='\t',quotechar="|")
for column in csv_reader:
b = Module_Name(
user= request.user,
a = column[1],
b = column[2],
c = column[3],
d = column[4],
e = column[5],
f = column[6],
g = column[7],
h = column[8],
)
b.save()
messages.success(request,"Successfully Imported...")
return redirect("return:return_import")
This can be called the full version of my code. To explain, there is a \n character in the data that comes here as column[1]. This file is a .txt file from another export. And in this export column[1];
This is
a value
and my django localhost new-line character seen in unquoted field - do you need to open the file in universal-newline mode? gives a warning and aborts the import to the system.
the csv reader iterates over rows, not columns. So if you want to append the data from a given column together, you must iterate over all the rows first. For example:
import csv
from io import StringIO
io_string = "this is , r0 c1\r\na value, r1 c2\r\n"
io_string = StringIO(io_string)
rows = csv.reader(io_string)
column_0_data = []
for row in rows:
column_0_data.append(row[0])
print("".join(column_0_data))
the rest of your code looks iffy to me, but that is off topic.

extract record by csv and filtering by date

I have a csv file where each record is a LinkedIn contact. I have to recreate another csv file where each contact it was reached only after a specific date (ex all the contact that are connected to me after 1/04/2017).
So this is my implementation:
def import_from_csv(file):
key_order = ("FirstName","LastName","EmailAddress","Company","ConnectedOn")
linkedin_contacts = []
with open(file, encoding="utf8") as csvfile:
reader=csv.DictReader(csvfile, delimiter=',')
for row in reader:
single_person = {"FirstName": row["FirstName"], "LastName": row["LastName"],
"EmailAddress": row["EmailAddress"], "Company": row["Company"],
"ConnectedOn": parser.parse(row["ConnectedOn"])}
od = OrderedDict((k, single_person[k]) for k in key_order)
linkedin_contacts.append(od)
return linkedin_contacts
the first script give to me a list of ordered dict, i dont know if the way i used to achive the correct order is good, also seeing some example (like here) i'm not using the od.update method, but i dont think i need it, is it correct?
Now i wrote a second function to filter the list:
def filter_by_date(connections):
filtered_list = []
target_date = parser.parse("01/04/2017")
for row in connections:
if row["ConnectedOn"] > target_date:
filtered_list.append(row)
return filtered_list
Am I doing this correctly?
Is there a way to optimize the code? Thanks
First point: you don't need the OrderedDict at all, just use a csv.DictWriter to write the filtered csv.
fieldnames = ("FirstName","LastName","EmailAddress","Company","ConnectedOn")
with open("/apth/to/final.csv", "wb") as f:
writer = csv.DictWriter(f, fieldnames)
writer.writeheader()
writer.writerows(filtered_contacts)
Second point: you don't need to create a new dict from the one yielded by the csv reader, just update the ConnectedOn key in place :
def import_from_csv(file):
linkedin_contacts = []
with open(file, encoding="utf8") as csvfile:
reader=csv.DictReader(csvfile, delimiter=',')
for row in reader:
row["ConnectedOn"] = parser.parse(row["ConnectedOn"])
linkedin_contacts.append(row)
return linkedin_contacts
And finally, if all you have to do is take the source csv, filter out records on ConnectedOn and write the result, you don't need to load the whole source in memory, create a filtered list (in memory again) and write the filtered list, you can stream the whole operation:
def filter_csv(source_path, dest_path, date):
fieldnames = ("FirstName","LastName","EmailAddress","Company","ConnectedOn")
target = parser.parse(date)
with open(source_path, "rb") as source, open(dest_path, "wb") as dest:
reader = csv.DictReader(source)
writer = csv.DictWriter(dest, fieldnames)
# if you want a header line with the fieldnames - else comment it out
writer.writeheaders()
for row in reader:
row_date = parser.parse(row["ConnectedOn"])
if row_date > target:
writer.writerow(row)
And here you are, plain and simple.
NB : I don't know what "parser.parse()" is but as others answers mention, you'd probably be better using the datetime module instead.
For filtering you could use filter() function:
def filter_by_date(connections):
target_date = datetime.strptime("01/04/2017", '%Y/%m/%d').date()
return list(filter(lambda x: x["ConnectedOn"] > target_date, connections))
And instead of creating simple dict and then fill its values into OrderedDict you could write values directly to the OrderedDict:
for row in reader:
od = OrderedDict()
od["FirstName"] = row["FirstName"]
od["LastName"] = row["LastName"]
od["EmailAddress"] = row["EmailAddress"]
od["Company"] = row["Company"]
od["ConnectedOn"] = datetime.strptime(row["ConnectedOn"], '%Y/%m/%d').date()
linkedin_contacts.append(od)
If you know date format you don't need python_dateutil, you could use built-in datetime.datetime.strptime() with needed format.
Because you don't precise the format string.
Use :
from datetime import datetime
format = '%d/%m/%Y'
date_text = '01/04/2017'
# inverse by datetime.strftime(format)
datetime.strptime(date_text, format)
#....
# with format as global
for row in reader:
od = OrderedDict()
od["FirstName"] = row["FirstName"]
od["LastName"] = row["LastName"]
od["EmailAddress"] = row["EmailAddress"]
od["Company"] = row["Company"]
od["ConnectedOn"] = strptime(row["ConnectedOn"], format)
linkedin_contacts.append(od)
Do:
def filter_by_date(connections, date_text):
target_date = datetime.strptime(date_text, format)
return [x for x in connections if x["ConnectedOn"] > target_dat]

Removal of Dictionary from List

I have a CSV file I read in as a list of dictionaries for each line. I want to remove all entries in the list that have an EmailAddress of ''. I've tried:
#!/usr/bin/python
import csv
def import_users(location_of_file):
with open(location_of_file, 'r', newline='', encoding='utf-8-sig') as openfile:
reader = csv.DictReader(openfile)
for row in reader:
yield row
def save_csv(data, location):
with open(location, 'w', newline='', encoding='utf-8-sig') as file:
fieldnames = ['EmailAddress', 'GivenName', 'Surname', 'Company', 'Department']
writer = csv.DictWriter(file, fieldnames=fieldnames)
writer.writeheader()
for item in data:
writer.writerow(item)
if __name__ == '__main__':
users = list(import_users('C:\Temp\Example.csv'))
for user in users:
if user['EmailAddress'] == '':
del user
else:
pass
save_csv(users, 'C:\Temp\Output.csv')
But my results still have the entries with no email address. What am I doing wrong?
Iterating over a data structure that you are modifying is bad practice (and will lead to super annoying bugs). So you should make another list containing only the items you want. You can do this with a loop:
users = list(import_users('C:\Temp\Example.csv'))
filtered_users = []
for user in users:
if user['EmailAddress'] == '':
filtered_users.append(user)
save_csv(filtered_users, 'C:\Temp\Output.csv')
Or using pythons filter function:
users = list(import_users('C:\Temp\Example.csv'))
filtered_users = filter(lambda user: user.get('EmailAddress') != '', users)
save_csv(filtered_users, 'C:\Temp\Output.csv')
You don't need to ever create the list in memory. You can pass around generators and iterators instead:
if __name__ == '__main__':
users = import_users('C:/Temp/Example.csv')
save_csv((user for user in users if user['EmailAddress'] != ''),
'C:/Temp/Output.csv')
Do not change list items while irritating over them.
Instead
for user in users:
if user['EmailAddress'] == '':
del user
else:
pass
Do
users = filter(lambda user: user['EmailAddress'] != '', users)
You would probably be better off making a new list rather than removing items:
users = [user for user in users if user['EmailAddress'] != '']
Here is a solution using pandas:
Example data:
import pandas as pd
#Read csv data
df = pd.read_csv('data.csv')
#Get only the rows having email address
dfo = df[pd.notnull(df['EmailAddress'])]
#Save to a file
dfo.to_csv('output.csv', index=False)
Example output:

Categories