I have a CSV file I read in as a list of dictionaries for each line. I want to remove all entries in the list that have an EmailAddress of ''. I've tried:
#!/usr/bin/python
import csv
def import_users(location_of_file):
with open(location_of_file, 'r', newline='', encoding='utf-8-sig') as openfile:
reader = csv.DictReader(openfile)
for row in reader:
yield row
def save_csv(data, location):
with open(location, 'w', newline='', encoding='utf-8-sig') as file:
fieldnames = ['EmailAddress', 'GivenName', 'Surname', 'Company', 'Department']
writer = csv.DictWriter(file, fieldnames=fieldnames)
writer.writeheader()
for item in data:
writer.writerow(item)
if __name__ == '__main__':
users = list(import_users('C:\Temp\Example.csv'))
for user in users:
if user['EmailAddress'] == '':
del user
else:
pass
save_csv(users, 'C:\Temp\Output.csv')
But my results still have the entries with no email address. What am I doing wrong?
Iterating over a data structure that you are modifying is bad practice (and will lead to super annoying bugs). So you should make another list containing only the items you want. You can do this with a loop:
users = list(import_users('C:\Temp\Example.csv'))
filtered_users = []
for user in users:
if user['EmailAddress'] == '':
filtered_users.append(user)
save_csv(filtered_users, 'C:\Temp\Output.csv')
Or using pythons filter function:
users = list(import_users('C:\Temp\Example.csv'))
filtered_users = filter(lambda user: user.get('EmailAddress') != '', users)
save_csv(filtered_users, 'C:\Temp\Output.csv')
You don't need to ever create the list in memory. You can pass around generators and iterators instead:
if __name__ == '__main__':
users = import_users('C:/Temp/Example.csv')
save_csv((user for user in users if user['EmailAddress'] != ''),
'C:/Temp/Output.csv')
Do not change list items while irritating over them.
Instead
for user in users:
if user['EmailAddress'] == '':
del user
else:
pass
Do
users = filter(lambda user: user['EmailAddress'] != '', users)
You would probably be better off making a new list rather than removing items:
users = [user for user in users if user['EmailAddress'] != '']
Here is a solution using pandas:
Example data:
import pandas as pd
#Read csv data
df = pd.read_csv('data.csv')
#Get only the rows having email address
dfo = df[pd.notnull(df['EmailAddress'])]
#Save to a file
dfo.to_csv('output.csv', index=False)
Example output:
Related
I'm new to Python so excuse me if my question is kind of dumb.
I send some data into a csv file (I'm making a password manager). So I send this to this file (in this order), the name of the site, the e-mail corresponding and finally the password.
But I would like to print all the names already written in the csv file but here is my problem, for the first row it does print the whole row but for the following rows it works just well.
Here is my code, I hope u can help me with this.
csv_file = csv.reader(open('mycsvfile.csv', 'r'), delimiter=';')
try :
print("Here are all the sites you saved :")
for row in csv_file :
print(row[0])
except :
print("Nothing already saved")
Maybe it can help, but here is how I wrote my data into the csv file:
#I encrypt the email and the password thanks to fernet and an already written key
#I also make sure that the email is valid
file = open('key.key', 'rb')
key = file.read()
file.close()
f = Fernet(key)
website = input("web site name : \n")
restart = True
while restart :
mail = input("Mail:\n")
a = isvalidEmail(mail)
if a == True :
print("e-mail validated")
restart = False
else :
print("Wrong e-mail")
pws = input("password :\n")
psw_bytes = psw.encode()
mail_bytes = mail.encode()
psw_encrypted_in_bytes = f.encrypt(psw_bytes)
mail_encrypted_in_bytes = f.encrypt(mail_bytes)
mail_encrypted_str = mail_encrypted_in_bytes.decode()
psw_encrypted_str = psw_encrypted_in_bytes.decode()
f = open('a.csv', 'a', newline='')
tup1 = (website, mail_encrypted_str, psw_encrypted_str)
writer = csv.writer(f, delimiter = ';')
writer.writerow(tup1)
print("Saved ;)")
f.close()
return
And here is my output (I have already saved data)
Output (First, you see the name of the ws with the email and the psw encrypted then just the name which is what I want
I finally succeed, instead of using a csv.Reader, i used a csv.DictReader and as all the names i'm looking for are on the same column, i juste have to use the title of the columns.
So here is the code :
with open('mycsv.csv', newline='') as csvfile:
data = csv.DictReader(csvfile)
print("Websites")
print("---------------------------------")
for row in data:
print(row['The_title_of_my_column'])
make list from csv.reader()
rows = [row for row in csv_file]
and now you can get element by identifier using rows as list of lists
rows[id1][id2]
How do i get the screen names from a list of twitter IDs? I have the IDs saved in a pandas dataframe and have 38194 IDs that i wish to match to their screen names so i can do a network analysis. I am using python, but i am quite new to coding so i do not know if this is even possible? I have tried the following:
myIds = friend_list
if myIds:
myIds = myIds.replace(' ','')
myIds = myIds.split(',')
# Set a new list object
myHandleList = []
i = 0
# Loop trough the list of usernames
for idnumber in myIds:
u = api.get_user(myIds[i])
uid = u.screen_name
myHandleList.append(uid)
i = i+1
# Print the lists
print('Twitter-Ids',myIds)
print('Usernames',myHandleList)
#set a filename based on current time
csvfilename = "csvoutput-"+time.strftime("%Y%m%d%-H%M%S")+".csv"
print('We also outputted a CSV-file named '+csvfilename+' to your file parent directory')
with open(csvfilename, 'w') as myfile:
wr = csv.writer(myfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
wr.writerow(['username','twitter-id'])
j = 0
for handle in myHandleList:
writeline = myHandleList[j],myIds[j]
wr.writerow(writeline)
j = j+1
else:
print('The input was empty')
Updating your loop, as I believe you are pretty close.
myHandleList = []
myIds = ['1031291359', '960442381']
for idnumber in myIds:
u = api.get_user(idnumber)
myHandleList.append(u.screen_name)
print(myHandleList)
Hey I am using easygui and appending the user input to a excel (csv file). However the userinput will continuously append to the same line, and not the next line.
Here is my code :
#Adding a User
msg = 'Adding your information'
title = 'Uk Users'
box_names = ["Email" , "Password"]
box_values = (easygui.multpasswordbox(msg, title, box_names,))
while box_values[0] == '' or box_values[1] == '':
msg = 'Try again'
title = 'you missed a box, please try again!'
box_names_1 = ["Email" , "Password"]
box_values = str(easygui.multpasswordbox(msg, title, box_names_1))
#How to make it repeat?
else:
for i in range(len(box_values)):
box_values = str(box_values)
f = open('USERS.csv' , 'a') #'a' is for appending
f.write(box_values) #How to add something to a new line?
You could use the csv library, defining a Writer object to write to your file. So you would need to replace the else statement with something like that:
else:
with open('USERS.csv', 'a', newline = '') as csvFile:
csvWriter = csv.writer(csvFile, delimiter = ',')
csvWriter.writerow(box_values)
The writerow method will automatically put your new data into a new row. Also you don't need to explicitly convert your data to string.
My code is currently hard coded to only accept csv files in the column format:
first_name,last_name,phone,email,address,company
However I would like users to be able to upload csv files that are in any* format order and naming scheme and correctly populate our forms. For example:
Email,LastName,FirstName,Company,Phone,Address
would be a valid column format. How would I go about doing that? Relevant code as follows:
dr = csv.reader(open(tmp_file,'r'))
data_dict = {}
headers = next(dr)
print (headers)
#skips over first line in csv
iterlines = iter(dr)
next(iterlines)
for row in iterlines:
data_dict["first_name"] = row[0]
#print(data_dict["first_name"])
data_dict["last_name"] = row[1]
#print(data_dict["last_name"])
data_dict["phone"] = row[2]
#print(data_dict["phone"])
data_dict["email"] = row[3]
#print(data_dict["email"])
data_dict["address"] = row[4]
#print(data_dict["address"])
data_dict["company"] = row[5]
#print(data_dict["company"])
#adds to form
try:
form = AddContactForm(data_dict)
if form.is_valid():
obj = form.save(commit=False)
obj.owner = request.user.username
first_name = form.cleaned_data.get(data_dict["first_name"])
last_name = form.cleaned_data.get(data_dict["last_name"])
phone = form.cleaned_data.get(data_dict["phone"])
email = form.cleaned_data.get(data_dict["email"])
address = form.cleaned_data.get(data_dict["address"])
company = form.cleaned_data.get(data_dict["company"])
obj.save()
else:
logging.getLogger("error_logger").error(form.errors.as_json())
except Exception as e:
logging.getLogger("error_logger").error(repr(e))
pass
headers = "first_name,last_name,email"
headers_array = headers.split(',')
headers_map = {}
for i, column_name in enumerate(headers_array):
headers_map[column_name] = i
#creates {'first_name': 0, 'last_name': 1, 'email': 2}
Now you can now use headers_map to get the row element
row[headers_map['first_name']]
Edit: For those loving one liners
headers_map = {column_name: i for i, column_name in enumerate(headers.split(','))}
There are a number of approaches to handling inconsistent header names in the file. The best approach is to prevent it by rejecting such files at upload time, obliging the uploader to correct them. Assuming this isn't possible, you could try to transform the provided headers into what you want
import csv
import io
import re
with open(tmp_file, 'r') as f:
reader = csv.reader
headers = next(reader)
# Make a new header list with placeholders
fixed_headers = [None * len(headers)]
for i, value in enumerate(headers)
fixed = re.sub(r'(\w+)(?<=[a-z])([A-Z]\w+)', r'\1_\2', v).lower()
new_headers[i] = fixed
The regex finds capital letters in the middle of strings and inserts an underscore; then str.lower is called on the result (so values like 'Email' will be converted to 'email'.
Now rewrite the csv with the fixed headers:
with open(tmp_file, 'r') as f:
reader = csv.reader(f)
next(reader)
new_file = io.StringIO()
writer = csv.writer(new_file)
writer.writerow(fixed_headers)
for row in reader:
writer.writerow(row)
# Rewind the file pointer
new_file.seek(0)
Use csv.DictReader to get rows as dictionaries of values mapped to headers.
dr = csv.DictReader(new_file)
for data_dict in dr:
#adds to form
try:
form = AddContactForm(data_dict)
if form.is_valid():
obj = form.save(commit=False)
obj.owner = request.user.username
first_name = form.cleaned_data.get(data_dict["first_name"])
last_name = form.cleaned_data.get(data_dict["last_name"])
phone = form.cleaned_data.get(data_dict["phone"])
email = form.cleaned_data.get(data_dict["email"])
address = form.cleaned_data.get(data_dict["address"])
company = form.cleaned_data.get(data_dict["company"])
obj.save()
else:
logging.getLogger("error_logger").error(form.errors.as_json())
except Exception as e:
logging.getLogger("error_logger").error(repr(e))
pass
I'm using a for loop to add a dictionary to a list of dictionaries (which I'm calling data_file), using data_file.append()
It works fine :) But then later I try to add some more dictionaries to data_file in another for loop and I use data_file.append() again and it doesn't work. It doesn't add those dictionaries to data_file
Does anyone know what I'm doing wrong?
I don't get an error. It just produces a file that only has the dictionaries from massage_generators. It doesn't take on anything from travel_generators.
I've even tried commenting out the first for loop, the one for massage_generators, and in that case it does add in the travel_generators dictionaries. It's like I can't use .append() twice?
Any help would be much appreciated please!
I'm sorry it's not very elegant, I'm only just learning this coding stuff! :)
Thanks
import csv
import copy
import os
import sys
generators = list()
for filename in os.listdir(os.getcwd()):
root, ext = os.path.splitext(filename)
if root.startswith('Travel Allowance Auto Look up') and ext == '.csv':
travel = filename
open_file = csv.DictReader(open(travel))
generators.append(open_file)
travel_generators = generators[:]
massage_generators = generators[:]
data_file = [] # result will be stored here (List of dicts)
travel_remap = {'FINAL_TRAVEL_ALL':'AMOUNT'}
massage_remap = {'MASSAGE_BIK':'AMOUNT'}
for generator in massage_generators:
for dictionary in generator:
dictionary['PAYMENT_TYPE_CODE'] = 'MASSAGE_BIK'
dictionary['COMMENT'] = 'Massage BIK'
dictionary['IS_GROSS'] = 'FALSE'
dictionary['PAYMENT_TO_DATE'] = '01/01/2099'
dictionary['PAID MANUALLY'] = 'FALSE'
for old_key, new_key in massage_remap.iteritems():
if old_key not in dictionary:
continue
dictionary['AMOUNT'] = dictionary['MASSAGE_BIK']
del dictionary['MASSAGE_BIK']
if (dictionary['AMOUNT'] != '0' and dictionary['AMOUNT'] != ''):
data_file.append(dictionary)
for generator in travel_generators:
for dictionary in generator:
dictionary['PAYMENT_TYPE_CODE'] = 'TRANSPORTATION_ALLOWANCE'
dictionary['COMMENT'] = 'Annual travel allowance paid in monthly installments'
dictionary['IS_GROSS'] = 'TRUE'
dictionary['PAYMENT_TO_DATE'] = '01/01/2099'
dictionary['PAID MANUALLY'] = 'FALSE'
for old_key, new_key in travel_remap.iteritems():
if old_key not in dictionary:
continue
dictionary['AMOUNT'] = dictionary['FINAL_TRAVEL_ALL']
del dictionary['FINAL_TRAVEL_ALL']
if (dictionary['AMOUNT'] != 'Not Applicable' and dictionary['AMOUNT'] != '0' and dictionary['AMOUNT'] != '' and dictionary['AMOUNT'] != '#N/A'):
data_file.append(dictionary)
keys = ['EMPID', 'Common Name', 'PAYMENT_TYPE_CODE', 'CURRENCY', 'AMOUNT', 'EFFECTIVE_DATE',
'COMMENT', 'PAYMENT_FROM_DATE', 'PAYMENT_TO_DATE', 'IS_GROSS', 'HIDDEN_UNTIL', 'PAID MANUALLY', 'PAYMENT_DATE']
bulk_upload = open('EMEA Bulk Upload.csv', 'wb')
dict_writer = csv.DictWriter(bulk_upload, keys, restval='', extrasaction='ignore')
dict_writer.writer.writerow(keys)
dict_writer.writerows(data_file)
print "Everything saved! Look up EMEA Bulk Upload.csv"
Here's your problem:
open_file = csv.DictReader(open(travel))
generators.append(open_file)
travel_generators = generators[:]
massage_generators = generators[:]
Your travel_generators and massage_generators lists are using the same open file to iterate over and get data; the problem with that is once the file has been iterated over once by the csv.DictReader in massage_generators it has been exhausted and you get no more data when you try to read it again in travel_generators.
Try this instead:
generators.append(travel) # just the filename
travel_generators = generators[:]
massage_generators = generators[:]
and then in your loops:
for filename in massage_generators:
for dictionary in csv.DictReader(open(filename)):
Also, in your first loop where you are gathering filenames: be aware that the loop is only saving the last filename that matches. If that is your intention you can make it clear by adding break on the line after travel = filename.
In the example you provided, generators, travel_generators, and massage_generators are all identical, and each contain the same instance of open_file. When you iterate massage_generators, you reach the end of the file. Then when you iterate travel_generators, there's nothing left to iterate in open_file.
At the very least, you could change the following line:
open_file = csv.DictReader(open(travel))
To:
open_file = map(None, csv.DictReader(open(travel)))
open_file is now a list and not a generator and can be iterated and re-iterated.
However, I would recommend you iterate through the file once, and build your two dictionaries, and write to a DictWriter instance in each iteration.
open_file = csv.DictReader(open(travel))
...
dict_writer = csv.DictWriter(bulk_upload, keys, restval='', extrasaction='ignore')
dict_writer.writer.writerow(keys)
for line in open_file:
massage_dictionary={}
massage_dictionary.update(line)
... # manipulate massage_dictionary
dict_writer.writerow(massage_dictionary)
travel_dictionary={}
travel_dictionary.update(line)
... # manipulate travel_dictionary
dict_writer.writerow(travel_dictionary)