I need read a csv file and fill the empty/null values in column "Phone and Email" based on the person's address and write to a new csv file. Ex: if a person "Jonas Kahnwald" doesn't have the phone number or an email address but has the same address as the person above or below, say "Hannah Kahnwald", then we should fill the empty/null values with those person's details.
I won't be able to use python pandas as the rest of the code/programs are purely based on python 2.7 (unfortunately), so I just need to write a function or a logic to capture this information alone.
Input format/table looks like below with empty cells (csv file):
FirstName,LastName,Phone,Email,Address
Hannah,Kahnwald,1457871452,hannkahn#gmail.com,145han street
Micheal,Kahnwald,6231897383,,145han street
Jonas,Kahnwald,,,145han street
Mikkel,Nielsen,4509213887,mikneil#yahoo.com,887neil ave
Magnus,Nielsen,,magnusneil#kyle.co,887neil ave
Ulrich,Nielsen,,,887neil ave
katharina,Nielsen,,,887neil ave
Elisabeth,Doppler,5439001211,elsisop#amaz.com,211elis park
Peter,Doppler,,,211elis park
bartosz,Tiedmannn,6263172828,tiedman#skype.com,828alex street
Alexander,washington,,,321notsame street
claudia,Tiedamann,,,828alex street
Output format should be like below:
Hannah,Kahnwald,1457871452,hannkahn#gmail.com,145han street
Micheal,Kahnwald,6231897383,hannkahn#gmail.com,145han street
Jonas,Kahnwald,1457871452,hannkahn#gmail.com,145han street
Mikkel,Nielsen,4509213887,mikneil#yahoo.com,887neil ave
Magnus,Nielsen,4509213887,magnusneil#kyle.co,887neil ave
Ulrich,Nielsen,4509213887,mikneil#yahoo.com,887neil ave
katharina,Nielsen,4509213887,mikneil#yahoo.com,887neil ave
Elisabeth,Doppler,5439001211,elsisop#amaz.com,211elis park
Peter,Doppler,5439001212,elsisop#amaz.com,211elis park
bartosz,Tiedmannn,6263172828,tiedman#skype.com,828alex street
Alexander,washington,,,321notsame street
claudia,Tiedamann,6263172828,tiedman#skype.com,828alex street
import csv,os
def get_info(file path):
data = []
with open(file, 'rb') as fin:
csv_reader = csv.reader(fin)
next(reader)
for each in csv_reader:
FirstName = each[0]
LN = each[1]
Phone = "some function or logic"
email = " some function or logic"
Address = each[4]
login = ""
logout = ""
data.append([FirstName,LN,Phone,email,Address,login,logout])
f.close()
return data
Here's a significantly updated version that attempts to fill in missing data from other entries in the file, but only if they have the same Address field. To make the searching faster it builds a dictionary for internal use called attr_dict which contains all the records with a particular address. It also uses namedtuples internally to make the code a little more readable.
Note that when retrieving missing information, it will use the data from the first entry it finds stored in this internal dictionary at the Address. In addition, I don't think the sample data you provided contains every possible case, so will need to do additional testing.
import csv
from collections import namedtuple
def get_info(file_path):
# Read data from file and convert to list of namedtuples, also create address
# dictionary to use to fill in missing information from others at same address.
with open(file_path, 'rb') as fin:
csv_reader = csv.reader(fin, skipinitialspace=True)
header = next(csv_reader)
Record = namedtuple('Record', header)
newheader = header + ['Login', 'Logout'] # Add names of new columns.
NewRecord = namedtuple('NewRecord', newheader)
addr_dict = {}
data = [newheader]
for rec in (Record._make(row) for row in csv_reader):
if rec.Email or rec.Phone: # Worth saving?
addr_dict.setdefault(rec.Address, []).append(rec) # Remember it.
login, logout = "", "" # Values for new columns.
data.append(NewRecord._make(rec + (login, logout)))
# Try to fill in missing data from any other records with same Address.
for i, row in enumerate(data[1:], 1):
if not (row.Phone and row.Email): # Info missing?
# Try to copy it from others at same address.
updated = False
for other in addr_dict.get(row.Address, []):
if not row.Phone and other.Phone:
row = row._replace(Phone=other.Phone)
updated = True
if not row.Email and other.Email:
row = row._replace(Email=other.Email)
updated = True
if row.Phone and row.Email: # Info now filled in?
break
if updated:
data[i] = row
return data
INPUT_FILE = 'null_cols.csv'
OUTPUT_FILE = 'fill_cols.csv'
data = get_info(INPUT_FILE)
with open(OUTPUT_FILE, 'wb') as fout:
writer = csv.DictWriter(fout, data[0]) # First elem has column names.
writer.writeheader()
for row in data[1:]:
writer.writerow(row._asdict())
print('Done')
Screenshot of results in Excel:
Related
I'm new to Python so excuse me if my question is kind of dumb.
I send some data into a csv file (I'm making a password manager). So I send this to this file (in this order), the name of the site, the e-mail corresponding and finally the password.
But I would like to print all the names already written in the csv file but here is my problem, for the first row it does print the whole row but for the following rows it works just well.
Here is my code, I hope u can help me with this.
csv_file = csv.reader(open('mycsvfile.csv', 'r'), delimiter=';')
try :
print("Here are all the sites you saved :")
for row in csv_file :
print(row[0])
except :
print("Nothing already saved")
Maybe it can help, but here is how I wrote my data into the csv file:
#I encrypt the email and the password thanks to fernet and an already written key
#I also make sure that the email is valid
file = open('key.key', 'rb')
key = file.read()
file.close()
f = Fernet(key)
website = input("web site name : \n")
restart = True
while restart :
mail = input("Mail:\n")
a = isvalidEmail(mail)
if a == True :
print("e-mail validated")
restart = False
else :
print("Wrong e-mail")
pws = input("password :\n")
psw_bytes = psw.encode()
mail_bytes = mail.encode()
psw_encrypted_in_bytes = f.encrypt(psw_bytes)
mail_encrypted_in_bytes = f.encrypt(mail_bytes)
mail_encrypted_str = mail_encrypted_in_bytes.decode()
psw_encrypted_str = psw_encrypted_in_bytes.decode()
f = open('a.csv', 'a', newline='')
tup1 = (website, mail_encrypted_str, psw_encrypted_str)
writer = csv.writer(f, delimiter = ';')
writer.writerow(tup1)
print("Saved ;)")
f.close()
return
And here is my output (I have already saved data)
Output (First, you see the name of the ws with the email and the psw encrypted then just the name which is what I want
I finally succeed, instead of using a csv.Reader, i used a csv.DictReader and as all the names i'm looking for are on the same column, i juste have to use the title of the columns.
So here is the code :
with open('mycsv.csv', newline='') as csvfile:
data = csv.DictReader(csvfile)
print("Websites")
print("---------------------------------")
for row in data:
print(row['The_title_of_my_column'])
make list from csv.reader()
rows = [row for row in csv_file]
and now you can get element by identifier using rows as list of lists
rows[id1][id2]
I've been working this problem way too long, please explain to me why the header keeps repeating in my output csv.
I have an input csv with this data:
name,house
"Abbott, Hannah",Hufflepuff
"Bell, Katie",Gryffindor
"Bones, Susan",Hufflepuff
"Boot, Terry",Ravenclaw
The problem requires reversing last and first name, separate name into two columns, and make a new header with 3 columns for the output csv. Here's what I have:
while True:
try:
# open file
with open(sys.argv[1]) as file:
# make reader
reader = csv.reader(file)
# skip first line (header row)
next(reader)
# for each row
for row in reader:
# identify name
name = row[0]
# split at ,
name = name.split(", ")
# create var last and first, identify var house
last = name[0]
first = name[1]
house = row[1]
# writing the new csv
with open(sys.argv[2], "a") as after:
writer = csv.DictWriter(after, fieldnames=["first", "last", "house"])
# HEADER ONLY NEEDS TO OCCUR ONCE
writer.writeheader()
writer.writerow({"first": first, "last": last, "house": house})
sys.exit(0)
my output csv:
first,last,house
Hannah,Abbott,Hufflepuff
first,last,house
Katie,Bell,Gryffindor
first,last,house
Susan,Bones,Hufflepuff
I've tried removing the while loop, unindenting and indenting, writing a row manually with the header names (which caused errors). Please help. Thanks!
You can add a variable that hold whether a header was printed or not, ex write_header
while True:
try:
write_header = True
# open file
with open(sys.argv[1]) as file:
# make reader
reader = csv.reader(file)
# skip first line (header row)
next(reader)
# for each row
for row in reader:
# identify name
name = row[0]
# split at ,
name = name.split(", ")
# create var last and first, identify var house
last = name[0]
first = name[1]
house = row[1]
# writing the new csv
with open(sys.argv[2], "a") as after:
writer = csv.DictWriter(after, fieldnames=["first", "last", "house"])
# HEADER ONLY NEEDS TO OCCUR ONCE
if write_header:
writer.writeheader()
write_header = False
writer.writerow({"first": first, "last": last, "house": house})
sys.exit(0)
See how i used write_header
On an other note, you can refactor your code to open the csv writer before the for loop, write headers there, then write values as you do now without the need to reopen the file each time you want to write a row
I am tryna collect information from the contact form like (name, email, etc) from website using flask into csv file. This data gets stored into a mega_data variable which converts it into dictionary.
Dictionary looks like this; {'Name':'xyz','Email':'xyz','Subject':'xyz','Message':'xyz'}
Here's the code for that:
#Dictionary into CSV
def file_to_csv(data):
with open('database.csv', 'a+', newline='') as database:
full_name = data['Name']
email = data['Email']
subject = data['Subject']
message = data['Message']
writer = csv.writer(database2, delimiter=',',
quotechar='"', quoting=csv.QUOTE_MINIMAL, lineterminator='\n')
writer.writerow([full_name, email, subject, message])
#Requesting DATA
#app.route('/submit_form', methods=['POST', 'GET'])
def submit_form():
error = None
if request.method == 'POST':
mega_data = {}
data = request.form.to_dict()
for k, v in data.items():
mega_data.update({k: v})
file_to_csv(mega_data)
return render_template('/thanks.html', name=data['Name'].split(' ')[0])
return 'Something went wrong, Try again!'
But by using this method I would already need to have a csv file ready with specified headers (or keys from the dictionary) which will delimited by any characters.
What I basically want is to be able to define the headers within the function itself which will generate csv file with headers listed along with its related values.
The CSV file should look something like this:
NAME EMAIL SUBJECT MESSAGE
abc abc abc abc
xyz xyz xyz xyz
This is the contact form where from I am collecting data and converting into dictionary, I want to put this dictionary data into CSV file.
Any help would really be appreciated!!!
Code below would help you
def header_present(file):
# open the file here and check the header, if present or not return True/False
def def file_to_csv(file,record):
with open(file, 'a') as f:
if header():
csv.DictWriter(f, record.keys()).writeheader()
json.dump(record, f)
f.write(os.linesep)
1 you can use pandas library to create csv file with columns as you want :
import pandas as pd
2 create a empty dictionary :
mega_data = {}
3 than using full_name you can make a key and enter all values that you want in it :
mega_data[full_name] = [full_name,email,subject,message ]
4 now by using pandas you can give any name to your column that you want :
mega_data_df = pd.DataFrame.from_dict(mega_data, orient='index',
columns=['NAME', 'EMAIL', 'SUBJECT'])
5 after that save this data as csv file and give it any name to that file :
mega_data_df.to_csv('mega_data.csv')
of course there are many ways to use pandas
so, do check out more about pandas library
I have a CSV file which contains an Address field. The CSV file has Addresses outlined as the following in everything caps but I need your assistance in using the title() snippet on the append(row[1]). I have tried doing:
append.title(row[1]) but it does not work.
In the CSV File --------------Needs to be:
1234 PRESTON ROAD -------- 1234 Preston Road
1245 JACKSON STREET ------- 1245 Jackson Street
8547 RAINING COURT ------- 8547 Raining Court
with open('C:\\Users\\Jake\\Desktop\\My Files\\Python Files\\PermitData.csv', 'rb') as f:
reader = csv.reader(f)
next (reader)
data = list(reader)
PermitData = []
for row in data:
PermitData.append(row[0]),PermitData.append(row[1]),PermitData.append(row[2]),
PermitData.append(row[3]),PermitData.append(row[4]),PermitData.append(row[5]),
PermitData.append(row[6])
results = PermitData
for result in results:
print result
f.close()
The reason I am iterating over every row in the CSV file is the need to save the edited CSV file as a temp file before replacing the original with the edited one. I am not that articulate with Python as I am learning by doing actual projects so please forgive any stupidity in the question and coding. Please provide your kind advice and assistance.
The following code will create a new file named output.csv with the output that you asked for:
import csv
with open('C:\\Users\\Jake\\Desktop\\My Files\\Python Files\\output.csv', 'w') as out:
with open('C:\\Users\\Jake\\Desktop\\My Files\\Python Files\\PermitData.csv', 'rb') as f:
reader = csv.reader(f)
out.write(next(reader)[0].replace('\t', ' ') + '\n')
data = list(reader)
for item in data:
item = item[0].split(' ')
out.write(' '.join(
[item[0],
item[1].title(),
item[2].title()]) + '\n')
If what you want is just to print the result, try as follows:
import csv
results = []
with open('permitData.csv', 'rb') as f:
reader = csv.reader(f)
next(reader)
data = list(reader)
for item in data:
item = item[0].split(' ')
results.append(' '.join(
[item[0],
item[1].title(),
item[2].title()]))
Output:
>>> for result in results:
... print result
...
1234 Preston Road
1245 Jackson Street
8547 Raining Court
You'll need to call title() on the string you want to convert to Title Case. This should work:
append(row[1].title())
Try this:
PermitData = []
with open('C:\\Users\\Jake\\Desktop\\My Files\\Python Files\\PermitData.csv', 'rb') as f:
reader = csv.reader(f)
headers = next (reader)
for row in reader:
row = row[:1]+ [row[1].title()] + row[2:] # Assuming row[1] is your address field you need in a title case.
PermitData.append(row)
for result in PermitData:
print result
You should also note that you don't need to call f.close() when you are using the with syntax for opening a file.
The file gets closed automatically once you exit with
I am writing to a csv and it works good, except some of the rows have commas in there names and when I write to the csv those commas throw the fields off...how do I write to a csv and ignore the commas in the rows
header = "Id, FACID, County, \n"
row = "{},{},{}\n".format(label2,facidcsv,County)
with open('example.csv', 'a') as wildcsv:
if z==0:
wildcsv.write(header)
wildcsv.write(row)
else:
wildcsv.write(row)
Strip any comma from each field that you write to the row, eg:
label2 = ''.join(label2.split(','))
facidcsv = ''.join(facidcsv.split(','))
County = ''.join(County.split(','))
row = "{},{},{}\n".format(label2,facidcsv,County)
Generalized to format a row with any number of fields:
def format_row(*fields):
row = ''
for field in fields:
if row:
row = row + ', ' + ''.join(field.split(','))
else:
row = ''.join(field.split(','))
return row
label2 = 'label2, label2'
facidcsv = 'facidcsv'
county = 'county, county'
print(format_row(label2, facidcsv, county))
wildcsv.write(format_row(label2, facidcsv, county))
Output
label2 label2, facidcsv, county county
As #TomaszPlaskota and #quapka allude to in the comments, Python's csv writers and readers by default write/read csv fields that contain a delimiter with a surrounding '"'. Most applications that work with csv files follow the same format. So the following is the preferred approach if you want to keep the commas in the output fields:
import csv
label2 = 'label2, label2'
facidcsv = 'facidccv'
county = 'county, county'
with open('out.csv', 'w') as f:
writer = csv.writer(f)
writer.writerow((label2, facidcsv, county))
out.csv
"label2, label2",facidccv,"county, county"