Finding farmers markets from zip codes or town names - python

I'm working on a small project where the user enters a zip code or town name and the computer outputs all farmer's market's that are in that zip code or town name.
I have a function that turns markets.txt (a txt which includes state, zipcode, town, city, name of farmers market) into 2 dictionaries: one that maps zip codes to farmers market tuples and another that maps towns to zip codes. My main program first checks if the user input is a zip code or if it is a town name and then (if the user gives a zip code) takes a list of all zip codes and finds a farmers market tuple to then format for readability as an output. In the case that it is a town, it's largely the same except the function retrieves a zip code from the user input town name to then get farmer's market tuples and format it.
I'm looking through markets.txt however and there are multiple farmer's markets for town names (such as this town called Granville) but the program only prints 1 rather than all of them.
Thanks so much!
Here is the code I have so far:
d1 = {}
d2 = {}
def read_markets(filename):
"""
Read in the farmers market data from the file named filename and return
a tuple of two objects:
1) A dictionary mapping zip codes to lists of farmers market tuples.
2) A dictionary mapping towns to sets of zip codes.
"""
with open(filename) as f:
for line in f:
s = line.strip().split('#')
#print(type(s[4]))
# s[4] are zipcodes, s[3] are towns, s[:4] is state, name, address, city
d1[s[4]] = (s[:4])
d2[s[3]] = (s[4])
#print(s[:4])
return d1, d2
def print_market(market):
"""
Returns a string representing the farmers market tuple
passed to the market parameter.
"""
#input is market tuple
name = market[1]
address = market[2]
city = market[3]
state = market[0]
zcode = (list(d1.keys())[list(d1.values()).index(market)])
final = name + "\n" + address + "\n" + city + ", " + state + " " + zcode
#print(final)
return final
# print(b)
if __name__ == "__main__":
# This main program first reads in the markets.txt once (using the function read_markets
# and then asks the user repeatedly to enter a zip code or
# a town name (in a while loop until the user types "quit").
FILENAME = "markets.txt"
c = 0
try:
zip_to_market, town_to_zips = read_markets(FILENAME)
while c < 1:
u_in = input("enter zip code or town name: ")
if u_in == "quit":
c = 1
else:
#check if its a zip code
if u_in.isdigit():
print("Ok, I will look for farmers markets matching that zipcode")
askzip = str(u_in)
#list of all zipcodes
mlist = d1.keys()
#look for corresponding zipcode in dictionary that maps zipcodes to market tuples
if askzip in mlist:
out1 = d1.get(askzip)
print(print_market(out1))
else:
print('No corresponding farmers markets exist for that zipcode')
#user input is town name
else:
print("Ok, I will look for farmers markets in that town")
asktown = str(u_in)
tlist = d2.keys()
if asktown in tlist:
outzip = d2.get(asktown)
#print(outzip)
#we got zip from our dictionary mapping zip codes to town names so now
mlist = d1.keys()
if outzip in mlist:
# print(outzip)
out1 = d1.get(outzip)
print(print_market(out1))
else:
print('No corresponding farmers markets exist for that town name')
except (FileNotFoundError, IOError):
print("Error reading {}".format(FILENAME))
#testing things
#read_markets("markets.txt")
#market = ['Wyoming', 'Wyoming Fresh Market', '121 W 15th Street', 'Cheyenne']
#print_market(market)
and a pastebin of the snippet of markets.txt that include multiple farmers markets in one town(Granville):
https://pastebin.com/cFdb7HZ5

I fixed the problem! :D
The problem was that the dictionaries I was using didn't store multiples of the same value. This could be solved using a defaultdict(list) and appending lists. Then to navigate the nested lists, I made a separate method to search for values using similar to what Ironkey suggested. I also changed the way zip codes were retrieved from market tuples by using another defaultdict matching multiple unique addresses to zip codes.
Final fixed code that accounts for multiple markets in 1 zip code/town name.
from collections import defaultdict
Dzcode_mtuples = defaultdict(list)
Dtowns_mtuples = defaultdict(list)
Dtowns_zcodes = defaultdict(list)
Daddress_zcodes = defaultdict(list)
a = set()
def read_markets(filename):
"""
Read in the farmers market data from the file named filename and return
a tuple of two objects:
1) A dictionary mapping zip codes to lists of farmers market tuples.
2) A dictionary mapping towns to sets of zip codes.
"""
with open(filename) as f:
for line in f:
s = line.strip().split('#')
#zip codes in a set
a.add(s[4])
#s[4] are zipcodes, s[3] are towns, s[:4] is state, name, address, city
#dictionary mapping 1 zip code to multiple farmers markets tuples
#dictionary mapping 1 town name to multiple farmers markets tuples
#dictionary mapping towns to sets of zip codes
#dictionary mapping unique addresses to zip codes
mtuple = [s[:4]]
Dzcode_mtuples[s[4]].append(mtuple)
Dtowns_mtuples[s[3]].append(mtuple)
Dtowns_zcodes[s[3]].append(a)
Daddress_zcodes[s[2]].append(s[4])
# print(market_name)
#print(s[:4])
return Dzcode_mtuples, Dtowns_mtuples
def search_nested(mylist, val):
for i in range(len(mylist)):
for j in range(len(mylist[i])):
#print i,j
#print(mylist)
if mylist[i][j] == val:
return mylist[i]
return str(val) + ' not found'
def list2string(s):
str1 = ""
#for element i in list s transverse in the string
for i in s:
str1 += i
return str1
def print_market(market):
"""
Returns a human-readable string representing the farmers market tuple
passed to the market parameter.
"""
#input is market tuple
name = market[1]
address = market[2]
city = market[3]
state = market[0]
#zcode is ['zipcode'] list from the dictionary that matched address to zcode. address is unique to each farmers market so its good to use to search
zcode = (list(Daddress_zcodes.values())[list(Daddress_zcodes.keys()).index(market[2])])
#the list2string method makes zcode a clean string
final = name + "\n" + address + "\n" + city + ", " + state + " " + list2string(zcode)
#print(final)
return final
# print(b)
if __name__ == "__main__":
# This main program first reads in the markets.txt once (using the function
# from part (a)), and then asks the user repeatedly to enter a zip code or
# a town name (in a while loop until the user types "quit").
FILENAME = "markets.txt"
c = 0
try:
zip_to_market, town_to_zips = read_markets(FILENAME)
while c < 1:
u_in = input("enter zip code or town name: ")
if u_in == "quit":
c = 1
else:
#check if its a zip code
if u_in.isdigit():
print("\nOk, I will look for farmers markets matching that zipcode \n")
askzip = str(u_in)
#list of all zipcodes
zlist = Dzcode_mtuples.keys()
#print(zlist)
#look for corresponding zipcode in dictionary that maps zipcodes to market tuples
if askzip in zlist:
out1 = Dzcode_mtuples.get(askzip)
for x in out1:
print(print_market(x[0]) + "\n")
else:
print('No corresponding farmers markets exist for that zipcode')
#user input is town name
else:
print("\nOk, I will look for farmers markets in that town \n")
asktown = str(u_in)
tlist = Dtowns_mtuples.keys()
if asktown in tlist:
#look for user input town in all the towns
out1 = Dtowns_mtuples.get(asktown)
#out1 is a list of mtuples that satisfy this
#print(out1[0][0])
for x in out1:
#print(Dzcode_mtuples.values())
#print(x[0])
print(print_market(x[0]) + "\n")
else:
print('No corresponding farmers markets exist for that town name')
except (FileNotFoundError, IOError):
print("Error reading {}".format(FILENAME))
#testing things
#read_markets("markets.txt")
#market = ['Wyoming', 'Wyoming Fresh Market', '121 W 15th Street', 'Cheyenne']
#print_market(market)

Related

Add gender column searching rows that contains info from 2 tables

I have a df that contains some emails:
Email
jonathat0420#email.com
12alexander#email.com
14abcdjoanna#email.com
maria44th#email.com
mikeasddf#email.com
I need to add a second column with the gender.
I will have 2 lists:
male_names = ['john', 'alex']
female_names = ['maria', joanna']
My output should look like that:
Email Gender
jonathat0420#email.com 1
12alexander#email.com 1
14abcdjoanna#email.com 2
maria44th#email.com 2
mikeasddf#email.com
I would need to search the emails that contains the names from the lists and if they are in the emails to add them a number, like "1" for males, 2 for "females" and leave empty for the emails without matching in the lists.
Can anybody help me with this?
You could simply use a map, like this:
def isinlist(email, names):
for name in names:
if name in email:
return True
return False
df.loc[:, 'Gender'] = df.Email.map(lambda x : 1 if isinlist(x, male_names) else (2 if isinlist(x, female_names) else None))
However, there are going to be a lot of ambiguous cases that risk being classified erroneously - e.g., "alexandra#email.com" would be classified as male, since alex is the list of male names.
Maybe you could implement a slighly more complex "best match" logic like this?
def maxmatchlen(email, names): # = length of longest name from list that is contained in the email
return max([len(name) for name in names if name in email] + [0]) # append a 0 to avoid empty lists
def f(email, male_names = male_names, female_names = female_names):
male_maxmatchlen = maxmatchlen(email, male_names)
female_maxmatchlen = maxmatchlen(email, female_names)
if male_maxmatchlen > female_maxmatchlen:
return 1
elif female_maxmatchlen > male_maxmatchlen:
return 2
else: # ambiguous case
return None
df.loc[:, 'Gender'] = df.Email.map(f)
It looks like you first must determine if the email contains a name. You can loop through both male and female. That will determine if the name is "in" the email. Then you could make a list or a dictionary of these.
#!/usr/bin/env python3
import os
def get_emails(filepath):
"""Open the data file and read the lines - return a list"""
with open(filepath, "r") as f:
email_list = f.readlines()
for email in email_list:
print(f'Email = {email}')
print(f'The total number of emails = {len(email_list)}')
return email_list
def find_names(email_list):
"""loop through the email list and see if each one contains a male or female name - return dictionary of tuples"""
male_names = ['john', 'alex', 'mike', 'jonathat']
female_names = ['maria', 'joanna']
name_dict = {}
for email in email_list:
for name_f in female_names:
if name_f in email:
data= (name_f , 1)
name_dict[email] = data
print(f"{email} is for {name_f} and is female {data[1]}")
continue
for name_m in male_names:
if name_m in email:
data= (name_m , 2)
name_dict[email] = data
print(f"{email} is for {name_m} and is male {data[1]}")
continue
return name_dict
if __name__ == '__main__':
your_Datafile = r"D:\Share\email.txt"
email_list = get_emails(your_Datafile)
my_dictionary = find_names(email_list)
print(my_dictionary)
for email, data in my_dictionary.items():
print(data[0], data[1], email)

How to read a csv file and sum values based on user input?

Read a CSV file
User have to enter the Mobile number
Program should show the Data usage (i.e. Arithmetic Operation Adding Uplink & downlink) to get the result (Total Data Used)
Here is Example of CSV file
Time_stamp; Mobile_number; Download; Upload; Connection_start_time; Connection_end_time; location
1/2/2020 10:43:55;7777777;213455;2343;1/2/2020 10:43:55;1/2/2020 10:47:25;09443
1/3/2020 10:33:10;9999999;345656;3568;1/3/2020 10:33:10;1/3/2020 10:37:20;89442
1/4/2020 11:47:57;9123456;345789;7651;1/4/2020 11:11:10;1/4/2020 11:40:22;19441
1/5/2020 11:47:57;9123456;342467;4157;1/5/2020 11:44:10;1/5/2020 11:59:22;29856
1/6/2020 10:47:57;7777777;213455;2343;1/6/2020 10:43:55;1/6/2020 10:47:25;09443
With pandas
import pandas as pd
# read in data
df = pd.read_csv('test.csv', sep=';')
# if there are really spaces at the beginning of the column names, they should be removed
df.columns = [col.strip() for col in df.columns]
# sum Download & Upload for all occurrences of the given number
usage = df[['Download', 'Upload']][df.Mobile_number == 7777777].sum().sum()
print(usage)
>>> 431596
if you want Download and Upload separately
# only 1 sum()
usage = df[['Download', 'Upload']][df.Mobile_number == 7777777].sum()
print(usage)
Download 426910
Upload 4686
with user input
This assumes the Mobile_number column has be read into the dataframe as an int
input is a str so it must be converted to int to match the type in the dataframe
df.Mobile_number == 7777777 not df.Mobile_number == '7777777'
number = int(input('Please input a phone number (numbers only)'))
usage = df[['Download', 'Upload']][df.Mobile_number == number].sum().sum()
With no imported modules
# read file and create dict of phone numbers
phone_dict = dict()
with open('test.csv') as f:
for i, l in enumerate(f.readlines()):
l = l.strip().split(';')
if (i != 0):
mobile = l[1]
download = int(l[2])
upload = int(l[3])
if phone_dict.get(mobile) == None:
phone_dict[mobile] = {'download': [download], 'upload': [upload]}
else:
phone_dict[mobile]['download'].append(download)
phone_dict[mobile]['upload'].append(upload)
print(phone_dict)
{'+917777777777': {'download': [213455, 213455], 'upload': [2343, 2343]},
'+919999999999': {'download': [345656], 'upload': [3568]},
'+919123456654': {'download': [345789], 'upload': [7651]},
'+919123456543': {'download': [342467], 'upload': [4157]}}
# function to return usage
def return_usage(data: dict, number: str):
download_usage = sum(data[number]['download'])
upload_usage = sum(data[number]['upload'])
return download_usage + upload_usage
# get user input to return usage
number = input('Please input a phone number')
usage = return_usage(phone_dict, number)
print(usage)
>>> Please input a phone number (numbers only) +917777777777
>>> 431596
The csv is not too much readable, but you could take a look at his library https://pandas.pydata.org/
Once installed you could use:
# ask for the mobile number here
mobile_number = input('phone number? ')
df = pandas.read_csv('data.csv')
# here you will get the data for that user phone
user_data = df[df['Mobile_number'] == mobile_number].copy()
# not pretty sure in this step
user_data['download'].sum()

More Idiomatic way of extracting column values and assigning it back to DF (Pandas)

I have an existing routine that is running just fine. But since I'm new to python I find my codes to be ugly and I'm finding ways to improve it.
My program goes this way, I have created a class where it needs to have a complete address string which I need to process. Thus, this class have 4 attributes namely, address, state, city and zipcode.
This is the said class:
class Address:
def __init__(self, fulladdress):
self.fulladdress = fulladdress.split(",")
self.address = self.get_address()
self.city = self.get_city()
stateandzip = str(self.fulladdress[-1]).strip()
self.statezip = stateandzip.split(" ")
self.state = self.get_state()
self.zipcode = self.get_zipcode()
def get_address(self):
len_address = len(self.fulladdress)
if len_address == 3:
return self.fulladdress[0].strip()
elif len_address == 4:
return self.fulladdress[0].strip() + ", " + self.fulladdress[1].strip()
elif len_address > 5:
temp_address = self.fulladdress[0]
for ad in self.fulladdress[0:-3]:
temp_address = temp_address + ", " + ad.strip()
return temp_address
else:
return ''
def get_city(self):
if len(self.fulladdress) > 0:
address = self.fulladdress[-2]
return address.strip()
else:
return ''
def get_state(self):
if len(self.fulladdress) > 0:
return self.statezip[0]
else:
return ''
def get_zipcode(self):
if len(self.fulladdress) > 0:
return self.statezip[1]
else:
return ''
Now my existing routine needs to append this results to my dataframe which based on the address column. What I did to parse the address data is I use df.iterrows() since I don't know how can I use the Address Class using the df.apply method.
Here is the routine:
import pandas as pd
import datahelper as dh
import address as ad
# Find the header name of the Address column
address_header = dh.findcolumn('Address', list(df.columns))
header_loc = df.columns.get_loc(address_header)
address = []
city = []
state = []
zipcode = []
for index, row in df.iterrows():
if not row[address_header]:
address.append('')
city.append('')
state.append('')
zipcode.append('')
continue
# extract details from the address
address_data = ad.Address(row[address_header])
address.append(address_data.address)
city.append(address_data.city)
state.append(address_data.state)
zipcode.append(address_data.zipcode)
df[address_header] = address
df.insert(header_loc + 1, 'City', city)
df.insert(header_loc + 2, 'State', state)
df.insert(header_loc + 3, 'Zip Code', zipcode)
I would really appreciate if someone can point me in the right direction. Thank you in advance.
By the way, dh is a datahelper module where I put all my helper functions.
def findcolumn(searchstring, list):
if searchstring in list:
return searchstring
else:
try:
return [i for i in list if searchstring in i][0]
except ValueError:
return None
except IndexError:
return None
And here is my desired output given the sample data from Address column.
df = pd.DataFrame({'Address': ['Rubin Center Dr Ste, Fort Mill, SC 29708', 'Miami, FL 33169']})
Output should be:
Address | City | State | Zip Code
--------------------------------------------------
Rubin Center Dr Ste |Fort Mill| SC |29708
--------------------------------------------------
|Miami | FL |33169

Movie File parse file into a dictionary of the form

1.6. Recommend a Movie
Create a function that counts how many keywords are similar in a set of movie reviews
and recommend the movie with the most similar number of keywords.
The solution to this task will require the use of dictionaries.
The film reviews & keywords are in a file called film_reviews.txt, separated by commas.
The first term is the movie name, the remaining terms are the film’s keyword tags (i.e.,
“amazing", “poetic", “scary", etc.).
Function name: similar_movie()
Parameters/arguments: name of a movie
Returns: a list of movies similar to the movie passed as an argument
film_reviews.txt -
7 Days in Entebbe,fun,foreign,sad,boring,slow,romance
12 Strong,war,violence,foreign,sad,action,romance,bloody
A Fantastic Woman,fun,foreign,sad,romance
A Wrinkle in Time,book,witty,historical,boring,slow,romance
Acts of Violence,war,violence,historical,action
Annihilation,fun,war,violence,gore,action
Armed,foreign,sad,war,violence,cgi,fancy,action,bloody
Black '47,fun,clever,witty,boring,slow,action,bloody
Black Panther,war,violence,comicbook,expensive,action,bloody
I think this could work for you
film_data = {'films': {}}
with open('film_reviews.txt', 'r') as f:
for line in f.readlines():
data = line.split(',')
data[-1] = data[-1].strip() # removing new line character
film_data['films'][data[0].lower()] = data[1:]
def get_smilar_movie(name):
if name.lower() in film_data['films'].keys():
original_review = film_data['films'][name.lower()]
similarities = dict()
for key in film_data['films']:
if key == name.lower():
continue
else:
similar_movie_review = set(film_data['films'][key])
overlap = set(original_review) & similar_movie_review
universe = set(original_review) | similar_movie_review
# % of overlap compared to the first movie = output1
output1 = float(len(overlap)) / len(set(original_review)) * 100
# % of overlap compared to the second movie = output2
output2 = float(len(overlap)) / len(similar_movie_review) * 100
# % of overlap compared to universe
output3 = float(len(overlap)) / len(universe) * 100
similarities[output1 + output2 + output3] = dict()
similarities[output1 + output2 + output3]['reviews'] = film_data['films'][key]
similarities[output1 + output2 + output3]['movie'] = key
max_similarity = max(similarities.keys())
movie2 = similarities[max_similarity]
print(name,' reviews ',film_data['films'][name.lower()])
print('similar movie ',movie2)
print('Similarity = {0:.2f}/100'.format(max_similarity/3))
return movie2['movie']
return None
The get_similar_movie function will return the most similar movie from the film_data dict. The function will take a movie name as argument.

ways to speed up .txt to db in django

I have a text file like this:
AN AD Aixas
AN AD Aixirivall
AN AD Aixovall
AN AD Ansalonga
And I want to import this text file to the database.
I am doing it like this.
fips_codes = []
iso_codes = []
city_names = []
for line in city_file.readlines():
cc_fips = line[:2]
cc_iso = line[3:5]
name = line[6:]
fips_codes.append(cc_fips)
iso_codes.append(cc_iso)
city_names.append(name)
counter = 0
for item in fips_codes:
country = Country.objects.get(cc_fips=fips_codes[counter], cc_iso=iso_codes[counter])
city_object = City(country=country, name=city_names[counter])
city_object.save()
counter = counter + 1
is there any way to speed up this process?
There is no need to loop twice
for line in city_file.readlines():
fips, iso, name = line.split()
country = Country.objects.get(cc_fips=fips, cc_iso=iso)
City.objects.create(country=country, name=name)
First of all, try to limit database calls. The number of countries is relatively small, so you can load all countries at once and make a map from fips/iso codes to country ids:
countries = {
(c.cc_fips, c.cc_iso): c.id
for c in Country.objects.all()
}
Then create cities in bulk:
cities = []
for line in city_file.readlines():
fips, iso, name = line.split()
cities.append(
City(country_id=countries[(fips, iso)], name=name)
)
City.objects.bulk_create(cities)
If the number of cities is very large, you can save cities in chunks, e.g. every 100 cities, for not to hold in memory all city objects.

Categories