How to use Python to append dictionary based on csv dataset

How to use Python to append dictionary based on csv dataset - python

I am trying to write a code based on a csv dataset of the number of passengers arriving at stations around multiple cities. My code needs to output the city with the most numbers of passengers arriving by finding the sum of arrivals across all stations in that city and output that number.
Currently my code outputs nothing for the city with most arrivals and -1 for the number of arrivals in that city.
I'm not sure what my error is. Please help!
This is my code:
cities = {}
is_first_line = True
for row in open("Passengers_Analysis.csv"):
if is_first_line:
is_first_line = False
else:
values = row.split(",")
city = values[3]
if city not in cities:
cities[city] = []
cities[city].append(city)
passengers = {}
for key in passengers:
passengers+=int(values[6])
max_city = ""
max_passengers = -1
for key in passengers:
if passengers[key] > max_passengers:
max_passengers = passengers[key]
max_city = key
print("The most popular city:", max_city)
print("The number of passengers in the scheduled period:", max_passengers)

Just from looking at your code, this is exactly the behaviour the code should produce.
In line 12 you declare an empty dictionary called "passengers".
In the lines 13 and 17 you create loops, which cycle over all elements in this dictionary. As the dictionary is empty, none of the loops gets executed and your max_city and max_passengers remain on their initial values.
At least the first loop is probably meant to run on your input data.
I recommend to clarify the flow of your programm first and then try to fix the loops.
And anyway, please provide a minimal, reproducible example.

Related

Returning a value based on matching one lists values to another list based on order

I have four lists:
user = [0,0,1,1]
names = ["jake","ryan","paul","david"]
disliked_index = [0,1]
ranked_names = ["paul","ryan","david","jake"]
List "user" holds a user's response to which names they like (1 if like, 0 if dislike) from list "names". disliked_index holds the list spots that user indicated 0 in the user list. ranked_names holds the names ranked from most popular to least popular based on the data set (multiple students). What I am trying to achieve is to return the most popular name that the user responded they didn't like. So that
mostpopular_unlikedname = "ryan"
So far what I have done is:
placement = []
for i in disliked_index:
a = names[i]
placement.append(a)
Where I now have a list that holds the names the user did not like.
placement = ["jake","ryan"]
Here my logic is to run a loop to check which names in the placement list appear in the ranked_names list and get added to the top_name list in the order from most popular to least.
top_name =[]
for i in range(len(ranked_names)):
if ranked_names[i] == placement:
top_name.append[i]
Nothing ends up being added to the top_name list. I am stuck on this part and wanted to see if this is an alright direction to continue or if I should try something else.
Any guidance would be appreciated, thanks!

You don't really need disliked_index list for this. Just do something along these lines:
dis_pos = []
for name, sentiment in zip(names,user):
if sentiment == 0:
dis_pos.append(ranked_names.index(name))
mostpopular_unlikedname = ranked_names[min(dis_pos)]
print(mostpopular_unlikedname)
Output:
ryan

Need help iterating over python dictionary values

I'm working on a program to search through a dictionaries value set and perform a method on values that match a user input. I have to compare and sort these values.
This is the code I'm working with right now
Code for value search and compare (very rough)
import nation
import pickle
KK = 1000000
pickle_in = open("nationsDict.dat","rb")
d = pickle.load(pickle_in)
k = raw_input("Enter a continent: ")
for value in d.values():
if k in d.values()[0]:
print d.values()[0]
Code for Nation class
class Nations:
KK = 1000000
def __init__(self, ctry, cont, pop, area):
self.country = ctry
self.continent = cont
self.population = float(pop)
self.area = float(area)
def popDensity(self):
popDensity = (self.population*self.KK) / self.area
popDensity = str(round(popDensity, 2))
return popDensity
Code for creating pickle dictionary
import nation
import pickle
i=0
dictUN = {}
with open('UN.txt') as f:
for line in f:
"""Data get from file"""
elements = line.strip().split(",")
n = nation.Nations(elements[0],elements[1],elements[2],elements[3])
"""Density"""
n.popDensity()
print "The density of", n.country, "is",n.popDensity(),"people per square mile."
"""Dictionary creation"""
dictVal = (n.continent, n.population, n.area)
dictUN.update({n.country: dictVal})
pickle_out = open("nationsDict.dat", "wb")
pickle.dump(dictUN, pickle_out)
pickle_out.close()
Here's a snippet from UN.txt
Mauritania,Africa,3.5,397954
Mauritius,Africa,1.3,787
Mexico,North America,120.3,761606
Micronesia,Australia/Oceania,.11,271
Monaco,Europe,.031,0.76
Mongolia,Asia,3.0,603909
Montenegro,Europe,.65,5019
Morocco,Africa,33.0,172414
My problems at this point are pretty contained to the value search and compare. Specifically, my program has to
Allow the user to search a continent (first element in value list)
Perform the method, Nations.popDensity (contained in nation class) on all matching countries
Compare the nations and return the top 5 density values per continent
I would say my one big question is how to handle the search of a dictionary by an element in a value. I've also considered about making a temp dictionary with the continent element as the key, but I'm not sure that would make my life any easier as I have to perform the popDensity method on it.
Any help is appreciated, Thanks!

Initialize pandas series object
Iterate through the dictionary.
If the continent matches:
a. calculate the population density.
b. if the value is larger than the smallest value in the pandas series:
i. remove the last entry
ii.append the value into the pandas series values and the country to the index
iii. sort the pandas series object ascending = False
If you're going to do this repeatedly, then creating a continent->country dictionary definitely will save time.

Glad it was helpful. I'll add it as an answer, so you can accept it, if you like.
Just as there is list comprehension, there is dictionary comprehension... It's pretty cool stuff! d2 = {k:d[k] for k in d.keys() if <some_elem> in d[k]} would give you a dict with a subset of the original dict that satisfies your requirements. You would have to fill in the <some_elem> in d[k] portion, because I haven't gone through all your code. You said that this is the main Q you have. Hopefully this gives you enough to solve it.

Organizing and printing information by a specific row in a csv file

I wrote a code that takes in some data, and I end up with a csv file that looks like the following:
1,Steak,Martins
2,Fish,Martins
2,Steak,Johnsons
4,Veggie,Smiths
3,Chicken,Johnsons
1,Veggie,Johnsons
where the first column is a quantity, the second column is the type of item (in this case the meal), and the third column is an identifier (in this case it is family name). I need to print this information to a text file in a specific way:
Martins
1 Steak
2 Fish
Johnsons
2 Steak
3 Chicken
1 Veggie
Smiths
4 Veggie
So What I want is the family name followed by what that family ordered. I wrote the following code to accomplish this, but it doesn't seem to be quite there.
import csv
orders = "orders.txt"
messy_orders = "needs_sorting.csv"
with open(messy_orders, 'rb') as orders_for_sorting, open(orders, 'a') as final_orders_file:
comp = []
reader_sorting = csv.reader(orders_for_sorting)
for row in reader_sorting:
test_bit = [row[2]]
if test_bit not in comp:
comp.append(test_bit)
final_orders_file.write(row[2])
for row in reader_sorting:
if [row[2]] == test_bit:
final_orders_file.write(row[0], row[1])
else:
print "already here"
continue
What I end up with is the following
Martins
2 Fish
Additionally, I never see it print "already here" though I think I should if it were working properly. What I suspect is happening is that the program goes through the second for loop, then exits the program without continuing the first loop. Unfortunately I'm not sure how to make it go back to the original loop once it has identified and printed all instances of a given family name in a file. I thought The reason I have it set up this way, is so that I can get the family name written as a header. Otherwise I would just sort the file by family name. Please note that after running the orders through my first program, I did manage to sort everything such that each row represents the complete quantity of that type of food for that family (there are no recurring instances of a row containing both Steak and Martins).

This is a problem that you solve with a dictionary; which will accumulate your items by the last name (family name) of your file.
The second thing you have to do is accumulate a total of each type of meal - keeping in mind that the data you are reading is a string, and not an integer that you can add, so you'll have to do some conversion.
To put all that together, try this snippet:
import csv
d = dict()
with open(r'd:/file.csv') as f:
reader = csv.reader(f)
for row in reader:
# if the family name doesn't
# exist in our dictionary,
# set it with a default value of a blank dictionary
if row[2] not in d:
d[row[2]] = dict()
# If the meal type doesn't exist for this
# family, set it up as a key in their dictionary
# and set the value to int value of the count
if row[1] not in d[row[2]]:
d[row[2]][row[1]] = int(row[0])
else:
# Both the family and the meal already
# exist in the dictionary, so just add the
# count to the total
d[row[2]][row[1]] += int(row[0])
Once you run through that loop, d looks like this:
{'Johnsons': {'Chicken': 3, 'Steak': 2, 'Veggie': 1},
'Martins': {'Fish': 2, 'Steak': 1},
'Smiths': {'Veggie': 4}}
Now its just a matter of printing it out:
for family,data in d.iteritems():
print('{}'.format(family))
for meal, total in data.iteritems():
print('{} {}'.format(total, meal))
At the end of the loop, you'll have:
Johnsons
3 Chicken
2 Steak
1 Veggie
Smiths
4 Veggie
Martins
2 Fish
1 Steak
You can later improve this snippet by using defaultdict

First time replier so here's a go. Have you considered keeping track of the orders and then writing to a file? I tried something using a dict based approach and it seems to work fine. The idea was to index by the family name and store a list of pairs containing the order quantities and types.
You may also want to consider the readability of your code - it's hard to follow and debug. However, what I think is happening is the line
for line in reader_sorting:
Iterates through reader_sorting. You read the 1st name, extract the family name, and later proceed to iterate again in reader_sorting. This time you start at the 2nd line, which family name matches, and you print it successfully. The rest of the line don't match, but you still iterate through them all. Now you've finished iterating through reader_sorting, and the loop finishes, even though in the outer loop you've read only one line.
One solution may be to create another iterator in the outer for loop and not expend the iterator that loop goes through. However, then you still need to deal with the possibility of double counting, or keeping track of indices. Another way may be to keep of the orders by family as you iterate.
import csv
orders = {}
with open('needs_sorting.csv') as file:
needs_sorting = csv.reader(file)
for amount, meal, family in needs_sorting:
if family not in orders:
orders[family] = []
orders[family].append((amount, meal))
with open('orders.txt', 'a') as file:
for family in orders:
file.write('%s\n' % family)
for amount, meal in orders[family]:
file.write('%s %s\n' % (amount, meal))

Django query / Iteration issue

I have a fairly noob question regarding iteration that I can't seem to get correct.
I have a table that houses a record for every monthly test a user completes, if they miss a month then there is no record in the table.
I want to pull the users history from the table then for each of the 12 months set a Y or N as to their completed status.
Here is my code:
def getSafetyHistory(self, id):
results = []
safety_courses = UserMonthlySafetyCurriculums.objects.filter(users_id=id).order_by('month_assigned')
for i in range(1, 13):
for s in safety_courses:
if s.month_assigned == i:
results.append('Y')
else:
results.append('N')
return results
So my ideal result would be a list with 12 entries, either Y or N
i.e results = [N,N,Y,N,N,Y,Y,Y,N,N,N,Y]
The query above returns 2 records for the user which is correct, but in my iteration I keep getting 24 entries, obviously due to the outter and inner loops, but I am not sure of the "pythonic" way I should be doing this without a ton of nested loops.

There are probably lots of ways to do this. Here is one idea.
It looks like you are only going to get records for courses that have been completed. So you could pre-build a list of 12 results, all set to no. Then after you query the database, you flip the ones to yes that correspond to the results you got.
results = ['N'] * 12 # prebuild results to all no
safety_courses = UserMonthlySafetyCurriculums.objects.filter(
users_id=id).order_by('month_assigned')
for course in safety_courses:
results[course.month_assigned - 1] = 'Y'
This assumes month_assigned is an integer between 1 and 12, as your code hints at.

'Splitting' List into several Arrays

I'm trying to complete a Project that will show total annual sales from an specific list contained in a .txt file.
The list is formatted this way:
-lastname, firstname (string)
-45.7 (float)
-456.4 (float)
-345.5 (float)
-lastname2, firstname2 (string)
-3354.7 (float)
-54.6 (float)
-56.2 (float)
-lastname3, firstname3 (string)
-76.6 (float)
-34.2 (float)
-48.2 (float)
And so on.... Actually, 7 different "employees" followed by 12 set of "numbers" (months of the year)....but that example should suffice to give an idea of what I'm trying to do.
I need to output this specific information of every "employee"
-Name of employee
-Total Sum (sum of the 12 numbers in the list)
So my logic is taking me to this conclusion, but I don't know where to start:
Create 7 different arrays to store each "employee" data.
With this logic, I need to split the main list into independent arrays so I can work with them.
How can this be achieved? And also, if I don't have a predefined number of employees (but a defined format :: "Name" followed by 12 months of numbers)...how can I achieve this?
I'm sure I can figure once I get an idea how to "split" a list in different sections -- Every 13 lines?

Yes, at every thirteenth line you'd have the information of an employee.
However, instead of using twelve different lists, you can use a dictionary of lists, so that you wouldn't have to worry about the number of employees.
And you can either use a parameter on the number of lines directed to each employee.
You could do the following:
infile = open("file.txt", "rt")
employee = dict()
name = infile.readline().strip()
while name:
employee[name] = list()
for i in xrange(1, 12):
val = float(infile.readline().strip())
employee[name].append(val)
name = infile.readline().strip()
Some ways to access dictionary entries:
for name, months in employee.items():
print name
print months
for name in employee.keys():
print name
print employee[name]
for months in employee.values():
print months
for name, months in (employee.keys(), employee.values()):
print name
print months
The entire process goes as follows:
infile = open("file.txt", "rt")
employee = dict()
name = infile.readline().strip()
while name:
val = 0.0
for i in xrange(1, 12):
val += float(infile.readline().strip())
employee[name] = val
print ">>> Employee:", name, " -- salary:", str(employee[name])
name = infile.readline().strip()
Sorry for being round the bush, somehow (:

Here is option.
Not good, but still brute option.
summed = 0
with open("file.txt", "rt") as f:
print f.readline() # We print first line (first man)
for line in f:
# then we suppose every line is float.
try:
# convert to float
value = float(line.strip())
# add to sum
summed += value
# If it does not convert, then it is next person
except ValueError:
# print sum for previous person
print summed
# print new name
print line
# reset sum
summed = 0
# on end of file there is no errors, so we print lst result
print summed
since you need more flexibility, there is another option:
data = {} # dict: list of all values for person by person name
with open("file.txt", "rt") as f:
data_key = f.readline() # We remember first line (first man)
data[data_key] = [] # empty list of values
for line in f:
# then we suppose every line is float.
try:
# convert to float
value = float(line.strip())
# add to data
data[data_key].append(value)
# If it does not convert, then it is next person
except ValueError:
# next person's name
data_key = line
# new list
data[data_key] = []
Q: let's say that I want to print a '2% bonus' to employees that made more than 7000 in total sales (12 months)
for employee, stats in data.iteritems():
if sum(stats) > 7000:
print employee + " done 7000 in total sales! need 2% bonus"

I would not create 7 different arrays. I would create some sort of data structure to hold all the relevant information for one employee in one data type (this is python, but surely you can create data structures in python as well).
Then, as you process the data for each employee, all you have to do is iterate over one array of employee data elements. That way, it's much easier to keep track of the indices of the data (or maybe even eliminates the need to!).
This is especially helpful if you want to sort the data somehow. That way, you'd only have to sort one array instead of 7.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to use Python to append dictionary based on csv dataset - python

Related

Returning a value based on matching one lists values to another list based on order

Need help iterating over python dictionary values

Organizing and printing information by a specific row in a csv file

Django query / Iteration issue

'Splitting' List into several Arrays

Categories

Resources