Append recursively each record from dictionary to obtain values - python

Following my question submitted in the last few days, i have a defaultdict which contains in each of the lines a record of the ticket sale for a deviceID, or passenger for a bus sale. The whole devicedict contains all the tickets sold for a given year, around 1 million.The defaultdict is indexed by the deviceID which is the key.
I need to know the average delay between the purchase date and the actual date of departure for each ticket purchase. My problem is that i can't seem to extract each record from the dictionary.
So devicedict contains for each key devicedict[key] a list of over 60 diferent characteristics: date_departure, date_arrival etc. In each turn of the loop i want to process something like devicedict[deviceID][field of interest] do something with it, and for example extract the median delay between each purchase.
I've tried using append, and using nested arrays, but it doesnt return each individual record by itself.
ValoresDias is the sum of the delays for each ticket(purchase date minus departure) in seconds divided by a day-86400, and ValoresTotalesDias is just an increment variable. The total median delay should be ValoresDias/ValoresTotalesDias for all the records.
with open('Salida1.csv',newline='', mode='r') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
#rows1 = list(csv_reader)
#print(len(rows1))
line_count = 0
count=0
for row in csv_reader:
key = row[20]
devicedict[key].append(row)
if line_count == 0:
print(f'Column names are {", ".join(row)}')
line_count += 1
else:
#print(f'\t{row[0]} works in the {row[20]} department, and was born in {row[2]}.')
#print(row['id'], row['idapp'])
#print(len(row))
#print(list(row))
mydict5ordenado.append(list(row))
line_count += 1
print(len(devicedict.keys()))
f = "%Y-%m-%d %H:%M:%S"
p = devicedict.keys()
for i in range(0,len(devicedict)):
mydict.append(devicedict[list(p)[i]])
print(mydict[i])
print("Los campos temporales:")
#print(mydict[i][4])
#print(mydict[i][3])
out1=datetime.datetime.strptime(mydict[i][4], f)
out2=datetime.datetime.strptime(mydict[i][3], f)
out3=out1-out2
valoresTotalesDias+=1
valoresDias+=out3.seconds/86400
#This is what i am trying to obtain for each record without hardcoding
#I want to access each field in the above loop
count1=len(devicedict['4ff70ad8e2e74f49'])
for i in range(0,count1):
mydict5.append(devicedict['4ff70ad8e2e74f49'][i])
print(len(mydict5))
for i in range (0,len(mydict5)):
print(mydict5[i][7])
print("Tipo de Bus:")
print(mydict5[i][16])
print(mydict5[i][14])
if (mydict5[i][16]=='P'):
preferente+=1
Mydict[i] should contain only one line of the record, that is one sale for each passenger not the whole record.

Related

ValueError error in Python code when reading from CSV file

Hello am supposed to the steps below. I have finished but getting this error
File "C:/Users/User/Desktop/question2.py", line 37, in
jobtype_salary[li['job']] = int(li['salary'])
ValueError: invalid literal for int() with base 10: 'SECRETARY
a. Read the file into a list of lists (14 rows, 5 columns)
b. Transform each row of the list into a dictionary. The keys are : ename, job, salary, comm, dno. Call the resulting list of dictionaries dict_of_emp
c. Display the table dict_of_emp, one row per line
d. Perform the following computations on dict_of_emp:
D1. Compute and print the incomes of Richard and Mary (add salary and comm)
D2 Compute and display the sum of salaries paid to each type of job (i.e. salary paid to analysts is 3500 + 3500= 7000)
D3. Add 5000 to the salaries of employees in department 30. Display the new table
import csv
#Open the file in read mode
f = open("employeeData.csv",'r')
reader = csv.reader(f)
#To read the file into list of lists we use list() method
emps = list(reader)
#print(emps)
#Transform each row into a dictionary.
dict_of_emp = [] #list of dictionaries
for row in emps:
d={}
d['ename'] = row[0]
d['job'] = row[1]
d['salary']=row[2]
d['comm']=row[3]
d['dno']=row[4]
dict_of_emp.append(d)
print("*************************************************")
#display the table dict_of_emp, one row per line.
for li in dict_of_emp:
print(li)
print("*************************************************")
#Incomes of Richard and Mary, to add salary and commision, first we need to cast them to integers.
d1 = ['RICHARD','MARY']
for li in dict_of_emp:
if li['ename'] in d1:
print('income of ', li['ename']," is ",int(li['salary']+li['comm']))
print("*************************************************")
#Sum of salaries based on type of job, dictionary is used so the job type is key
#and sum of salary is value
jobtype_salary = {}
for li in dict_of_emp:
if li['job'] in jobtype_salary.keys():
jobtype_salary[li['job']] += int(li['salary'])
else:
jobtype_salary[li['job']] = int(li['salary'])
print(jobtype_salary)
print("*************************************************")
#Add 5000 to salaries of employees in department 30.
for li in dict_of_emp:
if li['dno']=='30':
li['salary']=int(li['salary'])+5000
for li in dict_of_emp:
print(li)
Here is the csv as an image:
I think the indexing of your columns is slightly off. You do d['salary'] = row[2], which, according to the CSV corresponds with the third row i.e. with the position of the person (SECRETARY, SALESPERSON). If you then try to convert this string to an integer, you get the error.
Does it run with this instead?
for row in emps:
d={}
d['ename'] = row[1]
d['job'] = row[2]
d['salary']=row[3]
d['comm']=row[4]
d['dno']=row[5]
dict_of_emp.append(d)

KeyError When Assigning Dictionary Keys and Values

I've recently begun learning Python and I wanted to write a script to extract the day of the month from a CSV column (formatted as YYYY/DD/MM) then compare website users to days of the month (and eventually weeks of the month) as a challenge/learning exercise. The gist is that it extracts the CSV info, formats it/converts to integers, and smushes it back together as a dictionary comparing days 1-31 with the number of site visitors.
My code is below. The error I'm receiving is 'KeyError: 1'on line 29 result[days] = users. I think I understand what is happening (kind of - I'm guessing it's not happy with the way I'm trying to assign values to an empty dictionary? It seems to be looking for the integer 1 as a key but not finding it?) but I can't figure out what to do next. I'm about 2 weeks into learning Python so I hope this isn't too stupid a question. What am I doing wrong? How can I make the columns at index [0] and [1] of users_by_day the key and value in my dictionary?
Note: I am learning and using Python 3.
import csv
result = {}
with open('analytics.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
line_count = 0
users_by_day = list(csv_reader)
for row in users_by_day: #iterate through data
day = row[0].split('/') #split date to extract day of month
try: #skip unsplit cells
day = day[1]
except Exception as e:
pass
row[0] = day #set list column to extracted day value
users_by_day = users_by_day[1:-1] #strip headers
for row in users_by_day:
days = None
users = None
days = int(row[0]) #set values to int for math
users = int(row[1])
if days is not None:
if days in result: #trying to check for days in result
result[days] = users #where key error occurs
else:
result[days] += users
print(result)
The setdefault() call on dictionaries is great for this kind of thing, and is preferable to the if {thing} in {dict} construct.
So the following code:
if days in result: # trying to check for days in result
result[days] = users # where key error occurs
else:
result[days] += users
Could become:
result.setdefault(days, 0)
result[days] += users
in the else part , if days not in result, the equation certainly would generate an error, because it using an key that dosent exit:
result[days] =result[days]+ users
but do you really mean like :
if days is not None:
if days not in result: #if result doesn't have that day
result[days] = users #get the day and its value into result
else: #if result already has the day value
result[days] += users #summary the value
Options aside from dicty.setdefault(key, val) and if key in dicty, include:
Try / Except
try:
dicty[key] += value
except KeyError:
dicty[key] = value
collections.defaultdict()
Using collections.defaultdict():
dicty = defaultdict(int)
dicty[key] += value
Last line will execute to the effect of dicty[key] = value or dicty[key] += value as appropriate (really what it does is, if the key is not found, run dicty[key] = int() before running dicty[key] += value ... you'd have to be careful using this for *= therefore).

How do I skip a file line if data already exists in the previous line?

Heres function(open_file()) that receives an opened csv file. For every line, I made a for loop to go through every line. For every state, create a key dictionary for state name, while making it's city and date as a value in list. However, if the city AND date are the same, skip that line.
Heres the csv file:
State City Date
Michigan Detroit 3/31/00
Michigan Detroit 3/31/00
Michigan Detroit 3/31/00
Michigan Detroit 4/1/00
Michigan Detroit 4/2/00
Heres my code so far:
def read_file(fp):
reader = csv.reader(fp)
state = {}
for line in reader:
city = ''
date = ''
if line[1] != city:
city = line[1]
if line[2] != date:
date = line[2]
print(state)
the correct output:
state = {'Michigan': [['Detroit', '3/31/2000'],
['Detroit', '4/1/2000'],['Detroit', '4/2/2000']]
Ingest your input the other way: make the dict key city + '|' + date. This will automatically eliminate any duplicates. You don't skip the line; you just let it overwrite the (identical) existing dict entry.
Reverse the dictionary (switch value and key)
When you perform this reversal, split the new value at the vertical bar.
If memory is not a limiting issue, create one set to keep track of occurred elements. This would not prevent you from keeping the order of the elements because you don't change the way you add to the lists.
When if element in set: simply continue.

How to compare these data sets from a csv? Python 2.7

I have a project where I'm trying to create a program that will take a csv data set from www.transtats.gov which is a data set for airline flights in the US. My goal is to find the flight from one airport to another that had the worst delays overall, meaning it is the "worst flight". So far I have this:
`import csv
with open('826766072_T_ONTIME.csv') as csv_infile: #import and open CSV
reader = csv.DictReader(csv_infile)
total_delay = 0
flight_count = 0
flight_numbers = []
delay_totals = []
dest_list = [] #create empty list of destinations
for row in reader:
if row['ORIGIN'] == 'BOS': #only take flights leaving BOS
if row['FL_NUM'] not in flight_numbers:
flight_numbers.append(row['FL_NUM'])
if row['DEST'] not in dest_list: #if the dest is not already in the list
dest_list.append(row['DEST']) #append the dest to dest_list
for number in flight_numbers:
for row in reader:
if row['ORIGIN'] == 'BOS': #for flights leaving BOS
if row['FL_NUM'] == number:
if float(row['CANCELLED']) < 1: #if the flight is not cancelled
if float(row['DEP_DELAY']) >= 0: #and the delay is greater or equal to 0 (some flights had negative delay?)
total_delay += float(row['DEP_DELAY']) #add time of delay to total delay
flight_count += 1 #add the flight to total flight count
for row in reader:
for number in flight_numbers:
delay_totals.append(sum(row['DEP_DELAY']))`
I was thinking that I could create a list of flight numbers and a list of the total delays from those flight numbers and compare the two and see which flight had the highest delay total. What is the best way to go about comparing the two lists?
I'm not sure if I understand you correctly, but I think you should use dict for this purpose, where key is a 'FL_NUM' and value is total delay.
In general I want to eliminate loops in Python code. For files that aren't massive I'll typically read through a data file once and build up some dicts that I can analyze at the end. The below code isn't tested because I don't have the original data but follows the general pattern I would use.
Since a flight is identified by the origin, destination, and flight number I would capture them as a tuple and use that as the key in my dict.
from collections import defaultdict
flight_delays = defaultdict(list) # look this up if you aren't familiar
for row in reader:
if row['ORIGIN'] == 'BOS': #only take flights leaving BOS
if row['CANCELLED'] > 0:
flight = (row['ORIGIN'], row['DEST'], row['FL_NUM'])
flight_delays[flight].append(float(row['DEP_DELAY']))
# Finished reading through data, now I want to calculate average delays
worst_flight = ""
worst_delay = 0
for flight, delays in flight_delays.items():
average_delay = sum(delays) / len(delays)
if average_delay > worst_delay:
worst_flight = flight[0] + " to " + flight[1] + " on FL#" + flight[2]
worst_delay = average_delay
A very simple solution would be. Adding two new variables:
max_delay = 0
delay_flight = 0
# Change: if float(row['DEP_DELAY']) >= 0: FOR:
if float(row['DEP_DELAY']) > max_delay:
max_delay = float(row['DEP_DELAY'])
delay_flight = #save the row number or flight number for reference.

Searching next item in list if object isn't in the list

I'm attempting to learn how to search csv files. In this example, I've worked out how to search a specific column (date of birth) and how to search indexes within that column to get the year of birth.
I can search for greater than a specific year - e.g. typing in 45 will give me everyone born in or after 1945, but the bit I'm stuck on is if I type in a year not specifically in the csv/list I will get an error saying the year isn't in the list (which it isn't).
What I'd like to do is iterate through the years in the column until the next year that is in the list is found and print anything greater than that.
I've tried a few bits with iteration, but my brain has finally ground to a halt. Here is my code so far...
data=[]
with open("users.csv") as csvfile:
reader = csv.reader(csvfile)
for row in reader:
data.append(row)
print(data)
lookup = input("Please enter a year of birth to start at (eg 67): ")
#lookupint = int(lookup)
#searching column 3 eg [3]
#but also searching index 6-8 in column 3
#eg [6:8] being the year of birth within the DOB field
col3 = [x[3][6:8] for x in data]
#just to check if col3 is showing the right data
print(col3)
print ("test3")
#looks in column 3 for 'lookup' which is a string
#in the table
if lookup in col3: #can get rid of this
output = col3.index(lookup)
print (col3.index(lookup))
print("test2")
for k in range (0, len(col3)):
#looks for data that is equal or greater than YOB
if col3[k] >= lookup:
print(data[k])
Thanks in advance!

Categories