Count multiple occurrences in imported .csv file - python

Starting from a large imported data set, I am trying to identify and print each line corresponding to a city that has at least 2 unique colleges/universities there.
So far (the relevant code):
for line in file:
fields = line.split(",")
ID, name, city = fields[0], fields[1], fields[3]
count = line.count()
if line.count(city) >= 2:
if line.count(ID) < 2:
print "ID:", ID, "Name: ", name, "City: ", city
In other words, I want to be able to eliminate 1) any duplicate school listings (by ID - this file has many institutions appearing repeatedly), 2) any cities that do not have two or more institutions there.
Thank you!

dicts come in handy when you want to order data by some key. In your case, nested dicts that first index by city and then by ID should do the trick.
# will hold cities[city][ID] = [ID, name, city]
cities = {}
for line in file:
fields = lines.split()
ID, name, city = fields
cities.setdefault(name, {})[ID] = fields
# 'cities' values are the IDs for that city. make a list if there are at least 2 ids
multi_schooled_cities = [ids_by_city.values() for ids_by_city in cities.values() if len(ids_by_city) >= 2]

Related

ValueError error in Python code when reading from CSV file

Hello am supposed to the steps below. I have finished but getting this error
File "C:/Users/User/Desktop/question2.py", line 37, in
jobtype_salary[li['job']] = int(li['salary'])
ValueError: invalid literal for int() with base 10: 'SECRETARY
a. Read the file into a list of lists (14 rows, 5 columns)
b. Transform each row of the list into a dictionary. The keys are : ename, job, salary, comm, dno. Call the resulting list of dictionaries dict_of_emp
c. Display the table dict_of_emp, one row per line
d. Perform the following computations on dict_of_emp:
D1. Compute and print the incomes of Richard and Mary (add salary and comm)
D2 Compute and display the sum of salaries paid to each type of job (i.e. salary paid to analysts is 3500 + 3500= 7000)
D3. Add 5000 to the salaries of employees in department 30. Display the new table
import csv
#Open the file in read mode
f = open("employeeData.csv",'r')
reader = csv.reader(f)
#To read the file into list of lists we use list() method
emps = list(reader)
#print(emps)
#Transform each row into a dictionary.
dict_of_emp = [] #list of dictionaries
for row in emps:
d={}
d['ename'] = row[0]
d['job'] = row[1]
d['salary']=row[2]
d['comm']=row[3]
d['dno']=row[4]
dict_of_emp.append(d)
print("*************************************************")
#display the table dict_of_emp, one row per line.
for li in dict_of_emp:
print(li)
print("*************************************************")
#Incomes of Richard and Mary, to add salary and commision, first we need to cast them to integers.
d1 = ['RICHARD','MARY']
for li in dict_of_emp:
if li['ename'] in d1:
print('income of ', li['ename']," is ",int(li['salary']+li['comm']))
print("*************************************************")
#Sum of salaries based on type of job, dictionary is used so the job type is key
#and sum of salary is value
jobtype_salary = {}
for li in dict_of_emp:
if li['job'] in jobtype_salary.keys():
jobtype_salary[li['job']] += int(li['salary'])
else:
jobtype_salary[li['job']] = int(li['salary'])
print(jobtype_salary)
print("*************************************************")
#Add 5000 to salaries of employees in department 30.
for li in dict_of_emp:
if li['dno']=='30':
li['salary']=int(li['salary'])+5000
for li in dict_of_emp:
print(li)
Here is the csv as an image:
I think the indexing of your columns is slightly off. You do d['salary'] = row[2], which, according to the CSV corresponds with the third row i.e. with the position of the person (SECRETARY, SALESPERSON). If you then try to convert this string to an integer, you get the error.
Does it run with this instead?
for row in emps:
d={}
d['ename'] = row[1]
d['job'] = row[2]
d['salary']=row[3]
d['comm']=row[4]
d['dno']=row[5]
dict_of_emp.append(d)

How to output the number of restaurants in each category (such as: Italian, Japanese, Chinese)

Using python and pandas, how can I output the number of restaurants in each category? I have a datset of Restaurants and column called "Categories" which contains "restaurants" such as (Italian, Chinese...).
I want to count the TOP 10 number of restaurants from the most popular to the least.
The data is stored in a variable "filename" which is csv file.
My Approach:
def myrest(filename, city):
restaurants = filename[filename['categories'].str.contains('Restaurants')]
restaurants.loc[restaurants.categories.str.contains('Italian'), 'category'] = 'Italian'
restaurants.loc[restaurants.categories.str.contains('Japanese'), 'category'] = 'Japanese'
print(restaurants.category[:10])
Output should be like :
Italian: 350 (350 signifies the number of Italian restaurants in the city),
Japanese: 250,
Korean: 140,
Turkish: 77
....
I am getting only the names of the restaurants but not the count of how many there are for example in "Toronto".
If you want to count the values in the category column:
restaurants.categories.value_counts()
# or
restaurants.groupby('categories').count()
You will get a table of the restaurant type and the number of times it is in the column.

How to dynamically create dictionaries of lists in python using a variables string data

I am looping through a CSV file and want to group all common rows based on a given column into a dictionary populated with the rest of that rows values.
example:
Date, Name, Gender, Address
01/02/2019, John Doe, Male, 1 example street
21/12/2018, Mary Robinson, Female, 2 Lane St.
05/06/2017, John Doe, Male, 1 example street
dates = []
names = []
genders = []
addresses = []
for row in readFile:
date = row[0]
name = row[1]
gender = row[2]
address = row[3]
names.append(name)
dates.append(dates)
genders.append(gender)
addresses.append(address)
if name not in names:
#Create Dictionary with using the value of name. (exec?)
#Then populate dictionary with rest of row.
So I should end up with two dictionaries. One called John Doe and one called
Mary Robinson.
eg:
MaryRobinson = {
"date": "21/12/2018",
"name": "Mary Robinson",
"gender": "Female",
"Address": "2 Lane St."
}
Perhaps I would be better of using a list as I want to keep the option of storing more than one address.
I do not understand how to dynamically create a list from a variable value.
I have read it's bad practice.
Note: College assignment.
You could do it with exec if you wanted to but I don't think that would be really useful in your case (then you'd have to evaluate the name strings again every time you want to access them). I'd suggest you get one "master" dictionary where you have the names as keys, and the dictionaries as values. It would be something along the lines of (have not actually ran it):
data = {}
names = []
for row in readFile:
date = row[0]
name = row[1]
gender = row[2]
address = row[3]
names.append(name)
if name not in names:
data[name] = {"date":date, "gender":gender, "address":address}

Fetching data from related tables

I have 3 tables: Continent, Country and Story.
Country has ForeignKey(Continent) and Story has ManyToManyField(Country, blank=True) field.
What I need is to get a list of countries which at least have one story belonging to it, and I need these countries grouped by continents.
How can I achieve that?
One way to do it is:
countries = {}
country_list = Country.objects.all()
for c in country_list:
# check the stories that has the country
stories = Story.objects.filter(country_set__name__exact=c.name)
# only if the country have stories
if stories.count() > 0:
## check for initialize list
if not countries.has_key(c.continent.name):
countries[c.continent.name] = []
## finally we add the country
countries[c.continent.name].append(c)
That will do the work.
Bye

Python CSV search for specific values after reading into dictionary

I need a little help reading specific values into a dictionary using Python. I have a csv file with User numbers. So user 1,2,3... each user is within a specific department 1,2,3... and each department is in a specific building 1,2,3... So I need to know how can I list all the users in department one in building 1 then department 2 in building 1 so on. I have been trying and have read everything into a massive dictionary using csv.ReadDict, but this would work if I could search through which entries I read into each dictionary of dictionaries. Any ideas for how to sort through this file? The CSV has over 150,000 entries for users. Each row is a new user and it lists 3 attributes, user_name, departmentnumber, department building. There are a 100 departments and 100 buildings and 150,000 users. Any ideas on a short script to sort them all out? Thanks for your help in advance
A brute-force approach would look like
import csv
csvFile = csv.reader(open('myfile.csv'))
data = list(csvFile)
data.sort(key=lambda x: (x[2], x[1], x[0]))
It could then be extended to
import csv
import collections
csvFile = csv.reader(open('myfile.csv'))
data = collections.defaultdict(lambda: collections.defaultdict(list))
for name, dept, building in csvFile:
data[building][dept].append(name)
buildings = data.keys()
buildings.sort()
for building in buildings:
print "Building {0}".format(building)
depts = data[building].keys()
depts.sort()
for dept in depts:
print " Dept {0}".format(dept)
names = data[building][dept]
names.sort()
for name in names:
print " ",name

Categories