I am trying to write a program in python in which we have to add the numbers from different categories and sub-categories. The program is about a farmer's annual sale of produce from his farm. The text file from where we have to read from has 4 categories. The first category is the type of product for example Vegetables, Fruits, condiments. The second category tells us about the type of product we have, for example Potatoes, Apples, Hot Sauce. The third category tells us about the sales in 2014 and the fourth category tells us about the sales in 2015. In this program, we only have to calculate the totals from the 2015 numbers. The 2014 numbers are present in the text file but are irrelevant.
Here is how the text file looks like
PRODUCT,CATEGORY,2014 Sales,2015 Sales
Vegetables,Potatoes,4455,5644
Vegetables,Tomatoes,5544,6547
Vegetables,Peas,987,1236
Vegetables,Carrots,7877,8766
Vegetables,Broccoli,5564,3498
Fruits,Apples,398,4233
Fruits,Grapes,1099,1234
Fruits,Pear,2342,3219
Fruits,Bananas,998,1235
Fruits,Peaches,1678,1875
Condiments,Peanut Butter,3500,3902
Condiments,Hot Sauce,1234,1560
Condiments,Jelly,346,544
Condiments,Spread,2334,5644
Condiments,Ketchup,3321,3655
Condiments,Olive Oil,3211,2344
What we are looking to do is to add the sales for 2015 by products and then the total sales for everything in 2015.
The output should look something like this in the written text file:
Total sales for Vegetables in 2015 : {Insert total number here}
Total sales for Fruits in 2015 : {Insert total number here}
Total sales for Condiments in 2015 : {Insert total number here}
Total sales for the farmer in 2015: {Insert total for all the
products sold in 2015}
Along with that, it should also print the grand total on the Python run screen in the IDE along with the text file:
Total sales for the farmer in 2015: {Insert total for all the
products sold in 2015}
Here is my code. I am new to Python and reading and writing files so I can't really say if I am on the right track.
PRODUCT_FILE = "products.txt"
REPORT_FILE = "report.txt"
def main():
#open the file
productFile = open(PRODUCT_FILE, "r")
reportFile = open(REPORT_FILE, "w")
# reading the file
proData = extractDataRecord(productFile)
product = proData[0]
category = proData[1]
salesLastYear = prodata[2]
salesThisYear = proData[3]
#computing
product = 0.0
product = salesThisYear
productFile.close()
reportFile.close()
def extractDataRecord(infile) :
line = infile.readline()
if line == "" :
return []
else :
parts = line.rsplit(",", 1)
parts[1] = int(parts[1])
return parts
# Start the program.
main()
You have a csv file so you should probably use pythons built-in csv module to parse the file. The DictReader class turns every line into a dictionary with the key being the header. If your csv file was called product_sales.csv, the following code would work.
import csv
product_dict = {}
cat_dict = {}
with open('product_sales.csv', 'r') as f:
for line in csv.DictReader(f):
cat = line['CATEGORY']
product = line['PRODUCT']
sales_15 = int(line['2015 Sales'])
if cat in cat_dict:
cat_dict[cat] += sales_15
else:
cat_dict[cat] = sales_15
if product in product_dict:
product_dict[product] += sales_15
else:
product_dict[product] = sales_15
Total = sum(cat_dict.values())
print product_dict
print cat_dict
print Total
Your code looks like a good start; here are a few pointers:
it is a good idea to use with when opening files because it guarantees they will get closed properly. Instead of
productFile = open(PRODUCT_FILE, "r")
# do something with the file
productFile.close()
you should do
with open(PRODUCT_FILE, "r") as product_file:
# do something with the file
# the file has been closed!
you only call proData = extractDataRecord(productFile) once (ie you get the header line but none of the data). You could put it in a while loop, but it is much more idiomatic to iterate directly on the file, ie
for line in product_file:
product, category, _, sales = line.split(',')
sales = int(sales)
# now do something with the values!
(using _ as a variable name is shorthand for "I don't care about this value")
you can use a dict to keep track of products and total sales for each,
product_sales = {}
then in the for loop,
product_sales[product] = product_sales.get(product, 0) + sales
If you could from collections import defaultdict, this becomes even simpler:
product_sales = defaultdict(int)
product_sales[product] += sales
once you have processed the entire file, you need to report on the results like
all_sales = 0
for product, sales in product_sales.items():
# write the sales for this product
all_sales += sales
# print all_sales
Related
I've been given a homework task to get data from a csv file without using Pandas. The info in the csv file contains headers such as...
work year:
experience level: EN Entry-level / Junior MI Mid-level / Inter- mediate SE Senior-level / Expert EX Executive-level / Director
employment type: PT Part-time FT Full-time CT Contract FL Freelance
job title:
salary:
salary currency:
salaryinusd: The salary in USD
employee residence: Employee’s primary country of residence
remote ratio:
One of the questions is:
For each experience level, compute the average salary (over 3 years (2020/21/22)) for each job title?
The only way I've managed to do this is to iterate through the csv and add a load of 'if' statements according to the experience level and job title, but this is taking me forever.
Any ideas of how to tackle this differently? Not using any libraries/modules.
Example of my code:
with open('/Users/xxx/Desktop/ds_salaries.csv', 'r') as f:
csv_reader = f.readlines()
for row in csv_reader[1:]:
new_row = row.split(',')
experience_level = new_row[2]
job_title = new_row[4]
salary_in_usd = new_row[7]
if experience_level == 'EN' and job_title == 'AI Scientist':
en_ai_scientist += int(salary_in_usd)
count_en_ai_scientist += 1
avg_en_ai_scientist = en_ai_scientist / count_en_ai_scientist
print(avg_en_ai_scientist)
Data:
When working out an example like this, I find it helpful to ask, "What data structure would make this question easy to answer?"
For example, the question asks
For each experience level, compute the average salary (over 3 years (2020/21/22)) for each job title?
To me, this implies that I want a dictionary keyed by a tuple of experience level and job title, with the salaries of every person who matches. Something like this:
data = {
("EN", "AI Scientist"): [1000, 2000, 3000],
("SE", "AI Scientist"): [2000, 3000, 4000],
}
The next question is: how do I get my data into that format? I would read the data in with csv.DictReader, and add each salary number into the structure.
data = {}
with open('input.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
experience_level = row['first_name']
job_title = row['last_name']
key = experience_level, job_title
if key not in data:
# provide default value if no key exists
# look at collections.defaultdict if you want to see a better way to do this
data[key] = []
data[key].append(row['salary_in_usd'])
Now that you have your data organized, you can compute average salaries:
for (experience_level, job_title), salary_data in data:
print(experience_level, job_title, sum(salary_data)/len(salary_data))
I am new to python and trying to read a csv file and generate a sales report. For example, I have a dataset as per below
Category | State | Sales| Profits|
Clothes | California| 2389 | 608
Stationery| Georgia | 687 | 54
Gadgets | Washington| 324 | 90
How can I get the sum of the profit based on the state and category without using pandas? Meaning I need to sum the values of "Sales" and "Profit" when category is "Clothes".
I am using the code below currently, which requires a lot manual effort.
with open(path,"r") as store_file:
reader = csv.DictReader(superstore_file)
total_sales_clothes = 0
total_sales_stationery = 0
total_sales_gadgets = 0
for row in reader:
category = row.get("Category")
if category=="Clothes":
sales_clothes = float(row.get("Sales"))
total_sales_clothes += sales_clothes
elif category=="Stationery":
sales_stationery = float(row.get("Sales"))
total_sales_stationery += sales_stationery
elif category=="Gadgets":
sales_office_gadgets = float(row.get("Sales"))
total_sales_gadgets += sales_gadgets
print("Total Sales for Clothes is: {0}".format(total_sales_clothes))
print("Total Sales for Stationery is {0}".format(total_sales_stationery))
print("Total Sales for Gadgets is {0}".format(total_sales_gadgets))
You can use the python dict. This way, you don't have to hardcode the categories
The python dictionary is a key-value data structure which is extremely useful.
In your case, key would be the categories and value would be the total sales.
Eg.
{ "Clothes" : 2389, "Stationery" : 0, "Gadgets" : 0, }
Edit : Note that you should check if the category exists. If it does, just add the sales value, else just assign the value.
with open(path,"r") as store_file:
reader = csv.DictReader(superstore_file)
total_sales = {}
for row in reader:
category = row.get("Category")
if category in total_sales:
total_sales[category] += float(row.get(category))
else:
total_sales[category] = float(row.get(category))
print("Total Sales for Clothes is: {0}".format(total_sales['Clothes']))
If you want to traverse through the dict and print sales for all the categories use
for key in total_sales:
print("Total Sales for " + key + " is : " + total_sales[key])
And, though you mentioned you don't want to use Pandas, I would suggest you to check its usage if you are going to work in these types of CSV Dataset for long period. Once you start, you ll find how easy it makes your job. You will save more time than what you spend learning Pandas.
I am trying to write code that will handle my input file of numbers, and then perform various operations on them. For example, The first column is a name. The second is an hourly rate, and the third is hours. The File looks like this,
John 15 8
Sam 10 4
Mike 16 10
John 19 15
I want to go through and if a name is a duplicate (John in the example) it will average the 2nd number (hourly rate), get the sum the 3rd number (hours), and delete the duplicate leaving 1 John with average wage and total hours. If not a duplicate it will just output the original entry.
I cannot figure out how to keep track of the duplicate, and then move on to the next line in the row. Is there any way to do this without using line.split()?
This problem is easier if you break it up into parts.
First, you want to read through the file and parse each line into three variables, the name, the hourly rate, and the hours.
Second, you need to handle the matching on the first value (the name). You need some kind of data structure to store values in; a dict is probably the right thing here.
Thirdly, you need to compute the average at the end (you can't compute it along the way because you need the count of values).
Putting it together, I would do something like this:
class PersonRecord:
def __init__(self, name):
self.name = name
self.hourly_rates = []
self.total_hours = 0
def add_record(self, hourly_rate, hours):
self.hourly_rates.append(hourly_rate)
self.total_hours += hours
def get_average_hourly_rate(self):
return sum(self.hourly_rates) / len(self.hourly_rates)
def compute_person_records(data_file_path):
person_records = {}
with open(data_file_path, 'r') as data_file:
for line in data_file:
parts = line.split(' ')
name = parts[0]
hourly_rate = int(parts[1])
hours = int(parts[2])
person_record = person_records.get(name)
if person_record is None:
person_record = PersonRecord(name)
person_records[name] = person_record
person_record.add_record(hourly_rate, hours)
return person_records
def main():
person_records = compute_person_records()
for person_name, person_record in person_records.items():
print('{name} {average_hourly_rate} {total_hours}'.format(
name=person_name,
average_hourly_rate=person_record.get_average_hourly_rate(),
total_hours=person_record.total_hours))
if __name__ == '__main__':
main()
Here we go. Just groupby the name and aggregate on the rate and hours taking the mean and sum as shown below.
#assume d is the name of your DataFrame.
d.groupby(by =['name']).agg({'rate': "mean", 'hours':'sum'})
Here's a version that's not particularly efficient. I wouldn't run it on lots of data, but it's easy to read and returns your data to its original form, which is apparently what you want...
from statistics import mean
input = '''John 15 8
Sam 10 4
Mike 16 10
John 19 15'''
lines = input.splitlines()
data = [line.split(' ') for line in lines]
names = set([item[0] for item in data])
processed = [(name, str(mean([int(i[1]) for i in data if i[0] == name])), str(sum([int(i[2]) for i in data if i[0] == name]))) for name in names]
joined = [' '.join(p) for p in processed]
line_joined = '\n'.join(joined)
a=[] #list to store all the values
while(True): #infinite while loop to take any number of values given
try: #for giving any number of inputs u want
l=input().split()
a.append(l)
except(EOFError):
break;
for i in a:
m=[i] #temperory list which will contain duplicate values
for j in range(a.index(i)+1,len(a)):
if(i[0]==a[j][0]):
m.append(a[j]) #appending duplicates
a.pop(j) #popping duplicates from main list
hr=0 #initializing hourly rate and hours with 0
hrs=0
if(len(m)>1):
for k in m:
hr+=int(k[1])
hrs+=int(k[2])# calculating total hourly rate and hours
i[1]=hr/len(m)
i[2]=hrs/len(m)#finding average
for i in a:
print(i[0],i[1],i[2]) # printing the final list
Read comments in the code for code explanation
You can do:
from collections import defaultdict
with open('file_name') as fd:
data = fd.read().splitlines()
line_elems = []
for line in data:
line_elems.append(line.split())
a_dict = defaultdict(list)
for e in line_elems:
a_dict[e[0]].append((e[1], e[2]))
final_dict = {}
for key in a_dict:
if len(a_dict[key]) > 1:
hour_rates = [float(x[0]) for x in a_dict[key]]
hours = [float(x[1]) for x in a_dict[key]]
ave_rate = sum(hour_rates) / len(hour_rates)
total_hours = sum(hours)
final_dict[key] = (ave_rate, total_hours)
else:
final_dict[key] = a_dict[key]
print(final_dict)
# write to file or do whatever
I am trying to run the below code to read the csv file and make it so that I can search by month to get the respective months utilities charges. If I put a print statement above "try" it will print every line. however, when I try to print (or reference) the bills defaultdict it should create it is empty. Any ideas what I might be doing wrong here? Thanks in advance for any help on this!!
import csv
from collections import defaultdict, namedtuple
source = 'C:/Users/George/PycharmProjects/Day6-Monthly-bills/Monthly Bills record - Sheet3 (1).csv'
Bills = namedtuple('Utilities', 'Month Gas Electric Water Total')
def lookup_bills_by_month(data=source):
bills = defaultdict(list)
with open(data, encoding='utf-8') as f:
for line in csv.DictReader(f):
try:
month = line['Month']
electric = int(line['Electric'])
gas = int(line['Gas'])
water = int(line['Water'])
total = int(line['Total'])
except ValueError:
continue
b = Bills(Gas=gas, Electric=electric, Water=water, Total=total}
bills[month].append(b)
print(bills)
return bills
bills = lookup_bills_by_month()
print(bills)
I am working on a Python project where a food processing company is trying to calculate its total sales for the year. Python has to read from a text file where its divided into for categories split by commas. The first category is the Type of product, which can be cereal, chocolate candy etc produced by the company. The second category is the brand of the said product, for example, Kaptain Krunch for cereal or Coco Jam for chocolate. The third category is the sales for the last fiscal year(2014) and the last category is sales for this fiscal year(2015). Note that only sales for fiscal year 2015 are to be calculated. The 2014 has no use in this program but it is there. Here is how the text file looks like. Its name is product.txt
Cereal,Magic Balls,2200,2344
Cereal,Kaptain Krunch,3300,3123
Cereal,Coco Bongo,1800,2100
Cereal,Sugar Munch,4355,6500
Cereal,Oats n Barley,3299,5400
Sugar Candy,Pop Rocks,546,982
Sugar Candy,Lollipop,1233,1544
Sugar Candy,Gingerbud,2344,2211
Sugar Candy,Respur,1245,2211
Chocolate,Coco Jam,3322,4300
Chocolate,Larkspur,1600,2200
Chocolate,Mighty Milk,1234,2235
Chocolate,Almond Berry,998,1233
Condiments,Peanut Butter,3500,3902
Condiments,Hot Sauce,1234,1560
Condiments,Jelly,346,544
Condiments,Spread,2334,5644
What we are looking to do is to add the sales for Fiscal year 2015 by products and then the total sales for everything in 2015
The output should look something like the in the written text file
Total sales for cereal in 2015 : {Insert total number here}
Total sales for Sugar Candy in 2015 : {Insert total number here}
Total sales for Chocolate in 2015 : {Insert total number here}
Total sales for Condiments in 2015 : {Insert total number here}
Total sales for the company in 2015: {Insert total for all the
products sold in 2015}
Along with that, it should also print the grand total on the Python run screen in the IDE along with the text file.
Total sales for the company in 2015: {Insert total for all the
products sold in 2015}
Here is my code. I am new to Python and reading and writing files so I can't really say if I am on the right track.
PRODUCT_FILE = "products.txt"
REPORT_FILE = "report.txt"
def main():
#open the file
productFile = open(PRODUCT_FILE, "r")
reportFile = open(REPORT_FILE, "w")
# reading the file
proData = extractDataRecord(productFile)
product = proData[0]
category = proData[1]
salesLastYear = prodata[2]
salesThisYear = proData[3]
#computing
product = 0.0
product = salesThisYear
productFile.close()
reportFile.close()
def extractDataRecord(infile) :
line = infile.readline()
if line == "" :
return []
else :
parts = line.rsplit(",", 1)
parts[1] = int(parts[1])
return parts
# Start the program.
main()
The short version here is that you're doing this wrong. Never roll your own parsing code if you can help it. I'd suggest taking a look at the built-in csv module, and trying using that to "contract out" the CSV parsing, letting you focus on the rest of the logic.
Simple rewrite and completed code with csv:
import collections
import csv
PRODUCT_FILE = "products.txt"
REPORT_FILE = "report.txt"
def main():
# Easy way to get a dictionary where lookup defaults to 0
categorycounts = collections.defaultdict(int)
#open the files using with statements to ensure they're closed properly
# without the need for an explicit call to close, even on exceptions
with open(PRODUCT_FILE, newline='') as productfile,\
open(REPORT_FILE, "w") as reportfile:
pcsv = csv.reader(productfile)
# Sum sales by product type letting csv parse
# Filter removes empty rows for us; assume all other rows complete
for category, brand, sales_lastyear, sales_thisyear in filter(None, pcsv):
categorycounts[category] += int(sales_thisyear)
# Print categories in sorted order with their total sales
for category, sales in sorted(categorycounts.items()):
print('Total sales for', category, 'in 2015:', sales, file=reportfile)
print('-'*80, file=reportfile) # Separator line between categories and total
# Sum and print total sales to both file and screen
totalsales = sum(categorycounts.values())
print("Total sales for the company in 2015:", totalsales, file=reportfile)
print("Total sales for the company in 2015:", totalsales)
if __name__ == '__main__':
main()