reading and writing files for a food processing company

reading and writing files for a food processing company - python

I am working on a Python project where a food processing company is trying to calculate its total sales for the year. Python has to read from a text file where its divided into for categories split by commas. The first category is the Type of product, which can be cereal, chocolate candy etc produced by the company. The second category is the brand of the said product, for example, Kaptain Krunch for cereal or Coco Jam for chocolate. The third category is the sales for the last fiscal year(2014) and the last category is sales for this fiscal year(2015). Note that only sales for fiscal year 2015 are to be calculated. The 2014 has no use in this program but it is there. Here is how the text file looks like. Its name is product.txt
Cereal,Magic Balls,2200,2344
Cereal,Kaptain Krunch,3300,3123
Cereal,Coco Bongo,1800,2100
Cereal,Sugar Munch,4355,6500
Cereal,Oats n Barley,3299,5400
Sugar Candy,Pop Rocks,546,982
Sugar Candy,Lollipop,1233,1544
Sugar Candy,Gingerbud,2344,2211
Sugar Candy,Respur,1245,2211
Chocolate,Coco Jam,3322,4300
Chocolate,Larkspur,1600,2200
Chocolate,Mighty Milk,1234,2235
Chocolate,Almond Berry,998,1233
Condiments,Peanut Butter,3500,3902
Condiments,Hot Sauce,1234,1560
Condiments,Jelly,346,544
Condiments,Spread,2334,5644
What we are looking to do is to add the sales for Fiscal year 2015 by products and then the total sales for everything in 2015
The output should look something like the in the written text file
Total sales for cereal in 2015 : {Insert total number here}
Total sales for Sugar Candy in 2015 : {Insert total number here}
Total sales for Chocolate in 2015 : {Insert total number here}
Total sales for Condiments in 2015 : {Insert total number here}
Total sales for the company in 2015: {Insert total for all the
products sold in 2015}
Along with that, it should also print the grand total on the Python run screen in the IDE along with the text file.
Total sales for the company in 2015: {Insert total for all the
products sold in 2015}
Here is my code. I am new to Python and reading and writing files so I can't really say if I am on the right track.
PRODUCT_FILE = "products.txt"
REPORT_FILE = "report.txt"
def main():
#open the file
productFile = open(PRODUCT_FILE, "r")
reportFile = open(REPORT_FILE, "w")
# reading the file
proData = extractDataRecord(productFile)
product = proData[0]
category = proData[1]
salesLastYear = prodata[2]
salesThisYear = proData[3]
#computing
product = 0.0
product = salesThisYear
productFile.close()
reportFile.close()
def extractDataRecord(infile) :
line = infile.readline()
if line == "" :
return []
else :
parts = line.rsplit(",", 1)
parts[1] = int(parts[1])
return parts
# Start the program.
main()

The short version here is that you're doing this wrong. Never roll your own parsing code if you can help it. I'd suggest taking a look at the built-in csv module, and trying using that to "contract out" the CSV parsing, letting you focus on the rest of the logic.
Simple rewrite and completed code with csv:
import collections
import csv
PRODUCT_FILE = "products.txt"
REPORT_FILE = "report.txt"
def main():
# Easy way to get a dictionary where lookup defaults to 0
categorycounts = collections.defaultdict(int)
#open the files using with statements to ensure they're closed properly
# without the need for an explicit call to close, even on exceptions
with open(PRODUCT_FILE, newline='') as productfile,\
open(REPORT_FILE, "w") as reportfile:
pcsv = csv.reader(productfile)
# Sum sales by product type letting csv parse
# Filter removes empty rows for us; assume all other rows complete
for category, brand, sales_lastyear, sales_thisyear in filter(None, pcsv):
categorycounts[category] += int(sales_thisyear)
# Print categories in sorted order with their total sales
for category, sales in sorted(categorycounts.items()):
print('Total sales for', category, 'in 2015:', sales, file=reportfile)
print('-'*80, file=reportfile) # Separator line between categories and total
# Sum and print total sales to both file and screen
totalsales = sum(categorycounts.values())
print("Total sales for the company in 2015:", totalsales, file=reportfile)
print("Total sales for the company in 2015:", totalsales)
if __name__ == '__main__':
main()

Related

compare two data in a columns using one csv file in python

I am trying to compare two data in one csv file and I cannot use panda.
What I am trying to get is the total Unit sold that the two person sell and the sum of all the years then compare who sold more based on the sum of all they sold through out the years. Then also get the least they sold on that particular year.
For example, my .csv is setup like this:
John Smith, 343, 2020
John Smith, 522, 2019
John Smith, 248, 2018
Sherwin Cooper, 412, 2020
Sherwin Cooper, 367, 2019
Sherwin Cooper, 97, 2018
Dorothy Lee, 612, 2020
Dorothy Lee, 687, 2019
Dorothy Lee, 591, 2018
I want to compare John and Dorothy's unit sold and who sold more. So the output should be:
Dorothy Lee sold more units than John smith. A total of 1890 to 1113.
Dorothy Lee sold less in 2018, for only 591.
John Smith sold less in 2018, for only 248.
My code so far is:
import csv
def compare1(employee1):
with open("employeedata.csv") as file:
rows = list(csv.DictReader(file, fieldnames = ['c1', 'c2', 'c3']))
res = {}
for row in rows:
if row['c1'] == employee1:
res[employee1] = res.get(employee1, 0) + int(row['c2'])
print(res)
def compare2(employee2):
with open("employee2.csv") as file:
rows = list(csv.DictReader(file, fieldnames = ['c1', 'c2', 'c3']))
res = {}
for row in rows:
if row['c1'] == employee2:
res[employee2] = res.get(employee2, 0) + int(row['c2'])
print(res)
employee1 = input("Enter the first name: ")
employee2 = input("Enter the first name: ")
compare1(employee1)
compare2(employee2)
I don't know the rest. I am stuck. I am a beginner and I can't use Panda. The output I need to have should look like this:
Dorothy Lee sold more units than John smith. A total of 1890 to 1113.
Dorothy Lee sold less in 2018, for only 591.
John Smith sold less in 2018, for only 248.
right now I got the output:
{'John Smith : 1113}
{'Dorothy Lee' : 1890}

Suppose my.csv has columns name, sales, year:
import pandas as pd
emp_df = pd.read_csv("my.csv")
emp_gp = emp_df.groupby("name").sales.sum().reset_index(inplace=True)
def compare(saler1, saler2):
if saler1 in emp_pg.name.values and saler2 in emp_pg.name.values:
saler1_tol = emp_pg.loc[emp_pg.name == saler1, ["sales"]]
saler2_tol = emp_pg.loc[emp_pg.name == saler2, ["sales"]]
if saler1_tol > saler2_tol:
print(f"{saler1} sold more unit than {saler2}. A total {saler1_tol} to {saler1_tol}")
else:
print(f"{saler2} sold more unit than {saler1}. A total {saler2_tol} to {saler2_tol}")
emp_gb2 = emp_df.groupby("name")
emp_agg = emp_gb2.agg({
"sales" : "min"
})
emp_agg = emp_agg.reset_index()
print("{saler1} sold less in {emp_pg.loc[emp_pg.name == saler1, ["year"]].values}, for only {emp_pg.loc[emp_pg.name == saler1, ["sales"]].values}")
print("{saler2} sold less in {emp_pg.loc[emp_pg.name == saler2, ["year"]].values}, for only {emp_pg.loc[emp_pg.name == saler2, ["sales"]].values}")
else:
print("names of salers are not in the table")

Instead of creating a function for each result you want to get, first create a database (a dict is OK) that aggregates the sum of units sold for each name and for each year. Then it is easier to answer to all kind of comparisons without having to repeat code. You can start with something like this,
import csv
from collections import defaultdict
db=defaultdict(lambda: defaultdict(int))
with open('teste.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
db[row['name']][int(row['year'])]+=int(row['units'])
print(db['Dorothy Lee'][2019]) #Units sold by Dorothy Lee in 2019
print(sum(db['Dorothy Lee'].values())) #Total Units sold by Dorothy Lee
Don't be afraid of the defaultdict module. Check the docs, it is really handy in this kind of scenario. The defaultdict creates a dictionary with a default for every missing key. In this case, the default value of the first defaultdict is another defaultdict, this time with a default value of 0 (the result of calling int()), since we want to compute a sum of units sold (therefore an integer).
With this approach, you don't need to check if the key already exists or not, defaultdict takes care of that for you.
PS: the lambda in the first defaultdict is needed to nest a second defaultdict. If you are not familiar with lambda either, check this

Python: Getting values of one column based on another column name

I am new to python and trying to read a csv file and generate a sales report. For example, I have a dataset as per below
Category | State | Sales| Profits|
Clothes | California| 2389 | 608
Stationery| Georgia | 687 | 54
Gadgets | Washington| 324 | 90
How can I get the sum of the profit based on the state and category without using pandas? Meaning I need to sum the values of "Sales" and "Profit" when category is "Clothes".
I am using the code below currently, which requires a lot manual effort.
with open(path,"r") as store_file:
reader = csv.DictReader(superstore_file)
total_sales_clothes = 0
total_sales_stationery = 0
total_sales_gadgets = 0
for row in reader:
category = row.get("Category")
if category=="Clothes":
sales_clothes = float(row.get("Sales"))
total_sales_clothes += sales_clothes
elif category=="Stationery":
sales_stationery = float(row.get("Sales"))
total_sales_stationery += sales_stationery
elif category=="Gadgets":
sales_office_gadgets = float(row.get("Sales"))
total_sales_gadgets += sales_gadgets
print("Total Sales for Clothes is: {0}".format(total_sales_clothes))
print("Total Sales for Stationery is {0}".format(total_sales_stationery))
print("Total Sales for Gadgets is {0}".format(total_sales_gadgets))

You can use the python dict. This way, you don't have to hardcode the categories
The python dictionary is a key-value data structure which is extremely useful.
In your case, key would be the categories and value would be the total sales.
Eg.
{ "Clothes" : 2389, "Stationery" : 0, "Gadgets" : 0, }
Edit : Note that you should check if the category exists. If it does, just add the sales value, else just assign the value.
with open(path,"r") as store_file:
reader = csv.DictReader(superstore_file)
total_sales = {}
for row in reader:
category = row.get("Category")
if category in total_sales:
total_sales[category] += float(row.get(category))
else:
total_sales[category] = float(row.get(category))
print("Total Sales for Clothes is: {0}".format(total_sales['Clothes']))
If you want to traverse through the dict and print sales for all the categories use
for key in total_sales:
print("Total Sales for " + key + " is : " + total_sales[key])
And, though you mentioned you don't want to use Pandas, I would suggest you to check its usage if you are going to work in these types of CSV Dataset for long period. Once you start, you ll find how easy it makes your job. You will save more time than what you spend learning Pandas.

My dictionary is empty after try loop

I am trying to run the below code to read the csv file and make it so that I can search by month to get the respective months utilities charges. If I put a print statement above "try" it will print every line. however, when I try to print (or reference) the bills defaultdict it should create it is empty. Any ideas what I might be doing wrong here? Thanks in advance for any help on this!!
import csv
from collections import defaultdict, namedtuple
source = 'C:/Users/George/PycharmProjects/Day6-Monthly-bills/Monthly Bills record - Sheet3 (1).csv'
Bills = namedtuple('Utilities', 'Month Gas Electric Water Total')
def lookup_bills_by_month(data=source):
bills = defaultdict(list)
with open(data, encoding='utf-8') as f:
for line in csv.DictReader(f):
try:
month = line['Month']
electric = int(line['Electric'])
gas = int(line['Gas'])
water = int(line['Water'])
total = int(line['Total'])
except ValueError:
continue
b = Bills(Gas=gas, Electric=electric, Water=water, Total=total}
bills[month].append(b)
print(bills)
return bills
bills = lookup_bills_by_month()
print(bills)

Calculate totals by reading from a text file in Python

I am trying to write a program in python in which we have to add the numbers from different categories and sub-categories. The program is about a farmer's annual sale of produce from his farm. The text file from where we have to read from has 4 categories. The first category is the type of product for example Vegetables, Fruits, condiments. The second category tells us about the type of product we have, for example Potatoes, Apples, Hot Sauce. The third category tells us about the sales in 2014 and the fourth category tells us about the sales in 2015. In this program, we only have to calculate the totals from the 2015 numbers. The 2014 numbers are present in the text file but are irrelevant.
Here is how the text file looks like
PRODUCT,CATEGORY,2014 Sales,2015 Sales
Vegetables,Potatoes,4455,5644
Vegetables,Tomatoes,5544,6547
Vegetables,Peas,987,1236
Vegetables,Carrots,7877,8766
Vegetables,Broccoli,5564,3498
Fruits,Apples,398,4233
Fruits,Grapes,1099,1234
Fruits,Pear,2342,3219
Fruits,Bananas,998,1235
Fruits,Peaches,1678,1875
Condiments,Peanut Butter,3500,3902
Condiments,Hot Sauce,1234,1560
Condiments,Jelly,346,544
Condiments,Spread,2334,5644
Condiments,Ketchup,3321,3655
Condiments,Olive Oil,3211,2344
What we are looking to do is to add the sales for 2015 by products and then the total sales for everything in 2015.
The output should look something like this in the written text file:
Total sales for Vegetables in 2015 : {Insert total number here}
Total sales for Fruits in 2015 : {Insert total number here}
Total sales for Condiments in 2015 : {Insert total number here}
Total sales for the farmer in 2015: {Insert total for all the
products sold in 2015}
Along with that, it should also print the grand total on the Python run screen in the IDE along with the text file:
Total sales for the farmer in 2015: {Insert total for all the
products sold in 2015}
Here is my code. I am new to Python and reading and writing files so I can't really say if I am on the right track.
PRODUCT_FILE = "products.txt"
REPORT_FILE = "report.txt"
def main():
#open the file
productFile = open(PRODUCT_FILE, "r")
reportFile = open(REPORT_FILE, "w")
# reading the file
proData = extractDataRecord(productFile)
product = proData[0]
category = proData[1]
salesLastYear = prodata[2]
salesThisYear = proData[3]
#computing
product = 0.0
product = salesThisYear
productFile.close()
reportFile.close()
def extractDataRecord(infile) :
line = infile.readline()
if line == "" :
return []
else :
parts = line.rsplit(",", 1)
parts[1] = int(parts[1])
return parts
# Start the program.
main()

You have a csv file so you should probably use pythons built-in csv module to parse the file. The DictReader class turns every line into a dictionary with the key being the header. If your csv file was called product_sales.csv, the following code would work.
import csv
product_dict = {}
cat_dict = {}
with open('product_sales.csv', 'r') as f:
for line in csv.DictReader(f):
cat = line['CATEGORY']
product = line['PRODUCT']
sales_15 = int(line['2015 Sales'])
if cat in cat_dict:
cat_dict[cat] += sales_15
else:
cat_dict[cat] = sales_15
if product in product_dict:
product_dict[product] += sales_15
else:
product_dict[product] = sales_15
Total = sum(cat_dict.values())
print product_dict
print cat_dict
print Total

Your code looks like a good start; here are a few pointers:
it is a good idea to use with when opening files because it guarantees they will get closed properly. Instead of
productFile = open(PRODUCT_FILE, "r")
# do something with the file
productFile.close()
you should do
with open(PRODUCT_FILE, "r") as product_file:
# do something with the file
# the file has been closed!
you only call proData = extractDataRecord(productFile) once (ie you get the header line but none of the data). You could put it in a while loop, but it is much more idiomatic to iterate directly on the file, ie
for line in product_file:
product, category, _, sales = line.split(',')
sales = int(sales)
# now do something with the values!
(using _ as a variable name is shorthand for "I don't care about this value")
you can use a dict to keep track of products and total sales for each,
product_sales = {}
then in the for loop,
product_sales[product] = product_sales.get(product, 0) + sales
If you could from collections import defaultdict, this becomes even simpler:
product_sales = defaultdict(int)
product_sales[product] += sales
once you have processed the entire file, you need to report on the results like
all_sales = 0
for product, sales in product_sales.items():
# write the sales for this product
all_sales += sales
# print all_sales

How to continue writing the loop to the file and how to reset the initial value to 0 after each condition met?

How do I write the file without repeating in the loop?
How to continue writing the loop to the file and how to reset the initial value to 0 after each condition met?
def main():
infile=open("sales.txt","w")
infilelist=[]
#call function
totalsale_each,salesperson=sales(infile,infilelist)
for salesamount in infilelist:
infile.write(salesamount)
infile.close()
def sales(infile, infilelist):
#set initial
t_sales=0
n_sales=int(input("Number of sales"))
while t_sales<n_sales:
#increasement
t_sales+=1
#for loop
for s in range(1,n_sales+1):
totalsale_each=0 #set the acc for total sales by each sales person
sales_person=input("sales person name:")
print("sales for sales no. "+str(s)+"by"+sales_person+":")
infilelist.append(sales_person)
for count in range (1,5): #assuming one sales person can sell max and min of 5item per cust
sales=float(input('sales#' +str(count)+ ':'))
if sales<=300:
t_sales=t_sales+sales
totalsale_each=totalsale_each+sales
if sales>300: #if sales>300, need to change the sales person
#that sales person needs to finish serving the rest of that cust's items
t_sales=t_sales+sales
sales_person=input("another sales pesron")
infilelist.append(sales_person)
infilelist.append(str(totalsale_each)) #to write total sales for each person
return totalsale_each,sales_person
main()
This is my python test:
Number of sales: 2
sales person name:A
sales for sales no.1 by A:
sales#1:100
sales#2:150
sales#3:350
another sales pesronB
sales#4:200
sales person name:C
sales for sales no.2 by C:
sales#1:200
sales#2:500
another sales pesronD
sales#3:500
another sales pesronE
sales#4:200
The file I am getting after running is
A
100.0
250.0
B
250.0
450.0
C
200.0
D
200.0
E
200.0
400.0
but what I want to get is like below:
A
600
B
200
C
700
D
500
E
200
How can I correct it?
I can't figure out where to put infilelist.append(sales_person) and infilelist.append(totalsale_each) after asking for the next sales person when sales exceed 300.
Thank you

If I understand correctly, when the sales go over 300, you want to write the total sales of that person, and ask for the next one.
So, what you need to do is, add the total of sales for each person, after calculating the total sales, which means after the loop in range(5).
Also, you need to stop calculating sales after you find that a sales person has sold over 300. So, you need to break the loop at that moment, and continue normally.
The problem, here, is to go to the next sales person after that. Either you continue the loop and it will prompt as it would for any other name, OR you should ask for it in another form, and insert it in the array, and make sure it doesn't ask for another name like it would normally. The first option is simpler, if you need the second say so in the comment.
As a result, I would suggest these modifications to the sales function:
def sales(infile, infilelist):
#set initial
t_sales=0
n_sales=int(input("Number of sales"))
while t_sales<n_sales:
#increasement
t_sales+=1
#for loop
for s in range(1,n_sales+1):
totalsale_each=0 #set the acc for total sales by each sales person
# EDIT: this seems useless, so remove it
#sales_no=s*1
sales_person=input("sales person name:")
print("sales for sales no."+str(s)+"by"+sales_person+":")
infilelist.append(sales_person)
for count in range (5): #assuming one sales person can sell only 5item per cust
sales=float(input('sales#' +str(count)+ ':'))
if sales<=300:
t_sales=t_sales+sales
totalsale_each=totalsale_each+sales
if sales>300: #if sales>300, need to change the sales person
t_sales=t_sales+sales
# write the last sales_person's total, and ask for another one
infilelist.append(str(totalsale_each))
sales_person=input("another sales pesron")
# add it to the list, after that, everything counts for him
infilelist.append(sales_person)
totalsale_each=sales
# EDIT: decrease indentation so that the total is written once, after the loop
infilelist.append(str(totalsale_each)) #to write total sales for each person
return totalsale_each,sales_person
EDIT: I edited the code: removed the break, write the last one's total, ask for another sales person, reset the total counter, and write everything in his name.
Also, a note about the return statement (maybe you already know that): it will return the last sales person's total sales and name.
However, I would suggest to rewrite the whole thing, so that you save the totals in a dictionary, with a key = the sales person's name or index (sales_no). After that, you write them all to the list.
EDIT 2: If you notice, after adding the new sales_person (infilelist.append(sales_person)), I am adding the sales to totalsales_each. It seems what you want is the opposite, so you should do this before adding the new sales person, which gives you:
if sales>300:
t_sales=t_sales+sales
totalsale_each += sales # the last sale counts for the last person
infilelist.append(str(totalsale_each))
sales_person=input("another sales pesron")
infilelist.append(sales_person)
totalsale_each = 0 # reset the sales for the next person

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

reading and writing files for a food processing company - python

Related

compare two data in a columns using one csv file in python

Python: Getting values of one column based on another column name

My dictionary is empty after try loop

Calculate totals by reading from a text file in Python

How to continue writing the loop to the file and how to reset the initial value to 0 after each condition met?

Categories

Resources