How do I manipulate data in a list that has been read in from a file using Python 2.x?

How do I manipulate data in a list that has been read in from a file using Python 2.x? - python

I am trying to create a program that will tally the cost of ingredients within a recipe and return a total cost for said recipe. I am teaching myself Python and have set this as a personal, but practical, challenge. However, I have hit a wall. Hard.
My idea was to read a file into a list. Multiply the ingredient within the list by the comma separated numeral. Add it all together, and return a single float for the overall cost.
#Phase 1 - MASTER INGREDIENTS LIST
flour_5lb = 2.5
sugar_4lb = 2.0
butter_lb = 3.0
eggs_doz = 3.0
#PHASE 2 - COST PER UNIT CONVERSION
flour_cup = flour_5lb*(1.0/20)
sugar_cup = sugar_4lb*(1.0/8)
butter_Tbsp = butter_lb*(1.0/32)
eggs_each = eggs_doz*(1.0/12)
#PHASE THREE - RECIPE ASSESSMENT
def main():
fileObject = open("filname.txt", "r")
fileLines = fileObject.readlines()
fileObject.close()
for line in fileLines:
print line
print "\n"
if __name__ == "__main__":
main()
The for line in fileLines: statement prints the following:
flour_cup, .5
milk_cup, .4
eggs_each, 3
butter_Tbsp, 3
Press any key to continue . . .

If I understand you correctly, you have to parse your file.
For this you need to know the format in which the ingredients are being stored. Since this program is for your personal use you may just choose the most simple.
So let's assume you have your ingredients in CSV format:
sugar 10g
flour 20g
...
Then you can use pythons buildin function split and iteration to obtain a list of list [['sugar', '10g'], ['flour', '10g'], ...].
Getting the amounts into python floats is a little tricky, since we haave to concern ourselves with the units.
Again - choose a fixed set of units to make your life a little easier.
Then use the in statement or the builtin function which checks if a given string has a certain suffix. (I will leave it to you to find this function.)
Then the hard part is done. Hope I could help without giving too much away.

Part of your difficulty is knowing how to split your input on the comma -- use split(). Another problem is converting the string to a float -- use float().
Your last problem is mapping input strings to values. You could write a function that maps strings to costs:
if item == "milk_cup":
return milk_cup
if item == "flour_cup":
return flour_cup
...
...but the better way (DRY) to do it is to use a dictionary.
In my sample below I've used dict() to make the dictionary as then I don't have to quote every string.
Here's a sample:
#!/usr/bin/python
pricelist = dict(
flour_cup=1.0,
milk_cup=0.4,
)
input = ["flour_cup, 0.5", "milk_cup, 0.4"]
total = 0
for line in input:
item, qty = line.split(",")
item = item.strip()
qty = float(qty)
if item in pricelist:
cost = qty * pricelist[item]
print "%s: %.02f\n" % (item, cost)
total += cost
else:
print "I don't know what '%s' is" % item
print "Total: %.02f" % total

Related

Python - For every value in text file?

With the code I am writing, I have split a text file up using commas, and now for each value in there, I want to make it an integer. I have tried splitting the text file and then turning it into an integer but that would not work. Is there any way of saying for all values in a file, do a certain thing? Also, the amount of values isn't concrete, it depends on the user of the programme (it is a 'shopping list' programme.
My current code:
TotalCOSTS=open("TotalCOSTS.txt","r")
Prices=TotalCOSTS.read()
print(Prices)
Prices.strip().split(",")
IntPrices=int(NewPrices)
print(len(IntPrices))
if len(IntPrices)==1:
print("Your total cost is: "+IntPrices +" Pounds")
elif len(IntPrices)>1:
FinalTotal = sum([int(num) for num in IntPrices.split(",")])
print("Your total cost is: "+ FinalTotal +" Pounds")
Prices is the file that the values are contained in, so I've stripped it of whitespace and then split it. That is where I need to continue on from.
Thank you xx

results = [int(i) for i in results]
python 3 you can do:
results = list(map(int, results))

NewPrices isn't defined in your code example
split() returns a list
The most straight-forward way to accomplish what you are trying to do is the following:
total = sum([int(x) for x in TotalCOSTS.read().split(',') if x.isdigit() == True])
But this makes some super-simplifying assumptions which won't be accurate all of the time. For example, if something costs $2.99, int() will cast this to 3. Overall, you want to consider the price in terms of cents (idk which currency you are using, but in USD, 100 cents = 1 dollar) so that $2.99 = 299 cents.
So really, you want something like this:
total = sum([float(x)*100 for x in TotalCOSTS.read().split(',') if x.isnumeric() == True])/100

Update: Python average income reading and writing files

I was writing a code to find the average household income, and how many families are below poverty line.
this is my code so far
def povertyLevel():
inFile = open('program10.txt', 'r')
outFile = open('program10-out.txt', 'w')
outFile.write(str("%12s %12s %15s\n" % ("Account #", "Income", "Members")))
lineRead = inFile.readline() # Read first record
while lineRead != '': # While there are more records
words = lineRead.split() # Split the records into substrings
acctNum = int(words[0]) # Convert first substring to integer
annualIncome = float(words[1]) # Convert second substring to float
members = int(words[2]) # Convert third substring to integer
outFile.write(str("%10d %15.2f %10d\n" % (acctNum, annualIncome, members)))
lineRead = inFile.readline() # Read next record
# Close the file.
inFile.close() # Close file
Call the main function.
povertyLevel()
I am trying to find the average of annualIncome and what i tried to do was
avgIncome = (sum(annualIncome)/len(annualIncome))
outFile.write(avgIncome)
i did this inside the while lineRead. however it gave me an error saying
avgIncome = (sum(annualIncome)/len(annualIncome))
TypeError: 'float' object is not iterable
currently i am trying to find which household that exceeds the average income.

avgIncome expects a sequence (such as a list) (Thanks for the correction, Magenta Nova.), but its argument annualIncome is a float:
annualIncome = float(words[1])
It seems to me you want to build up a list:
allIncomes = []
while lineRead != '':
...
allIncomes.append(annualIncome)
averageInc = avgIncome(allIncomes)
(Note that I have one less indentation level for the avgIncome call.)
Also, once you get this working, I highly recommend a trip over to https://codereview.stackexchange.com/. You could get a lot of feedback on ways to improve this.
Edit:
In light of your edits, my advice still stands. You need to first compute the average before you can do comparisons. Once you have the average, you will need to loop over the data again to compare each income. Note: I advise saving the data somehow for the second loop, instead of reparsing the file. (You may even wish to separate reading the data from computing the average entirely.) That might best be accomplished with a new object or a namedtuple or a dict.

sum() and len() both take as their arguments an iterable. read the python documentation for more on iterables. you are passing a float into them as an argument. what would it mean to get the sum, or the length, of a floating point number? even thinking outside the world of coding, it's hard to make sense of that.
it seems like you need to review the basics of python types.

Python: creating a dictionary that writes high scores to a file

First: you don't have to code this for me, unless you're a super awesome nice guy. But since you're all great at programming and understand it so much better than me and all, it might just be easier (since it's probably not too many lines of code) than writing paragraph after paragraph trying to make me understand it.
So - I need to make a list of high scores that updates itself upon new entries. So here it goes:
First step - done
I have player-entered input, which has been taken as a data for a few calculations:
import time
import datetime
print "Current time:", time1.strftime("%d.%m.%Y, %H:%M")
time1 = datetime.datetime.now()
a = raw_input("Enter weight: ")
b = raw_input("Enter height: ")
c = a/b
Second step - making high score list
Here, I would need some sort of a dictionary or a thing that would read the previous entries and check if the score (c) is (at least) better than the score of the last one in "high scores", and if it is, it would prompt you to enter your name.
After you entered your name, it would post your name, your a, b, c, and time in a high score list.
This is what I came up with, and it definitely doesn't work:
list = [("CPU", 200, 100, 2, time1)]
player = "CPU"
a = 200
b = 100
c = 2
time1 = "20.12.2012, 21:38"
list.append((player, a, b, c, time1))
list.sort()
import pickle
scores = open("scores", "w")
pickle.dump(list[-5:], scores)
scores.close()
scores = open("scores", "r")
oldscores = pickle.load(scores)
scores.close()
print oldscores()
I know I did something terribly stupid, but anyways, thanks for reading this and I hope you can help me out with this one. :-)

First, don't use list as a variable name. It shadows the built-in list object. Second, avoid using just plain date strings, since it is much easier to work with datetime objects, which support proper comparisons and easy conversions.
Here is a full example of your code, with individual functions to help divide up the steps. I am trying not to use any more advanced modules or functionality, since you are obviously just learning:
import os
import datetime
import cPickle
# just a constants we can use to define our score file location
SCORES_FILE = "scores.pickle"
def get_user_data():
time1 = datetime.datetime.now()
print "Current time:", time1.strftime("%d.%m.%Y, %H:%M")
a = None
while True:
a = raw_input("Enter weight: ")
try:
a = float(a)
except:
continue
else:
break
b = None
while True:
b = raw_input("Enter height: ")
try:
b = float(b)
except:
continue
else:
break
c = a/b
return ['', a, b, c, time1]
def read_high_scores():
# initialize an empty score file if it does
# not exist already, and return an empty list
if not os.path.isfile(SCORES_FILE):
write_high_scores([])
return []
with open(SCORES_FILE, 'r') as f:
scores = cPickle.load(f)
return scores
def write_high_scores(scores):
with open(SCORES_FILE, 'w') as f:
cPickle.dump(scores, f)
def update_scores(newScore, highScores):
# reuse an anonymous function for looking
# up the `c` (4th item) score from the object
key = lambda item: item[3]
# make a local copy of the scores
highScores = highScores[:]
lowest = None
if highScores:
lowest = min(highScores, key=key)
# only add the new score if the high scores
# are empty, or it beats the lowest one
if lowest is None or (newScore[3] > lowest[3]):
newScore[0] = raw_input("Enter name: ")
highScores.append(newScore)
# take only the highest 5 scores and return them
highScores.sort(key=key, reverse=True)
return highScores[:5]
def print_high_scores(scores):
# loop over scores using enumerate to also
# get an int counter for printing
for i, score in enumerate(scores):
name, a, b, c, time1 = score
# #1 50.0 jdi (20.12.2012, 15:02)
print "#%d\t%s\t%s\t(%s)" % \
(i+1, c, name, time1.strftime("%d.%m.%Y, %H:%M"))
def main():
score = get_user_data()
highScores = read_high_scores()
highScores = update_scores(score, highScores)
write_high_scores(highScores)
print_high_scores(highScores)
if __name__ == "__main__":
main()
What it does now is only add new scores if there were no high scores or it beats the lowest. You could modify it to always add a new score if there are less than 5 previous scores, instead of requiring it to beat the lowest one. And then just perform the lowest check after the size of highscores >= 5

The first thing I noticed is that you did not tell list.sort() that the sorting should be based on the last element of each entry. By default, list.sort() will use Python's default sorting order, which will sort entries based on the first element of each entry (i.e. the name), then mode on to the second element, the third element and so on. So, you have to tell list.sort() which item to use for sorting:
from operator import itemgetter
[...]
list.sort(key=itemgetter(3))
This will sort entries based on the item with index 3 in each tuple, i.e. the fourth item.
Also, print oldscores() will definitely not work since oldscores is not a function, hence you cannot call it with the () operator. print oldscores is probably better.

Here are the things I notice.
These lines seem to be in the wrong order:
print "Current time:", time1.strftime("%d.%m.%Y, %H:%M")
time1 = datetime.datetime.now()
When the user enters the height and weight, they are going to be read in as strings, not integers, so you will get a TypeError on this line:
c = a/b
You could solve this by casting a and b to float like so:
a = float(raw_input("Enter weight: "))
But you'll probably need to wrap this in a try/catch block, in case the user puts in garbage, basically anything that can't be cast to a float. Put the whole thing in a while block until they get it right.
So, something like this:
b = None
while b == None:
try:
b = float(raw_input("Enter height: "))
except:
print "Weight should be entered using only digits, like '187'"
So, on to the second part, you shouldn't use list as a variable name, since it's a builtin, I'll use high_scores.
# Add one default entry to the list
high_scores = [("CPU", 200, 100, 2, "20.12.2012, 4:20")]
You say you want to check the player score against the high score, to see if it's best, but if that's the case, why a list? Why not just a single entry? Anyhow, that's confusing me, not sure if you really want a high score list, or just one high score.
So, let's just add the score, no matter what:
Assume you've gotten their name into the name variable.
high_score.append((name, a, b, c, time1))
Then apply the other answer from #Tamás

You definitely don't want a dictionary here. The whole point of a dictionary is to be able to map keys to values, without any sorting. What you want is a sorted list. And you've already got that.
Well, as Tamás points out, you've actually got a list sorted by the player name, not the score. On top of that, you want to sort in downward order, not upward. You could use the decorate-sort-undecorate pattern, or a key function, or whatever, but you need to do something. Also, you've put it in a variable named list, which is a very bad idea, because that's already the name of the list type.
Anyway, you can find out whether to add something into a sorted list, and where to insert it if so, using the bisect module in the standard library. But it's probably simpler to just use something like SortedCollection or blist.
Here's an example:
highscores = SortedCollection(scores, key=lambda x: -x[3])
Now, when you finish the game:
highscores.insert_right((player, a, b, newscore, time1))
del highscores[-1]
That's it. If you were actually not in the top 10, you'll be added at #11, then removed. If you were in the top 10, you'll be added, and the old #10 will now be #11 and be removed.
If you don't want to prepopulate the list with 10 fake scores the way old arcade games used to, just change it to this:
highscores.insert_right((player, a, b, newscore, time1))
del highscores[10:]
Now, if there were already 10 scores, when you get added, #11 will get deleted, but if there were only 3, nothing gets deleted, and now there are 4.
Meanwhile, I'm not sure why you're writing the new scores out to a pickle file, and then reading the same thing back in. You probably want to do the reading before adding the highscore to the list, and then do the writing after adding it.
You also asked how to "beautify the list". Well, there are three sides to that.
First of all, in the code, (player, a, b, c, time1) isn't very meaningful. Giving the variables better names would help, of course, but ultimately you still come down to the fact that when accessing list, you have to do entry[3] to get the score or entry[4] to get the time.
There are at least three ways to solve this:
Store a list (or SortedCollection) of dicts instead of tuples. The code gets a bit more verbose, but a lot more readable. You write {'player': player, 'height': a, 'weight': b, 'score': c, 'time': time1}, and then when accessing the list, you do entry['score'] instead of entry[3].
Use a collection of namedtuples. Now you can actually just insert ScoreEntry(player, a, b, c, time1), or you can insert ScoreEntry(player=player, height=a, weight=b, score=c, time=time1), whichever is more readable in a given case, and they both work the same way. And you can access entry.score or as entry[3], again using whichever is more readable.
Write an explicit class for score entries. This is pretty similar to the previous one, but there's more code to write, and you can't do indexed access anymore, but on the plus side you don't have to understand namedtuple.
Second, if you just print the entries, they look like a mess. The way to deal with that is string formatting. Instead of print scores, you do something like this:
print '\n'.join("{}: height {}, weight {}, score {} at {}".format(entry)
for entry in highscores)
If you're using a class or namedtuple instead of just a tuple, you can even format by name instead of by position, making the code much more readable.
Finally, the highscore file itself is an unreadable mess, because pickle is not meant for human consumption. If you want it to be human-readable, you have to pick a format, and write the code to serialize that format. Fortunately, the CSV format is pretty human-readable, and most of the code is already written for you in the csv module. (You may want to look at the DictReader and DictWriter classes, especially if you want to write a header line. Again, there's the tradeoff of a bit more code for a lot more readability.)

'Splitting' List into several Arrays

I'm trying to complete a Project that will show total annual sales from an specific list contained in a .txt file.
The list is formatted this way:
-lastname, firstname (string)
-45.7 (float)
-456.4 (float)
-345.5 (float)
-lastname2, firstname2 (string)
-3354.7 (float)
-54.6 (float)
-56.2 (float)
-lastname3, firstname3 (string)
-76.6 (float)
-34.2 (float)
-48.2 (float)
And so on.... Actually, 7 different "employees" followed by 12 set of "numbers" (months of the year)....but that example should suffice to give an idea of what I'm trying to do.
I need to output this specific information of every "employee"
-Name of employee
-Total Sum (sum of the 12 numbers in the list)
So my logic is taking me to this conclusion, but I don't know where to start:
Create 7 different arrays to store each "employee" data.
With this logic, I need to split the main list into independent arrays so I can work with them.
How can this be achieved? And also, if I don't have a predefined number of employees (but a defined format :: "Name" followed by 12 months of numbers)...how can I achieve this?
I'm sure I can figure once I get an idea how to "split" a list in different sections -- Every 13 lines?

Yes, at every thirteenth line you'd have the information of an employee.
However, instead of using twelve different lists, you can use a dictionary of lists, so that you wouldn't have to worry about the number of employees.
And you can either use a parameter on the number of lines directed to each employee.
You could do the following:
infile = open("file.txt", "rt")
employee = dict()
name = infile.readline().strip()
while name:
employee[name] = list()
for i in xrange(1, 12):
val = float(infile.readline().strip())
employee[name].append(val)
name = infile.readline().strip()
Some ways to access dictionary entries:
for name, months in employee.items():
print name
print months
for name in employee.keys():
print name
print employee[name]
for months in employee.values():
print months
for name, months in (employee.keys(), employee.values()):
print name
print months
The entire process goes as follows:
infile = open("file.txt", "rt")
employee = dict()
name = infile.readline().strip()
while name:
val = 0.0
for i in xrange(1, 12):
val += float(infile.readline().strip())
employee[name] = val
print ">>> Employee:", name, " -- salary:", str(employee[name])
name = infile.readline().strip()
Sorry for being round the bush, somehow (:

Here is option.
Not good, but still brute option.
summed = 0
with open("file.txt", "rt") as f:
print f.readline() # We print first line (first man)
for line in f:
# then we suppose every line is float.
try:
# convert to float
value = float(line.strip())
# add to sum
summed += value
# If it does not convert, then it is next person
except ValueError:
# print sum for previous person
print summed
# print new name
print line
# reset sum
summed = 0
# on end of file there is no errors, so we print lst result
print summed
since you need more flexibility, there is another option:
data = {} # dict: list of all values for person by person name
with open("file.txt", "rt") as f:
data_key = f.readline() # We remember first line (first man)
data[data_key] = [] # empty list of values
for line in f:
# then we suppose every line is float.
try:
# convert to float
value = float(line.strip())
# add to data
data[data_key].append(value)
# If it does not convert, then it is next person
except ValueError:
# next person's name
data_key = line
# new list
data[data_key] = []
Q: let's say that I want to print a '2% bonus' to employees that made more than 7000 in total sales (12 months)
for employee, stats in data.iteritems():
if sum(stats) > 7000:
print employee + " done 7000 in total sales! need 2% bonus"

I would not create 7 different arrays. I would create some sort of data structure to hold all the relevant information for one employee in one data type (this is python, but surely you can create data structures in python as well).
Then, as you process the data for each employee, all you have to do is iterate over one array of employee data elements. That way, it's much easier to keep track of the indices of the data (or maybe even eliminates the need to!).
This is especially helpful if you want to sort the data somehow. That way, you'd only have to sort one array instead of 7.

Python - Converting a Number to a Letter without an if statement

I am making a program for my own purposes (a naming program) that completely generates a random name. The problem is I cannot assign a number to a letter, so as a being 1 and z being 26, or a being 0 and z being 25. It gives me a SyntaxError. I need to assign this because the random integer (1,26) triggers a letter (if the random integer is 1, select A) and prints the name.
EDIT:
I have implemented your advice, and it works, I am grateful for this, but I wish to have my program create readable names, or more procedural. Here is an example of a name after I tweaked my program: ddjau. Now that doesn't look like a name, so I want it my program to work as if it were creating REAL names, like Samuel or other common names. Thanks!
EDIT (2):
Thanks, Adam, but I need a sort of 'seed' for the user to enter for the start of the name is. (Seed = A, Name = Adam. Seed = G, Name = George.) Should I do this by searching the file line by line, at the very beginning? If so, how do I do this?

Short Answer
Look into Python dictionaries to allow the 1 = 'a' type assignments. Below I have working example that would generate a random name based on gender and a 'litter'.
Disclaimer
I do not fully understand (via the code) what you're trying to accomplish with char/ord and a random letter. Also note having absolutely no idea of your design goals or requirements, I have made the example more complex than it may need to be for instructional purposes.
Additional Resources
* Python Docs for dictionary
* Using Python dictionary relationship to search both ways
In response to the last edit
If you are looking to build random 'real' names, I think your best bet will be to use a large list of names and just pick a random one. If I were you I'd look into something linking to the census results: males and females. Note that male_names.txt and female_names.txt are a copy of the list found at the census website. As a disclaimer, I'm sure there is a more efficient way to load / read the file. Just use this example as a proof on concept.
Update
Here's a quick and dirty way to seed the random values. Again I am not sure that this is the most pythonic way or most efficient way, but it works.
Example
import random
import time
def get_random_name(gender, seed):
if(gender == 'male'):
file = 'male_names.txt'
elif(gender == 'female'):
file = 'female_names.txt'
fid = open(file,'r')
names = []
total_names = 0
for line in fid:
if(line.lower().startswith(seed)):
names.append(line)
total_names = total_names + 1
random_index = random.randint(0,total_names)
return names[random_index]
if (__name__ == "__main__"):
print 'Welcome to Name Database 2.2\n'
print '1. Boy'
print '2. Girl'
bog = raw_input('\nGender: ')
print 'What should the name start with?'
print 'A, Ab, Abc, B, Ba, Br, etc...'
print ''
l = raw_input('Leter(s): ').lower()
new_name = ''
if bog == '1': # Boy
print get_random_name('male',l)
elif bog == '2':
print get_random_name('female',l)
Output
Welcome to Name Database 2.2
1. Boy
2. Girl
Gender: 2
What should the name start with?
A, Ab, Abc, B, Ba, Br, etc...
Leter(s): br
BRITTA

chr (see here) and ord (see here) are the two functions you're looking for (though you already seem to know about the latter). Follow those links for a more detailed explanation.
The first gives you a one-character string based on the integer, the second does the reverse operaion (technically, it handles Unicode as well, which chr doesn't, though you have unichr for that if you need it).
You can base your code on the following:
ch = "E"
print ord (ch) - ord ("A") + 1 # should give 5 for the fifth letter
val = 7
print chr (val + ord ("A") - 1) # should give G, the seventh letter

I'm not entirely sure what you're trying to do, but you can convert a number into a letter with the chr() function. chr() takes an ASCII code, so if you want to use the range [0, 25] instead you can adapt it like so:
chr(25 + ord('a')) # 'z'

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.