I started learning Python and I'm taking a Google course on Coursera about automation and IT using it. In the Practice Quiz: Reading & Writing CSV Files, the first question is:
We're working with a list of flowers and some information about each one. The create_file function writes this information to a CSV file. The contents_of_file function reads this file into records and returns the information in a nicely formatted block. Fill in the gaps of the contents_of_file function to turn the data in the CSV file into a dictionary using DictReader.
After giving an answer I receive "Incorrect. Something went wrong! Contact Coursera Support about this question!. I've found a page here and copied that code but the answer is always the same. So I contacted Coursera, but they say there's no problem on their end. That's the code I provided:
import os
import csv
# Create a file with data in it
def create_file(filename):
with open(filename, "w") as file:
file.write("name,color,type\n")
file.write("carnation,pink,annual\n")
file.write("daffodil,yellow,perennial\n")
file.write("iris,blue,perennial\n")
file.write("poinsettia,red,perennial\n")
file.write("sunflower,yellow,annual\n")
# Read the file contents and format the information about each row
def contents_of_file(filename):
return_string = ""
# Call the function to create the file
create_file(filename)
# Open the file
with open(filename) as f:
# Read the rows of the file into a dictionary
x = csv.DictReader(f)
# Process each item of the dictionary
for row in x:
return_string += "a {} {} is {}\n".format(row["color"], row["name"], row["type"])
return return_string
#Call the function
print(contents_of_file("flowers.csv"))
Has anyone encountered the same issues? Or can you explain to me why it doesn't work?
Adding the console log of the browser here. Tried with Firefox, Chrome and now on Opera.
Console Log
As it is an online evaluation platform, it might prohibit things like import OS for security reasons. Besides, it's not doing anything in your code. Did you try removing that line?
It seems you missed the some options(delimiter and newline='') in the reader function. Here is the working code:
import os
import csv
# Create a file with data in it
def create_file(filename):
with open(filename, "w") as file:
file.write("name,color,type\n")
file.write("carnation,pink,annual\n")
file.write("daffodil,yellow,perennial\n")
file.write("iris,blue,perennial\n")
file.write("poinsettia,red,perennial\n")
file.write("sunflower,yellow,annual\n")
# Read the file contents and format the information about each row
def contents_of_file(filename):
return_string = ""
# Call the function to create the file
create_file(filename)
# Open the file
with open(filename, "r", newline='') as f:
# Read the rows of the file into a dictionary
reader = csv.DictReader(f, delimiter=",")
# Process each item of the dictionary
for row in reader:
return_string += "a {} {} is {}\n".format(row["color"], row["name"], row["type"])
return return_string
#Call the function
print(contents_of_file("flowers.csv"))
and result is:
a pink carnation is annual
a yellow daffodil is perennial
a blue iris is perennial
a red poinsettia is perennial
a yellow sunflower is annual
Keep in mind that newline = '' is for python3 and the delimiter must be set in order to be read correctly.
This issue still persists. I reported it to Coursera today. There has to be an error on their side. Well, at least it's not a graded assessment, just a practice quiz. But frustrating nevertheless.
Related
I have the following data in a file called data.txt and would like to be able to add to the numbers at the end and replace them in the file without creating a new one:
Alfreda,art,2015,35
brook,biology,2015,3
charlie,chemistry,2015,140
dolly,Design,2015,120
Emilia,English,2015,150
Fiona,french,2015,40
Grace,Greek,2015,12
Hanna,history,2015,15
Here is the code I currently have:
with open("data.txt", "r") as f:
newline=[]
for word in f.line():
newline.append(word.replace(35,str(New))
with open("data.txt", "w") as f:
for line in newline :
f.writelines(line)
If you just want to add string to each line then update the file, this code can solve your problem but this is not optimal.
with open("data.txt", "r") as myFile:
newline=[]
# Use the readlines method to get all the lines
for line in myFile.readlines():
# Remove the \n character with the rstrip method
line = line.rstrip('\n')
newline.append(line+",35\n") # Don't forget to add \n
# Test
print newline
myFile.close()
with open("data.txt", "w") as myFile:
for line in newline :
myFile.writelines(line)
If this is not your problem, try to use the pickle module and work with objects, it will be easier.
I'm going to have to make some of your question up. If you have a file and you want to update it, the updates have to come from somewhere. The code in the question has a New variable but there is no indication of how New is supposed to get a value, or how the program is supposed to know which row to update.
I'm going to assume you have a file of updates called updates.txt that looks like this (and it is deliberately not in alphabetical order):
Emilia,45
Alfreda,35
So after your program runs the resulting file will have two rows different:
Alfreda,art,2015,70 ...this one
brook,biology,2015,3
charlie,chemistry,2015,140
dolly,Design,2015,120
Emilia,English,2015,195 ...and this one
Fiona,french,2015,40
Grace,Greek,2015,12
Hanna,history,2015,15
But the rest the same.
Since your sample data file is a .csv file I am using the Python csv module, rather than picking the data apart by hand. It doesn't matter much with simple data like this but it's a good module to know about.
import csv
marks = {}
# Read in existing data into a dictionary:
# key is name, value is a list [subject, year, score]
# like this: {"Alfreda": ["art",2015,35], ... }
# This is to make it easy to do random updates based on name
with open("data.txt", "r") as f:
for row in csv.reader(f):
name,subject,year,score = row
marks[name] = [subject,int(year),int(score)]
# Read in updates and apply each line to the corresponding entry in marks
with open("updates.txt", "r") as f:
for row in csv.reader(f):
name,added_score = row
try:
marks[name][2] += int(added_score) # for example marks["Alfreda"][2] += int("35")
except KeyError:
print(f"Name {name} not found to update, nothing done")
# Write out updated dictionary:
with open("data.txt", "w") as f:
writer = csv.writer(f,lineterminator="\n")
for name in sorted(marks.keys(), key=lambda n: n.lower()):
row=[name]+marks[name] # for example ["Alfreda"] + ["art",2015,70]
writer.writerow(row)
This line:
for name in sorted(marks.keys(), key=lambda n: n.lower()):
looks complicated but it is needed because you obviously expect the names Alfreda brook charlie dolly Emilia Fiona Grace Hanna to be in that order. But just doing the obvious
for name in sorted(marks.keys()):
will put them in the order Alfreda Emilia Fiona Grace Hanna brook charlie dolly.
In the interests of keeping the code simple and as close to your original as possible, it does no validity checks, so if this line
charlie,chemistry,2015,140
was wrongly entered as
charlie,chemistry,2015,14O
(with the letter O instead of a zero), the program will just fail. Ditto if the update file is missing a comma somewhere.
This works and will do what I think you want. But...
There are issues with the design. Your program reads in the data from data.txt, then overwrites it with new data. But suppose your program fails just after this line:
with open("data.txt", "w") as f:
Then you won't have your original data (because the call to open() truncated it), and you won't have the new data either (because you haven't written it out yet). Or suppose you accidentally run the program twice. There will be no way to tell you have done that.
You can provide some insurance against this sort of mishap by using the fileinput module, like this:
import fileinput
# Read in existing data
with fileinput.input("data.txt", inplace=True, backup=".bkp") as f:
for row in csv.reader(f):
name,subject,year,score = row
marks[name] = [subject,int(year),int(score)]
With this change, your updates will be in data.txt as before, but your original data will still be around, in a file called data.txt.bkp.
But that is just a fix. It avoids the real issue, which is that you really have a database application and you are trying to implement it using textfiles. The code above is all very well for an exercise, but it's not robust and it won't scale.
I am trying to create a basic mathematical quiz and need to be able to store the name of the user next to their score. To ensure that I could edit the data dynamically regardless of the length of the user's name or the number of digits in their score, I decided to split up the name and score with a comma and use the split function. I'm new to file handling in python so don't know if I am using the wrong mode ("r+") but when I complete the quiz, my score is not recorded at all, nothing is added to the file. Here is my code:
for line in class_results.read():
if student_full_name in line:
student = line.split(",")
student[1] = correct
line.replace(line, "{},{}".format(student_full_name, student[1]))
else:
class_results.write("{},{}".format(student_full_name, correct))
Please let me know how I can get this system to work. Thank you in advance.
Yes r+ opens the file for both reading and writing and to summarize:
r when the file will only be read
w for only writing (an existing file with the same name will be erased)
a opens the file for appending; any data written to the file is automatically added to the end.
I will recommend instead of comma separation to benifit from json or yaml syntax, it fits better in this case.
scores.json:
{
"student1": 12,
"student2": 798
}
The solution:
import json
with open(filename, "r+") as data:
scores_dict = json.loads(data.read())
scores_dict[student_full_name] = correct # if already exist it will be updated otherwise it will be added
data.seek(0)
data.write(json.dumps(scores_dict))
data.truncate()
scores.yml will looks as follow:
student1: 45
student2: 7986
Solution:
import yaml
with open(filename, "r+") as data:
scores_dict = yaml.loads(data.read())
scores_dict[student_full_name] = correct # if already exist it will be updated otherwise it will be added
data.seek(0)
data.write(yaml.dump(scores_dict, default_flow_style=False))
data.truncate()
to instal yaml python package: pip install pyyaml
Modifying a file in place is generally a poor way to do this. It risks errors causing the resulting file to be half new data, half old, with the split point being corrupted. The usual pattern is to write to a new file, then atomically replace the old file with the new file, so either you have the entire original old file and a partial new file, or the new file, not a mish-mash of both.
Given your example code, here is how you would fix it up to do that:
import csv
import os
from tempfile import NamedTemporaryFile
origfile = '...'
origdir = os.path.dirname(origfile)
# Open original file for read, and tempfile in same directory for write
with open(origfile, newline='') as inf, NamedTemporaryFile('w', dir=origdir, newline='') as outf:
old_results = csv.reader(inf)
new_results = csv.writer(outf)
for name, oldscore in old_results:
if name == student_full_name:
# Found our student, replace their score
new_results.writerow((name, correct))
# The write out the rest of the lines unchanged
new_results.writerows(old_results)
# and we're done
break
else:
new_results.writerow((name, oldscore))
else:
# else block on for loop executes if loop ran without break-ing
new_results.writerow((student_full_name, correct))
# If we got here, no exceptions, so let's keep the new data to replace the old
outf.delete = False
# Atomically replaces the original file with the temp file with updated data
os.replace(outf.name, origfile)
I am trying to remove duplicates of 3-column tab-delimited txt file, but as long as the first two columns are duplicates, then it should be removed even if the two has different 3rd column.
from operator import itemgetter
import sys
input = sys.argv[1]
output = sys.argv[2]
#Pass any column number you want, note that indexing starts at 0
ig = itemgetter(0,1)
seen = set()
data = []
for line in input.splitlines():
key = ig(line.split())
if key not in seen:
data.append(line)
seen.add(key)
file = open(output, "w")
file.write(data)
file.close()
First, I get error
key = ig(line.split())
IndexError: list index out of range
Also, I can't see how to save the result to output.txt
People say saving to output.txt is a really basic matter. But no tutorial helped.
I tried methods that use codec, those that use with, those that use file.write(data) and all didn't help.
I could learn MatLab quite easily. The online tutorial was fantastic and a series of Googling always helped a lot.
But I can't find a helpful tutorial of Python yet. This is obviously because I am a complete novice. For complete novices like me, what would be the best tutorial with 1) comprehensiveness AND 2) lots of examples 3) line by line explanation that dosen't leave any line without explanation?
And why is the above code causing error and not saving result?
I'm assuming since you assign input to the first command line argument with input = sys.argv[1] and output to the second, you intend those to be your input and output file names. But you're never opening any file for the input data, so you're callling .splitlines() on a file name, not on file contents.
Next, splitlines() is the wrong approach here anyway. To iterate over a file line-by-line, simply use for line in f, where f is an open file. Those lines will include the newline at the end of the line, so it needs to be stripped if it's not supposed to be part of the third columns data.
Then you're opening and closing the file inside your loop, which means you'll try to write the entire contents of data to the file every iteration, effectively overwriting any data written to the file before. Therefore I moved that block out of the loop.
It's good practice to use the with statement for opening files. with open(out_fn, "w") as outfile will open the file named out_fn and assign the open file to outfile, and close it for you as soon as you exit that indented block.
input is a builtin function in Python. I therefore renamed your variables so no builtin names get shadowed.
You're trying to directly write data to the output file. This won't work since data is a list of lines. You need to join those lines first in order to turn them in a single string again before writing it to a file.
So here's your code with all those issues addressed:
from operator import itemgetter
import sys
in_fn = sys.argv[1]
out_fn = sys.argv[2]
getkey = itemgetter(0, 1)
seen = set()
data = []
with open(in_fn, 'r') as infile:
for line in infile:
line = line.strip()
key = getkey(line.split())
if key not in seen:
data.append(line)
seen.add(key)
with open(out_fn, "w") as outfile:
outfile.write('\n'.join(data))
Why is the above code causing error?
Because you haven't opened the file, you are trying to work with the string input.txtrather than with the file. Then when you try to access your item, you get a list index out of range because line.split() returns ['input.txt'].
How to fix that: open the file and then work with it, not with its name.
For example, you can do (I tried to stay as close to your code as possible)
input = sys.argv[1]
infile = open(input, 'r')
(...)
lines = infile.readlines()
infile.close()
for line in lines:
(...)
Why is this not saving result?
Because you are opening/closing the file inside the loop. What you need to do is write the data once you're out of the loop. Also, you cannot write directly a list to a file. Hence, you need to do something like (outside of your loop):
outfile = open(output, "w")
for item in data:
outfile.write(item)
outfile.close()
All together
There are other ways of reading/writing files, and it is pretty well documented on the internet but I tried to stay close to your code so that you would understand better what was wrong with it
from operator import itemgetter
import sys
input = sys.argv[1]
infile = open(input, 'r')
output = sys.argv[2]
#Pass any column number you want, note that indexing starts at 0
ig = itemgetter(0,1)
seen = set()
data = []
lines = infile.readlines()
infile.close()
for line in lines:
print line
key = ig(line.split())
if key not in seen:
data.append(line)
seen.add(key)
print data
outfile = open(output, "w")
for item in data:
outfile.write(item)
outfile.close()
PS: it seems to produce the result that you needed there Python to remove duplicates using only some, not all, columns
I am making a creature simulator, at the end of the every creature, should dump a json form of their information to a single file. Then in the morning, the simulator should be able to pull all of the creatures information from that single file and reinstantiate them like they were before.
So is there a way to have:
newDailyFile = path+day
with open(newDailyFile, "a") as file:
for i in creatures:
dump({'name':name, 'numbers':n, 'strings':s, 'x':x, 'y':y}, file, indent=4)
#The only thing that is guaranteed to be unique is the name
then
with open("text") as file:
result = load(file)
for something in result:
creature = Creature(result)
The issue is in the second part, I dont know how to read each creature individually. How can I do that?
I just had to change the way I was writing the strings and reading them back.
newDailyFile = path+day
with open(newDailyFile, "a") as file:
for i in creatures:
dump({'name':name, 'numbers':n, 'strings':s, 'x':x, 'y':y}, file) #removed indent so each one is online
file.write("\n")#or else they will all be on the same line
then
with open(newDailyFile) as file:
lines = file.readlines() #Gets each one line by line, then I can load them
for i in lines:
result = loads(i)
Ok, so I'm learning Python. But for my studies I have to do rather complicated stuff already. I'm trying to run a script to analyse data in excel files. This is how it looks:
#!/usr/bin/python
import sys
#lots of functions, not relevant
resultsdir = /home/blah
filename1=sys.argv[1]
filename2=sys.argv[2]
out = open(sys.argv[3],"w")
#filename1,filename2="CNVB_reads.403476","CNVB_reads.403447"
file1=open(resultsdir+"/"+filename1+".csv")
file2=open(resultsdir+"/"+filename2+".csv")
for line in file1:
start.p,end.p,type,nexons,start,end,cnvlength,chromosome,id,BF,rest=line.split("\t",10)
CNVs1[chr].append([int(start),int(end),float(BF)])
for line in file2:
start.p,end.p,type,nexons,start,end,cnvlength,chromosome,id,BF,rest=line.split("\t",10)
CNVs2[chr].append([int(start),int(end),float(BF)])
These are the titles of the columns of the data in the excel files and I want to split them, I'm not even sure if that is necessary when using data from excel files.
#more irrelevant stuff
out.write(filename1+","+filename2+","+str(chromosome)+","+str(type)+","+str(shared)+"\n")
This is what it should write in my output, 'shared' is what I have calculated, the rest is already in the files.
Ok, now my question, finally, when I call the script like that:
python script.py CNVB_reads.403476 CNVB_reads.403447 script.csv in my shell
I get the following error message:
start.p,end.p,type,nexons,start,end,cnvlength,chromosome,id,BF,rest=line.split("\t",10)
ValueError: need more than 1 value to unpack
I have no idea what is meant by that in relation to the data... Any ideas?
The line.split('\t', 10) call did not return eleven elements. Perhaps it is empty?
You probably want to use the csv module instead to parse these files.
import csv
import os
for filename, target in ((filename1, CNVs1), (filename2, CNVs2)):
with open(os.path.join(resultsdir, filename + ".csv"), 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter='\t')
for row in reader:
start.p, end.p = row[:2]
BF = float(row[8])
target[chr].append([int(start), int(end), BF])