python 3 csv file - attribute error - python

I am trying to read in the following csv file:
https://github.com/eljefe6a/nfldata/blob/master/stadiums.csv
I copied and pasted the contents it into excel and save it as a csv file because it is in a unix format.
and I get the following attribute error message
Any help appreciated. Thank you.
import sys
import csv
with open('stadium.csv', newline='') as csvfile:
readCSV = csv.reader(csvfile,delimiter=',')
for line in readCSV:
line = line.strip()
unpacked = line.split(",")
stadium, capacity, expanded, location, surface, turf, team, opened, weather, roof, elevation = line.split(",")
results = [turf, "1"]
print("\t".join(results))
Error:
Traceback (most recent call last):
File "C:/Python34/mapper.py", line 31, in <module>
line = line.strip()
AttributeError: 'list' object has no attribute 'strip'

When you call .strip() on line it doesn't work because line is a list type object. Strip is method that only applies to strings. If I'm correct about what you're trying to do the correct way to unpack the variables would be:
stadium, capacity, expanded, location, surface, turf, team, opened, weather, roof, elevation = line[0], line[1], line[2], line[3], line[4], line[5], line[6], line[7], line[8], line[9], line[10]
The above works because you put the location of the value in the list (line) within the brackets and unpack the values into their respective variables.
Then call you can do:
stadium.split()
for example.

When you are using csv module, with delimiter as ',' and when you do -
for line in readCSV:
line is actually each row in the csv, and the row would be a list of all elements in that row, delimited by ',' . You actually do not need to strip or split them again.
You can try -
import sys
import csv
with open('stadium.csv', newline='') as csvfile:
readCSV = csv.reader(csvfile,delimiter=',')
for line in readCSV:
stadium, capacity, expanded, location, surface, turf, team,opened, weather, roof, elevation = line
results = [turf, "1"]
print("\t".join(results))
Please do make sure that the elements you do unpacking are there in the csv.

The CSV reader already separates all the fields for you. That is, your line variable is already a list, not a string, so there's nothing to strip and nothing to split. Use line the way you intended to use unpacked.
That's why you're using the csv package in the first place, remember.

Related

Why do I get a KeyError when using writerow() but not when using print()?

I'm designing a program that works as designed to take a csv file in the form of last name, first name, Harry Potter house and write it to another csv file in the form of first name, last name, house. It works fine when I print it locally to my terminal (when the part commented out is used), but throws the following error when I run it as is.
Traceback (most recent call last):
File "/workspaces/107595329/scourgify/scourgify.py", line 26, in <module>
writer.writerow(row[first], row[last], row[house])
KeyError: ' Hannah'
Hannah being the first of the first names in the file. How can I re-write the writer.writerow() line(line 26) to write first, last, and house to my after.csv file?
Code here:
import csv
import sys
students = []
try:
if len(sys.argv) < 3:
print("Too few command line arguments")
sys.exit(1)
if len(sys.argv) > 3:
print("Too many command line arguments")
sys.exit(1)
if not sys.argv[1].endswith(".csv") or not sys.argv[2].endswith(".csv"):
print("Not a CSV file")
sys.exit(1)
elif sys.argv[1].endswith(".csv") and sys.argv[2].endswith(".csv"):
with open(sys.argv[1]) as file:
reader = csv.DictReader(file)
for row in reader:
students.append({"name": row["name"], "house": row["house"]})
with open(sys.argv[2], "w") as after:
writer = csv.DictWriter(after, fieldnames = ["first, last, house"])
for row in students:
house = row["house"]
last, first = row["name"].split(",")
writer.writerow(row[first], row[last], row[house])
#print(first, last, house)
except FileNotFoundError:
print("Could not read " + sys.argv[1])
sys.exit(1)
Some notes about the OP code:
with open(sys.argv[1]) as file:
reader = csv.DictReader(file)
for row in reader:
students.append({"name": row["name"], "house": row["house"]})
# "row" is already the dictionary being formed here, so
# students.append(row) is all that is needed
# Could also do the following and skip the for loop
# students = list(reader)
with open(sys.argv[2], "w") as after:
# fieldnames is wrong.
# fieldnames = ['first','last','house'] is what is needed
writer = csv.DictWriter(after, fieldnames = ["first, last, house"])
for row in students:
house = row["house"]
last, first = row["name"].split(",")
# DictWriter.writerow takes a single dictionary as a parameter.
# correct would be:
# writer.writerow({'first':first,'last':last,'house':house})
writer.writerow(row[first], row[last], row[house])
Frankly, DictReader is overkill for this task. Open both files at once and re-write each line as it is read with using the standard reader/writer objects.
An input file wasn't provided, so given the following from the code description:
input.csv
name,house
"Potter, Harry",Gryffindor
"Weasley, Ron",Gryffindor
"Malfoy, Draco",Slytherin
"Lovegood, Luna",Ravenclaw
This code will process as needed:
test.py
import csv
# Note: newline='' is a requirement for the csv module when opening files.
# Without ends up writing \r\r\n line endings on Windows.
with (open('input.csv', newline='') as file,
open('output.csv', 'w', newline='') as after):
reader = csv.reader(file)
next(reader) # skip header
writer = csv.writer(after)
writer.writerow(['first', 'last', 'house']) # write new header
for name, house in reader:
last, first = name.split(', ')
writer.writerow([first, last, house])
output.csv
first,last,house
Harry,Potter,Gryffindor
Ron,Weasley,Gryffindor
Draco,Malfoy,Slytherin
Luna,Lovegood,Ravenclaw
Is it possible that you have a "Hannah" entry, while you actually are looking for a " Hannah" entry (with a whitespace as first character)?
That would make sense, since the print statement uses your "first", "last" and "house" variables, while the writerow command reads entries from a dictionary first. The split command in
last, first = row["name"].split(",")
will then probably output ["lastname", " firstname"] as the data probably is stored as "lastname, firstname", right?
I'd recommend using string.strip() or something similar to manage your keys properly
This is the line of code in question:
writer.writerow(row[first], row[last], row[house])
Per the error message you're getting, one of first, last or house (first, I'm guessing) has the value "Hannah", and the error is due to an attempt to find a key with that value in row. The error is saying that row has no key named `"Hannah".
So what is row in this context? Well, looking at your code, row is each time through the surrounding for loop one of the values in the list students. So then what is in students? Well, values are added to students via the following line of code:
students.append({"name": row["name"], "house": row["house"]})
So students is a list of dicts, each of which contains two keys, "name" and "house". So it seems that any particular element in students is never going to be a dict containing the key "Hannah". There's your problem.
The two lines of code just above this line have it right:
house = row["house"]
last, first = row["name"].split(",")
Here, the two keys we know are in each element in row are being properly extracted. So at this point, you should be done with row. row contains nothing more than exactly what you've already read from it.
...and what do you know! Your commented out print() statement had it right all along. I assume that what you really want here is:
writer.writerow(first, last, house)

trim leading and trailing whitespaces along with commas in csv file with python

this is how my data looks like when opened in Microsoft Excel.
As can be seen all the contents of the cell except 218(aligned to the right of the cell) are parsed as strings(aligned to the left of the cell). It is because they start with white space(it is " 4,610" instead of "4610").
I would like to remove all those white spaces at the beginning and also replace those commas(not the ones that make csvs csvs) because if comma exists 4 and 610 may be read into different cells.
Here's what I tried:
this is what i tried with inspiration from this stackoverflow answer:
import csv
import string
with open("old_dirty_file.csv") as bad_file:
reader = csv.reader(bad_file, delimiter=",")
with open("new_clean_file.csv", "w", newline="") as clean_file:
writer = csv.writer(clean_file)
for rec in reader:
writer.writerow(map(str.replace(__old=',', __new='').strip, rec))
But, I get this error:
Traceback (most recent call last):
File "C:/..,,../clean_files.py", line 9, in <module>
writer.writerow(map(str.replace(__old=',', __new='').strip, rec))
TypeError: descriptor 'replace' of 'str' object needs an argument
How do I clean those files?
Just need to separate replacement from stripping because python doesn't know which string the replacement should be made in.
for rec in reader:
rec = (i.replace(__old=',', __new='') for i in rec)
writer.writerow(map(str.strip, rec))
or combine them into a single function:
repstr = lambda string, old=',', new='': string.replace(old, new).strip()
for rec in reader:
writer.writerow(map(repstr, rec))

Outputting the contents of a file in Python

I'm trying to do something simple but having issues.
I want to read in a file and export each word to different columns in an excel spreadsheet. I have the spreadsheet portion, just having a hard time on what should be the simple part.
What I have happening is each character is placed on a new line.
I have a file called server_list. That file has contents as shown below.
Linux RHEL64 35
Linux RHEL78 24
Linux RHEL76 40
I want to read each line in the file and assign each word a variable so I can output it to the spreadsheet.
File = open("server_list", "r")
FileContent = File.readline()
for Ser, Ver, Up value in FileContent:
worksheet.write(row, col, Ser)
worksheet.write(row, col +1, Ver)
worksheet.write(row, col +1, Up)
row += 1
I'm getting the following error for this example
Traceback (most recent call last):
File "excel.py", line 47, in <module>
for Files, Ver, Uptime in FileContent:
ValueError: not enough values to unpack (expected 3, got 1)
FileContent is a string object that contains a single line of your file:
Out[4]: 'Linux RHEL64 35\n'
What you want to do with this string is strip the ending tag \n then split into single words. Only at this point you can do the item assignment that currently leads to your ValueError in your for-statement.
In python this means:
ser, ver, up = line.strip().split() # line is what you called FileContent, I'm allergic to caps in variable names
Now note that this is just one single line we are talking about. Probably you want to do this for all lines in the file, right?
So best is to iterate over the lines:
myfile = "server_list"
with open(myfile, 'r') as fobj:
for row, line in enumerate(fobj):
ser, ver, up = line.strip().split()
# do stuff with row, ser, ver, up
Note that you do not need to track the row yourself you can use the enumerate iterator to do so.
Also note, and this is crucial: the with statement I used here makes sure that you do not leave the file open. Using the with-clause whenever you are working with files is a good habit!

How to apply regex sub to a csv file in python

I have a csv file I wish to apply a regex replacement to with python.
So far I have the following
reader = csv.reader(open('ffrk_inventory_relics.csv', 'r'))
writer = csv.writer(open('outfile.csv','w'))
for row in reader:
reader = re.sub(r'\+','z',reader)
Which is giving me the following error:
Script error: Traceback (most recent call last):
File "ffrk_inventory_tracker_v1.6.py", line 22, in response
getRelics(data['equipments'], 'ffrk_inventory_relics')
File "ffrk_inventory_tracker_v1.6.py", line 72, in getRelics
reader = re.sub(r'\+','z',reader)
File "c:\users\baconcatbug\appdata\local\programs\python\python36\lib\re.py",
line 191, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
After googling to not much luck, I would like to ask the community here how to open the csv file correctly so I can use re.sub on it and then write out the altered csv file back to the same filename.
csv.reader(open('ffrk_inventory_relics.csv', 'r')) is creating a list of lists, and when you iterate over it and pass each value to re.sub, you are passing a list, not a string. Try this:
import re
import csv
final_data = [[re.sub('\+', 'z', b) for b in i] for i in csv.reader(open('ffrk_inventory_relics.csv', 'r'))]
write = csv.writer(open('ffrk_inventory_relics.csv'))
write.writerows(final_data)
If you don't need csv you can use replace with regular open:
with open('ffrk_inventory_relics.csv', 'r') as reader, open('outfile.csv','w') as writer:
for row in reader:
writer.write(row.replace('+','z'))

Trying to remove rows based in csv file based off column value

I'm trying to remove duplicated rows in a csv file based on if a column has a unique value. My code looks like this:
seen = set()
for line in fileinput.FileInput('DBA.csv', inplace=1):
if line[2] in seen:
continue # skip duplicated line
seen.add(line[2])
print(line, end='')
I'm trying to get the value of the 2 index column in every row and check if it's unique. But for some reason my seen set looks like this:
{'b', '"', 't', '/', 'k'}
Any advice on where my logic is flawed?
You're reading your file line by line, so when you pick line[2] you're actually picking the third character of each line you're running this on.
If you want to capture the value of the second column for each row, you need to parse your CSV first, something like:
import csv
seen = set()
with open("DBA.csv", "rUb") as f:
reader = csv.reader(f)
for line in reader:
if line[2] in seen:
continue
seen.add(line[2])
print(line) # this will NOT print valid CSV, it will print Python list
If you want to edit your CSV in place I'm afraid it will be a bit more complicated than that. If your CSV is not huge, you can load it in memory, truncate it and then write down your lines:
import csv
seen = set()
with open("DBA.csv", "rUb+") as f:
handler = csv.reader(f)
data = list(handler)
f.seek(0)
f.truncate()
handler = csv.writer(f)
for line in data:
if line[2] in seen:
continue
seen.add(line[2])
handler.writerow(line)
Otherwise you'll have to read your file line by line and use a buffer that you'll pass to csv.reader() to parse it, check the value of its third column and if not seen write the line to the live-editing file. If seen, you'll have to seek back to the previous line beginning before writing the next line etc.
Of course, you don't need to use the csv module if you know your line structures well which can simplify the things (you won't need to deal with passing buffers left and right), but for a universal solution it's highly advisable to let the csv module do your bidding.

Categories