def read_dict(file_name):
f=open(file_name,'r')
dict_rap={}
for key, val in csv.reader(f):
dict_rap[key]=str(val)
f.close()
return(dict_rap)
test_dict = {'wassup':['Hi','Hello'],'get up through':['to leave','to exit'],
'its on with you':['good bye','have a nice day'],'bet':['ok','alright'],'ight':['ok','yes'],
'whip':['car','vechile'],'lit':['fun','festive'],'guap':['money','currency'],'finesse':['to get desired results by anymeans','to trick someone'],
'jugg':['how you makemoney','modern term for hustle'],'1111':['www'] }
Traceback (most recent call last):
File "C:\Users\C2C\Desktop\rosetta_stone.py", line 97, in
reformed_dict = read_dict(file_name)#,test_dict)
File "C:\Users\C2C\Desktop\rosetta_stone.py", line 63, in read_dict
for key, val in csv.reader(f):
ValueError: too many values to unpack (expected 2)
From csv documentation ...
In [2]: csv.reader??
Docstring:
csv_reader = reader(iterable [, dialect='excel']
[optional keyword args])
for row in csv_reader:
process(row)
......
......
The returned object is an iterator. Each iteration returns a row
I guess it's pretty self explanatory...
I think each row of that list is a dictionary you're expecting. So your dict processing code should go inside a iteration which will iterate over the fat list returned by the csv.reader
I'm afraid that csv.reader(f) does not return what you are expecting it to return. I don't know exactly how your .csv file looks like, but I doubt that it directly returns the two values that you are trying to put into the dictionary.
Assuming that the first 3 lines of your .csv look something like this:
wassup,hi,hello
get up through,to leave,to exit
its on you,good bye,have a nice day
a better way to get the .cvs and iterate over each line might be:
...
my_csv = csv.reader(f)
for row in my_csv:
# row is a list with all the values you have in one line in the .csv
if len(row) > 1:
key = row[0] # for the 1st line the value is the string: 'wassup'
values = row[1:] # here for the first line you get the list: ['hi', 'hello']
# ... and so on
It is saying that csv.reader(f) is yielding only one thing that you are trying to treat as two things (key and val).
Presuming that you are using the standard csv module, then you are getting a list of only one item. If you expect the input to have two items, then perhaps you need to specificity a different delimiter. For example if your input has semi colons instead of commas:
csv.reader(f, delimiter=";")
Related
I'm trying to remove duplicated rows in a csv file based on if a column has a unique value. My code looks like this:
seen = set()
for line in fileinput.FileInput('DBA.csv', inplace=1):
if line[2] in seen:
continue # skip duplicated line
seen.add(line[2])
print(line, end='')
I'm trying to get the value of the 2 index column in every row and check if it's unique. But for some reason my seen set looks like this:
{'b', '"', 't', '/', 'k'}
Any advice on where my logic is flawed?
You're reading your file line by line, so when you pick line[2] you're actually picking the third character of each line you're running this on.
If you want to capture the value of the second column for each row, you need to parse your CSV first, something like:
import csv
seen = set()
with open("DBA.csv", "rUb") as f:
reader = csv.reader(f)
for line in reader:
if line[2] in seen:
continue
seen.add(line[2])
print(line) # this will NOT print valid CSV, it will print Python list
If you want to edit your CSV in place I'm afraid it will be a bit more complicated than that. If your CSV is not huge, you can load it in memory, truncate it and then write down your lines:
import csv
seen = set()
with open("DBA.csv", "rUb+") as f:
handler = csv.reader(f)
data = list(handler)
f.seek(0)
f.truncate()
handler = csv.writer(f)
for line in data:
if line[2] in seen:
continue
seen.add(line[2])
handler.writerow(line)
Otherwise you'll have to read your file line by line and use a buffer that you'll pass to csv.reader() to parse it, check the value of its third column and if not seen write the line to the live-editing file. If seen, you'll have to seek back to the previous line beginning before writing the next line etc.
Of course, you don't need to use the csv module if you know your line structures well which can simplify the things (you won't need to deal with passing buffers left and right), but for a universal solution it's highly advisable to let the csv module do your bidding.
I want to make a list of lists in python.
My code is below.
import csv
f = open('agGDPpct.csv','r')
inputfile = csv.DictReader(f)
list = []
next(f) ##Skip first line (column headers)
for line in f:
array = line.rstrip().split(",")
list.append(array[1])
list.append(array[0])
list.append(array[53])
list.append(array[54])
list.append(array[55])
list.append(array[56])
list.append(array[57])
print list
I'm pulling only select columns from every row. My code pops this all into one list, as such:
['ABW', 'Aruba', '0.506252445', '0.498384331', '0.512418427', '', '', 'AND', 'Andorra', '', '', '', '', '', 'AFG', 'Afghanistan', '30.20560247', '27.09154001', '24.50744042', '24.60324707', '23.96716227'...]
But what I want is a list in which each row is its own list: [[a,b,c][d,e,f][g,h,i]...] Any tips?
You are almost there. Make all your desired inputs into a list before appending. Try this:
import csv
with open('agGDPpct.csv','r') as f:
inputfile = csv.DictReader(f)
list = []
for line in inputfile:
list.append([line[1], line[0], line[53], line[54], line[55], line[56], line[57]])
print list
To end up with a list of lists, you have to make the inner lists with the columns from each row that you want, and then append that list to the outer one. Something like:
for line in f:
array = line.rstrip().split(",")
inner = []
inner.append(array[1])
# ...
inner.append(array[57])
list.append(inner)
Note that it's also not a good practice to use the name of the type ("list") as a variable name -- this is called "shadowing", and it means that if you later try to call list(...) to convert something to a list, you'll get an error because you're trying to call a particular instance of a list, not the list built-in.
To build on csv module capabilities, I'll do
import csv
f = csv.reader(open('your.csv'))
next(f)
list_of_lists = [items[1::-1]+items[53:58] for items in f]
Note that
items is a list of items, thanks to the intervention of a csv.reader() object;
using slice addressing returns sublists taken from items, so that the + operator in this context means concatenation of lists
the first slice expression 1::-1means from 1 go to the beginning moving backwards, or [items[1], items[0]].
Referring to https://docs.python.org/2/library/csv.html#csv.DictReader
Instead of
for line in f:
Write
for line in inputfile:
And also use list.append([array[1],array[0],array[53],..]) to append a list to a list.
One more thing, referring to https://docs.python.org/2/library/stdtypes.html#iterator.next , use inputfile.next() instead of next(f) .
After these changes, you get:
import csv
f = open('agGDPpct.csv','r')
inputfile = csv.DictReader(f)
list = []
inputfile.next() ##Skip first line (column headers)
for line in inputfile:
list.append([array[1],array[0],array[53],array[54],array[55],array[56],array[57]])
print list
In addition to that, it is not a good practice to use list as a variable name as it is a reserved word for the data structure of the same name. Rename that too.
You can further improve the above code using with . I will leave that to you.
Try and see if it works.
I have an input file that contains lines of:
key \t value1 \t value2 .....
I'd like read this file into a dictionary where key is the first token of the line and the value is the list of the values.
I think something like this would do it, but python gives me an error that name l is not defined. How do I write a comprehension that has two levels of "for" statements like this?
f = open("input.txt")
datamap = {tokens[0]:tokens[1:] for tokens in l.split("\t") for l in enumerate(f)}
Use the csv module and insert each row into a dictionary:
import csv
with open('input.txt') as tsvfile:
reader = csv.reader(tsvfile, delimiter='\t')
datamap = {row[0]: row[1:] for row in reader}
This sidesteps the issue altogether.
You can put a str.split() result into a tuple to create a 'loop variable':
datamap = {row[0]: row[1:] for l in f for row in (l.strip().split("\t"),)}
Here row is bound to the one str.split() result from the tuple, effectively creating a row = l.strip().split('\t') 'assignment'.
Martijn's got you covered for improving the process, but just to directly address the issues you were seeing with your code:
First, enumerate is not doing what you think it's doing (although I'm not entirely sure what you think it's doing). You can just get rid of it.
Second, Python is trying to resolve this:
tokens[0]:tokens[1:] for tokens in l.split("\t")
before it sees what you're defining l as. You can put parentheses around the second comprehension to make it evaluate as you intended:
datamap = {tokens[0]:tokens[1:] for tokens in (l.split("\t") for l in f)}
I have a csv file with the following structure:
1234,5678,"text1"
983453,2141235,"text2"
I need to convert each line to a tuple and create a list. Here is what I did
with open('myfile.csv') as f1:
mytuples = [tuple(line.strip().split(',')) for line in f1.readlines()]
However, I want the first 2 columns to be integers, not strings. I was not able to figure out how to continue with this, except by reading the file line by line once again and parsing it. Can I add something to the code above so that I transform str to int as I convert the file to list of tuples?
This is a csv file. Treat it as such.
import csv
with open("test.csv") as csvfile:
reader = csv.reader(csvfile)
result = [(int(a), int(b), c) for a,b,c in reader]
If there's a chance your input may not be what you think it is:
import csv
with open('test.csv') as csvfile:
reader = csv.reader(csvfile)
result = []
for line in reader:
this_line = []
for col in line:
try:
col = int(col)
except ValueError:
pass
this_line.append(col)
result.append(tuple(this_line))
Instead of trying to cram all of the logic in a single line, just spread it out so that it is readable.
with open('myfile.csv') as f1:
mytuples = []
for line in f1:
tokens = line.strip().split(',')
mytuples.append( (int(tokens[0]), int(tokens[1]), tokens[2]) )
Real python programmers aren't afraid of using multiple lines.
You can use isdigit() to check if all letters within element in row is digit so convert it to int , so replace the following :
tuple(line.strip().split(','))
with :
tuple(int(i) if i.isdigit() else i for i in (line.strip().split(','))
You can cram this all into one line if you really want, but god help me I don't know why you'd want to. Try giving yourself room to breathe:
def get_tuple(token_list):
return (int(token_list[0]), int(token_list[1]), token_list[2])
mytuples = []
with open('myfile.csv') as f1:
for line in f1.readlines():
token_list = line.strip().split(',')
mytuples.append(get_tuple(token_list))
Isn't that way easier to read? I like list comprehension as much as the next guy, but I also like knowing what a block of code does when I sit down three weeks later and start reading it!
I have a tab delimited file with lines of data as such:
8600tab8661tab000000000003148415tab10037-434tabXEOL
8600tab8662tab000000000003076447tab6134505tabEOL
8600tab8661tab000000000003426726tab470005-063tabXEOL
There should be 5 fields with the possibility of the last field having a value 'X' or being empty as shown above.
I am trying to parse this file in Python (2.7) using the csv reader module as such:
file = open(fname)
reader = csv.reader(file, delimiter='\t', quoting=csv.QUOTE_NONE)
for row in reader:
for i in range(5): # there are 5 fields
print row[i] # this fails if there is no 'X' in the last column
# index out of bounds error
If the last column is empty the row structure will end up looking like:
list: ['8600', '8662', '000000000003076447', '6134505']
So when row[4] is called, the error follows..
I was hoping for something like this:
list: ['8600', '8662', '000000000003076447', '6134505', '']
This problem only seems to occur if the very last column is empty. I have been looking through the reader arguments and dialect options to see if the is a simple command to pass into the csv.reader to fix the way it handles an empty field at the end of the line. So far no luck.
Any help will be much appreciated!
The easiest option would be to check the length of the row beforehand. If the length is 4, append an empty string to your list.
for row in reader:
if len(row) == 4:
row.append('')
for i in range(5):
print row[i]
There was a minor PEBCAK on my part. I was going back and forth between editing the file in Notepad++ and Gvim. At some point I lost my last tab on the end. I fixed the file and it parsed as expected.