dictionary conversion from reading files in python

dictionary conversion from reading files in python - python

What is the difference between line number 5 and line number 6?
line number 5 throws error but line number 6 works. I am unable to understand the difference.
with open("file1.txt") as f:
lines = f.readlines()
for line in lines:
print(line)
print(dict(line.split('=', 1))) # Line 5 throwing ValueError: dictionary update sequence element #0 has length 1; 2 is required
print(dict(line.split('=', 1) for line in lines)) # Line 6 works fine

Suppose that we have values like the following for lines:
lines = ["foo=bar", "baz=ola"]
In this code:
print(dict(line.split('=', 1) for line in lines))
the argument to dict is an iterable of lists of two elements, similar to:
>>> dict([["foo", "bar"], ["baz", "ola"]])
{'foo': 'bar', 'baz': 'ola'}
In this code:
for line in lines:
print(dict(line.split('=', 1)))
we call dict() multiple times, each time with a list of two elements, similar to:
>>> dict(["foo", "bar"])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: dictionary update sequence element #0 has length 3; 2 is required
"Dictionary update sequence element #0" is "foo"; each "sequence element" is a key, value pair, so the error is happening because dict() is trying to interpret "foo" as if it were a pair. Since your error message indicates "length 1", I'm guessing your file has a one-character name in it that is similarly being interpreted as a pair.
If we used two-character strings, it would "work", but it would work by using the first character as the key and the second character as the value:
>>> line = "ab=xy"
>>> dict(line.split("="))
{'a': 'b', 'x': 'y'}
The key takeaway is that the argument to dict must be all the key, value pairs, not a single key, value pair -- i.e. it must be a generator, list, or other iterable that contains other iterables with exactly two elements each. Even if you only have a single pair, you must put that pair inside an iterable (e.g. a list), because dict() will always treat its argument as an iterable of pairs.

Here is the answer:
If no positional argument is given, an empty dictionary is created. If
a positional argument is given and it is a mapping object, a
dictionary is created with the same key-value pairs as the mapping
object. Otherwise, the positional argument must be an iterable object.
Each item in the iterable must itself be an iterable with exactly two
objects. The first object of each item becomes a key in the new
dictionary, and the second object the corresponding value.
In line 6 you have an iterator whose items are an iterable with two objects. Line 5 is just a list with two items.

For a file like:
x=1
y=2
z=3
Use something like the following:
with open('f.txt') as f:
lines = f.read().splitlines() # strips newlines
new_dict = dict([line.split('=') for line in lines])
The list comprehension inside the call to dict produces a list where each element of the list are the split components of the lines. In this case:
[['x', '1'], ['y', '2'], ['z', '3']]
The dict function consumes iterables like this (with elements consisting of 2 elements) as key-value pairs.
If you inspect new_dict you'd see:
print(new_dict)
{'x': '1', 'y': '2', 'z': '3'}

Related

Shortcut to turn a ONE element dict into ONE tuple in Python

There are dozens of questions on turning a Python dict with some number of elements into a list of tuples. I am looking for a shortcut to turn a one element dict into a tuple.
Ideally, it would:
Be idiomatic and simple (ie, not a subclass of dict, not a function, etc).
Throw an error if there are more that one element in the dict.
Support multiple assignment by tuple unpacking.
Not destroy the dict.
I am looking for a shortcut to do this (that is not destructive to the dict):
k,v=unpack_this({'unknown_key':'some value'})
* does not work here.
I have come up with these that work:
k,v=next(iter(di.items())) # have to call 'iter' since 'dict_items' is not
Or:
k,v=(next(((k,v) for k,v in di.items())))
Or even:
k,v=next(zip(di.keys(), di.values()))
Finally, the best I can come up with:
k,v=list(di.items())[0] # probably the best...
Which can be wrapped into a function if I want a length check:
def f(di):
if (len(di)==1): return list(di.items())[0]
raise ValueError(f'Too many items to unpack. Expected 2, got {len(di)*2}')
These methods seem super clumsy and none throw an error if there is more than one element.
Is there an idiomatic shortcut that I am missing?

Why not :
next(iter(d.items())) if len(d)==1 else (None,None)

>>> d = {'a': 1}
>>> d.popitem()
('a', 1)
This will return the pair associated with the last key added, with no error error if the dict has multiple keys, but the same length check you've made in your function f can be used.
def f(di):
if len(di) == 1:
return d.popitem()
raise ValueError(f'Dict has multiple keys')

#chepner has the right approach with .popitem():
>>> d = {'a': 1}
>>> d.popitem()
('a', 1) # d is now {}
This will return the pair associated with the last key added, with no error error if the dict has multiple keys. It will also destroy d.
You can keep d intact by using this idiom to create a new dict and pop off the last item from the new dict created with {**d}:
>>> {**d}.popitem()
('a', 1)
>>> d
{'a': 1}
As far as a 1 line unpackable tuple that throws an error if the dict has more than one item, you can use a Python conditional expression with an alternate that throws the error desired:
# success: k=='a' and v==1
>>> d={'a':1}
>>> k,v={**d}.popitem() if len(d)==1 else 'wrong number of values'
# len(d)!=1 - pass a string to k,v which is too long
>>> d={'a':1,'b':2}
>>> k,v={**d}.popitem() if len(d)==1 else 'wrong number of values'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack (expected 2)
If you object to different types being passed, you could do:
>>> k,v={**d}.popitem() if len(d)==1 else {}.fromkeys('too many keys').items()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack (expected 2)

Why is the updated[value] = [] giving me TypeError: unhashable type: 'list'? [duplicate]

This question already has answers here:
Understanding slicing
(38 answers)
Why can't I use a list as a dict key in python?
(11 answers)
Closed 4 months ago.
I'm trying to take a file that looks like this:
AAA x 111
AAB x 111
AAA x 112
AAC x 123
...
And use a dictionary to so that the output looks like this
{AAA: ['111', '112'], AAB: ['111'], AAC: [123], ...}
This is what I've tried
file = open("filename.txt", "r")
readline = file.readline().rstrip()
while readline!= "":
list = []
list = readline.split(" ")
j = list.index("x")
k = list[0:j]
v = list[j + 1:]
d = {}
if k not in d == False:
d[k] = []
d[k].append(v)
readline = file.readline().rstrip()
I keep getting a TypeError: unhashable type: 'list'. I know that keys in a dictionary can't be lists but I'm trying to make my value into a list not the key. I'm wondering if I made a mistake somewhere.

As indicated by the other answers, the error is to due to k = list[0:j], where your key is converted to a list. One thing you could try is reworking your code to take advantage of the split function:
# Using with ensures that the file is properly closed when you're done
with open('filename.txt', 'rb') as f:
d = {}
# Here we use readlines() to split the file into a list where each element is a line
for line in f.readlines():
# Now we split the file on `x`, since the part before the x will be
# the key and the part after the value
line = line.split('x')
# Take the line parts and strip out the spaces, assigning them to the variables
# Once you get a bit more comfortable, this works as well:
# key, value = [x.strip() for x in line]
key = line[0].strip()
value = line[1].strip()
# Now we check if the dictionary contains the key; if so, append the new value,
# and if not, make a new list that contains the current value
# (For future reference, this is a great place for a defaultdict :)
if key in d:
d[key].append(value)
else:
d[key] = [value]
print d
# {'AAA': ['111', '112'], 'AAC': ['123'], 'AAB': ['111']}
Note that if you are using Python 3.x, you'll have to make a minor adjustment to get it work properly. If you open the file with rb, you'll need to use line = line.split(b'x') (which makes sure you are splitting the byte with the proper type of string). You can also open the file using with open('filename.txt', 'rU') as f: (or even with open('filename.txt', 'r') as f:) and it should work fine.

Note:
This answer does not explicitly answer the asked question. the other answers do it. Since the question is specific to a scenario and the raised exception is general, This answer points to the general case.
Hash values are just integers which are used to compare dictionary keys during a dictionary lookup quickly.
Internally, hash() method calls __hash__() method of an object which are set by default for any object.
Converting a nested list to a set
>>> a = [1,2,3,4,[5,6,7],8,9]
>>> set(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
This happens because of the list inside a list which is a list which cannot be hashed. Which can be solved by converting the internal nested lists to a tuple,
>>> set([1, 2, 3, 4, (5, 6, 7), 8, 9])
set([1, 2, 3, 4, 8, 9, (5, 6, 7)])
Explicitly hashing a nested list
>>> hash([1, 2, 3, [4, 5,], 6, 7])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> hash(tuple([1, 2, 3, [4, 5,], 6, 7]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> hash(tuple([1, 2, 3, tuple([4, 5,]), 6, 7]))
-7943504827826258506
The solution to avoid this error is to restructure the list to have nested tuples instead of lists.

You're trying to use k (which is a list) as a key for d. Lists are mutable and can't be used as dict keys.
Also, you're never initializing the lists in the dictionary, because of this line:
if k not in d == False:
Which should be:
if k not in d == True:
Which should actually be:
if k not in d:

The reason you're getting the unhashable type: 'list' exception is because k = list[0:j] sets k to be a "slice" of the list, which is logically another, often shorter, list. What you need is to get just the first item in list, written like so k = list[0]. The same for v = list[j + 1:] which should just be v = list[2] for the third element of the list returned from the call to readline.split(" ").
I noticed several other likely problems with the code, of which I'll mention a few. A big one is you don't want to (re)initialize d with d = {} for each line read in the loop. Another is it's generally not a good idea to name variables the same as any of the built-ins types because it'll prevent you from being able to access one of them if you need it — and it's confusing to others who are used to the names designating one of these standard items. For that reason, you ought to rename your variable list variable something different to avoid issues like that.
Here's a working version of your with these changes in it, I also replaced the if statement expression you used to check to see if the key was already in the dictionary and now make use of a dictionary's setdefault() method to accomplish the same thing a little more succinctly.
d = {}
with open("nameerror.txt", "r") as file:
line = file.readline().rstrip()
while line:
lst = line.split() # Split into sequence like ['AAA', 'x', '111'].
k, _, v = lst[:3] # Get first and third items.
d.setdefault(k, []).append(v)
line = file.readline().rstrip()
print('d: {}'.format(d))
Output:
d: {'AAA': ['111', '112'], 'AAC': ['123'], 'AAB': ['111']}

The reason behind this is the list contains list of values. Like:
a = [[1,2],[1,2],[3,4]]
And this won't work with something like this:
list(set(a))
To fix this you can transform the interior list to tuple, like :
a = [(1,2),(1,2),(3,4)]
This will work !

The TypeError is happening because k is a list, since it is created using a slice from another list with the line k = list[0:j]. This should probably be something like k = ' '.join(list[0:j]), so you have a string instead.
In addition to this, your if statement is incorrect as noted by Jesse's answer, which should read if k not in d or if not k in d (I prefer the latter).
You are also clearing your dictionary on each iteration since you have d = {} inside of your for loop.
Note that you should also not be using list or file as variable names, since you will be masking builtins.
Here is how I would rewrite your code:
d = {}
with open("filename.txt", "r") as input_file:
for line in input_file:
fields = line.split()
j = fields.index("x")
k = " ".join(fields[:j])
d.setdefault(k, []).append(" ".join(fields[j+1:]))
The dict.setdefault() method above replaces the if k not in d logic from your code.

python 3.2
with open("d://test.txt") as f:
k=(((i.split("\n"))[0].rstrip()).split() for i in f.readlines())
d={}
for i,_,v in k:
d.setdefault(i,[]).append(v)

Behaviour of Python when attempting to iterate over characters in a string

I am attempting to iterate over the following string, using a for loop:
>>> for a,b,c in "cat"
print(a,b,c)
Now what I intended for this to do was print out each character in the string individually on one physical line, instead I receive an error. I am aware that this is very easily resolved by enclosing the string in the list operator []:
>>> for a,b,c in ["cat"]
print(a,b,c)
c a t
But could someone explain why this is the case?

You are telling for to expand each iteration value to assign to three separate variables:
for a,b,c in "cat":
# ^^^^^ the target for the loop variable, 3 different names
However, iteration over a string produces a string with a single character, you can't assign a single character to three variables:
>>> loopiterable = 'cat'
>>> loopiterable[0] # first element
'c'
>>> a, b, c = loopiterable[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: not enough values to unpack (expected 3, got 1)
The error message tells you why this didn't work; you can't take three values out of a string of length 1.
When you put the string into a list, you changed what you loop over. You now have a list with one element, so the loop iterates just once, and the value for the single iteration is the string 'cat'. That string just happens to have 3 characters, so can be assigned to three variables:
>>> loopiterable = ['cat']
>>> loopiterable[0] # first element
'cat'
>>> a, b, c = loopiterable[0]
>>> a
'c'
>>> b
'a'
>>> c
't'
This still would fail if the contained string has a different number of characters:
>>> for a, b, c in ['cat', 'hamster']:
... print(a, b, c)
...
c a t
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack (expected 3)
'hamster' is 7 characters, not 3; that's 4 too many.
The correct solution is to just use one variable for the loop target, not 3:
for character in 'cat':
print(character)
Now you are printing each character separately:
>>> for character in 'cat':
... print(character)
...
c
a
t
Now, if you wanted to pass all characters of a string to print() as separate arguments, just use * to expand the string to separate arguments to the call:
>>> my_pet = 'cat'
>>> print(*my_pet)
c a t

The for loop goes through each element in the iterable you supply and tries to perform an assignment.
In the first case, since your iterable is 'cat' you're essentially unpacking:
a, b, c = 'c'
during the first iteration and getting the appropriate error message:
ValueError: not enough values to unpack (expected 3, got 1)
because you'll iterate 3 times; once for each character in the string 'cat'.
In the second case, you're unpacking "cat" as expected, because the list has a single element (i.e 'cat') which is retrieved and unpacked into a, b and c.

You are iterating different objects in two blocks of code.
In the former one, you iterate the string, which may be considered a special list.
While in the latter one, you iterate the list which contains only one object 'cat'.
You may change the first one to below:
for ch in 'cat':
print(ch)
Feel free to ask more about this.

How to overcome TypeError: unhashable type: 'list' [duplicate]

This question already has answers here:
Understanding slicing
(38 answers)
Why can't I use a list as a dict key in python?
(11 answers)
Closed 4 months ago.
I'm trying to take a file that looks like this:
AAA x 111
AAB x 111
AAA x 112
AAC x 123
...
And use a dictionary to so that the output looks like this
{AAA: ['111', '112'], AAB: ['111'], AAC: [123], ...}
This is what I've tried
file = open("filename.txt", "r")
readline = file.readline().rstrip()
while readline!= "":
list = []
list = readline.split(" ")
j = list.index("x")
k = list[0:j]
v = list[j + 1:]
d = {}
if k not in d == False:
d[k] = []
d[k].append(v)
readline = file.readline().rstrip()
I keep getting a TypeError: unhashable type: 'list'. I know that keys in a dictionary can't be lists but I'm trying to make my value into a list not the key. I'm wondering if I made a mistake somewhere.

As indicated by the other answers, the error is to due to k = list[0:j], where your key is converted to a list. One thing you could try is reworking your code to take advantage of the split function:
# Using with ensures that the file is properly closed when you're done
with open('filename.txt', 'rb') as f:
d = {}
# Here we use readlines() to split the file into a list where each element is a line
for line in f.readlines():
# Now we split the file on `x`, since the part before the x will be
# the key and the part after the value
line = line.split('x')
# Take the line parts and strip out the spaces, assigning them to the variables
# Once you get a bit more comfortable, this works as well:
# key, value = [x.strip() for x in line]
key = line[0].strip()
value = line[1].strip()
# Now we check if the dictionary contains the key; if so, append the new value,
# and if not, make a new list that contains the current value
# (For future reference, this is a great place for a defaultdict :)
if key in d:
d[key].append(value)
else:
d[key] = [value]
print d
# {'AAA': ['111', '112'], 'AAC': ['123'], 'AAB': ['111']}
Note that if you are using Python 3.x, you'll have to make a minor adjustment to get it work properly. If you open the file with rb, you'll need to use line = line.split(b'x') (which makes sure you are splitting the byte with the proper type of string). You can also open the file using with open('filename.txt', 'rU') as f: (or even with open('filename.txt', 'r') as f:) and it should work fine.

Note:
This answer does not explicitly answer the asked question. the other answers do it. Since the question is specific to a scenario and the raised exception is general, This answer points to the general case.
Hash values are just integers which are used to compare dictionary keys during a dictionary lookup quickly.
Internally, hash() method calls __hash__() method of an object which are set by default for any object.
Converting a nested list to a set
>>> a = [1,2,3,4,[5,6,7],8,9]
>>> set(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
This happens because of the list inside a list which is a list which cannot be hashed. Which can be solved by converting the internal nested lists to a tuple,
>>> set([1, 2, 3, 4, (5, 6, 7), 8, 9])
set([1, 2, 3, 4, 8, 9, (5, 6, 7)])
Explicitly hashing a nested list
>>> hash([1, 2, 3, [4, 5,], 6, 7])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> hash(tuple([1, 2, 3, [4, 5,], 6, 7]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> hash(tuple([1, 2, 3, tuple([4, 5,]), 6, 7]))
-7943504827826258506
The solution to avoid this error is to restructure the list to have nested tuples instead of lists.

You're trying to use k (which is a list) as a key for d. Lists are mutable and can't be used as dict keys.
Also, you're never initializing the lists in the dictionary, because of this line:
if k not in d == False:
Which should be:
if k not in d == True:
Which should actually be:
if k not in d:

The reason you're getting the unhashable type: 'list' exception is because k = list[0:j] sets k to be a "slice" of the list, which is logically another, often shorter, list. What you need is to get just the first item in list, written like so k = list[0]. The same for v = list[j + 1:] which should just be v = list[2] for the third element of the list returned from the call to readline.split(" ").
I noticed several other likely problems with the code, of which I'll mention a few. A big one is you don't want to (re)initialize d with d = {} for each line read in the loop. Another is it's generally not a good idea to name variables the same as any of the built-ins types because it'll prevent you from being able to access one of them if you need it — and it's confusing to others who are used to the names designating one of these standard items. For that reason, you ought to rename your variable list variable something different to avoid issues like that.
Here's a working version of your with these changes in it, I also replaced the if statement expression you used to check to see if the key was already in the dictionary and now make use of a dictionary's setdefault() method to accomplish the same thing a little more succinctly.
d = {}
with open("nameerror.txt", "r") as file:
line = file.readline().rstrip()
while line:
lst = line.split() # Split into sequence like ['AAA', 'x', '111'].
k, _, v = lst[:3] # Get first and third items.
d.setdefault(k, []).append(v)
line = file.readline().rstrip()
print('d: {}'.format(d))
Output:
d: {'AAA': ['111', '112'], 'AAC': ['123'], 'AAB': ['111']}

The reason behind this is the list contains list of values. Like:
a = [[1,2],[1,2],[3,4]]
And this won't work with something like this:
list(set(a))
To fix this you can transform the interior list to tuple, like :
a = [(1,2),(1,2),(3,4)]
This will work !

The TypeError is happening because k is a list, since it is created using a slice from another list with the line k = list[0:j]. This should probably be something like k = ' '.join(list[0:j]), so you have a string instead.
In addition to this, your if statement is incorrect as noted by Jesse's answer, which should read if k not in d or if not k in d (I prefer the latter).
You are also clearing your dictionary on each iteration since you have d = {} inside of your for loop.
Note that you should also not be using list or file as variable names, since you will be masking builtins.
Here is how I would rewrite your code:
d = {}
with open("filename.txt", "r") as input_file:
for line in input_file:
fields = line.split()
j = fields.index("x")
k = " ".join(fields[:j])
d.setdefault(k, []).append(" ".join(fields[j+1:]))
The dict.setdefault() method above replaces the if k not in d logic from your code.

python 3.2
with open("d://test.txt") as f:
k=(((i.split("\n"))[0].rstrip()).split() for i in f.readlines())
d={}
for i,_,v in k:
d.setdefault(i,[]).append(v)

Python dictionary creation error

I am trying to create a Python dictionary from a stored list. This first method works
>>> myList = []
>>> myList.append('Prop1')
>>> myList.append('Prop2')
>>> myDict = dict([myList])
However, the following method does not work
>>> myList2 = ['Prop1','Prop2','Prop3','Prop4']
>>> myDict2 = dict([myList2])
ValueError: dictionary update sequence element #0 has length 3; 2 is required
So I am wondering why the first method using append works but the second method doesn't work? Is there a difference between myList and myList2?
Edit
Checked again myList2 actually has more than two elements. Updated second example to reflect this.

You're doing it wrong.
The dict() constructor doesn't take a list of items (much less a list containing a single list of items), it takes an iterable of 2-element iterables. So if you changed your code to be:
myList = []
myList.append(["mykey1", "myvalue1"])
myList.append(["mykey2", "myvalue2"])
myDict = dict(myList)
Then you would get what you expect:
>>> myDict
{'mykey2': 'myvalue2', 'mykey1': 'myvalue1'}
The reason that this works:
myDict = dict([['prop1', 'prop2']])
{'prop1': 'prop2'}
Is because it's interpreting it as a list which contains one element which is a list which contains two elements.
Essentially, the dict constructor takes its first argument and executes code similar to this:
for key, value in myList:
print key, "=", value

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

dictionary conversion from reading files in python - python

Related

Shortcut to turn a ONE element dict into ONE tuple in Python

Why is the updated[value] = [] giving me TypeError: unhashable type: 'list'? [duplicate]

Behaviour of Python when attempting to iterate over characters in a string

How to overcome TypeError: unhashable type: 'list' [duplicate]

Python dictionary creation error

Categories

Resources