Converting Dictionary to Dataframe with tuple as key - python

I have a dictionary like this
df_dict = {(7, 'hello'): {1}, (1, 'fox'): {2}}
I want to transform it into a dataframe where the first part of the tuple is the row header, and the second part of the tuple is the column header. I tried this:
doc_df = pd.DataFrame(df_dict, index=[df_dict.keys()[0]], columns = [df_dict.keys()[1]])
But I got the error TypeError: 'dict_keys' object does not support indexing
I want my dataframe to look like:
_ | fox | hello
1 | 2 | null
7 | null | 1
How do I index into the keys?

The reason you're getting the TypeError is that df_dict.keys() is an iterator which yields keys from the dict one by one. The elements it yields will be (7, 'hello') and (1, 'fox'), but it doesn't "know" that in advance. The iterator itself doesn't have any idea how many elements it has or what sort of structure those elements might have, and in particular, it doesn't have any way to access an element by index number.
Now, you can use the itertools.islice function to access a given-numbered element from an iterable, but it involves throwing away everything that comes beforehand. So that's not what you want.
The answer to the question you're asking, which is how you index into the keys, is to convert them into a list first:
l = list(df_dict.keys())
and then you can use l[0] and l[1] and so on.
But even that isn't what you're actually going to need for your application. The resulting list, in your example, would be
[(7, 'hello'), (1, 'fox')]
so l[0] will be (7, 'hello') and l[1] will be (1, 'fox') (or vice-versa, since you don't know which order the keys will come out in). What you actually want to access is (7, 1) and ('hello', 'fox'), for which you either need to use something like a list comprehension:
[x[0] for x in l] # (7, 1)
[x[1] for x in l] # ('hello', 'fox')
or you could convert it to a NumPy array and transpose that.
npl = numpy.array(l) # array([[7, 'hello'], [1, 'fox']])
nplT = npl.T # array([[7, 1], ['hello', 'fox']])
Now you can use nplT[0] and so on.

Related

Enumerate does not work with 2d arrays yet range(len()) does?

I heard somewhere that we should all use enumerate to iterate through arrays but
for i in enumerate(array):
for j in enumerate(array[i]):
print(board[i][j])
doesn't work, yet when using range(len())
for i in range(len(array)):
for j in range(len(array[i)):
print(board[i][j])
it works as intended
use it like this:
for idxI, arrayI in enumerate(array):
for idxJ, arrayJ in enumerate(arrayI):
print(board[idxI][idxJ])
Like I wrote enumerate adds an extra counter to each element. Effectively turning you list of elements into a list of tuples.
Example
array = ['a', 'b','c','d']
print(list(enumerate(array)))
gives you this:
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]
So in your case what you want to do it simply add the extra element when iterating over it
for i, item1 in enumerate(array):
for j,item2 in enumerate(array[i]):
print(board[i][j])
Issue was in your case is
for i in enumerate(array):
this i is not an integer but a tuple ('1','a') in my case. And you cant access a list element with an index value of a tuple.
When one uses for i in enumerate(array): it returns a collection of tuples. When working with enumerate, the (index, obj) is returned while range based loops just go through the range specified.
>>> arr = [1,2,3]
>>> enumerate(arr)
<enumerate object at 0x105413140>
>>> list(enumerate(arr))
[(0, 1), (1, 2), (2, 3)]
>>> for i in list(enumerate(arr)):
... print(i)
...
(0, 1)
(1, 2)
(2, 3)
>>>
One has to access the first element of the tuple to get the index in order to further index.
>>> board = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> for idx1,lst in enumerate(board):
... for idx2,lst_ele in enumerate(lst): # could use enumerate(board[i])
... print(lst_ele,end=" ")
...
1 2 3 4 5 6 7 8 9
>>>
Sometimes you do not need both the index and the element so I do not think its always better to use enumerate. That being said, there are plenty of situations where its easier to use enumerate so you can grab the element faster without having to write element = array[idx].
See range() vs enumerate()
"Both are valid. The first solution [range-based] looks more similar to the problem description, while the second solution [enum-based] has a slight optimization where you don’t mutate the list potentially three times per iteration." - James Uejio

Reorganizing a list of tuples into lists of floats

Say I have in python a list of tuples
list1 = [(1,1,1), (2,2,2), (3,3,3)]
If I want to separate them into a list of all the 1 position values, 2 position values and 3 position values I would do:
ones = [tuple[0] for tuple in list1]
twos = [tuple[1] for tuple in list1]
threes = [tuple[2] for tuple in list1]
This sort of way can become very cumbersome the more elements each tuple in that list will have. Is there a cleaner way to do this possibly using the zip method or a reverse of it?
You can use zip for this:
list(zip(*list1))
output:
[(1, 2, 3), (1, 2, 3), (1, 2, 3)]
As #paoloaq noted, you can unpack these into separate lists:
ones, two, threes = list(zip(*list1))
or if you want lists instead of tuples:
ones, two, threes = map(list, list(zip(*list1)))
Sidenote: try avoiding variable names like list and tuple.

Using zip on the results of itertools.groupby unexpectedly gives empty lists

I've encountered some unexpected empty lists when using zip to transpose the results of itertools.groupby. In reality my data is a bunch of objects, but for simplicity let's say my starting data is this list:
> a = [1, 1, 1, 2, 1, 3, 3, 2, 1]
I want to group the duplicates, so I use itertools.groupby (sorting first, because otherwise groupby only groups consecutive duplicates):
from itertools import groupby
duplicates = groupby(sorted(a))
This gives an itertools.groupby object which when converted to a list gives
[(1, <itertools._grouper object at 0x7fb3fdd86850>), (2, <itertools._grouper object at 0x7fb3fdd91700>), (3, <itertools._grouper object at 0x7fb3fdce7430>)]
So far, so good. But now I want to transpose the results so I have a list of the unique values, [1, 2, 3], and a list of the items in each duplicate group, [<itertools._grouper object ...>, ...]. For this I used the solution in this answer on using zip to "unzip":
>>> keys, values = zip(*duplicates)
>>> print(keys)
(1, 2, 3)
>>> print(values)
(<itertools._grouper object at 0x7fb3fdd37940>, <itertools._grouper object at 0x7fb3fddfb040>, <itertools._grouper object at 0x7fb3fddfb250>)
But when I try to read the itertools._grouper objects, I get a bunch of empty lists:
>>> for value in values:
... print(list(value))
...
[]
[]
[]
What's going on? Shouldn't each value contain the duplicates in the original list, i.e. (1, 1, 1, 1, 1), (2, 2) and (3, 3)?
Ah. The beauty of multiple iterator all using the same underlying object.
The documentation of groupby addresses this very issue:
The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list:
groups = []
uniquekeys = []
data = sorted(data, key=keyfunc)
for k, g in groupby(data, keyfunc):
groups.append(list(g)) # Store group iterator as a list
uniquekeys.append(k)
So what ends up happening is that all your itertools._grouper objects are consumed before you ever unpack them. You see a similar effect if you try reusing any other iterator more than once. If you want to understand better, look at the next paragraph in the docs, which shows how the internals of groupby actually work.
Part of what helped me understand this is to work examples with a more obviously non-reusable iterator, like a file object. It helps to dissociate from the idea of an underlying buffer you can just keep track of.
A simple fix is to consume the objects yourself, as the documentation recommends:
# This is an iterator over a list:
duplicates = groupby(sorted(a))
# If you convert duplicates to a list, you consume it
# Don't store _grouper objects: consume them yourself:
keys, values = zip(*((key, list(value)) for key, value in duplicates)
As the other answer suggests, you don't need an O(N log N) solution that involves sorting, since you can do this in O(N) time in a single pass. Rather than use a Counter, though, I'd recommend a defaultdict to help store the lists:
from collections import defaultdict
result = defaultdict(list)
for item in a:
result[item].append(item)
For more complex objects, you'd index with key(item) instead of item.
To have grouping by each unique key for duplicate processing:
import itertools
a = [1, 1, 1, 2, 1, 3, 3, 2, 1]
g1 = itertools.groupby(sorted(a))
for k,v in g1:
print(f"Key {k} has", end=" ")
for e in v:
print(e, end=" ")
print()
# Key 1 has 1 1 1 1 1
# Key 2 has 2 2
# Key 3 has 3 3
If it's just for counting how many, with minimal sorting:
import itertools
import collections
a = [1, 1, 1, 2, 1, 3, 3, 2, 1]
g1 = itertools.groupby(a)
c1 = collections.Counter()
for k,v in g1:
l = len(tuple(v))
c1[k] += l
for k,v in c1.items():
print(f"Element {k} repeated {v} times")
# Element 1 repeated 5 times
# Element 2 repeated 2 times
# Element 3 repeated 2 times

good practice for string.partition in python

Sometime I write code like this:
a,temp,b = s.partition('-')
I just need to pick the first and 3rd elements. temp would never be used. Is there a better way to do this?
In other terms, is there a better way to pick distinct elements to make a new list?
For example, I want to make a new list using the elements 0,1,3,7 from the old list. The
code would be like this:
newlist = [oldlist[0],oldlist[1],oldlist[3],oldlist[7]]
It's pretty ugly, isn't it?
Be careful using
a, _, b = s.partition('-')
sometimes _ is use for internationalization (gettext), so you wouldn't want to accidentally overwrite it.
Usually I would do this for partition rather than creating a variable I don't need
a, b = s.partition('-')[::2]
and this in the general case
from operator import itemgetter
ig0137 = itemgetter(0, 1, 3, 7)
newlist = ig0137(oldlist)
The itemgetter is more efficient than a list comprehension if you are using it in a loop
For the first there's also this alternative:
a, b = s.partition('-')[::2]
For the latter, since there's no clear interval there is no way to do it too clean. But this might suit your needs:
newlist = [oldlist[k] for k in (0, 1, 3, 7)]
You can use Python's extended slicing feature to access a list periodically:
>>> a = range(10)
>>> # Pick every other element in a starting from a[1]
>>> b = a[1::2]
>>> print b
>>> [1, 3, 5, 7, 9]
Negative indexing works as you'd expect:
>>> c = a[-1::-2]
>>> print c
>>> [9, 7, 5, 3, 1]
For your case,
>>> a, b = s.partition('-')[::2]
the common practice in Python to pick 1st and 3rd values is:
a, _, b = s.partition('-')
And to pick specified elements in a list you can do :
newlist = [oldlist[k] for k in (0, 1, 3, 7)]
If you don't need to retain the middle field you can use split (and similarly rsplit) with the optional maxsplit parameter to limit the splits to the first (or last) match of the separator:
a, b = s.split('-', 1)
This avoids a throwaway temporary or additional slicing.
The only caveat is that with split, unlike partition, the original string is returned if the separator is not found. The attempt to unpack will fail as a result. The partition method always returns a 3-tuple.

What exactly are tuples in Python?

I'm following a couple of Pythone exercises and I'm stumped at this one.
# C. sort_last
# Given a list of non-empty tuples, return a list sorted in increasing
# order by the last element in each tuple.
# e.g. [(1, 7), (1, 3), (3, 4, 5), (2, 2)] yields
# [(2, 2), (1, 3), (3, 4, 5), (1, 7)]
# Hint: use a custom key= function to extract the last element form each tuple.
def sort_last(tuples):
# +++your code here+++
return
What is a Tuple? Do they mean a List of Lists?
The tuple is the simplest of Python's sequence types. You can think about it as an immutable (read-only) list:
>>> t = (1, 2, 3)
>>> print t[0]
1
>>> t[0] = 2
TypeError: tuple object does not support item assignment
Tuples can be turned into new lists by just passing them to list() (like any iterable), and any iterable can be turned into a new tuple by passing it to tuple():
>>> list(t)
[1, 2, 3]
>>> tuple(["hello", []])
("hello", [])
Hope this helps. Also see what the tutorial has to say about tuples.
Why are there separate tuple and list data types? (Python FAQ)
Python Tuples are Not Just Constant Lists
Understanding tuples vs. lists in Python
A tuple and a list is very similar. The main difference (as a user) is that a tuple is immutable (can't be modified)
In your example:
[(2, 2), (1, 3), (3, 4, 5), (1, 7)]
This is a list of tuples
[...] is the list
(2,2) is a tuple
A tuple is a sequence of immutable Python objects. Tuples are sequences, just like lists. The differences between tuples and lists are, the tuples cannot be changed unlike lists and tuples use parentheses, whereas lists use square brackets.
Creating a tuple is as simple as putting different comma-separated values. Optionally you can put these comma-separated values between parentheses also. For example −
tup1 = ('Dog', 'Cat', 2222, 555555);
tup2 = (10, 20, 30, 40, 50 );
tup3 = "a", "b", "c", "d";
The empty tuple is written as two parentheses containing nothing −
tup1 = ();
To write a tuple containing a single value you have to include a comma, even though there is only one value −
tup1 = (45,);
Like string indices, tuple indices start at 0, and they can be sliced, concatenated, and so on.
The best summary of the differences between lists and tuples I have read is this:
One common summary of these more interesting, if subtle, differences is that tuples are heterogeneous and lists are homogeneous. In other words: Tuples (generally) are sequences of different kinds of stuff, and you deal with the tuple as a coherent unit. Lists (generally) are sequences of the same kind of stuff, and you deal with the items individually.
From: http://news.e-scribe.com/397
Tuples are used to group related variables together. It's often more convenient to use a tuple rather than writing yet another single-use class. Granted, accessing their content by index is more obscure than a named member variable, but it's always possible to use 'tuple unpacking':
def returnTuple(a, b):
return (a, b)
a, b = returnTuple(1, 2)
In Python programming, a tuple is similar to a list. The difference between them is that we cannot change the elements of a tuple once it is assigned whereas in a list, elements can be changed.
data-types-in-python

Categories