map a data frame to a nested list - python

I have a data frame with 2 columns
and I also have a nested list containing the elements of the first column
I want to append the remaining column elements and the index to the nested list
|Column A | Column B |
| -------- | -------- |
|100,2 | A |
|101,5 | B |
|103,6 | C |
|104,6 | D |
|105,7 | E |
the nested list looks like
[[100.2,101.5],[103.6,104.6],[105.7]]
The output
[[[0,100.2,'A'],[1,101.5,'B']],[[3,103.6,'C'],[4,104.6,'D']],[[5,105.7,'E']]]
from a dataframe to a nested list

Use pandas to solve above problem with simplicity.
import pandas as pd
df = pd.DataFrame({'Column A': [100.2,101.5,103.6,103.6,105.7],
'Column B': ['A', 'B', 'C', 'D', 'E']})
grouped = df.groupby('Column A')
result = [[[i, row['Column A'], row['Column B']] for i, row in
group.iterrows()] for _, group in grouped]
print(result)
Result:
[[[0, 100.2, 'A']], [[1, 101.5, 'B']], [[2, 103.6, 'C'], [3, 103.6, 'D']], [[4, 105.7, 'E']]]
Process finished with exit code 0

We can use pandas reset_index() and tolist() functions to solve above
import pandas as pd
df = pd.DataFrame({'Column A': [100.2,101.5,103.6,103.6,105.7],
'Column B': ['A', 'B', 'C', 'D', 'E']})
df = df.reset_index(level=0)
print(df.values.tolist())
output:
[[0, 100.2, 'A'],
[1, 101.5, 'B'],
[2, 103.6, 'C'],
[3, 103.6, 'D'],
[4, 105.7, 'E']]

import pandas as pd
df = pd.DataFrame({'A': [100.2,101.5,103.6,104.6,105.7],
'B': ['A', 'B', 'C', 'D', 'E']})
a = [[100.2,101.5],[103.6,104.6],[105.7]]
df1 = df.reset_index().set_index('A', drop=False)
[[df1.loc[v].tolist() for v in row] for row in a]
produces
[[[0, 100.2, 'A'], [1, 101.5, 'B']],
[[2, 103.6, 'C'], [3, 104.6, 'D']],
[[4, 105.7, 'E']]]

Related

Changing 2-dimensional list to standard matrix form

org = [['A', 'a', 1],
['A', 'b', 2],
['A', 'c', 3],
['B', 'a', 4],
['B', 'b', 5],
['B', 'c', 6],
['C', 'a', 7],
['C', 'b', 8],
['C', 'c', 9]]
I want to change the 'org' to the standard matrix form like below.
transform = [['\t','A', 'B', 'C'],
['a', 1, 4, 7],
['b', 2, 5, 8],
['c', 3, 6, 9]]
I made a small function that converts this.
The code I wrote is below:
import numpy as np
def matrix(li):
column = ['\t']
row = []
result = []
rest = []
for i in li:
if i[0] not in column:
column.append(i[0])
if i[1] not in row:
row.append(i[1])
result.append(column)
for i in li:
for r in row:
if r == i[1]:
rest.append([i[2]])
rest = np.array(rest).reshape((len(row),len(column)-1)).tolist()
for i in range(len(rest)):
rest[i] = [row[i]]+rest[i]
result += rest
for i in result:
print(i)
matrix(org)
The result was this:
>>>['\t', 'school', 'kids', 'really']
[72, 0.008962252017017516, 0.04770759762717251, 0.08993156334317577]
[224, 0.004180594204995023, 0.04450803342634945, 0.04195010047081213]
[385, 0.0021807662921382335, 0.023217182598008267, 0.06564858527712682]
I don't think this is efficient since I use so many for loops.
Is there any efficient way to do this?
Since you are using 3rd party libraries, this is a task well suited for pandas.
There is some messy, but not inefficient, work to incorporate index and columns as per your requirement.
org = [['A', 'a', 1],
['A', 'b', 2],
['A', 'c', 3],
['B', 'a', 4],
['B', 'b', 5],
['B', 'c', 6],
['C', 'a', 7],
['C', 'b', 8],
['C', 'c', 9]]
df = pd.DataFrame(org)
pvt = df.pivot_table(index=0, columns=1, values=2)
cols = ['\t'] + pvt.columns.tolist()
res = pvt.values.T.tolist()
res.insert(0, pvt.index.tolist())
res = [[i]+j for i, j in zip(cols, res)]
print(res)
[['\t', 'A', 'B', 'C'],
['a', 1, 4, 7],
['b', 2, 5, 8],
['c', 3, 6, 9]]
Here's another "manual" way using only numpy:
org_arr = np.array(org)
key1 = np.unique(org_arr[:,0])
key2 = np.unique(org_arr[:,1])
values = org_arr[:,2].reshape((len(key1),len(key2))).transpose()
np.block([
["\t", key1 ],
[key2[:,None], values]
])
""" # alternatively, for numpy < 1.13.0
np.vstack((
np.hstack(("\t", key1)),
np.hstack((key2[:, None], values))
))
"""
For simplicity, it requires the input matrix to be strictly ordered (first col is major and ascending ...).
Output:
Out[58]:
array([['\t', 'A', 'B', 'C'],
['a', '1', '4', '7'],
['b', '2', '5', '8'],
['c', '3', '6', '9']],
dtype='<U1')

How to create a list of column names, row names, and values from data frame in panda?

I am very new to python. I have a data frame that looks like this:
A B
E 0 1
F 2 3
I want to convert this data frame to a list that looks like this:
[[E, A, 0], [E, B, 1], [F, A, 2], [F, B, 3]]
Any idea?
Use
In [197]: df.stack().reset_index().values.tolist()
Out[197]: [['E', 'A', 0L], ['E', 'B', 1L], ['F', 'A', 2L], ['F', 'B', 3L]]
df.reset_index().melt('index').values.tolist()
Out[1423]: [['E', 'A', 0], ['F', 'A', 2], ['E', 'B', 1], ['F', 'B', 3]]

Groupby, pivot and concatenate in Pandas, breaking by date?

I have a dataframe which looks like this:
df = pd.DataFrame([
[123, 'abc', '121'],
[123, 'abc', '121'],
[456, 'def', '121'],
[123, 'abc', '122'],
[123, 'abc', '122'],
[456, 'def', '145'],
[456, 'def', '145'],
[456, 'def', '121'],
], columns=['userid', 'name', 'dt'])
From this question, I have managed to transpose it.
So, the desired df would be:
userid1_date1 name_1 name_2 ... name_n
userid1_date2 name_1 name_2 ... name_n
userid2 name_1 name_2 ... name_n
userid3_date1 name_1 name_2 ... name_n
But, I want to seperate the rows depending on the date. For example, is a user 123 has data in two days, then the rows should be seperate for each day's api events.
I wouldn't really be needing the userid after the transformation, so you can use it anyway.
My plan was:
Group the df w.r.t the dt column
Pivot all the groups such that each looks like this:
userid1_date1 name_1 name_2 ... name_n
Now, concatenate the pivoted data
But, I have no clue how to do this in pandas!
Try:
def tweak(df):
return df.reset_index().name
df.set_index('userid').groupby(level=0).apply(tweak)
Demonstration
df = pd.DataFrame([[1, 'a'], [1, 'c'], [1, 'c'], [1, 'd'], [1, 'e'],
[1, 'a'], [1, 'c'], [1, 'c'], [1, 'd'], [1, 'e'],
[2, 'a'], [2, 'a'], [2, 'c'], [2, 'd'], [2, 'e'],
[2, 'a'], [2, 'a'], [2, 'c'], [2, 'd'], [2, 'e'],
], columns=['userid', 'name'])
def tweak(df):
return df.reset_index().name
df.set_index('userid').groupby(level=0).apply(tweak)

Pandas way of getting intersection between two rows in a python Pandas dataframe

Say I have some data that looks like below. I want to get the count of ids that have two tags at the same time.
tag id
a A
b B
a B
b A
c A
What I desire the result:
tag1 tag2 count
a b 2
a c 1
b c 1
In plain python I could write pseudocode:
d = defaultdict(set)
d[tag].add(id)
for tag1, tag2 in itertools.combinations(d.keys(), 2):
print tag1, tag2, len(d[tag1] & d[tag2])
Not the most efficient way but it should work. Now I already have the data stored in Pandas dataframe. Is there a more pandas-way to achieve the same result?
Here is my attempt:
from itertools import combinations
import pandas as pd
import numpy as np
In [123]: df
Out[123]:
tag id
0 a A
1 b B
2 a B
3 b A
4 c A
In [124]: a = np.asarray(list(combinations(df.tag, 2)))
In [125]: a
Out[125]:
array([['a', 'b'],
['a', 'a'],
['a', 'b'],
['a', 'c'],
['b', 'a'],
['b', 'b'],
['b', 'c'],
['a', 'b'],
['a', 'c'],
['b', 'c']],
dtype='<U1')
In [126]: a = a[a[:,0] != a[:,1]]
In [127]: a
Out[127]:
array([['a', 'b'],
['a', 'b'],
['a', 'c'],
['b', 'a'],
['b', 'c'],
['a', 'b'],
['a', 'c'],
['b', 'c']],
dtype='<U1')
In [129]: np.ndarray.sort(a)
In [130]: pd.DataFrame(a).groupby([0,1]).size()
Out[130]:
0 1
a b 4
c 2
b c 2
dtype: int64

extracting a range of elements from a csv / 2d array

I want to extract elements from a range of elements is a specific column from a csv file.
I've simplified the problem to this:
data = [['a',1,'A',100],['b',2,'B',200],['c',3,'C',300],['d',4,'D',400]]
print(data[0:2][:],'\nROWS 0&1')
print(data[:][0:2],'\nCOLS 1&1')
I thought that meant
'show me all columns for just row 0 and 1'
'show me all the rows for just column 0 and 1'
But the output is always just showing me rows 0 and 1, never the columns,
[['a', 1, 'A', 100], ['b', 2, 'B', 200]]
ROWS 0&1
[['a', 1, 'A', 100], ['b', 2, 'B', 200]]
COLS 1&1
when I want to see this:
['a', 1, 'A', 100,'b', 2, 'B', 200] # ... i.e. ROWS 0 and 1
['a','b','c','d',1,2,3,4]
Is there a nice way to do this?
Your problem here is that data[:] is just a copy of data:
>>> data
[['a', 1, 'A', 100], ['b', 2, 'B', 200], ['c', 3, 'C', 300], ['d', 4, 'D', 400]]
>>> data[:]
[['a', 1, 'A', 100], ['b', 2, 'B', 200], ['c', 3, 'C', 300], ['d', 4, 'D', 400]]
... so both your attempts at slicing are giving you the same result as data[0:2].
You can get just columns 0 and 1 with a list comprehension:
>>> [x[0:2] for x in data]
[['a', 1], ['b', 2], ['c', 3], ['d', 4]]
... which can be rearranged to the order you want with zip():
>>> list(zip(*(x[0:2] for x in data)))
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
To get a single list rather than a list of 2 tuples, use itertools.chain.from_iterable():
>>> from itertools import chain
>>> list(chain.from_iterable(zip(*(x[0:2] for x in data))))
['a', 'b', 'c', 'd', 1, 2, 3, 4]
... which can also be used to collapse data[0:2]:
>>> list(chain.from_iterable(data[0:2]))
['a', 1, 'A', 100, 'b', 2, 'B', 200]

Categories