I have a data frame like this:
2pair counts
'A','B','C','D' 5
'A','B','K','D' 3
'A','B','P','R' 2
'O','Y','C','D' 1
'O','Y','CL','lD' 4
I want to make a nested list, based on the first 2 elements. the first element is the first 2 letters and the rest is 2 other letter and counts column. For example, for the above data the result should be:
[
[
['A','B'],
[['C','D'],5],
[['K','D'],3],
['P','R'],2]
],
[
['O','Y'],
[['C','D'],1],
[['CL','lD'],4]
]
]
The following code does exactly what I want, but it is too slow. How can I make it faster?
pairs=[]
trans=[]
for i in range(df3.shape[0]):
if df3['2pair'].values[i].split(',')[:2] not in trans:
trans.append(df3['2pair'].values[i].split(',')[:2])
sub=[]
sub.append(df3['2pair'].values[i].split(',')[:2])
for j in range(df3.shape[0]):
if df3['2pair'].values[i].split(',')[:2]==df3['2pair'].values[j].split(',')[:2]:
sub.append([df3['2pair'].values[j].split(',')[2:],df3['counts'].values[j]])
pairs.append(sub)
Here's one way using str.split to split the strings in 2pair column; then use groupby.apply + to_dict to create the lists:
df[['head', 'tail']] = [[(*x[:2],), x[2:]] for x in df['2pair'].str.split(',')]
out = [[[*k]] + v for k,v in (df.groupby('head')[['tail','counts']]
.apply(lambda x: x.to_numpy().tolist()).to_dict()
.items())]
Output:
[[['A', 'B'], [['C', 'D'], 5], [['K', 'D'], 3], [['P', 'R'], 2]],
[['O', 'Y'], [['C', 'D'], 1], [['CL', 'lD'], 4]]]
Related
a = ['a', 'b', 'c']
b = [10, 20, 30]
output should be like [[a:10], [b:20], [c:30]]
I do know how to use the zip to interweave two lists
l = []
for x,y in zip(a,b):
l.append([x,y])
And the output is : [['a', 10], ['b', 20], ['c', 30]]
instead of [[a:10], [b:20], [c:30]]
How should I make like this with ':'
Thanks
Assuming that you mean to create a list of singleton dicts, you can zip the two lists, convert sequence of value pairs to singletons with another zip, and map the resulting sequence to the dict constructor:
list(map(dict, zip(zip(a, b))))
Or use a list comprehension:
[{i: j} for i, j in zip(a, b)]
Both return:
[{'a': 10}, {'b': 20}, {'c': 30}]
I am trying to transform this Dataframe.
To look like the following:
Here is the code to create the sample df
df = pd.DataFrame(data = [[1, 'A', 0, '2021-07-01'],
[1, 'B', 1, '2021-07-02'],
[2, 'D', 3, '2021-07-02'],
[2, 'C', 2, '2021-07-02'],
[2, 'E', 4, '2021-07-02']
], columns = ['id', 'symbol', 'value', 'date'])
symbol_list = [['A', 'B', ''], ['C','D','E']]
The end result dataframe is grouped by id field with symbol column turns into multiple columns with symbol ordering mapped to the user input list.
I was using .apply() method to construct each datarow for the above dataframe but it is taking very long time for 10000+ datapoints.
I am trying to find a more efficient way to transform the dataframe. I am thinking that I will need to use pivot function to unstack the data frame with the combination of resetting index (to turn category value into column). Appreciate any help on this!
Use GroupBy.cumcount with DataFrame.unstack for reshape, then extract date by DataFrame.pop with max per rows, flatten columns and last add new column date by DataFrame.assign:
df = pd.DataFrame(data = [[1, 'A', 0, '2021-07-01'],
[1, 'B', 1, '2021-07-02'],
[2, 'D', 3, '2021-07-02'],
[2, 'C', 2, '2021-07-02'],
[2, 'E', 4, '2021-07-02']
], columns = ['id', 'symbol', 'value', 'date'])
#IMPORTANT all values from symbol_list are in column symbol (without empty strings)
symbol_list = [['A', 'B', ''], ['C','D','E']]
order = [y for x in symbol_list for y in x if y]
print (order)
['A', 'B', 'C', 'D', 'E']
#convert all values to Categoricals with specified order by flatten lists
df['symbol'] = pd.Categorical(df['symbol'], ordered=True, categories=order)
df['date'] = pd.to_datetime(df['date'])
#sorting by id and symbol
df = df.sort_values(['id','symbol'])
df1 = df.set_index(['id',df.groupby('id').cumcount()]).unstack()
date_max = df1.pop('date').max(axis=1)
df1.columns = df1.columns.map(lambda x: f'{x[0]}_{x[1]}')
df1 = df1.assign(date = date_max)
print (df1)
symbol_0 symbol_1 symbol_2 value_0 value_1 value_2 date
id
1 A B NaN 0.0 1.0 NaN 2021-07-02
2 C D E 2.0 3.0 4.0 2021-07-02
I'd like to merge a list of dictionaries with lists as values. Given
arr[0] = {'number':[1,2,3,4], 'alphabet':['a','b','c']}
arr[1] = {'number':[3,4], 'alphabet':['d','e']}
arr[2] = {'number':[6,7], 'alphabet':['e','f']}
the result I want would be
merge_arr = {'number':[1,2,3,4,3,4,6,7,], 'alphabet':['a','b','c','d','e','e','f']}
could you recommend any compact code?
If you know these are the only keys in the dict, you can hard code it. If it isn't so simple, show a complicated example.
from pprint import pprint
arr = [
{
'number':[1,2,3,4],
'alphabet':['a','b','c']
},
{
'number':[3,4],
'alphabet':['d','e']
},
{
'number':[6,7],
'alphabet':['e','f']
}
]
merged_arr = {
'number': [],
'alphabet': []
}
for d in arr:
merged_arr['number'].extend(d['number'])
merged_arr['alphabet'].extend(d['alphabet'])
pprint(merged_arr)
Output:
{'alphabet': ['a', 'b', 'c', 'd', 'e', 'e', 'f'],
'number': [1, 2, 3, 4, 3, 4, 6, 7]}
arr = [{'number':[1,2,3,4], 'alphabet':['a','b','c']},{'number':[3,4], 'alphabet':['d','e']},{'number':[6,7], 'alphabet':['e','f']}]
dict = {}
for k in arr[0].keys():
dict[k] = sum([dict[k] for dict in arr], [])
print (dict)
output:
{'number': [1, 2, 3, 4, 3, 4, 6, 7], 'alphabet': ['a', 'b', 'c', 'd', 'e', 'e', 'f']}
Here is code that uses defaultdict to more easily collect the items. You could leave the result as a defaultdict but this version converts that to a regular dictionary. This code will work with any keys, and the keys in the various dictionaries can differ, as long as the values are lists. Therefore this answer is more general than the other answers given so far.
from collections import defaultdict
arr = [{'number':[1,2,3,4], 'alphabet':['a','b','c']},
{'number':[3,4], 'alphabet':['d','e']},
{'number':[6,7], 'alphabet':['e','f']},
]
merge_arr_default = defaultdict(list)
for adict in arr:
for key, value in adict.items():
merge_arr_default[key].extend(value)
merge_arr = dict(merge_arr_default)
print(merge_arr)
The printed result is
{'number': [1, 2, 3, 4, 3, 4, 6, 7], 'alphabet': ['a', 'b', 'c', 'd', 'e', 'e', 'f']}
EDIT: As noted by #pault, the solution below is of quadratic complexity, and therefore not recommended for large lists. There are more optimal ways to go around it.
However if you’re looking for compactness and relative simplicity, keep reading.
If you want a more functional form, this two-liner will do:
arr = [{'number':[1,2,3,4], 'alphabet':['a','b','c']},{'number':[3,4], 'alphabet':['d','e']},{'number':[6,7], 'alphabet':['e','f']}]
keys = ['number', 'alphabet']
merge_arr = {key: reduce(list.__add__, [dict[key] for dict in arr]) for key in keys}
print arr
Outputs:
{'alphabet': ['a', 'b', 'c', 'd', 'e', 'e', 'f'], 'number': [1, 2, 3, 4, 3, 4, 6, 7]}
This won't merge recursively.
If you want it to work with arbitrary keys, not present in each dict, use:
keys = {k for k in dict.keys() for dict in arr}
merge_arr = {key: reduce(list.__add__, [dict.get(key, []) for dict in arr]) for key in keys}
My problem, is that I have a nested list
l = [
['a','apple',1],
['b', 'banana', 0],
['a', 'artichoke', 'antenna'],
['b', 'brocolli', 'baton'],
['c', None, 22]
]
and i wanted to merge those list that have a common index value also without sorting the resultant list.
My prefered output:
[
['a','apple', 1, 'artichoke', 'antenna'],
['b', 'banana', 0, 'brocolli', 'baton'],
['c', None, 22]
]
I found the solution from here and here
But the output im getting is somehow sorted, which it comes to my current output:
[['c', None, 22], [1, 'antenna', 'apple', 'artichoke', 'a'], [0, 'b', 'banana', 'brocolli', 'baton']]
My code goes:
len_l = len(l)
i = 0
while i < (len_l - 1):
for j in range(i + 1, len_l):
# i,j iterate over all pairs of l's elements including new
# elements from merged pairs. We use len_l because len(l)
# may change as we iterate
i_set = set(l[i])
j_set = set(l[j])
if len(i_set.intersection(j_set)) > 0:
# Remove these two from list
l.pop(j)
l.pop(i)
# Merge them and append to the orig. list
ij_union = list(i_set.union(j_set))
l.append(ij_union)
# len(l) has changed
len_l -= 1
# adjust 'i' because elements shifted
i -= 1
# abort inner loop, continue with next l[i]
break
i += 1
print(l)
I would appreciate the help in here, and im also open to new suggest on how to do this in an easier way, coz honestly the i havent use the union() nor intersection() methods before.
thanx
You can use a dictionary with the first element of each list as the key and extend a list each time as they're encountered in the list-of-lists, eg:
data = [
['a','apple',1],
['b', 'banana', 0],
['a', 'artichoke', 'antenna'],
['b', 'brocolli', 'baton'],
['c', None, 22]
]
Then we:
d = {}
for k, *vals in data:
d.setdefault(k, []).extend(vals)
Optionally you can use d = collections.OrderedDict() here if it's completely necessary to guarantee the order of the keys is as seen in the list.
Which gives you a d of:
{'a': ['apple', 1, 'artichoke', 'antenna'],
'b': ['banana', 0, 'brocolli', 'baton'],
'c': [None, 22]}
If you then want to unpack back to a lists of lists (although it's probably more useful being a dict) then you can do:
new_data = [[k, *v] for k, v in d.items()]
To get:
[['a', 'apple', 1, 'artichoke', 'antenna'],
['b', 'banana', 0, 'brocolli', 'baton'],
['c', None, 22]]
I have this data.
CITY1 CITY2
A B
A C
A D
B C
B D
C D
How i can create dictionary looking like this from the above data
x={A:[B,C,D],
B:[A,C,D],
C:[A,B,D],
D:[A,B,C]
}
Thanks
Is it in a csv? It looks like, from the data you provide, you are doing an undirected graph. Assuming that the data is in some kind of "row" type format that you can loop through, (i.e. row[0] is the city1 value, and row[1] is the city2 value):
from collections import defaultdict
def make_graph(data):
graph = defaultdict(set)
for a, b in data:
graph[a].add(b)
graph[b].add(a) # delete this line if you want a directed graph
return graph
data = [
['A','B'],
['C','D'],
['A','C']
]
print make_graph(data)
I was trying to do it without any library import.
I made a simple dictionary first .
x={'A':['B','C','D'],'B':['C','D'],'C':['D']}
for i,j in x.items():
for p in j:
if p not in x.keys():
x[p]=[]
if p in x[i] and i not in x[p]:
x[p].append(i)
print x
{'A': ['B', 'C', 'D'], 'C': ['D', 'A', 'B'], 'B': ['C', 'D', 'A'], 'D': ['A', 'C', 'B']}