Related
How can I take a list in the form of:
hostnames = ['Pam: 5, 10, 4', Pet: 3, 2', 'Sara: 44, 40, 50']
and convert it into a form like :
hostnames = [[pam,5, 10, 4], [Pet,3, 2], [Sara,44, 40, 50]]
the end goal is so the code can print something like:
[[pam,5, 10, 4], [Pet,3, 2], [Sara,44, 40, 50]]
pam has scores of 5, 10 ,4
pet has scores of 3, 2
sara has scores of 44, 40, 50
my current code to help is this:
pos = 0
for var in hostnames:
newlist = hostnames[pos]
newlist = [newlist]
pos = pos + 1
names = newlist[0]
print(newlist)
I know the code asks for changes made to a new list, but I want this change made to the old hostname list. Without using the .Split Function
IIUC, you need to use str.split method to split the strings; first split on : to separate "key"; then again on , to separate the integers:
out = []
for var in hostnames:
k,v = var.split(':')
out.append([k, *map(int, v.split(','))])
Output:
[['Pam', 5, 10, 4], ['Pet', 3, 2], ['Sara', 44, 40, 50]]
Perhaps you're looking for something along the lines of:
out = []
for var in hostnames:
i = var.find(':')
out.append([var[:i], var[i+2:]])
print(out)
for i, *rest in out:
print("{} has scores of {}".format(i, *rest))
Output:
[['Pam', '5, 10, 4'], ['Pet', '3, 2'], ['Sara', '44, 40, 50']]
Pam has scores of 5, 10, 4
Pet has scores of 3, 2
Sara has scores of 44, 40, 50
If you want to separate integers as well, you could use a nested loop:
out = []
for var in hostnames:
i = var.find(':')
k, nums = var[:i], var[i+2:]
tmp = [[]]
for x in nums:
if x==',':
tmp[-1] = int(''.join(tmp[-1]))
tmp.append([])
else:
tmp[-1].append(x)
tmp[-1] = int(''.join(tmp[-1]))
out.append([k, *tmp])
print("{} has scores of {}".format(k, nums))
print(out)
Alternative solution is the following:
hostnames = ['Pam: 5, 10, 4', 'Pet: 3, 2', 'Sara: 44, 40, 50']
out = [_.split(": ") for _ in hostnames]
print(out)
for item in out:
print(f'{item[0]} has scores of {item[1]}')
Prints
[['Pam', '5, 10, 4'], ['Pet', '3, 2'], ['Sara', '44, 40, 50']]
Pam has scores of 5, 10, 4
Pet has scores of 3, 2
Sara has scores of 44, 40, 50
Here is another solution (Tested and verified):
hostnames = ['Pam: 5, 10, 4', 'Pet: 3, 2', 'Sara: 44, 40, 50']
hostnames = [i.split(':') for i in hostnames]
hostnames = [[ j.strip() for j in i] for i in hostnames ]
hostnames = [[int(j) if j.isdigit()==True else j for j in i] for i in
hostnames ]
print(hostnames)
Output:
[['Pam', '5, 10, 4'], ['Pet', '3, 2'], ['Sara', '44, 40, 50']]
If you want to have separate integers, you can use this code with a slight modification:
hostnames = ['Pam: 5, 10, 4', 'Pet: 3, 2', 'Sara: 44, 40, 50']
hostnames = [(i.replace(':',',')).split(',') for i in hostnames]
hostnames = [[ j.strip() for j in i] for i in hostnames ]
hostnames = [[int(j) if j.isdigit()==True else j for j in i] for i in
hostnames ]
print(hostnames)
Output:
[['Pam', 5, 10, 4], ['Pet', 3, 2], ['Sara', 44, 40, 50]]
I have a 3D dataframe, and I want to get all values of one x,y index across the z axis, where the z axis here moves between the original 2D dataframes. The way I am able to imagine it although forgive me if I'm mistaken because it's a little weird to visualize, if I got a vector of the x,y of x=0, y=0 it would be [1, 5, 3].
So my result would be a dataframe, where the df_2d[0][0] would be a string "1, 5, 3", and so on, taking all the values in the 3D dataframe.
Is there any way I can achieve this without looping through each cell index and accessing the values explicitly?
The data frame is defined as:
import pandas as pd
columns = ['A', 'B']
index = [1, 2, 3]
df_1 = pd.DataFrame(data=[[1, 2], [99, 57], [57, 20]], index=index, columns=columns)
df_2 = pd.DataFrame(data=[[5, 6], [78, 47], [21, 11]], index=index, columns=columns)
df_3 = pd.DataFrame(data=[[3, 4], [66, 37], [33, 17]], index=index, columns=columns)
df_3d = pd.concat([df_1, df_2, df_3], keys=['1', '2', '3'])
And then to get the original data out I do:
print(df_3d.xs('1'))
print(df_3d.xs('2'))
print(df_3d.xs('3'))
A B
1 1 2
2 99 57
3 57 20
A B
1 5 6
2 78 47
3 21 11
A B
1 3 4
2 66 37
3 33 17
Again, to clarify, if looking at this print I would like to have a combined dataframe looking like:
A B
1 '1, 5, 3' '2, 6, 4'
2 '99, 78, 66' '57, 47, 37'
3 '57, 21, 33' '20, 11, 17'
Use .xs to get each level dataframe and reduce to combine all dataframe together.
from functools import reduce
# Get each level values
dfs = [df_3d.xs(i) for i in df_3d.index.levels[0]]
df = reduce(lambda left,right: left.astype(str) + ", " + right.astype(str), dfs)
df
A B
1 1, 5, 3 2, 6, 4
2 99, 78, 66 57, 47, 37
3 57, 21, 33 20, 11, 17
And if you want ' you can use applymap to apply the function on every element.
df.applymap(lambda x: "'" + x + "'")
A B
1 '1, 5, 3' '2, 6, 4'
2 '99, 78, 66' '57, 47, 37'
3 '57, 21, 33' '20, 11, 17'
Or df = "'" + df + "'"
df
A B
1 '1, 5, 3' '2, 6, 4'
2 '99, 78, 66' '57, 47, 37'
3 '57, 21, 33' '20, 11, 17'
Hi I am fairly new to Python/programming and am having trouble with a unpacking a nested column in my dataframe.
The df in question looks like this:
The column I am trying to unpack looks like this (in JSON format):
df['id_data'] = [{u'metrics': {u'app_clicks': [6, 28, 13, 28, 43, 45],
u'card_engagements': [6, 28, 13, 28, 43, 45],
u'carousel_swipes': None,
u'clicks': [18, 33, 32, 48, 70, 95],
u'engagements': [25, 68, 46, 79, 119, 152],
u'follows': [0, 4, 1, 1, 1, 5],
u'impressions': [1697, 5887, 3174, 6383, 10250, 12301],
u'likes': [3, 4, 6, 9, 12, 15],
u'poll_card_vote': None,
u'qualified_impressions': None,
u'replies': [0, 0, 0, 0, 0, 1],
u'retweets': [1, 3, 0, 2, 5, 6],
u'tweets_send': None,
u'url_clicks': None},
u'segment': None}]
As you can see, there is a lot going on in this column. There is a list -> dictionary -> dictionary -> potentially another list. I would like each individual metric (app_clicks, card_engagement, carousel_swipes, etc) to be its own column. I've tried the following code with no progress.
df['app_clicks'] = df.apply(lambda row: u['app_clicks'] for y in row['id_data'] if y['metricdata'] = 'list', axis=1)
Any thoughts on how I could tackle this?
You should be able to pass the dictionary directly to the dataframe constructor:
foo = pd.DataFrame(df['id_data'][0]['metrics'])
foo.iloc[:3, :4]
app_clicks card_engagements carousel_swipes clicks
0 6 6 None 18
1 28 28 None 33
2 13 13 None 32
Hopefully I am understanding your question correctly and this gets you what you need
You can use to_json:
df1 = pd.DataFrame(json.loads(df["id_data"].to_json(orient="records")))
df2 = pd.DataFrame(json.loads(df1["metrics"].to_json(orient="records")))
I have two lists:
a = [0, 0, 0, 1, 1, 1, 1, 1, .... 99999]
b = [24, 53, 88, 32, 45, 24, 88, 53, ...... 1]
I want to merge those two lists into a dictionary like:
{
0: [24, 53, 88],
1: [32, 45, 24, 88, 53],
......
99999: [1]
}
A solution might be using for loop, which does not look good and elegant, like:
d = {}
unique_a = list(set(list_a))
for i in range(len(list_a)):
if list_a[i] in d.keys:
d[list_a[i]].append(list_b[i])
else:
d[list_a] = [list_b[i]]
Though this does work, it’s an inefficient and would take too much time when the list is extremely large. I want to know more elegant ways to construct such a dictionary?
Thanks in advance!
You can use a defaultdict:
from collections import defaultdict
d = defaultdict(list)
list_a = [0, 0, 0, 1, 1, 1, 1, 1, 9999]
list_b = [24, 53, 88, 32, 45, 24, 88, 53, 1]
for a, b in zip(list_a, list_b):
d[a].append(b)
print(dict(d))
Output:
{0: [24, 53, 88], 1: [32, 45, 24, 88, 53], 9999: [1]}
Alternative itertools.groupby() solution:
import itertools
a = [0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3]
b = [24, 53, 88, 32, 45, 24, 88, 53, 11, 22, 33, 44, 55, 66, 77]
result = { k: [i[1] for i in g]
for k,g in itertools.groupby(sorted(zip(a, b)), key=lambda x:x[0]) }
print(result)
The output:
{0: [24, 53, 88], 1: [24, 32, 45, 53, 88], 2: [11, 22, 33, 44, 55, 66], 3: [77]}
No fancy structures, just a plain ol' dictionary.
d = {}
for x, y in zip(a, b):
d.setdefault(x, []).append(y)
You can do this with a dict comprehension:
list_a = [0, 0, 0, 1, 1, 1, 1, 1]
list_b = [24, 53, 88, 32, 45, 24, 88, 53]
my_dict = {key: [] for key in set(a)} # my_dict = {0: [], 1: []}
for a, b in zip(list_a, list_b):
my_dict[a].append(b)
# {0: [24, 53, 88], 1: [32, 45, 24, 88, 53]}
Oddly enough, you cannot seem to make this work using dict.fromkeys(set(list_a), []) as this will set the value of all keys equal to the same empty array:
my_dict = dict.fromkeys(set(list_a), []) # my_dict = {0: [], 1: []}
my_dict[0].append(1) # my_dict = {0: [1], 1: [1]}
A pandas solution:
Setup:
import pandas as pd
a = [0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 3, 4, 4, 4]
b = pd.np.random.randint(0, 100, len(a)).tolist()
>>> b
Out[]: [28, 68, 71, 25, 25, 79, 30, 50, 17, 1, 35, 23, 52, 87, 21]
df = pd.DataFrame(columns=['Group', 'Value'], data=list(zip(a, b))) # Create a dataframe
>>> df
Out[]:
Group Value
0 0 28
1 0 68
2 0 71
3 1 25
4 1 25
5 1 79
6 1 30
7 1 50
8 2 17
9 2 1
10 2 35
11 3 23
12 4 52
13 4 87
14 4 21
Solution:
>>> df.groupby('Group').Value.apply(list).to_dict()
Out[]:
{0: [28, 68, 71],
1: [25, 25, 79, 30, 50],
2: [17, 1, 35],
3: [23],
4: [52, 87, 21]}
Walkthrough:
create a pd.DataFrame from the input lists, a is called Group and b called Value
df.groupby('Group') creates groups based on a
.Value.apply(list) gets the values for each group and cast it to list
.to_dict() converts the resulting DataFrame to dict
Timing:
To get an idea of timings for a test set of 1,000,000 values in 100,000 groups:
a = sorted(np.random.randint(0, 100000, 1000000).tolist())
b = pd.np.random.randint(0, 100, len(a)).tolist()
df = pd.DataFrame(columns=['Group', 'Value'], data=list(zip(a, b)))
>>> df.shape
Out[]: (1000000, 2)
%timeit df.groupby('Group').Value.apply(list).to_dict()
4.13 s ± 9.29 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
But to be honest it is likely less efficient than itertools.groupby suggested by #RomanPerekhrest, or defaultdict suggested by #Ajax1234.
Maybe I miss the point, but at least I will try to help. If you have to lists and want to put them in the dict do the following
a = [1, 2, 3, 4]
b = [5, 6, 7, 8]
lists = [a, b] # or directly -> lists = [ [1, 2, 3, 4], [5, 6, 7, 8] ]
new_dict = {}
for idx, sublist in enumerate([a, b]): # or enumerate(lists)
new_dict[idx] = sublist
hope it helps
Or do dictionary comprehension beforehand, then since all keys are there with values of empty lists, iterate trough the zip of the two lists, then add the second list's value to the dictionary's key naming first list's value, no need for try-except clause (or if statements), to see if the key exists or not, because of the beforehand dictionary comprehension:
d={k:[] for k in l}
for x,y in zip(l,l2):
d[x].append(y)
Now:
print(d)
Is:
{0: [24, 53, 88], 1: [32, 45, 24, 88, 53], 9999: [1]}
I have code that produces a data struicture that looks like this:
{'AttributeId': '4192',
'AttributeList': '',
'ClassId': '1014 (AP)',
'InstanceId': '0',
'MessageType': '81 (GetAttributesResponse)',
'ObjectInstance': '',
'Protocol': 'BSMIS Rx',
'RDN': '',
'TransactionId': '66',
'Sequences': [[],
[1,'2013-02-26T15:01:11Z'],
[],
[10564,13,388,0,-321,83,'272','05',67,67,708,896,31,128,-12,-109,0,-20,-111,-1,-1,0],
[10564,13,108,0,-11,83,'272','05',67,67,708,1796,31,128,-12,-109,0,-20,-111,-1,-1,0],
[10589,16,388,0,-15,79,'272','05',67,67,708,8680,31,125,-16,-110,0,-20,-111,-1,-1,0],
[10589,15,108,0,-16,81,'272','05',67,67,708,8105,31,126,-14,-109,0,-20,-111,-1,-1,0],
[10637,40,233,0,-11,89,'272','03',30052,1,5,54013,33,103,-6,-76,1,-20,-111,-1,-1,0],
[10662,46,234,0,-15,85,'272','03',30052,1,5,54016,33,97,-10,-74,1,-20,-111,-1,-1,0],
[10712,51,12,0,-24,91,'272','01',4013,254,200,2973,3,62,-4,-63,0,-20,-111,-1,-1,0],
[10737,15,224,0,-16,82,'272','01',3020,21,21,40770,33,128,-13,-108,0,-20,-111,-1,-1,0],
[10762,14,450,0,-7,78,'272','01',3020,21,21,53215,29,125,-17,-113,0,-20,-111,-1,-1,0],
[10762,15,224,0,-7,85,'272','01',3020,21,21,50770,33,128,-10,-105,0,-20,-111,-1,-1,0],
[10762,14,124,0,-7,78,'272','01',3020,10,10,56880,32,128,-17,-113,0,-20,-111,-1,-1,0],
[10812,11,135,0,-14,81,'272','02',36002,1,11,43159,31,130,-14,-113,1,-20,-111,-1,-1,0],
[10837,42,23,0,-9,89,'272','02',36002,1,11,53529,31,99,-6,-74,1,-20,-111,-1,-1,0,54],
[13,'2013-02-26T15:02:09Z'],
[],
[2,12,7,0,9,70,'272','02',20003,0,0,15535,0,0,0,0,1,100,100,-1,-1,0],
[5,15,44,0,-205,77,'272','02',20003,0,0,15632,0,0,0,0,1,100,100,-1,-1,0],
[7,25,9,0,0,84,'272','02',20002,0,0,50883,0,0,0,0,1,100,100,-1,-1,0]]
}
I then filtered this down to make a list of relevant values, I only wanted the first 2 elements of Sequences if the length was >=22. I did this as follows:
len22seqs = filter(lambda s: len(s)>=22, data['Sequences'])
UARFCNRSSI = []
for i in range(len(len22seqs)):
UARFCNRSSI.append([len22seqs[i][0], len22seqs[i][1]])
An example of the filtered list is:
[[10564, 15], [10564, 13], [10589, 18], [10637, 39], [10662, 38], [10712, 50], [10737, 15], [10762, 14], [10787, 9], [10812, 12], [10837, 45], [3, 17], [7, 21], [46, 26], [48, 12], [49, 24], [64, 14], [66, 17], [976, 27], [981, 22], [982, 22], [983, 17], [985, 13], [517, 9], [521, 15], [525, 11], [526, 13], [528, 14], [698, 14], [788, 24], [792, 19]]
However I now note that I need a third element in each of these sub-lists.
That is this:
[1,'2013-02-26T15:01:11Z'],
I need the first element of every list with length of 2 to be appended to this filtered list as a third element, for the elements that follow. But when there is a new list with length 2 then I need that new value to be appended to the subsequent entries.
So my final list example could look like, note the change to 13 for the third element upon finding another list with length 2:
[[10564, 15, 1], [10564, 13, 1], [10589, 18, 1], [10637, 39, 1], [10662, 38, 1], [10837, 45, 1], [3, 17, 13], [7, 21, 13], [46, 26, 13], etc]
How do I do this? Do i have to filter twice with len >=22 and len = 2, and a separate filter for just len >=22 as I wouldn't want to append element 0 or 1 to my final list for the lists with length 2.
I would try to make it readable:
UARFCNRSSI = []
x = None # future "third element"; please choose a better name
for item in data["Sequences"]:
if len(item) == 2:
x = item[0]
elif len(item) >= 22:
UARFCNRSSI.append([item[0], item[1], x])
I'd go with a generator to filter your data:
def filterdata(sequences):
add = []
for item in sequences:
if len(item) == 2:
add = [item[0]]
elif len(item) >= 22:
yield [item[0], item[1]] + add
You can access it like data = list(filterdata(data['Sequences']))