Fast way to iterate over three dictionaries?

Fast way to iterate over three dictionaries? - python

I am dealing with very large three dictionaries which looks like this:
dict_a = { ( 't','e' ) : [0.5,0.1,0.6], ( 'a','b' ) : [0.2,0.3,0.9] }
dict_b = { ( 'a','b' ) : [0.1,0.5,0.3] , ( 't','e' ) : [0.6,0.1,0.6] }
dict_c = { ( 'a','b' ) : [0.1,0.5,0.3] , ( 't','e' ) : [0.6,0.5,0.6] }
I am looking for the output like this :
name first_value second_value third_value
0 (t, e) [0.5, 0.1, 0.6] [0.6, 0.1, 0.6] [0.6, 0.5, 0.6]
1 (a, b) [0.2, 0.3, 0.9] [0.1, 0.5, 0.3] [0.1, 0.5, 0.3]
What I've tried is :
final_dict = {'name': [] , 'first_value' : [] ,'second_value': [] , 'third_value': [] }
for a,b in dict_a.items():
for c,d in dict_b.items():
for e,f in dict_c.items():
if a==c==e:
final_dict['name'].append(a)
final_dict['first_value'].append(b)
final_dict['second_value'].append(d)
final_dict['third_value'].append(f)
Which is really not efficient and optimize way to do this task. I was thinking to use pandas.
How can I do this task in minimal time complexity?
Thank you !

Because these are dictionaries, you only need to iterate over one. You can use the key to get the corresponding value from the others.
Example:
for key, value in dict_a.items():
final_dict['name'].append(key)
final_dict['first_value'].append(value)
final_dict['second_value'].append(dict_b[key])
final_dict['third_value'].append(dict_c[key])

Try this way:-
df = pd.DataFrame([dict_a, dict_b, dict_c], index = ['first_value',
'second_value', 'third_value']).T
df['names'] = df.index
df.index = [0, 1]
print(df)
Output:-
first_value second_value third_value names
0 [0.2, 0.3, 0.9] [0.1, 0.5, 0.3] [0.1, 0.5, 0.3] (a, b)
1 [0.5, 0.1, 0.6] [0.6, 0.1, 0.6] [0.6, 0.5, 0.6] (t, e)

How about:
pd.DataFrame({i:d for i,d in enumerate([dict_a,dict_b,dict_c])} )
Output:
0 1 2
a b [0.2, 0.3, 0.9] [0.1, 0.5, 0.3] [0.1, 0.5, 0.3]
t e [0.5, 0.1, 0.6] [0.6, 0.1, 0.6] [0.6, 0.5, 0.6]

Here is one way
pd.concat([pd.Series(x) for x in [dict_a,dict_b,dict_c]],axis=1)
Out[332]:
0 1 2
a b [0.2, 0.3, 0.9] [0.1, 0.5, 0.3] [0.1, 0.5, 0.3]
t e [0.5, 0.1, 0.6] [0.6, 0.1, 0.6] [0.6, 0.5, 0.6]

Related

How to expand the output of a truncated list to view more values of Pandas dataframe

Is there a specific way to display all the truncated data values of a list. The displayed values are as follows
v w
Row1 [0.1, 0.2, 0.3 .....1.0] [0.1, 0.2, 0.3 .....1.0]
Here are the option I tried
Option 1
p
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_seq_items', None)
z = pd.read_csv('a.csv')
Output:
it is still truncated
Option 2
for i, row in z.iterrows():
for j in row['w']:
print(j)
Output:
it is stll truncated
Any help on how to display all the truncated values and display the full list.

You can print after conversion to_string:
print(df.to_string())
output:
v w
Row1 [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
Used input:
L = list(np.arange(0, 1.1, 0.1).round(2))
df = pd.DataFrame({'v': [L], 'w': [L]}, index=['Row1'])
default print:
v \
Row1 [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, ...
w
Row1 [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, ...

Get indices greater than value and keep value

I have a 2D array that looks like this:
[[0.1, 0.2, 0.4, 0.6, 0.9]
[0.3, 0.7, 0.8, 0.3, 0.9]
[0.7, 0.9, 0.4, 0.6, 0.9]
[0.1, 0.2, 0.6, 0.6, 0.9]]
And I want to save the indices where the array is higher than 0.6 but I also want to keep the value of that position, so the output would be:
[0, 3, 0.6]
[0, 4, 0.9]
[1, 2, 0.7]
and so on.
To get the indices I did this:
x = np.where(PPCF> 0.6)
high_pc = np.asarray(x).T.tolist()
but how do I keep the value in a third position?

Simple, no loops:
x = np.where(PPCF > 0.6) # condition to screen values
vals = PPCF[x] # find values by indices
np.concatenate((np.array(x).T, vals.reshape(vals.size, 1)), axis = 1) # resulting array
Feel free to convert it to a list.

This should work :
x = np.where(PPCF> 0.6)
high_pc = np.asarray(x).T.tolist()
for i in high_pc:
i.append(float(PPCF[i[0],i[1]]))

You could just run a loop along the columns and rows and check if each element is greater than the threshold and save them in a list.
a = [[0.1, 0.2, 0.4, 0.6, 0.9],
[0.3, 0.7, 0.8, 0.3, 0.9],
[0.7, 0.9, 0.4, 0.6, 0.9],
[0.1, 0.2, 0.6, 0.6, 0.9]]
def find_ix(a, threshold = 0.6):
res_list = []
for i in range(len(a)):
for j in range(len(a[i])):
if a[i][j] >= threshold:
res_list.append([i, j, a[i][j]])
return res_list
print("Resulting list = \n ", find_ix(a))

import numpy as np
arr = np.array([[0.1, 0.2, 0.4, 0.6, 0.9],
[0.3, 0.7, 0.8, 0.3, 0.9],
[0.7, 0.9, 0.4, 0.6, 0.9],
[0.1, 0.2, 0.6, 0.6, 0.9]])
rows, cols = np.where(arr > 0.6) # Get rows and columns where arr > 0.6
values = arr[rows, cols] # Get all values > 0.6 in arr
result = np.column_stack((rows, cols, values)) # Stack three columns to create final array
"""
Result -
[ 0. 4. 0.9]
[ 1. 1. 0.7]
[ 1. 2. 0.8]
[ 1. 4. 0.9]
[ 2. 0. 0.7]
[ 2. 1. 0.9]
[ 2. 4. 0.9]
[ 3. 4. 0.9]]
"""
You can convert result into a list.

Why does random.shuffle fail on numpy lists?

I have an array of row vectors, upon which I run random.shuffle:
#!/usr/bin/env python
import random
import numpy as np
zzz = np.array([[0.1, 0.2, 0.3, 0.4, 0.5],
[0.6, 0.7, 0.8, 0.9, 1. ]])
iterations = 100000
f = 0
for _ in range(iterations):
random.shuffle(zzz)
if np.array_equal(zzz[0], zzz[1]):
print(zzz)
f += 1
print(float(f)/float(iterations))
Between 99.6 and 100% of the time, using random.shuffle on zzz returns a list with the same elements in it, e.g.:
$ ./test.py
...
[[ 0.1 0.2 0.3 0.4 0.5]
[ 0.1 0.2 0.3 0.4 0.5]]
0.996
Using numpy.random.shuffle appears to pass this test and shuffle row vectors correctly. I'm curious to know why random.shuffle fails.

If you look at the code of random.shuffle it performs swaps in the following way:
x[i], x[j] = x[j], x[i]
which for a numpy.array would fail, without raising any error. Example:
>>> zzz[1], zzz[0] = zzz[0], zzz[1]
>>> zzz
array([[0.1, 0.2, 0.3, 0.4, 0.5],
[0.1, 0.2, 0.3, 0.4, 0.5]])
The reason is that Python first evaluates the right hand side completely and then make the assignment (this is why with Python single line swap is possible) but for a numpy array this is not True.
numpy
>>> arr = np.array([[1],[1]])
>>> arr[0], arr[1] = arr[0]+1, arr[0]
>>> arr
array([[2],
[2]])
Python
>>> l = [1,1]
>>> l[0], l[1] = l[0]+1, l[0]
>>> l
[2, 1]

Try it like this :
#!/usr/bin/env python
import random
import numpy as np
zzz = np.array([[0.1, 0.2, 0.3, 0.4, 0.5],
[0.6, 0.7, 0.8, 0.9, 1. ]])
iterations = 100000
f = 0
for _ in range(iterations):
random.shuffle(zzz[0])
random.shuffle(zzz[1])
if np.array_equal(zzz[0], zzz[1]):
print(zzz)
f += 1
print(float(f)/float(iterations))

In [200]: zzz = np.array([[0.1, 0.2, 0.3, 0.4, 0.5],
...: [0.6, 0.7, 0.8, 0.9, 1. ]])
...:
In [201]: zl = zzz.tolist()
In [202]: zl
Out[202]: [[0.1, 0.2, 0.3, 0.4, 0.5], [0.6, 0.7, 0.8, 0.9, 1.0]]
random.random is probably using an in-place assignment like:
In [203]: zzz[0],zzz[1]=zzz[1],zzz[0]
In [204]: zzz
Out[204]:
array([[0.6, 0.7, 0.8, 0.9, 1. ],
[0.6, 0.7, 0.8, 0.9, 1. ]])
Note the replication.
But applied to a list of lists:
In [205]: zl[0],zl[1]=zl[1],zl[0]
In [206]: zl
Out[206]: [[0.6, 0.7, 0.8, 0.9, 1.0], [0.1, 0.2, 0.3, 0.4, 0.5]]
In [207]: zl[0],zl[1]=zl[1],zl[0]
In [208]: zl
Out[208]: [[0.1, 0.2, 0.3, 0.4, 0.5], [0.6, 0.7, 0.8, 0.9, 1.0]]
I tested zl = list(zzz) and still got the array behavior. This zl is a list with views of zzz. tolist makes a list of lists thats totally independent ofzzz`.
In short random.random cannot handle inplace modifications of a ndarray correctly. np.random.shuffle is designed to work with the 1st dim of an array, so it gets it right.
correct assignment for ndarray is:
In [211]: zzz = np.array([[0.1, 0.2, 0.3, 0.4, 0.5],
...: [0.6, 0.7, 0.8, 0.9, 1. ]])
...:
In [212]: zzz[[0,1]] = zzz[[1,0]]
In [213]: zzz
Out[213]:
array([[0.6, 0.7, 0.8, 0.9, 1. ],
[0.1, 0.2, 0.3, 0.4, 0.5]])
In [214]: zzz[[0,1]] = zzz[[1,0]]
In [215]: zzz
Out[215]:
array([[0.1, 0.2, 0.3, 0.4, 0.5],
[0.6, 0.7, 0.8, 0.9, 1. ]])

numpy arange implementation on pandas dataframe

I have a dataframe, like so,
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': [0, 0.5, 0.2],
'b': [1,1,0.3]})
print (df)
a b
0 0.0 1.0
1 0.5 1.0
2 0.2 0.3
I want to generate a Series that looks like
pd.Series ([np.arange ( start = 0, stop = 1, step = 0.1),
np.arange ( start = 0.5, stop = 1, step = 0.1),
np.arange ( start = 0.2, stop = 0.3, step = 0.1)])
0 [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, ...
1 [0.5, 0.6, 0.7, 0.8, 0.9]
2 [0.2]
dtype: object
I am trying to do this with a lambda function and getting an error, like so
foo = lambda x: np.arange(start = x.a, stop = x.b, step = 0.1)
print (df.apply(foo, axis =1))
ValueError: Shape of passed values is (3, 10), indices imply (3, 2)
I am not sure what this means. Is there a better/correct way to do this?

I'd use a comprehension
pd.Series([np.arange(a, b, .1) for a, b in zip(df.a, df.b)], df.index)
0 [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, ...
1 [0.5, 0.6, 0.7, 0.8, 0.9]
2 [0.2]
dtype: object

Use itertuples with Series constructor:
s = pd.Series([np.arange(x.a, x.b, .1) for x in df.itertuples()], index=df.index)
print (s)
0 [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, ...
1 [0.5, 0.6, 0.7, 0.8, 0.9]
2 [0.2]
dtype: object
s = pd.Series([np.arange(x.a, x.b, .1) for i, x in df.iterrows()], index=df.index)
print (s)
0 [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, ...
1 [0.5, 0.6, 0.7, 0.8, 0.9]
2 [0.2]
dtype: object
With apply works only converting to tuple:
foo = lambda x: tuple(np.arange(start = x.a, stop = x.b, step = 0.1))
print (df.apply(foo, axis = 1))
0 (0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, ...
1 (0.5, 0.6, 0.7, 0.8, 0.9)
2 (0.2,)
dtype: object

Get items from multidimensional list Python

I have a list with the following appearance:
[0] = [ [0.0, 100.0], [0.1, 93.08], [0.3, 92.85], [0.5, 92.62], [0.7, 91.12], [0.9, 90.89] ]
[1] = [ [0.0, 100.0], [0.1, 2.79], [0.3, 2.62], [0.5, 2.21], [0.7, 1.83], [0.9, 1.83] ]
and I´d like to obtain vectors to plot the info as follows:
[0.0, 0.1, 0.3, 0.5, 0.7, 0.9]
[100.0, 93.08, 92.85, 92.62, 91.12, 90.89]
and the same with all entries in the list.
I was trying something like:
x = mylist[0][:][0]
Any ideas? I appreciate the help!

Use zip:
>>> mylist = [
[0.0, 100.0], [0.1, 93.08], [0.3, 92.85], [0.5, 92.62],
[0.7, 91.12], [0.9, 90.89] ]
>>> a, b = zip(*mylist)
>>> a
(0.0, 0.1, 0.3, 0.5, 0.7, 0.9)
>>> b
(100.0, 93.08, 92.85, 92.62, 91.12, 90.89)
>>> list(a)
[0.0, 0.1, 0.3, 0.5, 0.7, 0.9]
>>> list(b)
[100.0, 93.08, 92.85, 92.62, 91.12, 90.89]

With pure-python you should use list-comprehension
data = [ [0.0, 100.0], [0.1, 93.08], [0.3, 92.85], [0.5, 92.62], [0.7, 91.12], [0.9, 90.89] ]
listx = [item[0] for item in data ]
listy = [item[1] for item in data ]
>>>listx
[0.0, 0.1, 0.3, 0.5, 0.7, 0.9]
>>>listy
[100.0, 93.08, 92.85, 92.62, 91.12, 90.89]
I think its a bit better than zip because it is easier to read and you do not have to cast the tuples

numpy solution:
import numpy as np
data = [ [0.0, 100.0], [0.1, 93.08], [0.3, 92.85], [0.5, 92.62], [0.7, 91.12], [0.9, 90.89] ]
data_np = np.array(data)
a = data_np[:,0]
b = data_np[:,1]
In [126]: a
Out[126]: array([ 0. , 0.1, 0.3, 0.5, 0.7, 0.9])
In [127]: b
Out[127]: array([ 100. , 93.08, 92.85, 92.62, 91.12, 90.89])

One can use map too:
a = map(lambda x: x[0], your_list)
b = map(lambda x: x[1], your_list)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fast way to iterate over three dictionaries? - python

How about: pd.DataFrame({i:d for i,d in enumerate([dict_a,dict_b,dict_c])} ) Output: 0 1 2 a b [0.2, 0.3, 0.9] [0.1, 0.5, 0.3] [0.1, 0.5, 0.3] t e [0.5, 0.1, 0.6] [0.6, 0.1, 0.6] [0.6, 0.5, 0.6]

Here is one way pd.concat([pd.Series(x) for x in [dict_a,dict_b,dict_c]],axis=1) Out[332]: 0 1 2 a b [0.2, 0.3, 0.9] [0.1, 0.5, 0.3] [0.1, 0.5, 0.3] t e [0.5, 0.1, 0.6] [0.6, 0.1, 0.6] [0.6, 0.5, 0.6]

Related

How to expand the output of a truncated list to view more values of Pandas dataframe

Get indices greater than value and keep value

Why does random.shuffle fail on numpy lists?

numpy arange implementation on pandas dataframe

Get items from multidimensional list Python

Categories

Resources