how to separate columns of itertools.product to matrices/arrays - python

I have a dataset of like 3 items e.g. [1,2,3]
I want to find the product of it with 3 repeats and then separate them into 3 datasets like this (it should be vertical actually):
[1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3]
[1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3]
[1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3]
I noticed that in python I can use iteration.product for finding products as:
data_prod=itertools.product(data,repeat=3)
now my question is how can I convert each column of the result (which the datatype is itertools.product) to 3 new datasets as shown in above example?

Use zip(*..) to turn columns into rows:
dataset1, dataset2, dataset3 = zip(*itertools.product(data,repeat=3))
Demo:
>>> zip(*itertools.product(data,repeat=3))
[(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3), (1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3), (1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3)]
>>> dataset1, dataset2, dataset3 = zip(*itertools.product(data,repeat=3))
>>> dataset1
(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3)
>>> dataset2
(1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3)
>>> dataset3
(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3)

An alternate way, for display purposes, still using itertools.product:
import itertools
import pandas as pd
cols=['series1', 'series2', 'series3']
originDataset = [1,2,3]
data_prod = lambda x: list(itertools.product(x, repeat=3))
df1 = pd.DataFrame(originDataset, columns=['OriginalDataSet'])
df2 = pd.DataFrame(data_prod(originDataset), columns=cols)
print df1
print '-'*80
print df2
print '-'*80
series1, series2, series3 = df2.T.values
print series1
print series2
print series3
Output:
OriginalDataSet
0 1
1 2
2 3
--------------------------------------------------------------------------------
series1 series2 series3
0 1 1 1
1 1 1 2
2 1 1 3
3 1 2 1
4 1 2 2
5 1 2 3
6 1 3 1
7 1 3 2
8 1 3 3
9 2 1 1
10 2 1 2
11 2 1 3
12 2 2 1
13 2 2 2
14 2 2 3
15 2 3 1
16 2 3 2
17 2 3 3
18 3 1 1
19 3 1 2
20 3 1 3
21 3 2 1
22 3 2 2
23 3 2 3
24 3 3 1
25 3 3 2
26 3 3 3
--------------------------------------------------------------------------------
[1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3]
[1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3]
[1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3]
I hope it helps to, at the same time, learn how to use Pandas

Related

Compare 3 columns of a 2-D List and Replace based on conditions

I have a 2-D List as follows:
[
[6 4 4 2 5 5 4 5 4 1 3 5]
[4 3 6 5 4 4 5 1 5 5 2 4]
[2 5 2 0 4 5 4 4 2 3 2 6]
[5 5 4 3 5 4 6 7 3 4 4 4]
[3 5 6 5 6 5 3 5 3 4 7 4]
[4 5 5 4 5 4 7 5 3 5 4 1]
[2 5 3 3 5 3 4 4 3 3 1 3]
[2 5 5 2 5 4 6 2 5 6 2 5]
]
Conditions:
compare column 1,5 and 9 (in steps of 4) - row-wise and process them in the following order
If one of them is zero - do nothing. Go to Step 2
(6,5,4) - none of them zero so go to step 2
If they are all equal - change all of them to zero. If not go Step 3
Take the lowest of the three and subtract each by this minimum
Repeat this with next three elements (2,6,10) until (4,8,12)
How to do efficiently this in python using pandas or numpy or even list operation.
Any help appreciated. Thanks!
You could write a custom function and then apply that functions to every element in the array.
def check_conditions(x):
for i in range(4):
if x[i] == 0 or x[i+4] == 0 or x[i+8] == 0:
continue
elif x[i] == x[i+4] == x[i+8]:
x[i] = 0
x[i+4] = 0
x[i+8] = 0
else:
min_val = min(x[i], x[i+4], x[i+8])
x[i] -= min_val
x[i+4] -= min_val
x[i+8] -= min_val
return x
new_arr = [check_conditions(x) for x in arr]
To get the following result.
print(new_arr)
[[2, 3, 1, 2, 1, 4, 1, 5, 0, 0, 0, 5],
[0, 0, 4, 5, 0, 1, 3, 1, 1, 2, 0, 4],
[0, 2, 0, 0, 2, 2, 2, 4, 0, 0, 0, 6],
[2, 1, 0, 3, 2, 0, 2, 7, 0, 0, 0, 4],
[0, 1, 3, 5, 3, 1, 0, 5, 0, 0, 4, 4],
[1, 1, 1, 4, 2, 0, 3, 5, 0, 1, 0, 1],
[0, 2, 2, 3, 3, 0, 3, 4, 1, 0, 0, 3],
[0, 1, 3, 2, 3, 0, 4, 2, 3, 2, 0, 5]]

Python dataframe repeat column data in each cell as a list

I am trying to repeat the whole data in a column in each each cell of the column.
My code:
df3=pd.DataFrame({
'x':[1,2,3,4,5],
'y':[10,20,30,20,10],
'z':[5,4,3,2,1]
})
df3 =
x y z
0 1 10 5
1 2 20 4
2 3 30 3
3 4 20 2
4 5 10 1
df3['z'] = df['z'].agg(lambda x: list(x))
Present output:
KeyError: 'z'
Expected output:
df=
x y z
0 1 10 [5, 4, 3, 2, 1]
1 2 20 [5, 4, 3, 2, 1]
2 3 30 [5, 4, 3, 2, 1]
3 4 20 [5, 4, 3, 2, 1]
4 5 10 [5, 4, 3, 2, 1]
Another way is to list(df.column.values)
df3.assign(z=[list(df3.z.values)]*len(df3))
x y z
0 5 10 [5, 4, 3, 2, 1]
1 4 20 [5, 4, 3, 2, 1]
2 3 30 [5, 4, 3, 2, 1]
3 2 20 [5, 4, 3, 2, 1]
4 1 10 [5, 4, 3, 2, 1]
Check with
df3['new_z']=[df3.z.tolist()]*len(df3)
More safe
df3['new_z']=[df3.z.tolist() for x in df.index]

How to select pairs from two lists/?

I have two lists from which I want to select pairs in such a way that each item in one set is paired with another item in the other set only when they are not the same. This is the code I tried so far.
start1 = [1, 4, 0, 2, 0, 3, 3, 3, 3, 1]
end1 = [0, 0, 0, 2, 1, 2, 2, 4, 1, 4]
for x in start1:
for y in end1:
if x != y:
print(x,y)
The above code gives me results that look like this...
1 0
1 0
1 0
1 2
1 2
1 2
1 4
1 4
4 0
4 0
4 0
4 2
4 1
4 2
4 2
4 1
.
.
.
However, trying to get results like this...
1 0
4 0
0 1
3 2
3 2
3 4
3 1
1 4
As I am new to python, I am having difficulties with this problem. Can someone kindly guide me to achieve my goal?
Regards.
Zip the lists together, filtering the results.
start1 = [1, 4, 0, 2, 0, 3, 3, 3, 3, 1]
end1 = [0, 0, 0, 2, 1, 2, 2, 4, 1, 4]
for x, y in zip(start1, end1):
if x != y:
print(x,y)
[item for item in zip(start1, end1) if item[0] != item[1]]
>> [(1, 0), (4, 0), (0, 1), (3, 2), (3, 2), (3, 4), (3, 1), (1, 4)]

changing a list of strings in python

I am trying to change this list
['AAAAA 4 2 1 2 4 2 4 4 5 2 2 1 5 2 4 3 1 1 3 3 5',
'BBB 5 2 1 2 4 5 4 4 1 2 2 2 4 4 4 3 1 2 3 3 2',
'K 4 1 2 1 2 1 2 5 1 1 1 1 4 2 2 1 5 1 3 4 1']
to something that looks like this
AAAAA -- [4, 2, 1, 2, 4, 2, 4, 4, 5, 2, 2, 1, 5, 2, 4, 3, 1, 1, 3, 3, 5]
BBB -- [5, 2, 1, 2, 4, 5, 4, 4, 1, 2, 2, 2, 4, 4, 4, 3, 1, 2, 3, 3, 2]
K -- [4, 1, 2, 1, 2, 1, 2, 5, 1, 1, 1, 1, 4, 2, 2, 1, 5, 1, 3, 4, 1]
i have tried multiple ways and cant figure it out
You just need to split the elements in each string, take the first element and set it as key of the dictionary, and convert the rest of the elements to integers, and store as values:
list_ = ['AAAAA 4 2 1 2 4 2 4 4 5 2 2 1 5 2 4 3 1 1 3 3 5',
'BBB 5 2 1 2 4 5 4 4 1 2 2 2 4 4 4 3 1 2 3 3 2',
'K 4 1 2 1 2 1 2 5 1 1 1 1 4 2 2 1 5 1 3 4 1']
dict_ = {}
for string in list_:
alpha, *numbers = string.split()
dict_[alpha] = [*map(int,numbers)]
for alpha, numbers in dict_.items():
print(f"{alpha} -- {numbers}")
Output:
AAAAA -- [4, 2, 1, 2, 4, 2, 4, 4, 5, 2, 2, 1, 5, 2, 4, 3, 1, 1, 3, 3, 5]
BBB -- [5, 2, 1, 2, 4, 5, 4, 4, 1, 2, 2, 2, 4, 4, 4, 3, 1, 2, 3, 3, 2]
K -- [4, 1, 2, 1, 2, 1, 2, 5, 1, 1, 1, 1, 4, 2, 2, 1, 5, 1, 3, 4, 1]
If you want to go fancy:
generator_ = (f"{alpha} -- {[*map(int,numbers)]}" for alpha, *numbers in [l.split() for l in list_])
print(*generator_, sep = '\n')
If you want to reproduce what you asked in the question:
x = ['AAAAA 4 2 1 2 4 2 4 4 5 2 2 1 5 2 4 3 1 1 3 3 5', 'BBB 5 2 1 2 4 5 4 4 1 2 2 2 4 4 4 3 1 2 3 3 2', 'K 4 1 2 1 2 1 2 5 1 1 1 1 4 2 2 1 5 1 3 4 1']
for elem in x:
split = elem.split(" ")
print("{} -- {}".format(split[0],[int(i) for i in split[1:]]))
This:
Loops through the list x
Splits its items into a separate list split
Separates first element from rest with a "--" when printing
Or using a dictionary:
x = ['AAAAA 4 2 1 2 4 2 4 4 5 2 2 1 5 2 4 3 1 1 3 3 5', 'BBB 5 2 1 2 4 5 4 4 1 2 2 2 4 4 4 3 1 2 3 3 2', 'K 4 1 2 1 2 1 2 5 1 1 1 1 4 2 2 1 5 1 3 4 1']
# Create dictionary following above logic
d = dict()
for elem in x:
split = elem.split(" ")
d.update({split[0] : [int(i) for i in split[1:]]})
# Loop through its keys and values and print as needed
for k, v in d.items():
print("{} -- {}".format(k, v))
Output:
K -- [4, 1, 2, 1, 2, 1, 2, 5, 1, 1, 1, 1, 4, 2, 2, 1, 5, 1, 3, 4, 1]
AAAAA -- [4, 2, 1, 2, 4, 2, 4, 4, 5, 2, 2, 1, 5, 2, 4, 3, 1, 1, 3, 3, 5]
BBB -- [5, 2, 1, 2, 4, 5, 4, 4, 1, 2, 2, 2, 4, 4, 4, 3, 1, 2, 3, 3, 2]
inputlist=['AAAAA 4 2 1 2 4 2 4 4 5 2 2 1 5 2 4 3 1 1 3 3 5', 'BBB 5 2 1 2 4 5 4 4 1 2 2 2 4 4 4 3 1 2 3 3 2', 'K 4 1 2 1 2 1 2 5 1 1 1 1 4 2 2 1 5 1 3 4 1']
for item in inputlist:
item_to_list=item.split(" ")
temp_list=[int(i) for i in list(filter(None, item_to_list[2:]))]
print("{0} -- {1}".format(item_to_list[0],str(temp_list)))
Output:
AAAAA -- [4, 2, 1, 2, 4, 2, 4, 4, 5, 2, 2, 1, 5, 2, 4, 3, 1, 1, 3, 3, 5]
BBB -- [5, 2, 1, 2, 4, 5, 4, 4, 1, 2, 2, 2, 4, 4, 4, 3, 1, 2, 3, 3,2]
K -- [4, 1, 2, 1, 2, 1, 2, 5, 1, 1, 1, 1, 4, 2, 2, 1, 5, 1, 3, 4, 1]
Try this:
name=["AAAAA 4 2 1 2 4 2 4 4 5 2 2 1 5 2 4 3 1 1 3 3 5",
"BBB 5 2 1 2 4 5 4 4 1 2 2 2 4 4 4 3 1 2 3 3 2",
"K 4 1 2 1 2 1 2 5 1 1 1 1 4 2 2 1 5 1 3 4 1"]
for item in name:
thelist = list(map(int, ','.join(item.split(' ')[1:]).split(',')))
print(f"{item.split(' ')[0]} -- {thelist}")
output:
AAAAA -- [4,2,1,2,4,2,4,4,5,2,2,1,5,2,4,3,1,1,3,3,5]
BBB -- [5,2,1,2,4,5,4,4,1,2,2,2,4,4,4,3,1,2,3,3,2]
K -- [4,1,2,1,2,1,2,5,1,1,1,1,4,2,2,1,5,1,3,4,1]
x = ['AAAAA 4 2 1 2 4 2 4 4 5 2 2 1 5 2 4 3 1 1 3 3 5', 'BBB 5 2 1 2 4 5 4 4 1 2 2 2 4 4 4 3 1 2 3 3 2', 'K 4 1 2 1 2 1 2 5 1 1 1 1 4 2 2 1 5 1 3 4 1']
for i in x:
i = i.split(' ')
tmp = {i[0]:[int(items) for items in i[1:]]}
for i, j in tmp.items():
print(f"{i} - {j}")
Output:
AAAAA - [4, 2, 1, 2, 4, 2, 4, 4, 5, 2, 2, 1, 5, 2, 4, 3, 1, 1, 3, 3, 5]
BBB - [5, 2, 1, 2, 4, 5, 4, 4, 1, 2, 2, 2, 4, 4, 4, 3, 1, 2, 3, 3, 2]
K - [4, 1, 2, 1, 2, 1, 2, 5, 1, 1, 1, 1, 4, 2, 2, 1, 5, 1, 3, 4, 1]
you can make a dict, and also use that dict to get your specific output if that's what you want:
assuming your list is called full_list
lists = [sub.split() for sub in full_list]
keys = [l[0] for l in lists]
vals = [list(map(int,l[1:])) for l in lists]
d = {k:v for k,v in zip(keys,vals)}
if desired to get that specific output:
for k,v in d.items():
print(f'{k} -- {v}')
output:
AAAAA -- [4, 2, 1, 2, 4, 2, 4, 4, 5, 2, 2, 1, 5, 2, 4, 3, 1, 1, 3, 3, 5]
BBB -- [5, 2, 1, 2, 4, 5, 4, 4, 1, 2, 2, 2, 4, 4, 4, 3, 1, 2, 3, 3, 2]
K -- [4, 1, 2, 1, 2, 1, 2, 5, 1, 1, 1, 1, 4, 2, 2, 1, 5, 1, 3, 4, 1]

pandas equivalent to R series of multiple repeated numbers

I want to create a simple vector of many repeated values. This is easy in R:
> numbers <- c(rep(1,5), rep(2,4), rep(3,3))
> numbers
[1] 1 1 1 1 1 2 2 2 2 3 3 3
However, if I try to do this in Python using pandas and numpy, I don't quite get the same thing:
numbers = pd.Series([np.repeat(1,5), np.repeat(2,4), np.repeat(3,3)])
numbers
0 [1, 1, 1, 1, 1]
1 [2, 2, 2, 2]
2 [3, 3, 3]
dtype: object
What's the R equivalent in Python?
Just adjust how you use np.repeat
np.repeat([1, 2, 3], [5, 4, 3])
array([1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3])
Or with pd.Series
pd.Series(np.repeat([1, 2, 3], [5, 4, 3]))
0 1
1 1
2 1
3 1
4 1
5 2
6 2
7 2
8 2
9 3
10 3
11 3
dtype: int64
That said, the purest form to replicate what you've done in R is to use np.concatenate in conjunction with np.repeat. It just isn't what I'd recommend doing.
np.concatenate([np.repeat(1,5), np.repeat(2,4), np.repeat(3,3)])
array([1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3])
Now you can use the same syntax in python:
>>> from datar.base import c, rep
>>>
>>> numbers = c(rep(1,5), rep(2,4), rep(3,3))
>>> print(numbers)
[1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3]
I am the author of the datar package. Feel free to submit issues if you have any questions.

Categories