changing a list of strings in python - python

I am trying to change this list
['AAAAA 4 2 1 2 4 2 4 4 5 2 2 1 5 2 4 3 1 1 3 3 5',
'BBB 5 2 1 2 4 5 4 4 1 2 2 2 4 4 4 3 1 2 3 3 2',
'K 4 1 2 1 2 1 2 5 1 1 1 1 4 2 2 1 5 1 3 4 1']
to something that looks like this
AAAAA -- [4, 2, 1, 2, 4, 2, 4, 4, 5, 2, 2, 1, 5, 2, 4, 3, 1, 1, 3, 3, 5]
BBB -- [5, 2, 1, 2, 4, 5, 4, 4, 1, 2, 2, 2, 4, 4, 4, 3, 1, 2, 3, 3, 2]
K -- [4, 1, 2, 1, 2, 1, 2, 5, 1, 1, 1, 1, 4, 2, 2, 1, 5, 1, 3, 4, 1]
i have tried multiple ways and cant figure it out

You just need to split the elements in each string, take the first element and set it as key of the dictionary, and convert the rest of the elements to integers, and store as values:
list_ = ['AAAAA 4 2 1 2 4 2 4 4 5 2 2 1 5 2 4 3 1 1 3 3 5',
'BBB 5 2 1 2 4 5 4 4 1 2 2 2 4 4 4 3 1 2 3 3 2',
'K 4 1 2 1 2 1 2 5 1 1 1 1 4 2 2 1 5 1 3 4 1']
dict_ = {}
for string in list_:
alpha, *numbers = string.split()
dict_[alpha] = [*map(int,numbers)]
for alpha, numbers in dict_.items():
print(f"{alpha} -- {numbers}")
Output:
AAAAA -- [4, 2, 1, 2, 4, 2, 4, 4, 5, 2, 2, 1, 5, 2, 4, 3, 1, 1, 3, 3, 5]
BBB -- [5, 2, 1, 2, 4, 5, 4, 4, 1, 2, 2, 2, 4, 4, 4, 3, 1, 2, 3, 3, 2]
K -- [4, 1, 2, 1, 2, 1, 2, 5, 1, 1, 1, 1, 4, 2, 2, 1, 5, 1, 3, 4, 1]
If you want to go fancy:
generator_ = (f"{alpha} -- {[*map(int,numbers)]}" for alpha, *numbers in [l.split() for l in list_])
print(*generator_, sep = '\n')

If you want to reproduce what you asked in the question:
x = ['AAAAA 4 2 1 2 4 2 4 4 5 2 2 1 5 2 4 3 1 1 3 3 5', 'BBB 5 2 1 2 4 5 4 4 1 2 2 2 4 4 4 3 1 2 3 3 2', 'K 4 1 2 1 2 1 2 5 1 1 1 1 4 2 2 1 5 1 3 4 1']
for elem in x:
split = elem.split(" ")
print("{} -- {}".format(split[0],[int(i) for i in split[1:]]))
This:
Loops through the list x
Splits its items into a separate list split
Separates first element from rest with a "--" when printing
Or using a dictionary:
x = ['AAAAA 4 2 1 2 4 2 4 4 5 2 2 1 5 2 4 3 1 1 3 3 5', 'BBB 5 2 1 2 4 5 4 4 1 2 2 2 4 4 4 3 1 2 3 3 2', 'K 4 1 2 1 2 1 2 5 1 1 1 1 4 2 2 1 5 1 3 4 1']
# Create dictionary following above logic
d = dict()
for elem in x:
split = elem.split(" ")
d.update({split[0] : [int(i) for i in split[1:]]})
# Loop through its keys and values and print as needed
for k, v in d.items():
print("{} -- {}".format(k, v))
Output:
K -- [4, 1, 2, 1, 2, 1, 2, 5, 1, 1, 1, 1, 4, 2, 2, 1, 5, 1, 3, 4, 1]
AAAAA -- [4, 2, 1, 2, 4, 2, 4, 4, 5, 2, 2, 1, 5, 2, 4, 3, 1, 1, 3, 3, 5]
BBB -- [5, 2, 1, 2, 4, 5, 4, 4, 1, 2, 2, 2, 4, 4, 4, 3, 1, 2, 3, 3, 2]

inputlist=['AAAAA 4 2 1 2 4 2 4 4 5 2 2 1 5 2 4 3 1 1 3 3 5', 'BBB 5 2 1 2 4 5 4 4 1 2 2 2 4 4 4 3 1 2 3 3 2', 'K 4 1 2 1 2 1 2 5 1 1 1 1 4 2 2 1 5 1 3 4 1']
for item in inputlist:
item_to_list=item.split(" ")
temp_list=[int(i) for i in list(filter(None, item_to_list[2:]))]
print("{0} -- {1}".format(item_to_list[0],str(temp_list)))
Output:
AAAAA -- [4, 2, 1, 2, 4, 2, 4, 4, 5, 2, 2, 1, 5, 2, 4, 3, 1, 1, 3, 3, 5]
BBB -- [5, 2, 1, 2, 4, 5, 4, 4, 1, 2, 2, 2, 4, 4, 4, 3, 1, 2, 3, 3,2]
K -- [4, 1, 2, 1, 2, 1, 2, 5, 1, 1, 1, 1, 4, 2, 2, 1, 5, 1, 3, 4, 1]

Try this:
name=["AAAAA 4 2 1 2 4 2 4 4 5 2 2 1 5 2 4 3 1 1 3 3 5",
"BBB 5 2 1 2 4 5 4 4 1 2 2 2 4 4 4 3 1 2 3 3 2",
"K 4 1 2 1 2 1 2 5 1 1 1 1 4 2 2 1 5 1 3 4 1"]
for item in name:
thelist = list(map(int, ','.join(item.split(' ')[1:]).split(',')))
print(f"{item.split(' ')[0]} -- {thelist}")
output:
AAAAA -- [4,2,1,2,4,2,4,4,5,2,2,1,5,2,4,3,1,1,3,3,5]
BBB -- [5,2,1,2,4,5,4,4,1,2,2,2,4,4,4,3,1,2,3,3,2]
K -- [4,1,2,1,2,1,2,5,1,1,1,1,4,2,2,1,5,1,3,4,1]

x = ['AAAAA 4 2 1 2 4 2 4 4 5 2 2 1 5 2 4 3 1 1 3 3 5', 'BBB 5 2 1 2 4 5 4 4 1 2 2 2 4 4 4 3 1 2 3 3 2', 'K 4 1 2 1 2 1 2 5 1 1 1 1 4 2 2 1 5 1 3 4 1']
for i in x:
i = i.split(' ')
tmp = {i[0]:[int(items) for items in i[1:]]}
for i, j in tmp.items():
print(f"{i} - {j}")
Output:
AAAAA - [4, 2, 1, 2, 4, 2, 4, 4, 5, 2, 2, 1, 5, 2, 4, 3, 1, 1, 3, 3, 5]
BBB - [5, 2, 1, 2, 4, 5, 4, 4, 1, 2, 2, 2, 4, 4, 4, 3, 1, 2, 3, 3, 2]
K - [4, 1, 2, 1, 2, 1, 2, 5, 1, 1, 1, 1, 4, 2, 2, 1, 5, 1, 3, 4, 1]

you can make a dict, and also use that dict to get your specific output if that's what you want:
assuming your list is called full_list
lists = [sub.split() for sub in full_list]
keys = [l[0] for l in lists]
vals = [list(map(int,l[1:])) for l in lists]
d = {k:v for k,v in zip(keys,vals)}
if desired to get that specific output:
for k,v in d.items():
print(f'{k} -- {v}')
output:
AAAAA -- [4, 2, 1, 2, 4, 2, 4, 4, 5, 2, 2, 1, 5, 2, 4, 3, 1, 1, 3, 3, 5]
BBB -- [5, 2, 1, 2, 4, 5, 4, 4, 1, 2, 2, 2, 4, 4, 4, 3, 1, 2, 3, 3, 2]
K -- [4, 1, 2, 1, 2, 1, 2, 5, 1, 1, 1, 1, 4, 2, 2, 1, 5, 1, 3, 4, 1]

Related

How to replace repeated items in the row of my array with zeros

I am trying to write a code that replaces all rows of three or more continuous values for zeros. so the three threes on the first row should become zero. I wrote this code which in my mind should work but when I execute my code it seems to me that I am stuck in an infinite loop.
import numpy as np
A = np.array([[1, 2, 3, 3, 3, 4],
[1, 3, 2, 4, 2, 4],
[1, 2, 4, 2, 4, 4],
[1, 2, 3, 5, 5, 5],
[1, 2, 1, 3, 4, 4]])
row_nmbr,column_nmbr = (A.shape)
row = 0
column = 0
while column < column_nmbr:
next_col = column + 1
next_col2 = next_col + 1
if A[row][column] == A[row][next_col] and A[row][next_col] == A[row][next_col2]:
A[row][column] = 0
column =+ 1
print(A)
Don't use if-else. It gets messy easily. Here's an approach without if-else.
Iterate over each row, and find unique element and their counts in it.
If an element occurs three or more times, filter that into an array.
Start iteration for each filtered element (val)
Find the indices of val in the given row
Do a groupby on the indices from step 4 to find blocks of contiguous indices.
Check if contiguous indices are three or more in number
If yes, do replacement.
The following sample code is scalable and works for multiple contiguous elements.
from functools import partial
from operator import itemgetter
A = np.array([[3, 3, 5, 3, 3, 3, 5, 5, 5, 6, 6, 5, 5, 5],
[1, 8, 8, 4, 7, 4, 7, 7, 7, 7, 1, 2, 3, 9],
[1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5],
[1, 2, 3, 3, 3, 3, 3, 2, 1, 1, 1, 2, 2, 2],
[1, 2, 1, 3, 4, 4, 9, 8, 8, 8, 8, 9, 9, 8]])
def func1d(row, replacement):
# find and filter elements which occurs three or more times
vals, count = np.unique(row, return_counts=True)
vals = vals[count >= 3]
# Iteration for each filtered element (val)
for val in vals:
# get indices of val from row
indices = (row == val).nonzero()[0]
# find contiguous indices
for k, g in groupby(enumerate(indices), lambda t: t[1] - t[0]):
l = list(map(itemgetter(1), g))
# if grouped indices are three or more, do replacement
if len(l) >=3:
row[l] = replacement
return row
wrapper = partial(func1d, replacement=0)
np.apply_along_axis(wrapper, 1, A)
Output, when compared with A:
# original array
[[3 3 5 3 3 3 5 5 5 6 6 5 5 5]
[1 8 8 4 7 4 7 7 7 7 1 2 3 9]
[1 1 1 2 2 2 3 3 3 4 4 4 4 5]
[1 2 3 3 3 3 3 2 1 1 1 2 2 2]
[1 2 1 3 4 4 9 8 8 8 8 9 9 8]]
# array with replaced values
[[3 3 5 0 0 0 0 0 0 6 6 0 0 0]
[1 8 8 4 7 4 0 0 0 0 1 2 3 9]
[0 0 0 0 0 0 0 0 0 0 0 0 0 5]
[1 2 0 0 0 0 0 2 0 0 0 0 0 0]
[1 2 1 3 4 4 9 0 0 0 0 9 9 8]]
Your loop will be infinite since column will always be 0 and less than column_nmbr.
Do it right like this:
for i in range(row_nmbr):
m, k = np.unique(A[i], return_inverse=True)
val = m[np.bincount(k) > 2]
if len(val) > 0:
aaa = A[i]
aaa[A[i] == val] = 0
print(A)
Output:
[[1 2 0 0 0 4]
[1 3 2 4 2 4]
[1 2 0 2 0 0]
[1 2 3 0 0 0]
[1 2 1 3 4 4]]

Compare 3 columns of a 2-D List and Replace based on conditions

I have a 2-D List as follows:
[
[6 4 4 2 5 5 4 5 4 1 3 5]
[4 3 6 5 4 4 5 1 5 5 2 4]
[2 5 2 0 4 5 4 4 2 3 2 6]
[5 5 4 3 5 4 6 7 3 4 4 4]
[3 5 6 5 6 5 3 5 3 4 7 4]
[4 5 5 4 5 4 7 5 3 5 4 1]
[2 5 3 3 5 3 4 4 3 3 1 3]
[2 5 5 2 5 4 6 2 5 6 2 5]
]
Conditions:
compare column 1,5 and 9 (in steps of 4) - row-wise and process them in the following order
If one of them is zero - do nothing. Go to Step 2
(6,5,4) - none of them zero so go to step 2
If they are all equal - change all of them to zero. If not go Step 3
Take the lowest of the three and subtract each by this minimum
Repeat this with next three elements (2,6,10) until (4,8,12)
How to do efficiently this in python using pandas or numpy or even list operation.
Any help appreciated. Thanks!
You could write a custom function and then apply that functions to every element in the array.
def check_conditions(x):
for i in range(4):
if x[i] == 0 or x[i+4] == 0 or x[i+8] == 0:
continue
elif x[i] == x[i+4] == x[i+8]:
x[i] = 0
x[i+4] = 0
x[i+8] = 0
else:
min_val = min(x[i], x[i+4], x[i+8])
x[i] -= min_val
x[i+4] -= min_val
x[i+8] -= min_val
return x
new_arr = [check_conditions(x) for x in arr]
To get the following result.
print(new_arr)
[[2, 3, 1, 2, 1, 4, 1, 5, 0, 0, 0, 5],
[0, 0, 4, 5, 0, 1, 3, 1, 1, 2, 0, 4],
[0, 2, 0, 0, 2, 2, 2, 4, 0, 0, 0, 6],
[2, 1, 0, 3, 2, 0, 2, 7, 0, 0, 0, 4],
[0, 1, 3, 5, 3, 1, 0, 5, 0, 0, 4, 4],
[1, 1, 1, 4, 2, 0, 3, 5, 0, 1, 0, 1],
[0, 2, 2, 3, 3, 0, 3, 4, 1, 0, 0, 3],
[0, 1, 3, 2, 3, 0, 4, 2, 3, 2, 0, 5]]

Python dataframe repeat column data in each cell as a list

I am trying to repeat the whole data in a column in each each cell of the column.
My code:
df3=pd.DataFrame({
'x':[1,2,3,4,5],
'y':[10,20,30,20,10],
'z':[5,4,3,2,1]
})
df3 =
x y z
0 1 10 5
1 2 20 4
2 3 30 3
3 4 20 2
4 5 10 1
df3['z'] = df['z'].agg(lambda x: list(x))
Present output:
KeyError: 'z'
Expected output:
df=
x y z
0 1 10 [5, 4, 3, 2, 1]
1 2 20 [5, 4, 3, 2, 1]
2 3 30 [5, 4, 3, 2, 1]
3 4 20 [5, 4, 3, 2, 1]
4 5 10 [5, 4, 3, 2, 1]
Another way is to list(df.column.values)
df3.assign(z=[list(df3.z.values)]*len(df3))
x y z
0 5 10 [5, 4, 3, 2, 1]
1 4 20 [5, 4, 3, 2, 1]
2 3 30 [5, 4, 3, 2, 1]
3 2 20 [5, 4, 3, 2, 1]
4 1 10 [5, 4, 3, 2, 1]
Check with
df3['new_z']=[df3.z.tolist()]*len(df3)
More safe
df3['new_z']=[df3.z.tolist() for x in df.index]

Translate reshape from Matlab to Python

I'm using numpy and I don't know how translate this MATLAB code to python:
C = reshape(A(B.',:).', 6, []).';
I think that the only right thing that I did is:
temp=A[B.transpose(),:]
but I don't know how translate all of the rows.
example of matrix:
A =
1 2
1 3
1 4
1 5
1 6
2 3
2 4
2 5
2 6
B =
1 2 3
1 2 4
1 2 5
1 2 6
1 2 7
1 2 8
1 2 9
C =
1 2 1 3 1 4
1 2 1 3 1 5
1 2 1 3 1 6
1 2 1 3 2 3
1 2 1 3 2 4
1 2 1 3 2 5
1 2 1 3 2 6
This looks like an indexing plus reshaping operation; one thing to keep in mind is that numpy is zero-indexed, while matlab is one-indexed. That means you need to index A with B - 1, and then reshape your result as desired. For example:
import numpy as np
A = np.array([[1, 2],
[1, 3],
[1, 4],
[1, 5],
[1, 6],
[2, 3],
[2, 4],
[2, 5],
[2, 6]])
B = np.array([[1, 2, 3],
[1, 2, 4],
[1, 2, 5],
[1, 2, 6],
[1, 2, 7],
[1, 2, 8],
[1, 2, 9]])
C = A[B - 1].reshape(B.shape[0], -1)
The result is:
>>> C
array([[1, 2, 1, 3, 1, 4],
[1, 2, 1, 3, 1, 5],
[1, 2, 1, 3, 1, 6],
[1, 2, 1, 3, 2, 3],
[1, 2, 1, 3, 2, 4],
[1, 2, 1, 3, 2, 5],
[1, 2, 1, 3, 2, 6]])
One potentially confusing piece: the -1 in the reshape method is a marker that indicates numpy should calculate the appropriate dimension to preserve the size of the array.

how to separate columns of itertools.product to matrices/arrays

I have a dataset of like 3 items e.g. [1,2,3]
I want to find the product of it with 3 repeats and then separate them into 3 datasets like this (it should be vertical actually):
[1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3]
[1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3]
[1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3]
I noticed that in python I can use iteration.product for finding products as:
data_prod=itertools.product(data,repeat=3)
now my question is how can I convert each column of the result (which the datatype is itertools.product) to 3 new datasets as shown in above example?
Use zip(*..) to turn columns into rows:
dataset1, dataset2, dataset3 = zip(*itertools.product(data,repeat=3))
Demo:
>>> zip(*itertools.product(data,repeat=3))
[(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3), (1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3), (1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3)]
>>> dataset1, dataset2, dataset3 = zip(*itertools.product(data,repeat=3))
>>> dataset1
(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3)
>>> dataset2
(1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3)
>>> dataset3
(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3)
An alternate way, for display purposes, still using itertools.product:
import itertools
import pandas as pd
cols=['series1', 'series2', 'series3']
originDataset = [1,2,3]
data_prod = lambda x: list(itertools.product(x, repeat=3))
df1 = pd.DataFrame(originDataset, columns=['OriginalDataSet'])
df2 = pd.DataFrame(data_prod(originDataset), columns=cols)
print df1
print '-'*80
print df2
print '-'*80
series1, series2, series3 = df2.T.values
print series1
print series2
print series3
Output:
OriginalDataSet
0 1
1 2
2 3
--------------------------------------------------------------------------------
series1 series2 series3
0 1 1 1
1 1 1 2
2 1 1 3
3 1 2 1
4 1 2 2
5 1 2 3
6 1 3 1
7 1 3 2
8 1 3 3
9 2 1 1
10 2 1 2
11 2 1 3
12 2 2 1
13 2 2 2
14 2 2 3
15 2 3 1
16 2 3 2
17 2 3 3
18 3 1 1
19 3 1 2
20 3 1 3
21 3 2 1
22 3 2 2
23 3 2 3
24 3 3 1
25 3 3 2
26 3 3 3
--------------------------------------------------------------------------------
[1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3]
[1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3]
[1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3]
I hope it helps to, at the same time, learn how to use Pandas

Categories