Turning numpy array into list of lists without zip - python

I want to turn my array which consists out of 2 lists into a ranked list.
Currently my code produces :
[['txt1.txt' 'txt2.txt' 'txt3.txt' 'txt4.txt' 'txt5.txt' 'txt6.txt'
'txt7.txt' 'txt8.txt']
['0.13794219565502694' '0.024652340886571225' '0.09806335128916213'
'0.07663118536707426' '0.09118273488073968' '0.06278926571143634'
'0.05114729750522118' '0.02961812647701087']]
I want to make it so that txt1.txt goes with the first value, txt2 goes with the second value etc.
So something like this
[['txt1.txt', '0.13794219565502694'], ['txt2.txt', '0.024652340886571225']... etc ]]
I do not want it to become tuples by using zip.
My current code:
def rankedmatrix():
matrix = numpy.array([names,x])
ranked_matrix = sorted(matrix.tolist(), key=lambda score: score[1], reverse=True)
print(ranked_matrix)
Names being :
names = ['txt1.txt', 'txt2.txt', 'txt3.txt', 'txt4.txt', 'txt5.txt', 'txt6.txt', 'txt7.txt', 'txt8.txt']
x being:
x = [0.1379422 0.01540234 0.09806335 0.07663119 0.09118273 0.06278927
0.0511473 0.02961813]
Any help is appreciated.

You can get the list of lists with zip as well:
x = [['txt1.txt', 'txt2.txt', 'txt3.txt', 'txt4.txt', 'txt5.txt', 'txt6.txt'
'txt7.txt', 'txt8.txt'], ['0.13794219565502694', '0.024652340886571225', '0.09806335128916213',
'0.07663118536707426', '0.09118273488073968', '0.06278926571143634',
'0.05114729750522118', '0.02961812647701087']]
res = [[e1, e2] for e1, e2 in zip(x[0], x[1])]
print(res)
Output:
[['txt1.txt', '0.13794219565502694'], ['txt2.txt', '0.024652340886571225'], ['txt3.txt', '0.09806335128916213'], ['txt4.txt', '0.07663118536707426'], ['txt5.txt', '0.09118273488073968'], ['txt6.txttxt7.txt', '0.06278926571143634'], ['txt8.txt', '0.05114729750522118']]

You can use map to convert the tuple to list.
list(map(list, zip(names, x)))
[['txt1.txt', 0.1379422],
['txt2.txt', 0.01540234],
['txt3.txt', 0.09806335],
['txt4.txt', 0.07663119],
['txt5.txt', 0.09118273],
['txt6.txt', 0.06278927],
['txt7.txt', 0.0511473],
['txt8.txt', 0.02961813]]

Related

Extracting Unique RGB Values from List

I created a list of RGB values for an image (let's say it's 3D_image, composed of 3D_image_slice). I want to extract the unique RGB values from it, but I'm running into problems.
rgb_values_unique = []
for 3D_image_slice in 3D_image:
for y in range(3D_image_slice.shape[0]):
for x in range(3D_image_slice.shape[1]):
if 3D_image_slice[y, x] not in rgb_values_unique:
rgb_values_unique.append(3D_image_slice[y, x])
I was thinking of using np.unique, but that doesn't apply to lists. Is there another way to find unique values within a list?
You have a couple of easy options; one is to create a unique list of strings (I don't love this since it changes the datatype, but it's look something like this):
rgb_values_unique = set()
for 3D_image_slice in 3D_image:
for y in range(3D_image_slice.shape[0]):
for x in range(3D_image_slice.shape[1]):
rgb_values_unique.add("-".join(3D_image_slice[y, x])
print(rgb_values_unique)
# {"r0-g0-b0", "r1-g1-b1", ...}
#which you could convert back into numbers like this:
result = [[int(j) for j in i.split("-")] for i in rgb_values_unique]
What I'd probably do is leverage the uniqueness of a dictionary:
rgb_values_unique = {}
for 3D_image_slice in 3D_image:
for y in range(3D_image_slice.shape[0]):
for x in range(3D_image_slice.shape[1]):
r,g,b = 3D_image_slice[y, x]
rgb_values_unique .setdefault(r, {}).setdefault(g, []).append(b)
print(rgb_values_unique)
# {r0: {g0: [b0, b1, b2]}, {g1: ...
Which you can then turn into a unique listing as follows:
result = [(r,g,b) for r,v in rgb_values_unique.items() for g, b_list in v.items() for b in b_list]
u can use sets to save only unique values:
rgb_values_unique = {}
for 3D_image_slice in 3D_image:
for y in range(3D_image_slice.shape[0]):
for x in range(3D_image_slice.shape[1]):
rgb_values_unique |= 3D_image_slice[y, x]

How to sort a list of strings by frequency?

I have a list of files
example_list = [7.gif, 8.gif, 123.html]
There are over 700k elements and I need to sort them by frequency to see the most accessed file and least accessed file.
for i in resl:
if resl.count(i) > 500:
resl2.append(i)
print(resl2)
When I run this it never compiles. And i have tried other methods but no results.
Your algorithm is unecessarily quadratic time. The following is linear
from collections import Counter
resl2 = [k for k,v in Counter(resl).items() if v > 500]
If you need them sorted, then do something like
resl2 = [(k,v) for k,v in Counter(resl).items() if v > 500]
resl2.sort(key=lambda kv: kv[1])
resl2 = [k for k,v in resl2]
From your comment:
I just need to find out which file occurs the most.
So:
statistics.mode(example_list)
Note that i represents an element from the array and not an integer
for i in resl:
if resl.count(i) > 500:
resl2.append(i)
print(resl2)
Change it to this.
for i in range(0,len(resl)-1):
if i > 500:
resl2.append(resl[i])
print(resl2)
You can do this trick using a set ;)
Here you have a minimal example for a list of files and showing when it appears 2 times:
files = ['10.gif', '8.gif', '0.gif', '0.doc', '0.gif', '0.gif', '0.tmp', '0.doc', '0.gif']
file_set = set(files)
files_freq = [0]*len(file_set)
for n,file in enumerate(file_set):
files_freq[n] = files.count(file)
sorted_list = [f for n,f in sorted(zip(files_freq, file_set), key=lambda x: x[0], reverse=True) if n >= 2]
print(sorted_list)
and the output will be: ['0.gif', '0.doc']
The set will filter the list only to unique occurrences of each file and the loop will calculate the count of each file.
After, the spooky list comprehension is the trick!
[f for n,f in sorted(zip(files_freq, file_set), key=lambda x: x[0], reverse=True) if n >= 2]
This will create a list only with the files which appeared 2 or more times, then the key part forces the sorted function to use the first files_freq from zip(files_freq, file_set) to do the sorting and reverse is to sort the list in descendant order, showing the highest frequencies before.

Taking two values from two list (Random Order) of tuples and multiplying

I have two lists and they are lists of tuples.
For example
List1 = [('zaidan', 0.0013568521031207597),('zimmerman', 0.0013568521031207597), ('ypa', 0.004070556309362279)]
List2 = [('zimmerman', 0.0013568521031207597), ('ypa', 0.004070556309362279), ('zaidan', 0.0013568521031207597)]
If the items were in the same order I could use the following code to multiply the two values:
val = [(t1, v1*v2) for (t1, v1), (t2, v2) in zip(tf,idf)]
But my issue is the order of one the lists outputs randomly so the code doesn't work. So essentially I need to see if the word in one list matches the word in the other and then multiply to get an output in a similar way as the list of tuples.
This question excellently demonstrates the advantages of the dictionary data structure and how your problem could benefit from it. So first, we convert your list of tuples to dictionaries (dict-calls) and then you "combine" the two dicts as per your requirement to get the desired result.
lst1 = [('zaidan', 0.0013568521031207597),('zimmerman', 0.0013568521031207597), ('ypa', 0.004070556309362279)]
lst2 = [('zimmerman', 0.0013568521031207597), ('ypa', 0.004070556309362279), ('zaidan', 0.0013568521031207597)]
dct1 = dict(lst1)
dct2 = dict(lst2)
res = {k: v * dct2.get(k, 1) for k, v in dct1.items()}.items()
which produces:
dict_items([('zaidan', 1.8410476297432288e-06), ('zimmerman', 1.8410476297432288e-06), ('ypa', 1.656942866768906e-05)])
And if the dict_item data type is confusing, you can always cast it to a vanilla-list.
res = list(res)
print(res)
# [('zaidan', 1.8410476297432288e-06), ('zimmerman', 1.8410476297432288e-06), ('ypa', 1.656942866768906e-05)]
i would tell you the easiest solution if your data are the same.
just sort it :
ls1 = sorted(ls1, key=lambda tup: tup[0])
ls2 = sorted(ls2, key=lambda tup: tup[0])
val = [(t1, v1*v2) for (t1, v1), (t2, v2) in zip(ls1,ls2)]
If, for any reason, you do not want to use dictionary (although it is a superior solution) but want to do this with lists and tuples, what you are looking for is looping through the lists and checking for equality:
x = [('zaidan', 0.0013568521031207597),('zimmerman', 0.0013568521031207597), ('ypa', 0.004070556309362279)]
y = [('zimmerman', 0.0013568521031207597), ('ypa', 0.004070556309362279), ('zaidan', 0.0013568521031207597)]
z = []
for item in x:
for _item in y:
if item[0] == _item[0]
z.append((item[0], item[1]*_item[1]))
At the end, z will be a list of tuples with the original string at the 0 index and the result of multiplication at the 1 index.

coupling str elements from a list to a tuple list

I have the following list:
lines
['line_North_Mid', 'line_South_Mid',
'line_North_South', 'line_Mid_South',
'line_South_North','line_Mid_North' ]
I would like to couple them in a tuple list as follows, with respect to their names:
tuple_list
[('line_Mid_North', 'line_North_Mid'),
('line_North_South', 'line_South_North'),
('line_Mid_South', 'line_South_Mid')]
I thought maybe I could do a string search in the elements of the lines but it wont be efficient. Is there a better way to order lines elements in a way which would look like tuple_list
Paring Criteria:
If the both elements have the same Area_name: ('North', 'Mid', 'South')
E.g.: 'line_North_Mid' should be coupled with 'line_Mid_North'
Try this:
from itertools import combinations
tuple_list = [i for i in combinations(lines,2) if i[0].split('_')[1] == i[1].split('_')[2] and i[0].split('_')[2] == i[1].split('_')[1]]
or I think this is better:
[i for i in combinations(lines,2) if i[0].split('_')[1:] == i[1].split('_')[1:][::-1]]
An order-agnostic O(n) solution is possible using collections.defaultdict. The idea is to use as our dictionary keys the last 2 components of your strings delimited by '_', appending values from your input list. Then extract values and convert to a list of tuples.
from collections import defaultdict
L = ['line_North_Mid', 'line_South_Mid',
'line_North_South', 'line_Mid_South',
'line_South_North', 'line_Mid_North']
dd = defaultdict(list)
for item in L:
dd[frozenset(item.rsplit('_', maxsplit=2)[1:])].append(item)
res = list(map(tuple, dd.values()))
# [('line_North_Mid', 'line_Mid_North'),
# ('line_South_Mid', 'line_Mid_South'),
# ('line_North_South', 'line_South_North')]
You can use the following list comprehension:
lines = ['line_Mid_North', 'line_North_Mid',
'line_North_South', 'line_South_North',
'line_Mid_South', 'line_South_Mid']
[(j,i) for i in lines for j in lines if j not in i
if set(j.split('_')[1:]) < set(i.split('_'))][::2]
[('line_Mid_North', 'line_North_Mid'),
('line_North_South', 'line_South_North'),
('line_Mid_South', 'line_South_Mid')]
I suggest you have a function that returns the same key for string that are supposed to be together (a grouping-key).
def key(s):
# ignore first part and sort other 2 parts, so they will always be in same order
_, part_1, part_2 = s.split('_')
return tuple(sorted([part_1, part_2]))
The you have to use some grouping method; I used defaultdict for example:
import collections
lines = [
'line_North_Mid', 'line_South_Mid',
'line_North_South', 'line_Mid_South',
'line_South_North','line_Mid_North',
]
dd = collections.defaultdict(list)
for s in lines:
dd[key(s)].append(s) # those with same key get grouped
print(list(tuple(v) for v in dd.values()))
# [
# ('line_North_Mid', 'line_Mid_North'),
# ('line_South_Mid', 'line_Mid_South'),
# ('line_North_South', 'line_South_North'),
# ]

I want to convert the categorical variable to numerical in Python

I have a dataframe having categorical variables. I want to convert them to the numerical using the following logic:
I have 2 lists one contains the distinct categorical values in the column and the second list contains the values for each category. Now i need to map these values in place of those categorical values.
For Eg:
List_A = ['A','B','C','D','E']
List_B = [3,2,1,1,2]
I need to replace A with 3, B with 2, C and D with 1 and E with 2.
Is there any way to do this in Python.
I can do this by applying multiple for loops but I am looking for some easier way or some direct function if there is any.
Any help is very much appreciated, Thanks in Advance.
Create a mapping dict
List_A = ['A','B','C','D','E',]
List_B = [3,2,1,1,2]
d=dict(zip(List_A, List_B))
new_list=['A','B','C','D','E','A','B']
new_mapped_list=[d[v] for v in new_list if v in d]
new_mapped_list
Or define a function and use map
List_A = ['A','B','C','D','E',]
List_B = [3,2,1,1,2]
d=dict(zip(List_A, List_B))
def mapper(value):
if value in d:
return d[value]
return None
new_list=['A','B','C','D','E','A','B']
map(mapper,new_list)
Suppose df is your data frame and "Category" is the name of the column holding your categories:
df[df.Category == "A"] = 3,2, 1, 1, 2
df[(df.Category == "B") | (df.Category == "E") ] = 2
df[(df.Category == "C") | (df.Category == "D") ] = 1
If you only need to replace values in one list with the values of other and the structure is like the one you say. Two list, same lenght and same position, then you only need this:
list_a = []
list_a = list_b
A more convoluted solution would be like this, with a function that will create a dictionary that you can use on other lists:
# we make a function
def convert_list(ls_a,ls_b):
dic_new = {}
for letter,number in zip(ls_a,ls_b):
dic_new[letter] = number
return dic_new
This will make a dictionary with the combinations you need. You pass the two list, then you can use that dictionary on other list:
List_A = ['A','B','C','D','E']
List_B = [3,2,1,1,2]
dic_new = convert_list(ls_a, ls_b)
other_list = ['a','b','c','d']
for _ in other_list:
print(dic_new[_.upper()])
# prints
3
2
1
1
cheers
You could use a solution from machine learning scikit-learn module.
OneHotEncoder
LabelEncoder
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html
The pandas "hard" way:
https://stackoverflow.com/a/29330853/9799449

Categories