Compare 3 arrays to find patterns and format output - python

I have 3 arrays with same customer names in different orders, what I am trying to do and the following:
1 - Take the name of customers and compare to return when it is the same, thus solving the problem of random order;
2 - After obtaining this comparison, the output should be as follows:
Result: CLIENT1, '2', '3', '2', '3', '2', '3'
...
The output should look like this: Client Name, value contained within array1 for this client name, value contained within array2 for this client name and value contained within array3 for this client name
Problem: I can not perform this operation with 3 arrays, only 2, besides, I am having difficulty formatting the output in the established pattern
ARRAYS
######################################################################################
# Create arrays
######################################################################################
array1 = [['CLIENT1', '2', '3'],['CLIENT2', '3', '4'],['CLIENT3', '4', '5']]
array2 = [['CLIENT3', '2', '3'],['CLIENT2', '3', '4'],['CLIENT1', '4', '5']]
array3 = [['CLIENT2', '2', '3'],['CLIENT1', '3', '4'],['CLIENT3', '4', '5']]
SEARCH
######################################################################################
# Check and align
######################################################################################
for line1 in array1:
for line2 in array2:
if line1[0].upper().__contains__(line2[0].upper()):
# print of results

I suggest using an intermediate dictionary to store the values of every processed client. Using this structure you can store any information from the clients (with repetitions or without) and you can easily parse the output.
Here is the code:
# Initialize values
array1 = [['CLIENT1', '2', '3'], ['CLIENT2', '3', '4'], ['CLIENT3', '4', '5']]
array2 = [['CLIENT3', '2', '3'], ['CLIENT2', '3', '4'], ['CLIENT1', '4', '5']]
array3 = [['CLIENT2', '2', '3'], ['CLIENT1', '3', '4'], ['CLIENT3', '4', '5']]
# Initialize a dictionary with key = client name, value = list of client entries
result = {}
# Add values from array1
for client_info in array1:
# Parse current entry
client_name = client_info[0]
client_values = client_info[1:]
# Add previous values if exitant
if client_name in result.keys():
client_values.extend(result[client_name])
# Update clients dictionary
result[client_name] = client_values
# Add values from array2
for client_info in array2:
# Parse current entry
client_name = client_info[0]
client_values = client_info[1:]
# Add previous values if exitant
if client_name in result.keys():
client_values.extend(result[client_name])
# Update clients dictionary
result[client_name] = client_values
# Add values from array3
for client_info in array3:
# Parse current entry
client_name = client_info[0]
client_values = client_info[1:]
# Add previous values if exitant
if client_name in result.keys():
client_values.extend(result[client_name])
# Update clients dictionary
result[client_name] = client_values
# Print result information
for client_name, client_values in result.items():
print("Result: " + str(client_name) + ", " + str(client_values))
And the obtained output:
Result: CLIENT1, ['3', '4', '4', '5', '2', '3']
Result: CLIENT2, ['2', '3', '3', '4', '3', '4']
Result: CLIENT3, ['4', '5', '2', '3', '4', '5']
If you are willing to display ONLY the clients that appear on the 3 lists, you can avoid updating the result dictionary on the array2 and array3 loops when the client name is not on the list.

array1 = [['CLIENT1', '2', '3'],['CLIENT2', '3', '4'],['CLIENT3', '4', '5']]
array2 = [['CLIENT3', '2', '3'],['CLIENT2', '3', '4'],['CLIENT1', '4', '5']]
array3 = [['CLIENT2', '2', '3'],['CLIENT1', '3', '4'],['CLIENT3', '4', '5']]
def search(key, arrays):
result = []
for array in arrays:
for lst in array:
if lst[0] == key:
result.extend(lst[1:])
return 'Result: {key}, {values}'.format(key=key, values=', '.join(result))
print(search('CLIENT1', (array1, array2, array3)))
Output:
Result: CLIENT1, 2, 3, 4, 5, 3, 4

Related

Converting string to 2D list

def convert_to_list(VertexList):
VerticesList = []
items = VertexList.split(';')
for item in items:
i = item.split(',')
SubList = []
for item in i:
SubList.append(item)
VerticesList.append(SubList)
return VerticesList
This code converts string in this format to a 2D list. However, I am sure it can be optimized.
Input -> '1,2,4,5,6,7;2,3,4,5,6,7,8;1,2,4,5,6,8'
Output -> [['1', '2', '4', '5', '6', '7'], ['2', '3', '4', '5', '6', '7', '8'], ['1', '2', '4', '5', '6', '8']]
Use a comprehension.
inp = '1,2,4,5,6,7;2,3,4,5,6,7,8;1,2,4,5,6,8'
print([s.split(',') for s in inp.split(';')])
Results in
[['1', '2', '4', '5', '6', '7'], ['2', '3', '4', '5', '6', '7', '8'], ['1', '2', '4', '5', '6', '8']]
This is smaller, easier to read code, which is part of the optimization I expect you were looking for. It doesn't loop through things any fewer times, but it's executing fewer assignments, using less temporary variabels, and making fewer function calls (i.e. append()). Maybe some of those calls are being made behind the scenes in the comprehension, but you should be taking advantage of whatever optimizations Python does to its comprehensions in terms of what functions calls are made.
--update--
Check out this answer for a performance analysis of the OP and this answer.
-- update 2 --
To convert all strings to int, you can use map or another comprehension.
inp = '1,2,4,5,6,7;2,3,4,5,6,7,8;1,2,4,5,6,8'
print([list(map(int, s.split(','))) for s in inp.split(';')])
or
inp = '1,2,4,5,6,7;2,3,4,5,6,7,8;1,2,4,5,6,8'
print([[int(c) for c in s.split(',')] for s in inp.split(';')])
This is not a solution, but only a comparison of the optimality of the above codes in terms of actual performance:
from timeit import Timer
code1 = """\
def convert_to_list(VertexList):
VerticesList = []
items = VertexList.split(';')
for item in items:
i = item.split(',')
SubList = []
for item in i:
SubList.append(item)
VerticesList.append(SubList)
return VerticesList
inp = '1,2,4,5,6,7;2,3,4,5,6,7,8;1,2,4,5,6,8'
convert_to_list(inp)
"""
code2 = """\
inp = '1,2,4,5,6,7;2,3,4,5,6,7,8;1,2,4,5,6,8'
out = [s.split(',') for s in inp.split(';')]
"""
t = Timer(stmt=code1)
time1 = t.timeit() # 1000000 iteration by default
print(f"Original time:{round(time1, 6)} sec.")
t = Timer(stmt=code2)
time2 = t.timeit() # 1000000 iteration by default
print(f"New time: {round(time2, 6)} sec.")
print(f'New solution faster in = {round(time1 / time2, 1)} times')
Output:
Original time:1.812856 sec.
New time: 0.741987 sec.
New solution faster in = 2.4 times

Can't modify list of lists to have a customized length

I've written a script to make the length of all the lists at least 3 no matter what are their individual length at this moment.
Currently the list of lists I have:
item_list = [['1','2'],['3','4','5'],['2','4','5'],['1']]
I've tried with:
item_list = [['1','2'],['3','4','5'],['2','4','5'],['1']]
for item in item_list:
if len(item)<3:
item.extend([""])
elif len(item)<2:
item.extend([""]*2)
print(item_list)
Output I'm getting:
[['1', '2', ''], ['3', '4', '5'], ['2', '4', '5'], ['1', '']]
Desired output:
[['1', '2', ''], ['3', '4', '5'], ['2', '4', '5'], ['1', '','']]
How can I make the length of all the lists at least 3 irrespective of their current length?
for item in item_list:
item += ['']*(3-len(item))
You have written the order in reverse
item_list = [['1','2'],['3','4','5'],['2','4','5'],['1']]
for item in item_list:
if len(item)<2:
item.extend([""]*2)
elif len(item)<3:
item.extend([""])
print(item_list)

python slice set in list

i would like to slice a set within a list, but every time i do so, i get an empty list in return.
what i try to accomplish (maybe there is an easier way):
i got a list of sets
each set has 5 items
i would like to compare a new set against the list (if the set already exists in the list)
the first and the last item in the set is irrelevant for the comparison, so only the positions 2-4 are valid for the search of already existing sets
here is my code:
result_set = ['1', '2', '3', '4', '5']
result_matrix = []
result_matrix.append(result_set)
slicing the set is no problem:
print result_set[1:4]
['2', '3', '4']
print result_matrix[:][1:4]
[]
i would expect:
[['2', '3', '4']]
I think this is what you want to do:
>>> target_set = ['2', '3', '4']
>>> any([l for l in result_matrix if target_set == l[1:-1]])
True
>>> target_set = ['1', '2', '3']
>>> any([l for l in result_matrix if target_set == l[1:-1]])
False
Generalising and making that a function:
def is_set_in_matrix(target_set, matrix):
return any(True for l in matrix if list(target_set) == l[1:-1])
>>> result_matrix = [['1', '2', '3', '4', '5']]
>>> is_set_in_matrix(['1', '2', '3'], result_matrix)
False
>>> is_set_in_matrix(['2', '3', '4'], result_matrix)
True
# a quirk - it also works with strings...`
>>> s = '234'
>>> is_set_in_matrix(s, result_matrix)
True
Note that I have used l[1:-1] to ignore the first and last elements of the "set" in the comparison. This is more flexible should you ever need sets of different lengths.
>>> result_set = ['1', '2', '3', '4', '5']
>>> print result_set[1:4]
['2', '3', '4']
>>> result_matrix.append(result_set[1:4])
>>> result_matrix
[['2', '3', '4']]
Using result_matrix[:] returns the whole matrix as it is. You need to treat the result you want as a part of the array.
>>> result_matrix.append(result_set)
>>> result_matrix[:]
[['1', '2', '3', '4']]
>>> result_matrix[:][0]
['1', '2', '3', '4']
>>> result_matrix[0][1:4]
['2', '3', '4']
Also, as pointed out by falsetru:
>>> result_matrix.extend(result_set)
>>> result_matrix
['1', '2', '3', '4']
>>> result_matrix[1:4]
['2', '3', '4']

Randomly extract x items from a list using python

Starting with two lists such as:
lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
I want to have the user input how many items they want to extract, as a percentage of the overall list length, and the same indices from each list to be randomly extracted. For example say I wanted 50% the output would be
newLstOne = ['8', '1', '3', '7', '5']
newLstTwo = ['8', '1', '3', '7', '5']
I have achieved this using the following code:
from random import randrange
lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
LengthOfList = len(lstOne)
print LengthOfList
PercentageToUse = input("What Percentage Of Reads Do you want to extract? ")
RangeOfListIndices = []
HowManyIndicesToMake = (float(PercentageToUse)/100)*float(LengthOfList)
print HowManyIndicesToMake
for x in lstOne:
if len(RangeOfListIndices)==int(HowManyIndicesToMake):
break
else:
random_index = randrange(0,LengthOfList)
RangeOfListIndices.append(random_index)
print RangeOfListIndices
newlstOne = []
newlstTwo = []
for x in RangeOfListIndices:
newlstOne.append(lstOne[int(x)])
for x in RangeOfListIndices:
newlstTwo.append(lstTwo[int(x)])
print newlstOne
print newlstTwo
But I was wondering if there was a more efficient way of doing this, in my actual use case this is subsampling from 145,000 items. Furthermore, is randrange sufficiently free of bias at this scale?
Thank you
Q. I want to have the user input how many items they want to extract, as a percentage of the overall list length, and the same indices from each list to be randomly extracted.
A. The most straight-forward approach directly matches your specification:
percentage = float(raw_input('What percentage? '))
k = len(data) * percentage // 100
indicies = random.sample(xrange(len(data)), k)
new_list1 = [list1[i] for i in indicies]
new_list2 = [list2[i] for i in indicies]
Q. in my actual use case this is subsampling from 145,000 items. Furthermore, is randrange sufficiently free of bias at this scale?
A. In Python 2 and Python 3, the random.randrange() function completely eliminates bias (it uses the internal _randbelow() method that makes multiple random choices until a bias-free result is found).
In Python 2, the random.sample() function is slightly biased but only in the round-off in the last of 53 bits. In Python 3, the random.sample() function uses the internal _randbelow() method and is bias-free.
Just zip your two lists together, use random.sample to do your sampling, then zip again to transpose back into two lists.
import random
_zips = random.sample(zip(lstOne,lstTwo), 5)
new_list_1, new_list_2 = zip(*_zips)
demo:
list_1 = range(1,11)
list_2 = list('abcdefghij')
_zips = random.sample(zip(list_1, list_2), 5)
new_list_1, new_list_2 = zip(*_zips)
new_list_1
Out[33]: (3, 1, 9, 8, 10)
new_list_2
Out[34]: ('c', 'a', 'i', 'h', 'j')
The way you are doing it looks mostly okay to me.
If you want to avoid sampling the same object several times, you could proceed as follows:
a = len(lstOne)
choose_from = range(a) #<--- creates a list of ints of size len(lstOne)
random.shuffle(choose_from)
for i in choose_from[:a]: # selects the desired number of items from both original list
newlstOne.append(lstOne[i]) # at the same random locations & appends to two newlists in
newlstTwo.append(lstTwo[i]) # sequence

List circulation in Python for Project Euler 37

So, I was doing Project Euler 37
I need to circulate a list
input: 2345 # converted to list inside function
expected output: [[3,4,5,2],[4,5,2,3],[5,2,3,4],[2,3,4,5]]
Here is my function for that
def circulate(n): #2345
lst=list(str(n)) #[2,3,4,5]
res=[]
for i in range(len(lst)):
temp=lst.pop(0)
lst.append(temp)
print lst #print expected list
res.append(lst) #but doesn't append as expected
return res
print circulate(2345)
My output is:
['3', '4', '5', '2']
['4', '5', '2', '3']
['5', '2', '3', '4']
['2', '3', '4', '5']
[['2', '3', '4', '5'], ['2', '3', '4', '5'], ['2', '3', '4', '5'], ['2', '3', '4', '5']]
The function prints lst correct every time, but doesn't append as expected.
What I am doing wrong?
You need to append copies of your list to res:
res.append(lst[:])
You were appending a reference to the list being altered instead; all references reflect the changes made to the one object.
You may want to look at collections.deque() instead; this double-ended list object supports efficient rotation with a .rotate() method:
from collections import deque
def circulate(n):
lst = deque(str(n))
res = []
for i in range(len(lst)):
lst.rotate(1)
res.append(list(lst))
return res

Categories