Python Splitting String and Sorting Alphabetically - python

Can somebody please help me to create a python program whereby the unsorted list is split up into groups of 2, arranged alphabetically within their groups of two. The program should then create a new list in alphabetical order by taking the next greatest letter from the correct pair. Please don't tell me to do this in a different way as my method must take place as is written above. Thanks :)
unsorted = ['B', 'D', 'A', 'G', 'F', 'E', 'H', 'C']
n = 4
num = float(len(unsorted))/n
l = [ unsorted [i:i + int(num)] for i in range(0, (n-1)*int(num), int(num))]
l.append(unsorted[(n-1)*int(num):])
print(l)
complete = unsorted.split()
print(complete)

If I understand correctly, you are trying to turn unsorted into the following list:
['D', 'G', 'F', 'H']
If that is the case, I have modified your code so that it produces the correct output.
unsorted = ['B', 'D', 'A', 'G', 'F', 'E', 'H', 'C']
n = 4
num = float(len(unsorted))/n
l = [ unsorted [i:i + int(num)] for i in range(0, (n-1)*int(num), int(num))]
l.append(unsorted[(n-1)*int(num):])
# This part has been added in. It sorts each sublist,
# then takes the second element (the character further along the alphabet)
for i in range(len(l)):
l[i] = sorted(l[i])[1]
print(l)

Assuming you are looking for ['D', 'G', 'F', 'H'], I think the following code is similar to yours but a little clearer :
a = ['B', 'D', 'A', 'G', 'F', 'E', 'H', 'C']
Iterate over pairs of items in list, a[i:i+2], using a list comprehension, and sort the items in each pair in a reversed alphabetic order using sorted(list,reverse=True):
groups = [ sorted(a[i:i+2],reverse=True) for i in range(0,len(a),2)]
Iterate over the sorted groups, selecting the first item of a pair
result = [i[0] for i in groups]

Related

Python rearrange list based on another list

I want to rearrange a list based on another list which have common elements between them.
my list = ['q','s','b','f','l','c','x','a']
base_list = ['z','a','b','c']
Above lists have common 'a','b' and 'c' as common elements.the expected outcome for is as below
my_result = ['a','b','c','q','s','f','l','x']
Thanks in Advance
Sky
my_list = ['q','s','b','f','l','c','x','a']
base_list = ['z','a','b','c']
res1=[x for x in base_list if x in my_list] # common elements
res2=[x for x in my_list if x not in res1] #
res3=res1+res2
Output :
['a', 'b', 'c', 'q', 's', 'f', 'l', 'x']
Create a custom key for sorted as shown in this document. Set the value arbitrarily high for the letters that don't appear in the base_list so they end up in the back. Since sorted is considered stable those that aren't in the base_list will remain untouched in terms of original order.
l = ['q','s','b','f','l','c','x','a']
base_list = ['z','a','b','c']
def custom_key(letter):
try:
return base_list.index(letter)
except ValueError:
return 1_000
sorted(l, key=custom_key)
['a', 'b', 'c', 'q', 's', 'f', 'l', 'x']
A (probably non optimal) way:
>>> sorted(my_list, key=lambda x: base_list.index(x) if x in base_list
else len(base_list)+1)
['a', 'b', 'c', 'q', 's', 'f', 'l', 'x']

Python inserting element to a list varying with each iteration

I am trying to insert an element in the list at multiple instances. But by doing this, the length of the list is constantly changing. So, it is not reaching the last element.
my_list = ['a', 'b', 'c', 'd', 'e', 'a']
aq = len(my_list)
for i in range(aq):
if my_list[i] == 'a':
my_list.insert(i+1, 'g')
aq = aq+1
print(my_list)
The output I am getting is -
['a', 'g', 'b', 'c', 'd', 'e', 'a']
The output I am trying to get is -
['a', 'g', 'b', 'c', 'd', 'e', 'a', 'g']
How can I get that?
Changing aq in the loop does not change the range. That created an iterator when you entered the loop, and that iterator won't change. There are two ways to do this. The easy way is to build a new list:
newlist = []
for c in my_list:
newlist.append(c)
if c == 'a':
newlist.append('g')
The trickier way is to use .find() to find the next instance of 'a' and insert a 'g' after it, then keep searching for the next one.
Here is a nice way to write it using the built-in itertools.chain.from_iterable:
from itertools import chain
my_list = ['a', 'b', 'c', 'd', 'e', 'a']
my_list = list(chain.from_iterable((x, "g") if x == "a" else x for x in my_list))
# ['a', 'g', 'b', 'c', 'd', 'e', 'a', 'g']
Here, every occurance of "a" is replaced with "a", "g" in the list, otherwise the elements are left alone.

Python array value unexpectedly changes after function call

I have a small function which uses one list to populate another. For some reason, the source list gets modified. I don't have a single line that manipulates the source list arr. I am probably missing the way Python deals with scope of variables, lists. My expected output is for the list arr to remain the same after the function call.
numTestRows = 5
m = 2
def getTestData():
data['test'] = []
size_c = len(arr)
for i in range(numTestRows):
data['test'].append(arr[i%size_c])
for j in range(m):
data['test'][i].append('xyz')
#just a 2x5 str matrix
arr = [['a', 'b', 'c', 'd', 'e'], ['f', 'g', 'h', 'i', 'j']]
print('Array before: ')
print( arr)
data = {}
getTestData()
print('Array after: ')
print( arr)
Output
Array before:
[['a', 'b', 'c', 'd', 'e'], ['f', 'g', 'h', 'i', 'j']]
Array after:
[['a', 'b', 'c', 'd', 'e', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz', 'xyz'], ['f', 'g', 'h', 'i', 'j', 'xyz', 'xyz', 'xyz', 'xyz']]
You've mis-handled the references in your list of lists (not a matrix). Perhaps if we break this down a little more, you can see what's happening. Start your main program with the two char lists as separate variables:
left = ['a', 'b', 'c', 'd', 'e']
right = ['f', 'g', 'h', 'i', 'j']
arr = [left, right]
Now, look at what happens within your function at the critical lines. On this first iteration, size_c is 2, i is 0 ...
data['test'].append(arr[i%size_c])
This will append arr[0] to data[test], which started as an empty list. Now for the critical part: arr[0] is not a new list; rather, it's a reference to the list we now know as left in the main program. There is only one copy of this list.
Now, when we get into the next loop, we hit the statement:
data['test'][i].append('xyz')
data['test'][i] is a reference to the same list as left ... and this explains the appending to the original list.
You can easily copy a list with the suffix [:], making a new slice of the entire list. For instance:
data['test'].append(arr[i%size_c][:])
... and this should solve your reference problem.

Removing a string from a list inside a list of lists

What I'm trying to achieve here in a small local test is to iterate over an array of strings, which are basically arrays of strings inside a parent array.
I'm trying to achieve the following...
1) Get the first array in the parent array
2) Get the rest of the list without the one taken
3) Iterate through the taken array, so I take each of the strings
4) Look for each string taken in all of the rest of the arrays
5) If found, remove it from the array
So far I've tried the following, but I'm struggling with an error that I don't know where it does come from...
lines = map(lambda l: str.replace(l, "\n", ""),
list(open("PATH", 'r')))
splitLines = map(lambda l: l.split(','), lines)
for line in splitLines:
for keyword in line:
print(list(splitLines).remove(keyword))
But I'm getting the following error...
ValueError: list.remove(x): x not in list
Which isn't true as 'x' isn't a string included in any of the given test arrays.
SAMPLE INPUT (Comma separated lines in a text file, so I get an array of strings per line):
[['a', 'b', 'c'], ['e', 'f', 'g'], ['b', 'q', 'a']]
SAMPLE OUTPUT:
[['a', 'b', 'c'], ['e', 'f', 'g'], ['q']]
You can keep track of previously seen strings using a set for fast lookups, and using a simple list comprehension to add elements not found in the previously seen set.
prev = set()
final = []
for i in x:
final.append([j for j in i if j not in prev])
prev = prev.union(set(i))
print(final)
Output:
[['a', 'b', 'c'], ['e', 'f', 'g'], ['q']]
inputlist = [['a', 'b', 'c'], ['e', 'f', 'g'], ['b', 'q', 'a']]
scanned=[]
res=[]
for i in inputlist:
temp=[]
for j in i:
if j in scanned:
pass
else:
scanned.append(j)
temp.append(j)
res.append(temp)
[['a', 'b', 'c'], ['e', 'f', 'g'], ['q']]

What's the most efficient way of identifying repeated pattern in array of objects using Python

I have two arrays of 5 objects
a = ['a', 'b', 'c', 'd', 'e', 'f', 'e', 'f']
b = ['a', 'b', 'd', 'f', 'e', 'f']
I would like to identify the repeated patterns of more than one object and their occurrences like
['a', 'b']: 2
['e', 'f']: 3
['f', 'e', 'f']: 2
The first sequence ['a', 'b'] appeared once in a and once in b, so total count 2. The 2nd sequence ['e', 'f'] appeared twice in a, once in b, so total 3. The 3rd sequence ['f', 'e', 'f'] appeared once in a, and once in b, so total 2.
Is there a good way to do this in Python?
Also the universe of objects is limited. Was wondering if there's an efficient solution that utilizes hash table?
If the approach is only for two lists, the following approach should work. I am not sure if this is the most efficient solution though.
A nice description of find n-grams is given in this blog post.
This approach provides the min length and determines the max length that a repeating sequence of a list might have (at most half the length of the list).
We then find all the sequences for each of the lists by combining the sequences for individual lists. Then we have a counter of every sequence and its count.
Finally we return a dictionary of all the sequences that occur more than once.
def find_repeating(list_a, list_b):
min_len = 2
def find_ngrams(input_list, n):
return zip(*[input_list[i:] for i in range(n)])
seq_list_a = []
for seq_len in range(min_len, len(list_a) + 1):
seq_list_a += [val for val in find_ngrams(list_a, seq_len)]
seq_list_b = []
for seq_len in range(min_len, len(list_b) + 1):
seq_list_b += [val for val in find_ngrams(list_b, seq_len)]
all_sequences = seq_list_a + seq_list_b
counter = {}
for seq in all_sequences:
counter[seq] = counter.get(seq, 0) + 1
filtered_counter = {k: v for k, v in counter.items() if v > 1}
return filtered_counter
Do let me know if you are unsure about anything.
>>> list_a = ['a', 'b', 'c', 'd', 'e', 'f', 'e', 'f']
>>> list_b = ['a', 'b', 'd', 'f', 'e', 'f']
>>> print find_repeating(list_a, list_b)
{('f', 'e'): 2, ('e', 'f'): 3, ('f', 'e', 'f'): 2, ('a', 'b'): 2}
When you mentioned that you were looking for an efficient solution, my first thought was of the approaches to solving the longest common subsequence problem. But in your case, we actually do need to enumerate all common subsequences so that we can count them, so a dynamic programming solution will not do. Here's my solution. It's certainly shorter than SSSINISTER's solution (mostly because I use the collections.Counter class).
#!/usr/bin/env python3
def find_repeating(sequence_a, sequence_b, min_len=2):
from collections import Counter
# Find all subsequences
subseq_a = [tuple(sequence_a[start:stop]) for start in range(len(sequence_a)-min_len+1)
for stop in range(start+min_len,len(sequence_a)+1)]
subseq_b = [tuple(sequence_b[start:stop]) for start in range(len(sequence_b)-min_len+1)
for stop in range(start+min_len,len(sequence_b)+1)]
# Find common subsequences
common = set(tup for tup in subseq_a if tup in subseq_b)
# Count common subsequences
return Counter(tup for tup in (subseq_a + subseq_b) if tup in common)
Resulting in ...
>>> list_a = ['a', 'b', 'c', 'd', 'e', 'f', 'e', 'f']
>>> list_b = ['a', 'b', 'd', 'f', 'e', 'f']
>>> print(find_repeating(list_a, list_b))
Counter({('e', 'f'): 3, ('f', 'e'): 2, ('a', 'b'): 2, ('f', 'e', 'f'): 2})
The advantage to using collections.Counter is that not only do you not need to produce the actual code to iterate and count, you get access to all of the dict methods as well as a few specialized methods for using those counts.

Categories