Find and replace some elements in a list using python - python

I have to search all elements in a list and replace all occurrences of one element with another. What is the best way to do this?
For example, suppose my list has the following elements:
data = ['a34b3f8b22783cf748d8ec99b651ddf35204d40c',
'baa6cb4298d90db1c375c63ee28733eb144b7266',
'CommitTest.txt',
'=>',
'text/CommitTest.txt',
'0',
'README.md',
'=>',
'text/README.md',
'0']
and I need to replace all occurrences of character '=>' with the combined value from elements before and after the character '=>', so the output I need is:
data = ['a34b3f8b22783cf748d8ec99b651ddf35204d40c',
'baa6cb4298d90db1c375c63ee28733eb144b7266',
'CommitTest.txt=>text/CommitTest.txt',
'0',
'README.md=>text/README.md',
'0']
This is my code I wrote so far:
ind = data.index("=>")
item_to_replace = data[ind]
combine = data[ind-1]+data[ind]+data[ind+1]
replacement_value = combine
indices_to_replace = [i for i,x in enumerate(data) if x==item_to_replace]
for i in indices_to_replace:
data[i] = replacement_value
data
However, the unwanted output is like this :
data = ['a34b3f8b22783cf748d8ec99b651ddf35204d40c',
'baa6cb4298d90db1c375c63ee28733eb144b7266',
'CommitTest.txt',
'CommitTest.txt=>text/CommitTest.txt',
'text/CommitTest.txt',
'0',
'README.md',
'CommitTest.txt=>text/CommitTest.txt',
'text/README.md',
'0']
Is there a better way?

Your general algorithm is correct.
However, data.index("->") will only find the index of the first occurance of "->".
You need to find all occurrences of "=>" store it in a list, combine the elements and replace for each of the occurances.
To find the index of all occurance of "=>", you can use:
indices = [i for i, x in enumerate(data) if x == "=>"]

As #alpha_989 suggested first find the index of => element and replace for each occurances, hope this may help
>>> indices = [i for i, x in enumerate(data) if x == "=>"]
>>> for i in indices: #this will add one index upper and one index lower of elem "=>" with elem
data[i-1] = data[i-1]+ data[i] + data[i+1]
>>> for elem in data:
if elem == "=>":
del data[data.index("=>")+1]
del data[data.index("=>")]
>>> data
['a34b3f8b22783cf748d8ec99b651ddf35204d40c', 'baa6cb4298d90db1c375c63ee28733eb144b7266', 'CommitTest.txt=>text/CommitTest.txt', '0', 'README.md=>text/README.md', '0']

It was correctly pointed out to you that data.index will only return the index of the first occurence of an element. Furthermore, you code does not remove the entries after and before the "=>".
For a solution that mutates your list, you could use del, but I recommend using this neat slicing syntax that Python offers.
indices = [i for i, val in enumerate(data) if val == '=>']
for i in reversed(indices):
data[i-1: i+2] = [data[i-1] + data[i] + data[i+1]]
I also suggest you attempt an implementation that generates a new list in a single pass. Mutating a list can be a bad practice and has no real advantage over creating a new list like so.
new_data = []
i = 0
while i < len(data):
if i + 1 < len(data) and data[i + 1] == "=>":
new_data.append(data[i] + data[i+1] + data[i+2])
i += 3
else:
new_data.append(data[i])
i += 1

Below is my little experiment, I added a function to call. You can check for it:
data = ['a34b3f8b22783cf748d8ec99b651ddf35204d40c',
'baa6cb4298d90db1c375c63ee28733eb144b7266',
'CommitTest.txt',
'=>',
'text/CommitTest.txt',
'0',
'README.md',
'=>',
'text/README.md',
'0']
def convert_list():
ind = [i for i, x in enumerate(data) if x == "=>"]
if ind == 0 or ind == len(data) - 1:
print("Invalid element location")
return
new_data = []
index_start = 0
while index_start < len(data):
for ind_index in ind:
if index_start == ind_index -1:
index_start += 3
new_data.append(data[ind_index - 1] + data[ind_index] +data[ind_index + 1])
new_data.append(data[index_start])
index_start += 1
return new_data
print(convert_list())

The indexs that need to be deleted are saved first, then deleted.
delete_index=[]
for i,d in enumerate(data):
if(d=="=>"):
data[i]=data[i-1]+data[i]+data[i+1]
delete_index.append(i-1)
delete_index.append(i+1)
new_data=[]
for i,d in enumerate(data):
if i not in delete_index:
new_data.append(d)
print(new_data)

Related

Filling an array with changing string value

I want to fill an array with word suffixes while making a dictionary with their indexes.
In a loop I do the following:
for i in range(len(s)):
suf = s[:j]
suff_dict.update({suf: i})
suff_arr[i][0] = suf
suff_arr[i][1] = 0
j -= 1
The dictionary is filled right, however, the array is filled only with the 1st letter.
[['H', 0], ['H', 0], ['H', 0], ['H', 0], ['H', 0], ['H', 0]]
{'HELLO': 1, 'HELL': 2, 'HEL': 3, 'HE': 4, 'H': 5}
Could you help me to find a problem?
I think maybe this is what you are looking for.
s='HELLO'
suff_arr=[]
suff_dict={}
for i in range(len(s)):
suf = s[i:]
suff_dict.update({suf: i})
suff_arr.append(suf)
print(suff_arr, suff_dict)
I do not really unterstand why you would have nested lists, with a zero, but if you want that you could do it like this:
s='HELLO'
suff_arr=[]
suff_dict={}
for i in range(len(s)):
suf = s[i:]
suff_dict.update({suf: i})
suff_arr.append([suf,0])
print(suff_arr, suff_dict)
Also you said you wanted the word suffixes not prefixes, so I changed that too. If you want the prefixes, simply replace s[i:] with s[:i+1]
Since the data in this question is unclear I can't exactly guess what you are trying to do. But from what I understand this might help u.
s = 'HELLO'
suff_dict = {}
j=len(s)
suff_arr = []
for i in range(len(s)):
suf = s[:j]
suff_dict.update({suf: i})
suff_arr.append([suf,0])
j -= 1
First off, as the previous answers have indicated, this is a case for building your list with "append()". Here is some explanation for the unexpected results you were seeing, it has to do with how Python stores and refers to objects in memory.
Copy and run the code below, it's my attempt to show why you were getting unexpected results in your list. The "id()" function in Python returns a unique identifier for an object, and I use it to show where list values are being stored in memory.
print('**** Values stored directly in list. ****')
arr = [0] * 3
print('All the items in the list refer to the same memory address.')
c = 0
for item in arr:
print(f'arr[{c}] id = ', id(item))
c += 1
print('\n')
for i in range(len(arr)):
arr[i] = i
print('As values are updated, new objects are created in memory:')
c = 0
for item in arr:
print(f'arr[{c}] id = ', id(item))
c += 1
print('And we see the results we expect:')
print(arr)
print('\n')
print('**** Values stored in sub list. ****')
arr = [[0]] * 3
print('All the items in the list refer to the same memory location.')
c = 0
for item in arr:
print(f'arr[{c}] id = ', id(item))
c += 1
print('\n')
for i in range(len(arr)):
arr[i][0] = i
print('The same memory address is repeatedly overwritten.')
c = 0
for item in arr:
print(f'arr[{c}] id = ', id(item))
c += 1
print('\n')
print('And we say "Wut??"')
print(arr)

Find duplicates in a list of strings differing only in upper and lower case writing

I have a list of strings that contains 'literal duplicates' and 'pseudo-duplicates' which differ only in lower- and uppercase writing. I am looking for a function that treats all literal duplicates as one group, returns their indices, and finds all pseudo-duplicates for these elements, again returning their indices.
Here's an example list:
a = ['bar','bar','foo','Bar','Foo','Foo']
And this is the output I am looking for (a list of lists of lists):
dupe_list = [[[0,1],[3]],[[2],[4,5]]]
Explanation: 'bar' appears twice at the indexes 0 and 1 and there is one pseudo-duplicate 'Bar' at index 3. 'foo' appears once at index 2 and there are two pseudo-duplicates 'Foo' at indexes 4 and 5.
Here is one solution (you didn't clarify what the logic of list items will be and i considered that you want the items in lower format as they are met from left to right in the list, let me know if it must be different):
d={i:[[], []] for i in set(k.lower() for k in a)}
for i in range(len(a)):
if a[i] in d.keys():
d[a[i]][0].append(i)
else:
d[a[i].lower()][1].append(i)
result=list(d.values())
Output:
>>> print(result)
[[[0, 1], [3]], [[2], [4, 5]]]
Here's how I would achieve it. But you should consider using a dictionary and not a list of list of list. Dictionaries are excellent data structures for problems like this.
#default argument vars
a = ['bar','bar','foo','Bar','Foo','Foo']
#initalize a dictionary to count occurances
a_dict = {}
for i in a:
a_dict[i] = None
#loop through keys in dictionary, which is values from a_list
#loop through the items from list a
#if the item is exact match to key, add index to list of exacts
#if the item is similar match to key, add index to list of similars
#update the dictionary key's value
for k, v in a_dict.items():
index_exact = []
index_similar = []
for i in range(len(a)):
print(a[i])
print(a[i] == k)
if a[i] == str(k):
index_exact.append(i)
elif a[i].lower() == str(k):
index_similar.append(i)
a_dict[k] = [index_exact, index_similar]
#print out dictionary values to assure answer
print(a_dict.items())
#segregate values from dictionary to its own list.
dup_list = []
for v in a_dict.values():
dup_list.append(v)
print(dup_list)
Here is the solution. I have handled the situation where if there are only pseudo duplicates present or only literal duplicates present
a = ['bar', 'bar', 'foo', 'Bar', 'Foo', 'Foo', 'ka']
# Dictionaries to store the positions of words
literal_duplicates = dict()
pseudo_duplicates = dict()
for index, item in enumerate(a):
# Treates words as literal duplicates if word is in smaller case
if item.islower():
if item in literal_duplicates:
literal_duplicates[item].append(index)
else:
literal_duplicates[item] = [index]
# Handle if only literal_duplicates present
if item not in pseudo_duplicates:
pseudo_duplicates[item] = []
# Treates words as pseudo duplicates if word is in not in smaller case
else:
item_lower = item.lower()
if item_lower in pseudo_duplicates:
pseudo_duplicates[item_lower].append(index)
else:
pseudo_duplicates[item_lower] = [index]
# Handle if only pseudo_duplicates present
if item not in literal_duplicates:
literal_duplicates[item_lower] = []
# Form final list from the dictionaries
dupe_list = [[v, pseudo_duplicates[k]] for k, v in literal_duplicates.items()]
Here is the simple and easy to understand answer for you
a = ['bar','bar','foo','Bar','Foo','Foo']
dupe_list = []
ilist = []
ilist2 =[]
samecase = -1
dupecase = -1
for i in range(len(a)):
if a[i] != 'Null':
ilist = []
ilist2 = []
for j in range(i+1,len(a)):
samecase = -1
dupecase = -1
# print(a)
if i not in ilist:
ilist.append(i)
if a[i] == a[j]:
# print(a[i],a[j])
samecase = j
a[j] = 'Null'
elif a[i] == a[j].casefold():
# print(a[i],a[j])
dupecase = j
a[j] = 'Null'
# print(samecase)
# print(ilist,ilist2)
if samecase != -1:
ilist.append(samecase)
if dupecase != -1:
ilist2.append(dupecase)
dupe_list.append([ilist,ilist2])
a[i]='Null'
print(dupe_list)

Adding a number at the beginning of a string

i am trying to read strings from a line and add a number at the beginning of each string and add each into an array, however my code adds a number to EACH character of the string.
infile = open("milkin.txt","r").readlines()
outfile = open("milkout.txt","w")
number = infile[0]
arrayLoc = infile[1].split( )
array = infile[2].split( )
for i in infile[2]:
counter = 1
countered = str(counter)
i = countered + i
array.append(i)
output:
['2234567', '3222222', '4333333', '5444444', '6555555', '11', '12', '13', '14', '15', '16', '17', '1 ', '12' .... etc
intended output:
['12234567', '23222222', '34333333', '45444444', '56555555']
infile:
5
1 3 4 5 2
2234567 3222222 4333333 5444444 6555555
You need to loop over the array that you read from your file, and since it looks like you want to add sequential numbers to each element, you can use enumerate(array) to get the index of each element as you loop. You can add an argument to enumerate to tell it what number to start at (default is 0):
new_arr = []
for i, a in enumerate(array, 1):
# 'i' will go from 1, 2, ... (n + 1) where 'n' is number of elements in 'array'
# 'a' will be the ith element of 'array'
new_arr.append(str(i) + a)
print(new_arr)
['12234567', '23222222', '34333333', '45444444', '56555555']
As pointed out in a comment, this can be done much more concisely using a list comprehension, which is the more pythonic way to loop:
new_arr = [str(i) + a for i, a in enumerate(array, 1)]

Converting strings within a list into floats

I have a list of numerical values that are of type "string" right now. Some of the elements in this list have more than one value, e.g.:
AF=['0.056', '0.024, 0.0235', '0.724', '0.932, 0.226, 0.634']
The other thing is that some of the elements might be a .
With that being said, I've been trying to convert the elements of this list into floats (while still conserving the tuple if there's more than one value), but I keep getting the following error:
ValueError: could not convert string to float: .
I've tried a LOT of things to solve this, with the latest one being:
for x in AF:
if "," in x: #if there are multiple values for one AF
elements= x.split(",")
for k in elements: #each element of the sub-list
if k != '.':
k= map(float, k)
print(k) #check to see if there are still "."
else:
pass
But when I run that, I still get the same error. So I printed k from the above loop and sure enough, there were still . in the list, despite me stating NOT to include those in the string-to-float conversion.
This is my desired output:
AF=[0.056, [0.024, 0.0235], 0.724, [0.932, 0.226, 0.634]]
def convert(l):
new = []
for line in l:
if ',' in line:
new.append([float(j) for j in line.split(',')])
else:
try:
new.append(float(line))
except ValueError:
pass
return new
>>> convert(AF)
[0.056, [0.024, 0.0235], 0.724, [0.932, 0.226, 0.634]]
If you try this:
result = []
for item in AF:
if item != '.':
values = list(map(float, item.split(', ')))
result.append(values)
You get:
[[0.056], [0.024, 0.0235], [0.724], [0.932, 0.226, 0.634]]
You can simplify using a comprehension list:
result = [list(map(float, item.split(', ')))
for item in AF
if item != '.']
With re.findall() function (on extended input list):
import re
AF = ['0.056', '0.024, 0.0235, .', '.', '0.724', '0.932, 0.226, 0.634', '.']
result = []
for s in AF:
items = re.findall(r'\b\d+\.\d+\b', s)
if items:
result.append(float(items[0]) if len(items) == 1 else list(map(float, items)))
print(result)
The output:
[0.056, [0.024, 0.0235], 0.724, [0.932, 0.226, 0.634]]

Merge nested list items based on a repeating value

Although poorly written, this code:
marker_array = [['hard','2','soft'],['heavy','2','light'],['rock','2','feather'],['fast','3'], ['turtle','4','wet']]
marker_array_DS = []
for i in range(len(marker_array)):
if marker_array[i-1][1] != marker_array[i][1]:
marker_array_DS.append(marker_array[i])
print marker_array_DS
Returns:
[['hard', '2', 'soft'], ['fast', '3'], ['turtle', '4', 'wet']]
It accomplishes part of the task which is to create a new list containing all nested lists except those that have duplicate values in index [1]. But what I really need is to concatenate the matching index values from the removed lists creating a list like this:
[['hard heavy rock', '2', 'soft light feather'], ['fast', '3'], ['turtle', '4', 'wet']]
The values in index [1] must not be concatenated. I kind of managed to do the concatenation part using a tip from another post:
newlist = [i + n for i, n in zip(list_a, list_b]
But I am struggling with figuring out the way to produce the desired result. The "marker_array" list will be already sorted in ascending order before being passed to this code. All like-values in index [1] position will be contiguous. Some nested lists may not have any values beyond [0] and [1] as illustrated above.
Quick stab at it... use itertools.groupby to do the grouping for you, but do it over a generator that converts the 2 element list into a 3 element.
from itertools import groupby
from operator import itemgetter
marker_array = [['hard','2','soft'],['heavy','2','light'],['rock','2','feather'],['fast','3'], ['turtle','4','wet']]
def my_group(iterable):
temp = ((el + [''])[:3] for el in marker_array)
for k, g in groupby(temp, key=itemgetter(1)):
fst, snd = map(' '.join, zip(*map(itemgetter(0, 2), g)))
yield filter(None, [fst, k, snd])
print list(my_group(marker_array))
from collections import defaultdict
d1 = defaultdict(list)
d2 = defaultdict(list)
for pxa in marker_array:
d1[pxa[1]].extend(pxa[:1])
d2[pxa[1]].extend(pxa[2:])
res = [[' '.join(d1[x]), x, ' '.join(d2[x])] for x in sorted(d1)]
If you really need 2-tuples (which I think is unlikely):
for p in res:
if not p[-1]:
p.pop()
marker_array = [['hard','2','soft'],['heavy','2','light'],['rock','2','feather'],['fast','3'], ['turtle','4','wet']]
marker_array_DS = []
marker_array_hit = []
for i in range(len(marker_array)):
if marker_array[i][1] not in marker_array_hit:
marker_array_hit.append(marker_array[i][1])
for i in marker_array_hit:
lists = [item for item in marker_array if item[1] == i]
temp = []
first_part = ' '.join([str(item[0]) for item in lists])
temp.append(first_part)
temp.append(i)
second_part = ' '.join([str(item[2]) for item in lists if len(item) > 2])
if second_part != '':
temp.append(second_part);
marker_array_DS.append(temp)
print marker_array_DS
I learned python for this because I'm a shameless rep whore
marker_array = [
['hard','2','soft'],
['heavy','2','light'],
['rock','2','feather'],
['fast','3'],
['turtle','4','wet'],
]
data = {}
for arr in marker_array:
if len(arr) == 2:
arr.append('')
(first, index, last) = arr
firsts, lasts = data.setdefault(index, [[],[]])
firsts.append(first)
lasts.append(last)
results = []
for key in sorted(data.keys()):
current = [
" ".join(data[key][0]),
key,
" ".join(data[key][1])
]
if current[-1] == '':
current = current[:-1]
results.append(current)
print results
--output:--
[['hard heavy rock', '2', 'soft light feather'], ['fast', '3'], ['turtle', '4', 'wet']]
A different solution based on itertools.groupby:
from itertools import groupby
# normalizes the list of markers so all markers have 3 elements
def normalized(markers):
for marker in markers:
yield marker + [""] * (3 - len(marker))
def concatenated(markers):
# use groupby to iterator over lists of markers sharing the same key
for key, markers_in_category in groupby(normalized(markers), lambda m: m[1]):
# get separate lists of left and right words
lefts, rights = zip(*[(m[0],m[2]) for m in markers_in_category])
# remove empty strings from both lists
lefts, rights = filter(bool, lefts), filter(bool, rights)
# yield the concatenated entry for this key (also removing the empty string at the end, if necessary)
yield filter(bool, [" ".join(lefts), key, " ".join(rights)])
The generator concatenated(markers) will yield the results. This code correctly handles the ['fast', '3'] case and doesn't return an additional third element in such cases.

Categories