Remove duplicates of set of characters in string - Python - python

I have a string '1a1b1c1d3e3e3e1f1g2h2h1i1j1k1l1m1n4o4o4o4o1p1q2r2r1s2t2t2u2u1v1w1x1y1z'
and I want to remove all of the duplicates of these charterers: 3e, 4o, 2r etc.
How can I do that in Python?

str_='1a1b1c1d3e3e3e1f1g2h2h1i1j1k1l1m1n4o4o4o4o1p1q2r2r1s2t2t2u2u1v1w1x1y1z'
seen = set()
result = []
n=2
for i in range(0,len(str_),n):
item=str_[i:i+n]
if item not in seen:
seen.add(item)
result.append(item)

This is a pretty crude way of doing it.
But it seams to do the job without begin to complicated.
This also assumes that it's known character compositions you need to remove. You didn't mention that you need to remove all duplicates, only a set of known ones.
x = '1a1b1c1d3e3e3e1f1g2h2h1i1j1k1l1m1n4o4o4o4o1p1q2r2r1s2t2t2u2u1v1w1x1y1z'
for y in ['3e', '4o', '2r']:
x = x[:x.find(y)+len(y)] + x[x.find(y)+len(y):].replace(y, '')
print(x)
Finds the first occurance of your desired object (3e for instance) and builds a new version of the string up to and including that object, and prepends the string with the rest of the original string but with replacing your object with a empty string.
This is a bit slow, but again, gets the job done.
No error handling here tho so be wary of -1 positions etc.

You can use list comprehension and set to do this in the following way:
s = '1a1b1c1d3e3e3e1f1g2h2h1i1j1k1l1m1n4o4o4o4o1p1q2r2r1s2t2t2u2u1v1w1x1y1z'
s = [s[i:i+2] for i in range(0, len(s) - 1, 2)]
s = set(s)
Hope it helps

Related

Python: Compare first n characters of item in list to first n characters of all other items in same list

I need to compare the first n characters of items in a list to the first n characters of other items in the same list, then remove or keep one of those items.
In the example list below, “AB2222_100” and “AB2222_P100” would be considered duplicates (even though they're technically unique) because the first 6 characters match. When comparing the two values, if x[-4:] = "P100", then that value would be kept in the list and the value without the “P” would be removed. The other items in the list would be kept since there isn’t a duplicate, regardless of whether it's “P100” or “100” suffix at the end of the string. For this case, there will never be more than one duplicate (either a “P” or not).
AB1111_100
AB2222_100
AB2222_P100
AB3333_P100
AB4444_100
AB5555_P100
I understand slicing and comparing, but everything is assuming unique values. I was hoping to use list comprehension instead of a long for loop, but also want to understand what I'm seeing. I've gotten lost trying to figure out collections, sets, zip, etc. for this non-unique scenario.
Slicing and comparing isn't going to retain the required suffix that needs to be maintained in the final list.
newList = [x[:6] for x in myList]
This is how it should start and end.
myList = ['ABC1111_P100', 'ABC2222_100', 'ABC2222_P100', 'ABC3333_P100', 'ABC4444_100', 'ABC5555_P100']
newList = ['ABC1111_P100', 'ABC2222_P100', 'ABC3333_P100', 'ABC4444_100', 'ABC5555_P100']
As stated in your comments you can't do this in a one liner. You can do this in O(n) time but it will take some extra space:
myList = ['ABC1111_P100', 'ABC2222_100', 'ABC2222_P100', 'ABC3333_P100', 'ABC4444_100', 'ABC5555_P100']
seen = dict()
print(myList)
for x in myList:
# grab the start and end of the string
start, end = x.split('_')
if start in seen: # If we have seen this value before
if seen[start] != 'P100': # Did that ending have a P value?
seen[start] = end # If not swap out the P value
else:
# If we have not seen this before then add it to our dict.
seen[start] = end
final_list = ["{}_{}".format(key, value) for key, value in seen.items()]
print(final_list)

How to delete the very last character from every string in a list of strings

I have the strings '80010', '80030', '80050' in a list, as in
test = ['80010','80030','80050']
How can I delete the very last character (in this case the very last digit of each string which is a 0), so that I can end up with another list containing only the first four digits/characters from each string? So end up with something like
newtest = ['8001', '8003', '8005']
I am very new to Python but I have tried with if-else statements, appending, using indexing [:-1], etc. but nothing seems to work unless I end up deleting all my other zeros. Thank you so much!
test = ["80010","80030","80050"]
newtest = [x[:-1] for x in test]
New test will contain the result ["8001","8003","8005"].
[x[:-1] for x in test] creates a new list (using list comprehension) by looping over each item in test and putting a modified version into newtest. The x[:-1] means to take everything in the string value x up to but not including the last element.
You are not so far off. Using the slice notation [:-1] is the right approach. Just combine it with a list comprehension:
>>> test = ['80010','80030','80050']
>>> [x[:-1] for x in test]
['8001', '8003', '8005']
somestring[:-1] gives you everything from the character at position 0 (inclusive) to the last character (exclusive).
Just to show a slightly different solution than comprehension, Given that other answers already explained slicing, I just go through at the method.
With the map function.
test = ['80010','80030','80050']
print map(lambda x: x[:-1],test)
# ['8001', '8003', '8005']
For more information about this solution, please read the brief explanation I did in another similar question.
Convert a list into a sequence of string triples
In python #Matthew solution is perfect. But if indeed you are a beginer in coding in general, I must recommend this, less elegant for sure but the only way in many other scenario :
#variables declaration
test = ['80010','80030','80050']
length = len(test) # for reading and writing sakes, len(A): length of A
newtest = [None] * length # newtest = [none, none, none], go look up empty array creation
strLen = 0 # temporary storage
#adding in newtest every element of test but spliced
for i in range(0, lenght): # for loop
str = test[i] # get n th element of test
strLen = len (str) # for reading sake, the lenght of string that will be spliced
newtest[i] = str[0:strLen - 1] # n th element of newtest is the spliced n th element from test
#show the results
print (newtest) # ['8001','8003','8005']
ps : this scripts, albeit not being the best, works in python ! Good luck to any programmer newcommer.
I had a similar problem and here is the solution.
List<String> timeInDays = new ArrayList<>();
timeInDays.add(2d);
timeInDays.add(3d);
timeInDays.add(4d);
I need to remove last letter in every string in-order to compare them. Below solution worked for me.
List<String> trimmedList = new ArrayList<>;
for(int i=0;i<timeInDays.size();i++)
{
String trimmedString = timeInDays.get(i).substring(0,name.length()-1);
trimmedList=add(trimmedString );
}
System.out.println("List after trimming every string is "+trimmedList);

Python Replace Character In List

I have a Python list that looks like the below:
list = ['|wwwwwwwwadawwwwwwwwi', '|oooooooocFcooooooooi']
I access the letter in the index I want by doing this:
list[y][x]
For example, list[1][10] returns F.
I would like to replace F with a value. Thus changing the string in the list.
I have tried list[y][x] = 'o' but it throws the error:
self.map[y][x] = 'o'
TypeError: 'str' object does not support item assignment
Can anybody help me out? Thanks.
As #Marcin says, Python strings are immutable. If you have a specific character/substring you want to replace, there is string.replace. You can also use lists of characters instead of strings, as described here if you want to support the functionality of changing one particular character.
If you want something like string.replace, but for an index rather than a substring, you can do something like:
def replaceOneCharInString(string, index, newString):
return string[:index] + newString + string[index+len(newString):]
You would need to do some length checking, though.
Edit: forgot string before the brackets on string[index+len(newString):]. Woops.
Since python strings are immutable, they cannot be modified. You need to make new ones. One way is as follows:
tmp_list = list(a_list[1])
tmp_list[10] = 'o' # simulates: list[1][10]='o'
new_str = ''.join(tmp_list)
#Gives |oooooooococooooooooi
# substitute the string in your list
a_list[1] = new_str
As marcin says, strings are immutable in Python so you can not assign to individual characters in an existing string. The reason you can index them is that thay are sequences. Thus
for c in "ABCDEF":
print(c)
Will work, and print each character of the string on a separate line.
To achieve what you want you need to build a new string.For example, here is a brute force approach to replacing a single character of a string
def replace_1(s, index, c)
return s[:index] + c + s[index+1:]
Which you can use thus:
self.map[y] = replace_1(self.map[y], x, 'o')
This will work because self.map is list, which is mutable by design.
Let use L to represent the "list" since list is a function in python
L= ['|wwwwwwwwadawwwwwwwwi', '|oooooooocFcooooooooi']
L[1]='|oooooooococooooooooi'
print(L)
Unfortunately changing a character from an object (in this case) is not supported. The proper way would be to remove the object and add a new string object.
Output
['|wwwwwwwwadawwwwwwwwi', '|oooooooococooooooooi']

Python: removing specific lines from an object

I have a bit of a weird question here.
I am using iperf to test performance between a device and a server. I get the results of this test over SSH, which I then want to parse into values using a parser that has already been made. However, there are several lines at the top of the results (which I read into an object of lines) that I don't want to go into the parser. I know exactly how many lines I need to remove from the top each time though. Is there any way to drop specific entries out of a list? Something like this in psuedo-python
print list
["line1","line2","line3","line4"]
list = list.drop([0 - 1])
print list
["line3","line4"]
If anyone knows anything I could use I would really appreciate you helping me out. The only thing I can think of is writing a loop to iterate through and make a new list only putting in what I need. Anyway, thanlks!
Michael
Slices:
l = ["line1","line2","line3","line4"]
print l[2:] # print from 2nd element (including) onwards
["line3","line4"]
Slices syntax is [from(included):to(excluded):step]. Each part is optional. So you can write [:] to get the whole list (or any iterable for that matter -- string and tuple as an example from the built-ins). You can also use negative indexes, so [:-2] means from beginning to the second last element. You can also step backwards, [::-1] means get all, but in reversed order.
Also, don't use list as a variable name. It overrides the built-in list class.
This is what the slice operator is for:
>>> before = [1,2,3,4]
>>> after = before[2:]
>>> print after
[3, 4]
In this instance, before[2:] says 'give me the elements of the list before, starting at element 2 and all the way until the end.'
(also -- don't use reserved words like list or dict as variable names -- doing that can lead to confusing bugs)
You can use slices for that:
>>> l = ["line1","line2","line3","line4"] # don't use "list" as variable name, it's a built-in.
>>> print l[2:] # to discard items up to some point, specify a starting index and no stop point.
['line3', 'line4']
>>> print l[:1] + l[3:] # to drop items "in the middle", join two slices.
['line1', 'line4']
why not use a basic list slice? something like:
list = list[3:] #everything from the 3 position to the end
You want del for that
del list[:2]
You can use "del" statment to remove specific entries :
del(list[0]) # remove entry 0
del(list[0:2]) # remove entries 0 and 1

What is the best way to create a string array in python?

I'm relatively new to Python and it's libraries and I was wondering how I might create a string array with a preset size. It's easy in java but I was wondering how I might do this in python.
So far all I can think of is
strs = ['']*size
And some how when I try to call string methods on it, the debugger gives me an error X operation does not exist in object tuple.
And if it was in java this is what I would want to do.
String[] ar = new String[size];
Arrays.fill(ar,"");
Please help.
Error code
strs[sum-1] = strs[sum-1].strip('\(\)')
AttributeError: 'tuple' object has no attribute 'strip'
Question: How might I do what I can normally do in Java in Python while still keeping the code clean.
In python, you wouldn't normally do what you are trying to do. But, the below code will do it:
strs = ["" for x in range(size)]
In Python, the tendency is usually that one would use a non-fixed size list (that is to say items can be appended/removed to it dynamically). If you followed this, there would be no need to allocate a fixed-size collection ahead of time and fill it in with empty values. Rather, as you get or create strings, you simply add them to the list. When it comes time to remove values, you simply remove the appropriate value from the string. I would imagine you can probably use this technique for this. For example (in Python 2.x syntax):
>>> temp_list = []
>>> print temp_list
[]
>>>
>>> temp_list.append("one")
>>> temp_list.append("two")
>>> print temp_list
['one', 'two']
>>>
>>> temp_list.append("three")
>>> print temp_list
['one', 'two', 'three']
>>>
Of course, some situations might call for something more specific. In your case, a good idea may be to use a deque. Check out the post here: Python, forcing a list to a fixed size. With this, you can create a deque which has a fixed size. If a new value is appended to the end, the first element (head of the deque) is removed and the new item is appended onto the deque. This may work for what you need, but I don't believe this is considered the "norm" for Python.
The simple answer is, "You don't." At the point where you need something to be of fixed length, you're either stuck on old habits or writing for a very specific problem with its own unique set of constraints.
The best and most convenient method for creating a string array in python is with the help of NumPy library.
Example:
import numpy as np
arr = np.chararray((rows, columns))
This will create an array having all the entries as empty strings. You can then initialize the array using either indexing or slicing.
Are you trying to do something like this?
>>> strs = [s.strip('\(\)') for s in ['some\\', '(list)', 'of', 'strings']]
>>> strs
['some', 'list', 'of', 'strings']
But what is a reason to use fixed size? There is no actual need in python to use fixed size arrays(lists) so you always have ability to increase it's size using append, extend or decrease using pop, or at least you can use slicing.
x = ['' for x in xrange(10)]
strlist =[{}]*10
strlist[0] = set()
strlist[0].add("Beef")
strlist[0].add("Fish")
strlist[1] = {"Apple", "Banana"}
strlist[1].add("Cherry")
print(strlist[0])
print(strlist[1])
print(strlist[2])
print("Array size:", len(strlist))
print(strlist)
The error message says it all: strs[sum-1] is a tuple, not a string. If you show more of your code someone will probably be able to help you. Without that we can only guess.
Sometimes I need a empty char array. You cannot do "np.empty(size)" because error will be reported if you fill in char later. Then I usually do something quite clumsy but it is still one way to do it:
# Suppose you want a size N char array
charlist = [' ']*N # other preset character is fine as well, like 'x'
chararray = np.array(charlist)
# Then you change the content of the array
chararray[somecondition1] = 'a'
chararray[somecondition2] = 'b'
The bad part of this is that your array has default values (if you forget to change them).
def _remove_regex(input_text, regex_pattern):
findregs = re.finditer(regex_pattern, input_text)
for i in findregs:
input_text = re.sub(i.group().strip(), '', input_text)
return input_text
regex_pattern = r"\buntil\b|\bcan\b|\bboat\b"
_remove_regex("row and row and row your boat until you can row no more", regex_pattern)
\w means that it matches word characters, a|b means match either a or b, \b represents a word boundary
If you want to take input from user here is the code
If each string is given in new line:
strs = [input() for i in range(size)]
If the strings are separated by spaces:
strs = list(input().split())

Categories