I am having a list which contains some elements with repetition and from this list I want to generate a list which has no repeated elements in it AND also maintains theie Order in the List.
I tried set(['1','1','2','3','4','4','5','2','2','3','3','6']) and got the output as set(['1', '3', '2', '5', '4', '6'])
But I want the output as set(['1', '2', '3', '4', '5', '6']) i.e. maintain the relative order of the elements already present.
How to do this??? Thanks in advance...
One way to do this:
In [9]: x = ['1','1','2','3','4','4','5','2','2','3','3','6']
In [10]: s = set()
In [11]: y = []
In [12]: for i in x:
...: if i not in s:
...: y.append(i)
...: s.add(i)
...:
In [13]: y
Out[13]: ['1', '2', '3', '4', '5', '6']
As noted by Martijn, a set is unordered by definition, so you need a list to store the result. See also this old question.
Related
def convert_to_list(VertexList):
VerticesList = []
items = VertexList.split(';')
for item in items:
i = item.split(',')
SubList = []
for item in i:
SubList.append(item)
VerticesList.append(SubList)
return VerticesList
This code converts string in this format to a 2D list. However, I am sure it can be optimized.
Input -> '1,2,4,5,6,7;2,3,4,5,6,7,8;1,2,4,5,6,8'
Output -> [['1', '2', '4', '5', '6', '7'], ['2', '3', '4', '5', '6', '7', '8'], ['1', '2', '4', '5', '6', '8']]
Use a comprehension.
inp = '1,2,4,5,6,7;2,3,4,5,6,7,8;1,2,4,5,6,8'
print([s.split(',') for s in inp.split(';')])
Results in
[['1', '2', '4', '5', '6', '7'], ['2', '3', '4', '5', '6', '7', '8'], ['1', '2', '4', '5', '6', '8']]
This is smaller, easier to read code, which is part of the optimization I expect you were looking for. It doesn't loop through things any fewer times, but it's executing fewer assignments, using less temporary variabels, and making fewer function calls (i.e. append()). Maybe some of those calls are being made behind the scenes in the comprehension, but you should be taking advantage of whatever optimizations Python does to its comprehensions in terms of what functions calls are made.
--update--
Check out this answer for a performance analysis of the OP and this answer.
-- update 2 --
To convert all strings to int, you can use map or another comprehension.
inp = '1,2,4,5,6,7;2,3,4,5,6,7,8;1,2,4,5,6,8'
print([list(map(int, s.split(','))) for s in inp.split(';')])
or
inp = '1,2,4,5,6,7;2,3,4,5,6,7,8;1,2,4,5,6,8'
print([[int(c) for c in s.split(',')] for s in inp.split(';')])
This is not a solution, but only a comparison of the optimality of the above codes in terms of actual performance:
from timeit import Timer
code1 = """\
def convert_to_list(VertexList):
VerticesList = []
items = VertexList.split(';')
for item in items:
i = item.split(',')
SubList = []
for item in i:
SubList.append(item)
VerticesList.append(SubList)
return VerticesList
inp = '1,2,4,5,6,7;2,3,4,5,6,7,8;1,2,4,5,6,8'
convert_to_list(inp)
"""
code2 = """\
inp = '1,2,4,5,6,7;2,3,4,5,6,7,8;1,2,4,5,6,8'
out = [s.split(',') for s in inp.split(';')]
"""
t = Timer(stmt=code1)
time1 = t.timeit() # 1000000 iteration by default
print(f"Original time:{round(time1, 6)} sec.")
t = Timer(stmt=code2)
time2 = t.timeit() # 1000000 iteration by default
print(f"New time: {round(time2, 6)} sec.")
print(f'New solution faster in = {round(time1 / time2, 1)} times')
Output:
Original time:1.812856 sec.
New time: 0.741987 sec.
New solution faster in = 2.4 times
This question already has answers here:
How to sort python list of strings of numbers
(4 answers)
Closed 2 years ago.
I tried to sort a list of string that are actually integers but i do not get the right sort value. How do i sort it in a way that it is sorted according to the integer value of string ?
a = ['10', '1', '3', '2', '5', '4']
print(sorted(a))
Output:
['1', '10', '2', '3', '4', '5']
Output wanted:
['1', '2', '3', '4', '5', '10']
We have to use the lambda as a key and make each string to int before the sorted function happens.
sorted(a,key=lambda i: int(i))
Output :
['1', '2', '3', '4', '5', '10']
More shorter way -> sorted(a,key=int). Thanks to #Mark for commenting.
So one of the ways to approach this problem is converting the list to a list integers by iterating through each element and converting them to integers and later sort it and again converting it to a list of strings again.
You could convert the strings to integers, sort them, and then convert back to strings. Example using list comprehensions:
sorted_a = [str(x) for x in sorted(int(y) for y in a)]
More verbose version:
int_a = [int(x) for x in a] # Convert all elements of a to ints
sorted_int_a = sorted(int_a) # Sort the int list
sorted_str_a = [str(x) for x in sorted_int_a] # Convert all elements of int list to str
print(sorted_str_a)
Note: #tedd's solution to this problem is the preferred solution to this problem, I would definitely recommend that over this.
Whenever you have a list of elements and you want to sort using some property of the elements, use key argument (see the docs).
Here's what it looks like:
>>> a = ['10', '1', '3', '2', '5', '4']
>>> print(sorted(a))
['1', '10', '2', '3', '4', '5']
>>> print(sorted(a, key=lambda el: int(el)))
['1', '2', '3', '4', '5', '10']
This question already has answers here:
Split a list into parts based on a set of indexes in Python
(9 answers)
Closed 6 years ago.
I have a list of strings - foo and another list of integers- bar which keeps the track of important indices in foo.
For example:
foo = [{}.format(i) for i in range(1, 11)] # not necessarily of this format
bar = [0, 3, 5]
I would like to create a recipe for creating a list of lists, each list obtained by splitting foo based on indices in bar.
Expected output for the above example:
[['1', '2', '3'], ['4', '5'], ['6', '7', '8', '9', '10']]
For achieving this, I have created the following function which works fine:
result = []
for index, value in enumerate(b):
if index == len(b) - 1:
result.append(a[value:])
elif index == 0 and value != 0:
result.append(a[0: value])
else:
result.append(a[value: b[index + 1]])
However, I find this code highly Non-Pythonic, thanks to my C-Java background.
I would like to know a better solution to this problem (maybe we can use itertools somehow).
You could do as follows:
In [3]: [foo[a:b] for a, b in zip(bar, bar[1:]+[None])]
Out[3]: [['1', '2', '3'], ['4', '5'], ['6', '7', '8', '9', '10']]
Here is one way using a list comprehension:
In [107]: bar = bar + [len(foo)] if bar[-1] < len(foo) else bar
In [110]: [foo[i:j] for i, j in zip(bar, bar[1:])]
Out[110]: [['1', '2', '3'], ['4', '5'], ['6', '7', '8', '9', '10']]
this is my code:
positions = []
for i in lines[2]:
if i not in positions:
positions.append(i)
print (positions)
print (lines[1])
print (lines[2])
the output is:
['1', '2', '3', '4', '5']
['is', 'the', 'time', 'this', 'ends']
['1', '2', '3', '4', '1', '5']
I would want my output of the variable "positions" to be; ['2','3','4','1','5']
so instead of removing the second duplicate from the variable "lines[2]" it should remove the first duplicate.
You can reverse your list, create the positions and then reverse it back as mentioned by #tobias_k in the comment:
lst = ['1', '2', '3', '4', '1', '5']
positions = []
for i in reversed(lst):
if i not in positions:
positions.append(i)
list(reversed(positions))
# ['2', '3', '4', '1', '5']
You'll need to first detect what values are duplicated before you can build positions. Use an itertools.Counter() object to test if a value has been seen more than once:
from itertools import Counter
counts = Counter(lines[2])
positions = []
for i in lines[2]:
counts[i] -= 1
if counts[i] == 0:
# only add if this is the 'last' value
positions.append(i)
This'll work for any number of repetitions of values; only the last value to appear is ever used.
You could also reverse the list, and track what you have already seen with a set, which is faster than testing against the list:
positions = []
seen = set()
for i in reversed(lines[2]):
if i not in seen:
# only add if this is the first time we see the value
positions.append(i)
seen.add(i)
positions = positions[::-1] # reverse the output list
Both approaches require two iterations; the first to create the counts mapping, the second to reverse the output list. Which is faster will depend on the size of lines[2] and the number of duplicates in it, and wether or not you are using Python 3 (where Counter performance was significantly improved).
you can use a dictionary to save the last position of the element and then build a new list with that information
>>> data=['1', '2', '3', '4', '1', '5']
>>> temp={ e:i for i,e in enumerate(data) }
>>> sorted(temp, key=lambda x:temp[x])
['2', '3', '4', '1', '5']
>>>
Starting with two lists such as:
lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
I want to have the user input how many items they want to extract, as a percentage of the overall list length, and the same indices from each list to be randomly extracted. For example say I wanted 50% the output would be
newLstOne = ['8', '1', '3', '7', '5']
newLstTwo = ['8', '1', '3', '7', '5']
I have achieved this using the following code:
from random import randrange
lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
LengthOfList = len(lstOne)
print LengthOfList
PercentageToUse = input("What Percentage Of Reads Do you want to extract? ")
RangeOfListIndices = []
HowManyIndicesToMake = (float(PercentageToUse)/100)*float(LengthOfList)
print HowManyIndicesToMake
for x in lstOne:
if len(RangeOfListIndices)==int(HowManyIndicesToMake):
break
else:
random_index = randrange(0,LengthOfList)
RangeOfListIndices.append(random_index)
print RangeOfListIndices
newlstOne = []
newlstTwo = []
for x in RangeOfListIndices:
newlstOne.append(lstOne[int(x)])
for x in RangeOfListIndices:
newlstTwo.append(lstTwo[int(x)])
print newlstOne
print newlstTwo
But I was wondering if there was a more efficient way of doing this, in my actual use case this is subsampling from 145,000 items. Furthermore, is randrange sufficiently free of bias at this scale?
Thank you
Q. I want to have the user input how many items they want to extract, as a percentage of the overall list length, and the same indices from each list to be randomly extracted.
A. The most straight-forward approach directly matches your specification:
percentage = float(raw_input('What percentage? '))
k = len(data) * percentage // 100
indicies = random.sample(xrange(len(data)), k)
new_list1 = [list1[i] for i in indicies]
new_list2 = [list2[i] for i in indicies]
Q. in my actual use case this is subsampling from 145,000 items. Furthermore, is randrange sufficiently free of bias at this scale?
A. In Python 2 and Python 3, the random.randrange() function completely eliminates bias (it uses the internal _randbelow() method that makes multiple random choices until a bias-free result is found).
In Python 2, the random.sample() function is slightly biased but only in the round-off in the last of 53 bits. In Python 3, the random.sample() function uses the internal _randbelow() method and is bias-free.
Just zip your two lists together, use random.sample to do your sampling, then zip again to transpose back into two lists.
import random
_zips = random.sample(zip(lstOne,lstTwo), 5)
new_list_1, new_list_2 = zip(*_zips)
demo:
list_1 = range(1,11)
list_2 = list('abcdefghij')
_zips = random.sample(zip(list_1, list_2), 5)
new_list_1, new_list_2 = zip(*_zips)
new_list_1
Out[33]: (3, 1, 9, 8, 10)
new_list_2
Out[34]: ('c', 'a', 'i', 'h', 'j')
The way you are doing it looks mostly okay to me.
If you want to avoid sampling the same object several times, you could proceed as follows:
a = len(lstOne)
choose_from = range(a) #<--- creates a list of ints of size len(lstOne)
random.shuffle(choose_from)
for i in choose_from[:a]: # selects the desired number of items from both original list
newlstOne.append(lstOne[i]) # at the same random locations & appends to two newlists in
newlstTwo.append(lstTwo[i]) # sequence