Use index of first and second repeated index in list - python

There are lots of similar posts out there, but I could not find something that directly matched, or resulted in a solution to, the issue I am dealing with.
I want to use the second instance of a repeated index contained in a list as the index of another list. When the function is executed, I want all numbers from the start of the list up to the first '\*' to print after Code1, all numbers between the first '\*' and the second '\*' to print after Code2, and then all numbers following the second '\*' until the end of the list to print after Code3. Example data for digit would be "['1', '2', '3', '4', '5', '\*', '6', '\*', '7', '8', '9', '10', '1']".
In other words, I want the code below to print , assuming those digits exist, User Code: 12345, Pass Code: 6, Pin Code: 789101, all in one line.
print_string += 'User Code: {} '.format(''.join(str(dig) for dig in digit[:digit.index('*')])) + \
'Pass Code: {} '.format(''.join(str(dig) for dig in digit[digit.index('*'):digit.index('*')])) + \
'Pin Code: {} '.format(''.join(str(dig) for dig in digit[digit.index('*'):]))
print(print_string)
Essentially, I would like to call the first asterisk as the right index for User Code, the first asterisk as the left index and the second asterisk as the right index for Pass Code, and the second asterisk as the left index for Pin Code.
I just cannot figure out how make it look for sequential asterisks. If there is a simpler way to execute this, please let me know!

Given,
L = ['1', '2', '3', '4', '5', '\*', '6', '\*', '7', '8', '9', '10', '1']
Then
str.join('', L)
will form a string
'12345\\*6\\*789101'
which you can split into the three parts
parts = str.join('', L).split('\*')
and then pull out what you need
user_code = parts[0]
pass_code = parts[1]
pin = parts[2]
If you have actually got all the digits in a list like shape ina string,
"['1', '2', '3', '4', '5', '\*', '6', '\*', '7', '8', '9', '10', '1']"
it might be worth just having them as a list, then you can use the join/split method above.

Related

Parse a list into a list of lists in python

I am trying to figure out how to parse a list into a list of lists.
tileElements = browser.find_element(By.CLASS_NAME, 'tile-container')
tileHTML = (str(tileElements.get_attribute('innerHTML')))
tileNUMS = re.findall('\d+',tileHTML)
NumTiles = int(len(tileNUMS)/4)
#parse out list, each 4 list items are one tile
print(str(tileNUMS))
print(str(NumTiles))
TileList = [[i+j for i in range(len(tileNUMS))]for j in range (NumTiles)]
print(str(TileList))
The first part of this code works find and gives me a list of Tile Numbers:
['2', '3', '1', '2', '2', '4', '4', '2']
However, what I need is a list of lists made out of this and that is where I am getting stuck.
The list of lists should be 4 elements long and look like this:
[['2', '3', '1', '2'] , ['2', '4', '4', '2']]
It should be able to do this for as many tiles as there are in the game (up to 19 I believe). It would be really nice if when the middle numbers are repeated that the two outside numbers are replaced with the latest value from the source list.
You can use a list comprehension to get slices from the list like so.
elements = ['2', '3', '1', '2', '2', '4', '4', '2']
size = 4
result = [elements[i:i+size] for i in range(0, len(elements), size)]
(By the way, there's no need to cast things into str to print them, and tileHTML is probably already a string, too.)

How can I use regex to match only one character in Python?

I am trying do precess a list of files
file_list = ['.DS_Store', '9', '7', '6', '8', '01', '4', '3', '2', '5']
the goal is to find the files whose name has only one character.
I tried this code
r = re.compile('[0-9]')
result_list = list(filter(r.match, file_list))
result_list
and got
['9', '7', '6', '8', '01', '4', '3', '2', '5']
where '01' should not be included.
I made a workaround
tmp = []
for i in file_list:
if len(i)==1:
tmp.append(i)
tmp
and I got
['9', '7', '6', '8', '4', '3', '2', '5']
this is exactly what I want. Although the method is ugly.
how can I use regex in Python to finish the task?
r = re.compile('^[0-9]$')
The ^ matches the beginning of a line and $ matches the end.
And if you really want it to match any character, not just numbers, it should be
r = re.compile('^.$')
The . in the regex is a single-character wildcard.
Match a string if it's simply any single character appearing at the beginning of the string (^.) right before the end of the string ($):
^.$
Regex101
Your Python then becomes:
r = re.compile('^.$')
result_list = list(filter(r.match, file_list))
Your code is equivalent to
[ i for i in file_list if len(i)==1]
And this method adapts to every case in which file's name has only one character.

how to remove the first occurence of an integer in a list

this is my code:
positions = []
for i in lines[2]:
if i not in positions:
positions.append(i)
print (positions)
print (lines[1])
print (lines[2])
the output is:
['1', '2', '3', '4', '5']
['is', 'the', 'time', 'this', 'ends']
['1', '2', '3', '4', '1', '5']
I would want my output of the variable "positions" to be; ['2','3','4','1','5']
so instead of removing the second duplicate from the variable "lines[2]" it should remove the first duplicate.
You can reverse your list, create the positions and then reverse it back as mentioned by #tobias_k in the comment:
lst = ['1', '2', '3', '4', '1', '5']
positions = []
for i in reversed(lst):
if i not in positions:
positions.append(i)
list(reversed(positions))
# ['2', '3', '4', '1', '5']
You'll need to first detect what values are duplicated before you can build positions. Use an itertools.Counter() object to test if a value has been seen more than once:
from itertools import Counter
counts = Counter(lines[2])
positions = []
for i in lines[2]:
counts[i] -= 1
if counts[i] == 0:
# only add if this is the 'last' value
positions.append(i)
This'll work for any number of repetitions of values; only the last value to appear is ever used.
You could also reverse the list, and track what you have already seen with a set, which is faster than testing against the list:
positions = []
seen = set()
for i in reversed(lines[2]):
if i not in seen:
# only add if this is the first time we see the value
positions.append(i)
seen.add(i)
positions = positions[::-1] # reverse the output list
Both approaches require two iterations; the first to create the counts mapping, the second to reverse the output list. Which is faster will depend on the size of lines[2] and the number of duplicates in it, and wether or not you are using Python 3 (where Counter performance was significantly improved).
you can use a dictionary to save the last position of the element and then build a new list with that information
>>> data=['1', '2', '3', '4', '1', '5']
>>> temp={ e:i for i,e in enumerate(data) }
>>> sorted(temp, key=lambda x:temp[x])
['2', '3', '4', '1', '5']
>>>

Getting rid of Characters in CVS file to get mean of columns

I asked for help a while ago and I thought this was what I was looking for unfortunately I ran into another problem. In my CSV file I have ?'s inplace of missing data in some rows in the 13 columns. I have an idea of how to fix it but have yet to be successful in implementing it. My current Idea would be to use use ord and chr to change the ? to 0 but not sure how to implement that to list. This is the error I get
File "C:\Users\David\Documents\Python\asdf.py", line 46, in <module>
iList_sum[i] += float(ill_data[i])
ValueError: could not convert string to float: '?'
Just so you know I can not use numby or panda. I am also trying to refrain from using mapping since I am trying to get a very simplistic code.
import csv
#turn csv files into a list of lists
with open('train.csv','rU') as csvfile:
reader = csv.reader(csvfile)
csv_data = list(reader)
# Create two lists to handle the patients
# And two more lists to collect the 'sum' of the columns
# The one that needs to hold the sum 'must' have 0 so we
# can work with them more easily
iList = []
iList_sum = [0,0,0,0,0,0,0,0,0,0,0,0,0]
hList = []
hList_sum = [0,0,0,0,0,0,0,0,0,0,0,0,0]
# Only use one loop to make the process mega faster
for row in csv_data:
# If row 13 is greater than 0, then place them as unhealthy
if (row and int(row[13]) > 0):
# This appends the whole 'line'/'row' for storing :)
# That's what you want (instead of saving only one cell at a time)
iList.append(row)
# If it failed the initial condition (greater than 0), then row 13
# is either less than or equal to 0. That's simply the logical outcome
else:
hList.append(row)
# Use these to verify the data and make sure we collected the right thing
# print iList
# [['67', '1', '4', '160', '286', '0', '2', '108', '1', '1.5', '2', '3', '3', '2'], ['67', '1', '4', '120', '229', '0', '2', '129', '1', '2.6', '2', '2', '7', '1']]
# print hList
# [['63', '1', '1', '145', '233', '1', '2', '150', '0', '2.3', '3', '0', '6', '0'], ['37', '1', '3', '130', '250', '0', '0', '187', '0', '3.5', '3', '0', '3', '0']]
# We can use list comprehension, but since this is a beginner task, let's go with basics:
# Loop through all the 'rows' of the ill patient
for ill_data in iList:
# Loop through the data within each row, and sum them up
for i in range(0,len(ill_data) - 1):
iList_sum[i] += float(ill_data[i])
# Now repeat the process for healthy patient
# Loop through all the 'rows' of the healthy patient
for healthy_data in hList:
# Loop through the data within each row, and sum them up
for i in range(0,len(healthy_data) - 1):
hList_sum[i] += float(ill_data[i])
# Using list comprehension, I basically go through each number
# In ill list (sum of all columns), and divide it by the lenght of iList that
# I found from the csv file. So, if there are 22 ill patients, then len(iList) will
# be 22. You can see that the whole thing is wrapped in brackets, so it would show
# as a python list
ill_avg = [ ill / len(iList) for ill in iList_sum]
hlt_avg = [ hlt / len(hList) for hlt in hList_sum]
Here is a screenshot of the CSV file.
Simply check the value you get from the list:
# Loop through the data within each row, and sum them up
qmark_counter = 0
for i in range(0,len(ill_data) - 1):
if ill_data[i] == '?':
val = 0
qmark_counter += 1
else
val = ill_data[i]
iList_sum[i] += float(val)
And so on for the other ones. There are many other improvements that could be done; for instance, I would put the snippet of code in a function so that it does not have to be repeated multiple times.
EDIT: added the counter for question marks. If you want to keep track of question marks separately for each list, you may want to use a dictionary.

Randomly extract x items from a list using python

Starting with two lists such as:
lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
I want to have the user input how many items they want to extract, as a percentage of the overall list length, and the same indices from each list to be randomly extracted. For example say I wanted 50% the output would be
newLstOne = ['8', '1', '3', '7', '5']
newLstTwo = ['8', '1', '3', '7', '5']
I have achieved this using the following code:
from random import randrange
lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
LengthOfList = len(lstOne)
print LengthOfList
PercentageToUse = input("What Percentage Of Reads Do you want to extract? ")
RangeOfListIndices = []
HowManyIndicesToMake = (float(PercentageToUse)/100)*float(LengthOfList)
print HowManyIndicesToMake
for x in lstOne:
if len(RangeOfListIndices)==int(HowManyIndicesToMake):
break
else:
random_index = randrange(0,LengthOfList)
RangeOfListIndices.append(random_index)
print RangeOfListIndices
newlstOne = []
newlstTwo = []
for x in RangeOfListIndices:
newlstOne.append(lstOne[int(x)])
for x in RangeOfListIndices:
newlstTwo.append(lstTwo[int(x)])
print newlstOne
print newlstTwo
But I was wondering if there was a more efficient way of doing this, in my actual use case this is subsampling from 145,000 items. Furthermore, is randrange sufficiently free of bias at this scale?
Thank you
Q. I want to have the user input how many items they want to extract, as a percentage of the overall list length, and the same indices from each list to be randomly extracted.
A. The most straight-forward approach directly matches your specification:
percentage = float(raw_input('What percentage? '))
k = len(data) * percentage // 100
indicies = random.sample(xrange(len(data)), k)
new_list1 = [list1[i] for i in indicies]
new_list2 = [list2[i] for i in indicies]
Q. in my actual use case this is subsampling from 145,000 items. Furthermore, is randrange sufficiently free of bias at this scale?
A. In Python 2 and Python 3, the random.randrange() function completely eliminates bias (it uses the internal _randbelow() method that makes multiple random choices until a bias-free result is found).
In Python 2, the random.sample() function is slightly biased but only in the round-off in the last of 53 bits. In Python 3, the random.sample() function uses the internal _randbelow() method and is bias-free.
Just zip your two lists together, use random.sample to do your sampling, then zip again to transpose back into two lists.
import random
_zips = random.sample(zip(lstOne,lstTwo), 5)
new_list_1, new_list_2 = zip(*_zips)
demo:
list_1 = range(1,11)
list_2 = list('abcdefghij')
_zips = random.sample(zip(list_1, list_2), 5)
new_list_1, new_list_2 = zip(*_zips)
new_list_1
Out[33]: (3, 1, 9, 8, 10)
new_list_2
Out[34]: ('c', 'a', 'i', 'h', 'j')
The way you are doing it looks mostly okay to me.
If you want to avoid sampling the same object several times, you could proceed as follows:
a = len(lstOne)
choose_from = range(a) #<--- creates a list of ints of size len(lstOne)
random.shuffle(choose_from)
for i in choose_from[:a]: # selects the desired number of items from both original list
newlstOne.append(lstOne[i]) # at the same random locations & appends to two newlists in
newlstTwo.append(lstTwo[i]) # sequence

Categories