How do I extract certain digits from raw input in Python? - python

Let's say I ask a users for some random letters and numbers. let's say they gave me 1254jf4h. How would I take the letters jfh and convert them inter a separate variable and then take the numbers 12544 and make them in a separate variable?

>>> s="1254jf4h"
>>> num=[]
>>> alpah=[]
>>> for n,i in enumerate(s):
... if i.isdigit():
... num.append(i)
... else:
... alpah.append(i)
...
>>> alpah
['j', 'f', 'h']
>>> num
['1', '2', '5', '4', '4']

A for loop is simple enough. Personally, I would use filter().
s = "1254jf4h"
nums = filter(lambda x: x.isdigit(), s)
chars = filter(lambda x: x.isalpha(), s)
print nums # 12544
print chars # jfh

edit: oh well, you already got your answer. Ignore.
NUMBERS = "0123456789"
LETTERS = "abcdefghijklmnopqrstuvwxyz"
def numalpha(string):
return string.translate(None, NUMBERS), string.translate(None, LETTERS)
print numalpha("14asdf129h53")
The function numalpha returns a 2-tuple with two strings, the first containing all the alphabetic characters in the original string, the second containing the numbers.
Note this is highly inefficient, as it traverses the string twice and it doesn't take into account the fact that numbers and letters have consecutive ASCII codes, though it does have the advantage of being easily modifiable to work with other codifications.
Note also that I only extracted lower-case letters. Yeah, It's not the best code snippet I've ever written xD. Hope it helps anyway.

Related

How can I check a string for two letters or more?

I am pulling data from a table that changes often using Python - and the method I am using is not ideal. What I would like to have is a method to pull all strings that contain only one letter and leave out anything that is 2 or more.
An example of data I might get:
115
19A6
HYS8
568
In this example, I would like to pull 115, 19A6, and 568.
Currently I am using the isdigit() method to determine if it is a digit and this filters out all numbers with one letter, which works for some purposes, but is less than ideal.
Try this:
string_list = ["115", "19A6", "HYS8", "568"]
output_list = []
for item in string_list: # goes through the string list
letter_counter = 0
for letter in item: # goes through the letters of one string
if not letter.isdigit(): # checks if the letter is a digt
letter_counter += 1
if letter_counter < 2: # if the string has more then 1 letter it wont be in output list
output_list.append(item)
print(output_list)
Output:
['115', '19A6', '568']
Here is a one-liner with a regular expression:
import re
data = ["115", "19A6", "HYS8", "568"]
out = [string for string in data if len(re.sub("\d", "", string))<2]
print(out)
Output:
['115', '19A6', '568']
This is an excellent case for regular expressions (regex), which is available as the built-in re library.
The code below follows the logic:
Define the dataset. Two examples have been added to show that a string containing two alpha-characters is rejected.
Compile a character pattern to be matched. In this case, zero or more digits, followed by zero or one upper case letter, ending with zero of more digits.
Use the filter function to detect matches in the data list and output as a list.
For example:
import re
data = ['115', '19A6', 'HYS8', '568', 'H', 'HI']
rexp = re.compile('^\d*[A-Z]{0,1}\d*$')
result = list(filter(rexp.match, data))
print(result)
Output:
['115', '19A6', '568', 'H']
Another solution, without re using str.maketrans/str.translate:
lst = ["115", "19A6", "HYS8", "568"]
d = str.maketrans(dict.fromkeys(map(str, range(10)), ""))
out = [i for i in lst if len(i.translate(d)) < 2]
print(out)
Prints:
['115', '19A6', '568']
z=False
a = str(a)
for I in range(len(a)):
if a[I].isdigit():
z = True
break
else:
z="no digit"
print(z)```

masking alphanumeric strings to similar format

i'm trying to impose a format on an alphanumeric string where digits become all 9's and the alphabets become A's.
e.g. N43563 == A999
e.g2. dhfgb85fb == AAAAA99AA
something along these lines
pytho based. i have tried regex but it was a bit confusing for me which is why i'm now asking for assistance
>>> result1 = re.sub('[a-zA-Z]', 'A', 'dhfgb85fb')
>>> result2 = re.sub('[0-9]', '9', result1)
>>> result2
'AAAAA99AA'
You don't need re for that, if you only want to replace each digit for a 9 and each letter for an A then you can do this:
sample = ['N43563', 'dhfgb85fb']
for s in sample:
new_s = ''.join(
'9' if letter.isdigit() else 'A' for letter in s
)
print(new_s)
>>> A99999
>>> AAAAA99AA

How to understand the result of list comprehension of nested lists when the order is reversed?

I'm trying to extract numbers that are mixed in sentences. I am doing this by splitting the sentence into elements of a list, and then I will iterate through each character of each element to find the numbers. For example:
String = "is2 Thi1s T4est 3a"
LP = String.split()
for e in LP:
for i in e:
if i in ('123456789'):
result += i
This can give me the result I want, which is ['2', '1', '4', '3']. Now I want to write this in list comprehension. After reading the List comprehension on a nested list?
post I understood that the right code shall be:
[i for e in LP for i in e if i in ('123456789') ]
My original code for the list comprehension approach was wrong, but I'm trying to wrap my heads around the result I get from it.
My original incorrect code, which reversed the order:
[i for i in e for e in LP if i in ('123456789') ]
The result I get from that is:
['3', '3', '3', '3']
Could anyone explain the process that leads to this result please?
Just reverse the same process you found in the other post. Nest the loops in the same order:
for i in e:
for e in LP:
if i in ('123456789'):
print(i)
The code requires both e and LP to be set beforehand, so the outcome you see depends entirely on other code run before your list comprehension.
If we presume that e was set to '3a' (the last element in LP from your code that ran full loopss), then for i in e will run twice, first with i set to '3'. We then get a nested loop, for e in LP, and given your output, LP is 4 elements long. So that iterates 4 times, and each iteration, i == '3' so the if test passes and '3' is added to the output. The next iteration of for i in e: sets i = 'a', the inner loop runs 4 times again, but not the if test fails.
However, we can't know for certain, because we don't know what code was run last in your environment that set e and LP to begin with.
I'm not sure why your original code uses str.split(), then iterates over all the characters of each word. Whitespace would never pass your if filter anyway, so you could just loop directly over the full String value. The if test can be replaced with a str.isdigit() test:
digits = [char for char in String if char.isdigit()]
or a even a regular expression:
digits = re.findall(r'\d', String)
and finally, if this is a reordering puzzle, you'd want to split out your strings into a number (for ordering) and the remainder (for joining); sort the words on the extracted number, and extract the remainder after sorting:
# to sort on numbers, extract the digits and turn to an integer
sortkey = lambda w: int(re.search(r'\d+', w).group())
# 'is2' -> 2, 'Th1s1' -> 1, etc.
# sort the words by sort key
reordered = sorted(String.split(), key=sortkey)
# -> ['Thi1s', 'is2', '3a', 'T4est']
# replace digits in the words and join again
rejoined = ' '.join(re.sub(r'\d+', '', w) for w in reordered)
# -> 'This is a Test'
From the question you asked in a comment ("how would you proceed to reorder the words using the list that we got as index?"):
We can use custom sorting to accomplish this. (Note that regex is not required, but makes it slightly simpler. Use any method to extract the number out of the string.)
import re
test_string = 'is2 Thi1s T4est 3a'
words = test_string.split()
words.sort(key=lambda s: int(re.search(r'\d+', s).group()))
print(words) # ['Thi1s', 'is2', '3a', 'T4est']
To remove the numbers:
words = [re.sub(r'\d', '', w) for w in words]
Final output is:
['This', 'is', 'a', 'Test']

Python find element in list that ends with number

I have a list of strings, and I want to all the strings that end with _1234 where 1234 can be any 4-digit number. It's ideal to find all the elements, and what the digits actually are, or at least return the 1st matching element, and what the 4 digit is.
For example, I have
['A', 'BB_1024', 'CQ_2', 'x_0510', 'y_98765']
I want to get
['1024', '0510']
Okay so far I got, _\d{4}$ will match _1234 and return a match object, and the match_object.group(0) is the actual matched string. But is there a better way to look for _\d{4}$ but only return \d{4} without the _?
Use re.search():
import re
lst = ['A', 'BB_1024', 'CQ_2', 'x_0510']
newlst = []
for item in lst:
match = re.search(r'_(\d{4})\Z', item)
if match:
newlst.append(match.group(1))
print(newlst) # ['1024', '0510']
As for the regex, the pattern matches an underscore and exactly 4 digits at the end of the string, capturing only the digits (note the parens). The captured group is then accessible via match.group(1) (remember that group(0) is the entire match).
import re
src = ['A', 'BB_1024', 'CQ_2', 'x_0510', 'y_98765', 'AB2421', 'D3&1345']
res = []
p = re.compile('.*\D(\d{4})$')
for s in src:
m = p.match(s)
if m:
res.append(m.group(1))
print(res)
Works fine, \D means not a number, so it will match 'AB2421', 'D3&1345' and so on.
Please show some code next time you ask a question here, even if it doesn't work at all. It makes it easier for people to help you.
If you're interested in a solution without any regex, here's a way with list comprehensions:
>>> data = ['A', 'BB_1024', 'CQ_2', 'x_0510', 'y_98765']
>>> endings = [text.split('_')[-1] for text in data]
>>> endings
['A', '1024', '2', '0510', '98765']
>>> [x for x in endings if x.isdigit() and len(x)==4]
['1024', '0510']
Try this:
[s[-4:] for s in lst if s[-4:].isdigit() and len(s) > 4]
Just check the last four characters if it's a number or not.
added the len(s) > 4 to correct the mistake Joran pointed out.
Try this code:
r = re.compile(".*?([0-9]+)$")
newlist = filter(r.match, mylist)
print newlist

Python issue with list and join function

How do I append two digit integer into a list using for loop without splitting them. For example I give the computer 10,14,13,15 and I get something like 1,0,1,4,1,3,1,5. I tried to go around this, but I ended up with a new issue, which is Type Error: sequence item 0: expected string, int found
def GetNumbers(List):
q=[]
Numberlist = []
for i in List:
if i.isdigit():
q.append(int(i))
else:
Numberlist.append(''.join(q[:]))
del q[:]
return Numberlist
Ideal way will be to use str.split() function as:
>>> my_num_string = "10,14,13,15"
>>> my_num_string.split(',')
['10', '14', '13', '15']
But, since you mentioned you can not use split(), you may use regex expression to extract numbers from string as:
>>> import re
>>> re.findall('\d+', my_num_string)
['10', '14', '13', '15']
Else, if you do not want to go with any fancy method, you may achieve it with simple for loop as:
num_str, num_list = '', []
# ^ Needed for storing the state of number while iterating over
# the string character by character
for c in my_num_string:
if c.isdigit():
num_str += c
else:
num_list.append(num_str)
num_str = ''
The numbers in num_list will be in the form of str. In order to convert them to int, you may explicitly convert them as:
num_list = [int(i) for i in num_list] # OR, list(map(int, num_list))

Categories