My program takes a user input such as:
>>> x = input()
>>> 1
>>> print x
>>> one
my actual code:
>>> import string
>>> numbers = ['0','1','2','3','4','5','6','7','8','9']
>>> wordNumbers = ['zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine']
>>> myDict = dict(zip(numbers, wordNumbers))
>>> myVar = (raw_input("Enter a number to be tranlated: "))
>>> for translate in myVar.split():
>>> print(myDict[translate])
The problem is I need the user to input 123 and for my program to output one two three, but it doesn't for some reason.
I'm thinking that if I add spaces with some syntax between 123 like 1 2 3 it would work.
You simply need to use:
for translate in myVar:
Instead of:
for translate in myVar.split():
Iterating over a string gives you its characters one by one, which is what you need.
If you do want to convert '123' to '1 2 3' (which isn't needed here because you don't need to use split), you can use:
' '.join(myVar)
Related
A search on SO with just [regex] gave me 249'446 hits and a search with [regex] inclusion exclusion gave me 47 hits but I guess none of the latter (maybe some of the former?) fit my case.
I am also aware, e.g. about this regex page https://www.regular-expressions.info/refquick.html,
but I guess there might be a regex concept which I am not yet familiar with
and would be grateful for hints.
Here is a minimal example of what I am trying to do with a given list of strings.
Find all items which:
have a fixed defined number of characters, i.e. length
must include all characters from a certain list (doesn't matter at what position and if multiple times)
must NOT include any characters from a certain list
Constructs like: [ei^no]{4}, ((?![no])[ei]){4} and a lot of other more complex trials didn't give the desired results.
Hence, I currently implemented this as a 3 step process with checking the length, doing a search and a match. This looks pretty cumbersome and inefficient to me.
Is there a more efficient way to do this?
Script:
import re
items = ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve']
count = 4
mustContain = 'ei' # all of these charactes at least once
mustNotContain = 'no' # none of those chars
hits1 = []
for item in items:
if len(item)==count:
hits1.append(item)
print("Hits1:",hits1)
hits2 = []
for hit in hits1:
regex = '[{}]'.format(mustContain)
if re.search(regex,hit):
hits2.append(hit)
print("Hits2:", hits2)
hits3 = []
for hit in hits2:
regex = '[{}]'.format(mustNotContain)
if re.match(regex,hit):
hits3.append(hit)
print("Hits3:", hits3)
Result:
Hits1: ['four', 'five', 'nine']
Hits2: ['five', 'nine']
Hits3: ['five']
If you are interested in a regex approach, you can create a single dynamic pattern that looks like:
^(?=.{4}$)(?![^no\n]*[no])(?=[^e\n]*e)[^i\n]*i.*$
Explanation
^ Start of string
(?=.{4}$) Assert 4 characters
(?![^no\n]*[no]) Assert no occurrence of n or o to the right using a leading negated character class
(?=[^e\n]*e) Assert an e char to the right
[^i\n]*i Match any char except i and then match i
.* Match the rest of the line
$ end of string
See a regex demo and a Python demo.
Example
import re
items = ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve', 'tree']
hits = [item for item in items if re.match(r"(?=.{4}$)(?![^no\n]*[no])(?=[^e\n]*e)[^i\n]*i.*$", item)]
print(hits)
Output
['five']
Using a variation of all and a list comprehension:
items = ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve', 'tree']
count = 4
mustContain = ["e", "i"] # all of these characters at least once
mustNotContain = ["n", "o"] # none of those chars
hits = [
item for item in items if
len(item) == count and
all([c in item for c in mustContain]) and
all([c not in item for c in mustNotContain])
]
print(hits)
Output
['five']
See a Python demo.
Apparently, the "trick" which I was missing was the "Positive lookahead" (?=regex).
I guess the regex in #Thefourthbird's solution can be shortened,
unless I overlooked something and somebody will prove me wrong.
The regex for the included characters can be generated dynamically.
The regex for the original minimal example of the question would be:
^(?=.{4}$)(?!.*[no])(?=.*e)(?=.*i)
Script: (dynamically generated regex)
import re
items = ['one', 'two', 'three', 'four', 'five', 'six',
'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve',
'tree', 'mean', 'mine', 'fine', 'dime', 'eire']
count = 4
mustContain = 'ei' # all of these characters at least once
mustNotContain = 'no' # none of those chars
hits = []
regex1 = '^(?=.{' + str(count) + '}$)' # limit number of chars
regex2 = '(?!.*[' + mustNotContain + '])' if mustNotContain else '' # excluded chars
regex3 = ''.join(['(?=.*{})'.format(c) for c in mustContain]) # included chars
regex = regex1 + regex2 + regex3
for item in items:
if re.match(regex,item,re.IGNORECASE):
hits.append(item)
print("Hits:", hits)
Result:
Hits: ['five', 'dime', 'eire']
I have two lists, a short one and a longer one.
list1= ['one', 'two']
list2= ['ten', 'seven', 'three', 'one', 'eight', 'six', 'nine', 'two', 'four', 'five']
I need to search the long list for every word in the short list. If it finds a match, stop searching and do something. If it doesn't find it, do something else. The actual list can be quite long so if it finds it I don't want it to keep looking. The only part I can't figure out is getting it to stop once found. Maybe my search terms are wrong. How do I get it to stop search once found, return None if not found? What's the most efficient or pythonic way of doing this? Here is what I have (the fuzzy search is part of something else):
for name in list1:
for dict in reversed(list2):
if fuzz.WRatio(name, dict['Number']) > 90:
I know I can add what to do when found and then break but then I'm not sure what to do if it isn't found except put in another if but now it's starting to seem kludgy.
The pattern you described is often designed to be a function of the form def find(content, pattern) -> offset.
You iterate over the candidates and find the first one matching the pattern, which in your case is by checking if it matches any string in the second list.
When there's no match found, this kind of function often returned -1, for example, the string.find method in Python returns -1 when nothing's found.
So in your case you may create a function like the following:
def find(candidates, patterns):
for i, name in enumerate(candidates):
for dict in reversed(patterns):
if fuzz.WRatio(name, dict['Number']) > 90:
return i # return the index of the name match a pattern
return -1
As far as I understand, maybe code like this is what you want.
list1 = ['one', 'two']
list2 = ['ten', 'seven', 'three', 'one', 'eight', 'six', 'nine', 'two', 'four', 'five']
list1_count = 0
for name1 in list1:
for name2 in list2:
if name1 == name2:
list1_count = list1_count + 1
break
if list1_count == len(list1):
print("found")
else:
print("not found")
Lines from list1_count = 0 to break can be (maybe more Pythonically) replaced to:
list1_count = 0
for name1 in list1:
if name1 in list2:
list1_count = list1_count + 1
I don't know if I understand what you're looking for, but something that finds the first value and stops it
list1 = ['one', 'two']
list2 = ['ten', 'seven', 'three', 'one', 'eight', 'six', 'nine', 'two', 'four', 'five']
for l in list1:
a = list2.index(l)
break
print(a)
If you want to return None if you find nothing, try
list1 = ['one', 'two']
list2 = ['ten', 'seven', 'three', 'one', 'eight', 'six', 'nine', 'two', 'four', 'five']
try:
for l in list1:
a = list2.index(l)
break
except:
a = None
print(a)
The following will tell you if all of the values from list1 are in list2.
all_in = all([val in list2 for val in list1])
If all of the values from list1 are in list2, the value of all_in will be True, and if they weren't, the value of all_in will be False.
If you wanted, you could use this line directly to control your if-else logic.
if all([val in list2 for val in list1]):
#do thing if match
else:
#do thing if no match
Edit
If you were looking for the first match of any word in the first list, this might be closer to what you were looking for.
This will give you a True value if there is any match from the first list in the second. Again you can use this for an if statement.
any_in = any((val in list2 for val in list1))
If you need the value of the first match, or a None value if no match is found, this should work.
first_match = next((val for val in list1 if val in list2), None)
That will make use of Python's generators to stop on the very first matching case of any of the words in the first list.
Edit 2
I think I'm pretty sure that the behavior that you were trying to describe was nesting the loops.
for val in list1:
if val in list2:
#do something
else:
#do something else
I have a string = "12345678"
I wanted to replace each character of this string into text:
I have already built the dictionary(its my requirement). However, I do not know how to replace all of it.
The dictionary is build all I have to do is just loop string and then replace it with the value in the dictionary.
text = [string.replace(x, dictionary[x]) for x in string]
My current output:
it replaces one by one instead and then created 8 different element in the list with each element only one character is replace.
Example(Sorry I cant show much):
text = [one2345678, 1two345678, 12three45678...1234567eight]
I dont know why.
My expected output:
text= onetwothreefourfivesixseveneight
The issue is because you are using a list comprehension instead try
string = "12345678"
text = ''
for x in string:
text += dictionary[x]
or
text = "".join(dictionary[x] for x in string)
import re
s='12345678'
d={
'1':'one',
'2':'two'
}
print(re.sub(r'\d',lambda x:d[x.group()],s))
The regular expression route that SmartManoj recommended is perfect if the thing you want to replace is more than one character, but if you're mapping single characters to arbitrary-length strings, then it's waaaaaay overkill.
You can instead use str.translate alongside str.maketrans
dictionary = {'1': 'one', '2': 'two', ... }
mapping = str.maketrans(dictionary)
string = '12345678'
text = string.translate(mapping)
List comprehension solution:
text_in = '12345678 and the rest is not in dic'
replace_dic = {
'1': 'one',
'2': 'two',
'3': 'three',
'4': 'four',
'5': 'five',
'6': 'six',
'7': 'seven',
'8': 'eight',
}
text_out = ''.join(replace_dic.get(c, c) for c in text_in)
print(text_out) # 'onetwothreefourfivesixseveneight and the rest is not in dic'
EDIT: code edited per comment, and print converted to Python3.
I have a string type list from bash which looks like this:
inp = "["one","two","three","four","five"]"
The input is coming from bash script.
In my python script I would like to convert this to normal python list in this format:
["one","two","three","four","five"]
where all elements would be string, but the whole thin is represented as list.
I tried: list(inp)
it does not work. Any suggestions?
Try this code,
import ast
inp = '["one","two","three","four","five"]'
ast.literal_eval(inp) # will prints ['one', 'two', 'three', 'four', 'five']
Have a look at ast.literal_eval:
>>> import ast
>>> inp = '["one","two","three","four","five"]'
>>> converted_inp = ast.literal_eval(inp)
>>> type(converted_inp)
<class 'list'>
>>> print(converted_inp)
['one', 'two', 'three', 'four', 'five']
Notice that your original input string is not a valid python string, since it ends after "[".
>>> inp = "["one","two","three","four","five"]"
SyntaxError: invalid syntax
The solution using re.sub() and str.split() functions:
import re
inp = '["one","two","three","four","five"]'
l = re.sub(r'["\]\[]', '', inp).split(',')
print(l)
The output:
['one', 'two', 'three', 'four', 'five']
you can use replace and split as the following:
>>> inp
"['one','two','three','four','five']"
>>> inp.replace('[','').replace(']','').replace('\'','').split(',')
['one', 'two', 'three', 'four', 'five']
I'm writing a Python function to split text into words, ignoring specified punctuation. Here is some working code. I'm not convinced that constructing strings out of lists (buf = [] in the code) is efficient though. Does anyone have a suggestion for a better way to do this?
def getwords(text, splitchars=' \t|!?.;:"'):
"""
Generator to get words in text by splitting text along specified splitchars
and stripping out the splitchars::
>>> list(getwords('this is some text.'))
['this', 'is', 'some', 'text']
>>> list(getwords('and/or'))
['and', 'or']
>>> list(getwords('one||two'))
['one', 'two']
>>> list(getwords(u'hola unicode!'))
[u'hola', u'unicode']
"""
splitchars = set(splitchars)
buf = []
for char in text:
if char not in splitchars:
buf.append(char)
else:
if buf:
yield ''.join(buf)
buf = []
# All done. Yield last word.
if buf:
yield ''.join(buf)
http://www.skymind.com/~ocrow/python_string/ talks about several ways of concatenating strings in Python and assesses their performance as well.
You don't want to use re.split?
import re
re.split("[,; ]+", "coucou1 , coucou2;coucou3")
You can use re.split
re.split('[\s|!\?\.;:"]', text)
However if the text is very large the resulting array may be consuming too much memory. Then you may consider re.finditer:
import re
def getwords(text, splitchars=' \t|!?.;:"'):
words_iter = re.finditer(
"([%s]+)" % "".join([("^" + c) for c in splitchars]),
text)
for word in words_iter:
yield word.group()
# a quick test
s = "a:b cc? def...a||"
words = [x for x in getwords(s)]
assert ["a", "b", "cc", "def", "a"] == words, words
You can split the input using re.split():
>>> splitchars=' \t|!?.;:"'
>>> re.split("[%s]" % splitchars, "one\ttwo|three?four")
['one', 'two', 'three', 'four']
>>>
EDIT: If your splitchars may contain special chars like ] or ^, you can use re.escpae()
>>> re.escape(splitchars)
'\\ \\\t\\|\\!\\?\\.\\;\\:\\"'
>>> re.split("[%s]" % re.escape(splitchars), "one\ttwo|three?four")
['one', 'two', 'three', 'four']
>>>