Capitalizing letters using for in range loop [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 27 days ago.
Improve this question
I'm trying to solve a problem where a for in range loop goes over a sentence and capitalises every letter following a ".", a "!" and a "?" as well as the first letter of the sentence. For example,
welcome! how are you? all the best!
would become
Welcome! How are you? All the best!
i've tried using the len function, but I'm struggling with how to go from identifying the placement of the value to capitalising the next letter.

I would do this with a regex substitution using a callable. There are two cases for the substitution.
The string starts with a lower: ^[a-z]
There is a ., ! or ? followed by space and a lower. [.!?]\s+[a-z]
In both cases you can just uppercase the contents of the match. Here's an example:
import re
capatalize_re = re.compile(r"(^[a-z])|([.!?]\s+[a-z])")
def upper_match(m):
return m.group(0).upper()
def capitalize(text):
return capatalize_re.sub(upper_match, text)
This results in:
>>> capitalize("welcome! how are you? all the best!")
Welcome! How are you? All the best!

welcome to StackOverflow, in order not to spoil the solution I will instead give you two hints:
>>> from itertools import tee
>>> def pairwise(iterable):
... "s -> (s0,s1), (s1,s2), (s2, s3), ..."
... a, b = tee(iterable)
... next(b, None)
... return zip(a, b)
...
>>> list(pairwise([1,2,3,4,5,6]))
[(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)]
>>> list(enumerate("hello"))
[(0, 'h'), (1, 'e'), (2, 'l'), (3, 'l'), (4, 'o')]
These two functions will greatly help you in solving this problem.

Related

Find indexes of unquoted words in a string using `re.finditer()` method

I'm trying to find the position (index) of unquoted words in a string, but all my tests have been unsuccessful.
For the string: string='foo "bar" baz' I'd like to get
(0, 3) # This for foo
(10, 13) # This for baz
# I'd like to skip the quoted "bar"
However, every regular expression I try includes the quoted 'bar' or parts of it:
string='foo "bar" baz'
_RE_UNQUOTED_VALUES = re.compile(r"([^\"']\w+[^\"'])")
print([m.span() for m in _RE_UNQUOTED_VALUES.finditer(string)])
outputs: [(0, 4), (5, 8), (9, 13)]
Or using:
_RE_UNQUOTED_VALUES = re.compile(r"(?!(\"|'))\w+(?!(\"|'))")
# Outputs [(0, 3), (5, 7), (10, 13)]
Is this not doable with regular expressions? Am I misunderstanding how finditer() works?
You can use
import re
string="foo 'bar' baz"
ms = re.finditer(r"""\b(?<!['"])\w+\b(?!['"])""", string)
print([(x.start(), x.end()) for x in ms])
# => [(0, 3), (10, 13)]
See the Python demo.
The \b(?<!['"])\w+\b(?!['"]) regex matches a word boundary first, then the (?<!') negative lookbehind fails the match if there is a '/" char immediately on the left, then matches one or more word chars, checks the word boundary position again and the (?!['"]) negative lookahead fails the match if there is a '/" char immediately on the right.
See the regex demo.
You can also use:
import re
string='foo "bar" baz'
_RE_UNQUOTED_VALUES = re.compile(r"(?<!['\"\w])\w+(?![\"'\w])")
print([m.span() for m in _RE_UNQUOTED_VALUES.finditer(string)])
Output:
[(0, 3), (10, 13)]
In this case using \w as well as the quote characters in the negative lookarounds forces the engine to match only whole words that are not surrounded by quotes.
How about doing it with .index instead of a regex? This way it's much more readable and extensible. For example:
strstr=lambda s, sub: (s.index(sub), s.index(sub)+len(sub))
strstr('foo "bar" baz', 'foo')
# (0, 3)
strstr('foo "bar" baz', 'baz')
# (10, 13)
However, if you don't know the input and need to use regex, #Wiktor's answer is better.

Why do tuples in a list comprehension need parentheses? [duplicate]

This question already has answers here:
Why does creating a list of tuples using list comprehension requires parentheses?
(2 answers)
Closed 2 years ago.
It is well known that tuples are not defined by parentheses, but commas. Quote from documentation:
A tuple consists of a number of values separated by commas
Therefore:
myVar1 = 'a', 'b', 'c'
type(myVar1)
# Result:
<type 'tuple'>
Another striking example is this:
myVar2 = ('a')
type(myVar2)
# Result:
<type 'str'>
myVar3 = ('a',)
type(myVar3)
# Result:
<type 'tuple'>
Even the single-element tuple needs a comma, and parentheses are always used just to avoid confusion. My question is: Why can't we omit parentheses of arrays in a list comprehension? For example:
myList1 = ['a', 'b']
myList2 = ['c', 'd']
print([(v1,v2) for v1 in myList1 for v2 in myList2])
# Works, result:
[('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd')]
print([v1,v2 for v1 in myList1 for v2 in myList2])
# Does not work, result:
SyntaxError: invalid syntax
Isn't the second list comprehension just syntactic sugar for the following loop, which does work?
myTuples = []
for v1 in myList1:
for v2 in myList2:
myTuple = v1,v2
myTuples.append(myTuple)
print myTuples
# Result:
[('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd')]
Python's grammar is LL(1), meaning that it only looks ahead one symbol when parsing.
[(v1, v2) for v1 in myList1 for v2 in myList2]
Here, the parser sees something like this.
[ # An opening bracket; must be some kind of list
[( # Okay, so a list containing some value in parentheses
[(v1
[(v1,
[(v1, v2
[(v1, v2)
[(v1, v2) for # Alright, list comprehension
However, without the parentheses, it has to make a decision earlier on.
[v1, v2 for v1 in myList1 for v2 in myList2]
[ # List-ish thing
[v1 # List containing a value; alright
[v1, # List containing at least two values
[v1, v2 # Here's the second value
[v1, v2 for # Wait, what?
A parser which backtracks tends to be notoriously slow, so LL(1) parsers do not backtrack. Thus, the ambiguous syntax is forbidden.
As I felt "because the grammar forbids it" to be a little too snarky, I came up with a reason.
It begins parsing the expression as a list/set/tuple and is expecting a , and instead encounters a for token.
For example:
$ python3.6 test.py
File "test.py", line 1
[a, b for a, b in c]
^
SyntaxError: invalid syntax
tokenizes as follows:
$ python3.6 -m tokenize test.py
0,0-0,0: ENCODING 'utf-8'
1,0-1,1: OP '['
1,1-1,2: NAME 'a'
1,2-1,3: OP ','
1,4-1,5: NAME 'b'
1,6-1,9: NAME 'for'
1,10-1,11: NAME 'a'
1,11-1,12: OP ','
1,13-1,14: NAME 'b'
1,15-1,17: NAME 'in'
1,18-1,19: NAME 'c'
1,19-1,20: OP ']'
1,20-1,21: NEWLINE '\n'
2,0-2,0: ENDMARKER ''
There was no parser issue that motivated this restriction. Contrary to Silvio Mayolo's answer, an LL(1) parser could have parsed the no-parentheses syntax just fine. The parentheses were optional in early versions of the original list comprehension patch; they were only made mandatory to make the meaning clearer.
Quoting Guido van Rossum back in 2000, in a response to someone worried that [x, y for ...] would cause parser issues,
Don't worry. Greg Ewing had no problem expressing this in Python's
own grammar, which is about as restricted as parsers come. (It's
LL(1), which is equivalent to pure recursive descent with one
lookahead token, i.e. no backtracking.)
Here's Greg's grammar:
atom: ... | '[' [testlist [list_iter]] ']' | ...
list_iter: list_for | list_if
list_for: 'for' exprlist 'in' testlist [list_iter]
list_if: 'if' test [list_iter]
Note that before, the list syntax was '[' [testlist] ']'. Let me
explain it in different terms:
The parser parses a series comma-separated expressions. Previously,
it was expecting ']' as the sole possible token following this.
After the change, 'for' is another possible following token. This
is no problem at all for any parser that knows how to parse matching
parentheses!
If you'd rather not support [x, y for ...] because it's ambiguous
(to the human reader, not to the parser!), we can change the grammar
to something like:
'[' test [',' testlist | list_iter] ']'
(Note that | binds less than concatenation, and [...] means an
optional part.)
Also see the next response in the thread, where Greg Ewing runs
>>> seq = [1,2,3,4,5]
>>> [x, x*2 for x in seq]
[(1, 2), (2, 4), (3, 6), (4, 8), (5, 10)]
on an early version of the list comprehension patch, and it works just fine.

Elegant / Efficient way of getting one element and the following in an iterable [duplicate]

This question already has answers here:
Iterate over all pairs of consecutive items in a list [duplicate]
(7 answers)
Closed 6 years ago.
So I have a question, I have an iterable (string or list here) like string = "ABCDEFG" and I want to ouput something like
A-B
B-C
C-D
...
F-G
So I know this works (forgot to handle indexError, but whatever) but it's pretty ugly...
for i in range(len(myString)):
element1 = myString[i]
element2 = myString[i+1]
print("theshit")
Is there a way of doing that in a more elegant/pythonic way ? I think itertools can be a solution but I don't know how it works..
By the way, I need myString only for this loop so maybe generators (don't know how to use that too, I'm still learning)
Thanks :)
itertools.izip might be useful here if you zip your sequence with the tail of the sequence (which you can get using itertools.islice):
>>> from itertools import islice, izip
>>> iterable = "abcdefg"
>>> for x in izip(iterable, islice(iterable, 1, None)): print x
...
('a', 'b')
('b', 'c')
('c', 'd')
('d', 'e')
('e', 'f')
('f', 'g')
itertools has the following recipe that does exactly what you need:
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return zip(a, b)
Here is one option, but Frenrich's answer is definitely the best.
I only mention it because it may be helpful in a similar sort of situation.
sequence = 'ABCDEFG'
iter_1 = iter(sequence)
iter_2 = iter(sequence)
next(iter_2) # throw away the first value
for a, b in zip(iter_1, iter_2):
print(a, b)

Python deep zip

I am trying to write a function like zip. I am not good at explaining what I mean, so i will just show 'code' of what i'm trying to do.
a = [1,2,3,[4,5]]
b = a[:]
zip(a, b) == [(1,1), (2,2), (3,3), ([4,5],[4,5])]
myzip(a, b) == [(1,1), (2,2), (3,3), [(4,4), (5,5)]]
I am so stuck on this it's not even funny. I am trying to write it in a simple functional way with recursive lambdas, to make my code prettier. I want myzip like this because i want to use its output with another function I wrote which maps a function to a tree
def tree_map(func, tree):
return map(lambda x: func(x) if not isinstance(x, list) else tree_map(func, x),
tree)
I have been trying to do something similar to this with zip, but I can't seem to wrap my head around it. Does anyone have any ideas on how i could write myzip?
Edit: Look at tree_map! isn't that pretty! i think so at least, but my mother tongue is Scheme :P
and also, I want myzip to go as deep as it needs to. basically, I want myzip to retain the structure of the trees i pass it. Also, myzip will only handle trees that are the same shape.
I think the following should work:
import collections
def myzip(*args):
if all(isinstance(arg, collections.Iterable) for arg in args):
return [myzip(*vals) for vals in zip(*args)]
return args
Result:
>>> a = [1,2,3,[4,[5,6]]]
>>> b = [1,2,3,[4,[5,6]]]
>>> myzip(a, b)
[(1, 1), (2, 2), (3, 3), [(4, 4), [(5, 5), (6, 6)]]]
Note that I use collections.Iterable instead of list in the type checking so that the behavior is more like zip() with tuples and other iterables.

Python: Analyzing complex statements during execution

I am wondering if there is any way to get some meta information about the interpretation of a python statement during execution.
Let's assume this is a complex statement of some single statements joined with or (A, B, ... are boolean functions)
if A or B and ((C or D and E) or F) or G and H:
and I want to know which part of the statement is causing the statement to evaluate to True so I can do something with this knowledge. In the example, there would be 3 possible candidates:
A
B and ((C or D and E) or F)
G and H
And in the second case, I would like to know if it was (C or D and E) or F that evaluated to True and so on...
Is there any way without parsing the statement? Can I hook up to the interpreter in some way or utilize the inspect module in a way that I haven't found yet? I do not want to debug, it's really about knowing which part of this or-chain triggered the statement at runtime.
Edit - further information: The type of application that I want to use this in is a categorizing algorithm that inputs an object and outputs a certain category for this object, based on its attributes. I need to know which attributes were decisive for the category.
As you might guess, the complex statement from above comes from the categorization algorithm. The code for this algorithm is generated from a formal pseudo-code and contains about 3,000 nested if-elif-statements that determine the category in a hierarchical way like
if obj.attr1 < 23 and (is_something(obj.attr10) or eats_spam_for_breakfast(obj)):
return 'Category1'
elif obj.attr3 == 'Welcome Home' or count_something(obj) >= 2:
return 'Category2a'
elif ...
So aside from the category itself, I need to flag the attributes that were decisive for that category, so if I'd delete all other attributes, the object would still be assigned to the same category (due to the ors within the statements). The statements can be really long, up to 1,000 chars, and deeply nested. Every object can have up to 200 attributes.
Thanks a lot for your help!
Edit 2: Haven't found time in the last two weeks. Thanks for providing this solution, it works!
Could you recode your original code:
if A or B and ((C or D and E) or F) or G and H:
as, say:
e = Evaluator()
if e('A or B and ((C or D and E) or F) or G and H'):
...? If so, there's hope!-). The Evaluator class, upon __call__, would compile its string argument, then eval the result with (an empty real dict for globals, and) a pseudo-dict for locals that actually delegates the value lookups to the locals and globals of its caller (just takes a little black magic, but, not too bad;-) and also takes note of what names it's looked up. Given Python's and and or's short-circuiting behavior, you can infer from the actual set of names that were actually looked up, which one determined the truth value of the expression (or each subexpression) -- in an X or Y or Z, the first true value (if any) will be the last one looked up, and in a X and Y and Z, the first false one will.
Would this help? If yes, and if you need help with the coding, I'll be happy to expand on this, but first I'd like some confirmation that getting the code for Evaluator would indeed be solving whatever problem it is that you're trying to address!-)
Edit: so here's coding implementing Evaluator and exemplifying its use:
import inspect
import random
class TracingDict(object):
def __init__(self, loc, glob):
self.loc = loc
self.glob = glob
self.vars = []
def __getitem__(self, name):
try: v = self.loc[name]
except KeyError: v = self.glob[name]
self.vars.append((name, v))
return v
class Evaluator(object):
def __init__(self):
f = inspect.currentframe()
f = inspect.getouterframes(f)[1][0]
self.d = TracingDict(f.f_locals, f.f_globals)
def __call__(self, expr):
return eval(expr, {}, self.d)
def f(A, B, C, D, E):
e = Evaluator()
res = e('A or B and ((C or D and E) or F) or G and H')
print 'R=%r from %s' % (res, e.d.vars)
for x in range(20):
A, B, C, D, E, F, G, H = [random.randrange(2) for x in range(8)]
f(A, B, C, D, E)
and here's output from a sample run:
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 0), ('G', 1), ('H', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=0 from [('A', 0), ('B', 0), ('G', 0)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=0 from [('A', 0), ('B', 0), ('G', 0)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]
You can see that often (about 50% of the time) A is true, which short-circuits everything. When A is false, B evaluates -- when B is also false, then G is next, when B is true, then C.
As far as I remember, Python does not return True or False per se:
Important exception: the Boolean
operations or and and always return
one of their operands.
The Python Standard Library - Truth Value Testing
Therefore, following is valid:
A = 1
B = 0
result = B or A # result == 1
The Python interpreter doesn't give you a way to introspect the evaluation of an expression at runtime. The sys.settrace() function lets you register a callback that is invoked for every line of source code, but that's too coarse-grained for what you want to do.
That said, I've experimented with a crazy hack to have the function invoked for every bytecode executed: Python bytecode tracing.
But even then, I don't know how to find the execution state, for example, the values on the interpreter stack.
I think the only way to get at what you want is to modify the code algorithmically. You could either transform your source (though you said you didn't want to parse the code), or you could transform the compiled bytecode. Neither is a simple undertaking, and I'm sure there are a dozen difficult hurdles to overcome if you try it.
Sorry to be discouraging...
BTW: What application do you have for this sort of technology?
I would just put something like this before the big statement (assuming the statement is in a class):
for i in ("A","B","C","D","E","F","G","H"):
print i,self.__dict__[i]
"""I do not want to debug, it's really about knowing which part of this or-chain triggered the statement at runtime.""": you might need to explain what is the difference between "debug" and "knowing which part".
Do you mean that you the observer need to be told at runtime what is going on (why??) so that you can do something different, or do you mean that the code needs to "know" so that it can do something different?
In any case, assuming that your A, B, C etc don't have side effects, why can't you simply split up your or-chain and test the components:
part1 = A
part2 = B and ((C or D and E) or F)
part3 = G and H
whodunit = "1" if part1 else "2" if part2 else "3" if part3 else "nobody"
print "Perp is", whodunit
if part1 or part2 or part3:
do_something()
??
Update:
"""The difference between debug and 'knowing which part' is that I need to assign a flag for the variables that were used in the statement that first evaluated to True (at runtime)"""
So you are saying that given the condition "A or B", that if A is True and B is True, A gets all the glory (or all the blame)? I'm finding it very hard to believe that categorisation software such as you describe is based on "or" having a short-circuit evaluation. Are you sure that there's an intent behind the code being "A or B" and not "B or A"? Could the order be random, or influenced by the order that the variables where originally input?
In any case, generating Python code automatically and then reverse-engineering it appears to be a long way around the problem. Why not just generate code with the part1 = yadda; part2 = blah; etc nature?

Categories