nextLine().split("\\s+") converted to python - python

Can someone explain to me what
nextLine().split("\\s+")
does and how would I convert that to python?
Thanks
i wanted to use it but its in java

split takes an input string, possibly a regular expression (in your case) and uses the regex as a delimiter. Here, the regex is simply \s+ (the extra backslash is to escape the string), where \s denotes any sort of white space and + means "one or more", so basically, if I have the string "Hello world ! ." you will have the output ["Hello", "world", "!", "."].
In Python, you need to use the re library for this functionality:
re.split(r"\s+", input_str)
Or, just for this specific case (as #Kurt pointed out), input_str.split() will do the trick.

The nextLine() is used to read user input, and split("\\s+") will split it to a bunch of elements based on a specific delimiter, and for this case the delim is a regex \\s+.
The equivalent of it in python is this, by using the :
import re
s = input()
sub_s = re.split(r"\s+", s)
# hello and welcome everyone
# ['hello', 'and', 'welcome', 'everyone']

code in java
import java.util.*;
public class MyClass {
public static void main(String args[]) {
String s = "Hello my Wonderful\nWorld!";
// nextLine()
Scanner scanner = new Scanner(s);
System.out.println("'" + scanner.nextLine() + "'");
System.out.println("'" + scanner.nextLine() + "'");
scanner.close();
// nextLine().split("\\s+")
scanner = new Scanner(s);
String str[] = scanner.nextLine().split("\\s+");
System.out.println("*" + str[2] + "*");
scanner.close();
}
}
python
s = "Hello my Wonderful\nWorld!";
o = s.split("\n")
print ("'" + o[0] + "'")
print ("'" + o[1] + "'")
'''
resp. use of
i = s.find('\n')
print (s[:i])
print (s[i+1:])
e.g.
def get_lines(str):
start = 0
end = 0
sub = '\n'
while True:
end = str.find(sub, start)
if end==-1:
yield str[start:]
return
else:
yield str[start:end]
start = end + 1
i = iter(get_lines(s))
print ("'" + next (i) + "'")
print ("'" + next (i) + "'")
'''
o = s.split()
print ("*" + o[2] + "*")
output
'Hello my Wonderful'
'World!'
*Wonderful*

Related

How to add a space in hangman game

I'm trying to make a game, where a song name is picked from a file, and the title is replaced with underscores (apart from the first letter)
However I'm not sure how to add a space into it, as some songs are more than one word, this is what I have currently:
def QuizStart():
line = random.choice(open('songnamefile.txt').readlines())
line.split('-')
songname, artist = line.split('-')
underscoresong = songname
i=0
song_name = range(1,len(songname))
for i in song_name:
if ' ' in song_name:
i=i+1
else:
underscoresong = underscoresong.replace(songname[i],"_")
i=i+1
print(underscoresong, ' - ', artist)
It would be good to include expected output for a given input examples.
You can just multiply an array containing the placeholder character n times. e.g.:
songname = 'My blue submarine'
underscoresong = ''.join([songname[0]] + ['_'] * (len(songname) - 1))
print(underscoresong)
Output:
M________________
That will add the first character and then the underscore for as long as the songname is, minus one (for the first character). The join converts it to a string.
Or if you want to preserve spaces:
underscoresong = ''.join(
[songname[0]] + ['_' if c != ' ' else ' ' for c in songname[1:]]
)
print(underscoresong)
Output:
M_ ____ _________
Or if you want to also preserve the single quote:
songname = "God's Plan-Drake"
underscoresong = ''.join(
[songname[0]] +
['_' if c not in {' ', "'"} else c for c in songname[1:]]
)
print(underscoresong)
Output:
G__'_ __________
You could also use regular expressions:
import re
songname = "God's Plan-Drake"
underscoresong = songname[0] + re.sub(r"[^ ']", '_', songname[1:])
print(underscoresong)
Output:
G__'_ __________

String Operation on captured group in re Python

I have a string:
str1 = "abc = def"
I want to convert it to:
str2 = "abc = #Abc#"
I am trying this:
re.sub("(\w+) = (\w+)",r"\1 = %s" % ("#"+str(r"\1").title()+"#"),str1)
but it returns: (without the string operation done)
"abc = #abc#"
What is the possible reason .title() is not working.?
How to use string operation on the captured group in python?
You can see what's going on with the help of a little function:
import re
str1 = "abc = def"
def fun(m):
print("In fun(): " + m)
return m
str2 = re.sub(r"(\w+) = (\w+)",
r"\1 = %s" % ("#" + fun(r"\1") + "#"),
# ^^^^^^^^^^
str1)
Which yields
In fun(): \1
So what you are basically trying to do is to change \1 (not the substitute!) to an uppercase version which obviously remains \1 literally. The \1 is replaced only later with the captured content than your call to str.title().
Go with a lambda function as proposed by #Rakesh.
Try using lambda.
Ex:
import re
str1 = "abc = def"
print( re.sub("(?P<one>(\w+)) = (\w+)",lambda match: r'{0} = #{1}#'.format(match.group('one'), match.group('one').title()), str1) )
Output:
abc = #Abc#

preserving text structure information - pyparsing

Using pyparsing, is there a way to extract the context you are in during recursive descent. Let me explain what I mean. I have the following code:
import pyparsing as pp
openBrace = pp.Suppress(pp.Literal("{"))
closeBrace = pp.Suppress(pp.Literal("}"))
ident = pp.Word(pp.alphanums + "_" + ".")
comment = pp.Literal("//") + pp.restOfLine
messageName = ident
messageKw = pp.Suppress(pp.Keyword("msg"))
text = pp.Word(pp.alphanums + "_" + "." + "-" + "+")
otherText = ~messageKw + pp.Suppress(text)
messageExpr = pp.Forward()
messageExpr << (messageKw + messageName + openBrace +
pp.ZeroOrMore(otherText) + pp.ZeroOrMore(messageExpr) +
pp.ZeroOrMore(otherText) + closeBrace).ignore(comment)
testStr = "msg msgName1 { some text msg msgName2 { some text } some text }"
print messageExpr.parseString(testStr)
which produces this output: ['msgName1', 'msgName2']
In the output, I would like to keep track of the structure of embedded matches. What I mean is that, for example, I would like the following output with the test string above: ['msgName1', 'msgName1.msgName2'] to keep track of the hierarchy in the text. However, I am new to pyparsing and have yet to find a way yet to extract the fact that "msgName2" is embedded in the structure of "msgName1."
Is there a way to use the setParseAction() method of ParserElement to do this, or maybe using naming of results?
Helpful advice would be appreciated.
Thanks to Paul McGuire for his sagely advice. Here are the additions/changes I made, which solved the problem:
msgNameStack = []
def pushMsgName(str, loc, tokens):
msgNameStack.append(tokens[0])
tokens[0] = '.'.join(msgNameStack)
def popMsgName(str, loc, tokens):
msgNameStack.pop()
closeBrace = pp.Suppress(pp.Literal("}")).setParseAction(popMsgName)
messageName = ident.setParseAction(pushMsgName)
And here is the complete code:
import pyparsing as pp
msgNameStack = []
def pushMsgName(str, loc, tokens):
msgNameStack.append(tokens[0])
tokens[0] = '.'.join(msgNameStack)
def popMsgName(str, loc, tokens):
msgNameStack.pop()
openBrace = pp.Suppress(pp.Literal("{"))
closeBrace = pp.Suppress(pp.Literal("}")).setParseAction(popMsgName)
ident = pp.Word(pp.alphanums + "_" + ".")
comment = pp.Literal("//") + pp.restOfLine
messageName = ident.setParseAction(pushMsgName)
messageKw = pp.Suppress(pp.Keyword("msg"))
text = pp.Word(pp.alphanums + "_" + "." + "-" + "+")
otherText = ~messageKw + pp.Suppress(text)
messageExpr = pp.Forward()
messageExpr << (messageKw + messageName + openBrace +
pp.ZeroOrMore(otherText) + pp.ZeroOrMore(messageExpr) +
pp.ZeroOrMore(otherText) + closeBrace).ignore(comment)
testStr = "msg msgName1 { some text msg msgName2 { some text } some text }"
print messageExpr.parseString(testStr)

Python/Pyparsing - Multiline quotes

I'm trying to use pyparsing to match a multiline string that can continue in a similar fashion to those of python:
Test = "This is a long " \
"string"
I can't find a way to make pyparsing recognize this. Here is what I've tried so far:
import pyparsing as pp
src1 = '''
Test("This is a long string")
'''
src2 = '''
Test("This is a long " \
"string")
'''
_lp = pp.Suppress('(')
_rp = pp.Suppress(')')
_str = pp.QuotedString('"', multiline=True, unquoteResults=False)
func = pp.Word(pp.alphas)
function = func + _lp + _str + _rp
print src1
print function.parseString(src1)
print '-------------------------'
print src2
print function.parseString(src2)
The problem is that having a multi-line quoted string doesn't do what you think. A multiline quoted string is literally that -- a string with newlines inside:
import pyparsing as pp
src0 = '''
"Hello
World
Goodbye and go"
'''
pat = pp.QuotedString('"', multiline=True)
print pat.parseString(src0)
The output of parsing this string would be ['Hello\n World\n Goodbye and go'].
As far as I know, if you want a string that's similar to how Python's strings behave, you have to define it yourself:
import pyparsing as pp
src1 = '''
Test("This is a long string")
'''
src2 = '''
Test("This is a long"
"string")
'''
src3 = '''
Test("This is a long" \\
"string")
'''
_lp = pp.Suppress('(')
_rp = pp.Suppress(')')
_str = pp.QuotedString('"')
_slash = pp.Suppress(pp.Optional("\\"))
_multiline_str = pp.Combine(pp.OneOrMore(_str + _slash), adjacent=False)
func = pp.Word(pp.alphas)
function = func + _lp + _multiline_str + _rp
print src1
print function.parseString(src1)
print '-------------------------'
print src2
print function.parseString(src2)
print '-------------------------'
print src3
print function.parseString(src3)
This produces the following output:
Test("This is a long string")
['Test', 'This is a long string']
-------------------------
Test("This is a long"
"string")
['Test', 'This is a longstring']
-------------------------
Test("This is a long" \
"string")
['Test', 'This is a longstring']
Note: The Combine class merges the various quoted strings into a single unit so that they appear as a single string in the output list. The reason why the backslash is suppressed so that it isn't combined as a part of output string.

Search Patterns replacement using lambda

I need to write into a file with Before and after search replacement patterns. I have written the below code. I have used function in writing to output file and it worked fine. But i have around 20 such replacement patterns and i feel i am not writing a good code because i need to create functions for all those replacements. Can you please let me know is there any other way in implementing this?
import re
Report_file = open("report.txt", "w")
st = '''<TimeLog>
<InTime='10Azx'>1056789</InTime>
<OutTime='14crg'>1056867</OutTime>
<PsTime='32lxn'>1056935</PsTime>
<ClrTime='09zvf'>1057689</ClrTime>
</TimeLog>'''
def tcnv(str):
Report_file.write("Previous TS: " + str + "\n\n")
v1 = re.search(r"(?i)<clrtime='(\d+\w+)'>", str)
val1 = v1.group(1)
v2 = re.search(r"(?i)(<clrtime='(\d+\w+)'>(.*?)</clrtime>)", str)
val2 = v2.group(3)
soutval = "<Clzone><clnvl='" + val1 + "'>" + val2 + "</clnvl></Clzone>"
Report_file.write("New TS: " + soutval + "\n")
return soutval
st = re.sub(r"(?i)(<clrtime='(\d+\w+)'>(.*?)</clrtime>)", lambda m: tcnv(m.group(1)), st)
st = re.sub(r"(?i)<intime='(\d+\w+)'>(.*?)</intime>", "<Izone><Invl='\\1'>\\2</Invl></Izone>", st)
st = re.sub(r"(?i)<outtime='(\d+\w+)'>(.*?)</outtime>", "<Ozone><onvl='\\1'>\\2</onnvl></Ozone>", st)
st = re.sub(r"(?i)<pstime='(\d+\w+)'>(.*?)</pstime>", "<Pszone><psnvl='\\1'>\\2</psnvl
I didn't see why you put the re.IGNORECASE flag under the form of (?i), so I don't use it the following solution, and the pattern is written with the uppercased letters where necessary according to your sample
Note that you should use the with statement to open the files, it would be far better:
with open('filename.txt','rb') as f:
ch = f.read()
The answer
import re
st = '''<InTime='10Azx'>1056789</InTime>
<OutTime='14crg'>1056867</OutTime>
<PsTime='32lxn'>1056935</PsTime>
<ClrTime='09zvf'>1057689</ClrTime>
'''
d = dict(zip(('InTime','OutTime','PsTime','ClrTime'),
(('Izone><Invl','/Invl></Izone'),
('Ozone><onvl','/onnvl></Ozone'),
('Pszone><psnvl','/psnvl></Pszone'),
('Clzone><clnvl','/clnvl></Clzone'))
)
)
def ripl(ma,d=d):
return "<{}='{}'>{}<{}>".format(d[ma.group(1)][0],
ma.group(2),
ma.group(3),
d[ma.group(1)][1])
st2 = re.sub(r"<(InTime|OutTime|PsTime|ClrTime)='(\d+\w+)'>(.*?)</\1>",
ripl, st)
print '%s\n\n%s\n' % (st,st2)

Categories