Store output after finding matching string using regex and pexpect

Store output after finding matching string using regex and pexpect - python

I'm writing a Python script and I am having some trouble figuring out how to get the output of a command I send and store it in a variable, but for the entire output of that command - I only want to store the rest of 1 specific line after a certain word.
To illustrate - say I have a command that outputs hundreds of lines that all represent certain details of a specific product.
Color: Maroon Red
Height: 187cm
Number Of Seats: 6
Number Of Wheels: 4
Material: Aluminum
Brand: Toyota
#and hundreds of more lines...
I want to parse the entire output of the command that I sent which print the details above and only store the material of the product in a variable.
Right now I have something like:
child.sendline('some command that lists details')
variable = child.expect(["Material: .*"])
print(variable)
child.expect(prompt)
The sendline and expect prompt parts list the details correctly and all, but I'm having trouble figuring out how to parse the output of that command, look for a part that says "Material: " and only store the Aluminum string in a variable.
So instead of having variable equal to and print a value of 0 which is what currently prints right now, it should instead print the word "Aluminum".
Is there a way to do this using regex? I'm trying to get used to using regex expressions so I would prefer a solution using that but if not, I'd still appreciate any help! I'm also editing my code in vim and using linux if that helps.

You only need to look for the substring Material: . For this you can place the string you want to match (I am using a dot character, which means "match any character") in between a positive lookbehind for Material: and a positive lookahead for \r\n:
(?<=Material:\s).*(?=[\r\n])
You can find a good explanation for this regex here.

As you are using Python, you can use a capture group and store the value in for example my_var in the example code.
^Material:\s*(.+)
The pattern matches:
^ Start of string
Material:\s* Match Material: and optional whitspace chars
(.+) Capture group 1 match 1+ times any char except a newline
See a regex demo and a Python demo.
For example
import re
regex = r"^Material:\s*(.+)"
s = ("Color: Maroon Red\n"
"Height: 187cm\n"
"Number Of Seats: 6\n"
"Number Of Wheels: 4\n"
"Material: Aluminum\n"
"Brand: Toyota \n"
"#and hundreds of more lines...")
match = re.search(regex, s, re.MULTILINE)
if match:
my_var = match.group(1)
print(my_var)
Output
Aluminum

Related

REGEX: how to i get the name more the character " : "

Im using python to extract some info
i wanna get the words/names before the charcter :
but the problem is everythig is tied together
from here
Morgan Stanley.Erik Woodring:
i just wanna extract "Erik Woodring:"
or from here
market.Operator:
i just wanna extract Operator:
sometimes there are questiosn like this
to acquire?Tim Cook:
i just wanna extract "Tim Cook:"
this is what i tried
\w*(?=.*:)
this is not getting what i wanted, its returning a lot of words

This could be the regex you're looking for:
\b[\w\s]+(?=:)
\b world boundary;
[\w\s]+ matches any word or whitespace (at least one character);
(?=:) positive lookahead that specifies the word must be followed by a punctation mark;
https://regex101.com/r/w86oWv/1
If you want to get the ":" too you can simply remove the lookahead:
\b[\w\s]+:

Going in reverse with RegEx

I'm writing a Python script and I need to extract two pieces of information from the following text:
The user XXXXXXXX (XXXXXXX#XXXXXX.com) was involved in an impossible travel incident. The user connected from two countries within 102 minutes, from these IP addresses: Country1 (111.111.111.111) and Country2 (222.222.222.222). Another irrelevant staff...
I need "Country1" and "Country2". I already extracted the IPs so I can look for them in my expression.
With this regex: (?> )(.*)(?= \(111\.111\.111\.111)
I take all this:
The user XXXXXXXX (XXXXXXX#XXXXXX.com) was involved in an impossible travel incident. The user connected from two countries within 102 minutes, from these IP addresses: Country1
Is there a way to take all the characters going backward and make it stop at the first space, to take just "Country1" ?
Or does anyone knows a better way to extract "Country1" and "Country2" with a regex or directly with Python?

You can use
\S+(?=\s*\(\d{1,3}(?:\.\d{1,3}){3}\))
See the regex demo.
Details:
\S+ - one or more non-whitespace chars
(?=\s*\(\d{1,3}(?:\.\d{1,3}){3}\)) - a positive lookahead that requires the following pattern to appear immediately at the right of the current location:
\s* - zero or more whitespaces
\( - a ( char
\d{1,3}(?:\.\d{1,3}){3} - one to three digits and then three repetitions of . and one to three digits
\) - a ) char.

If your message pattern is always the same you can get the countries like this using Python:
your_string = 'The user XXXXXXXX (XXXXXXX#XXXXXX.com) ...'
your_string = your_string.split(': ')[1].split(' and ')
first_country = your_string[0].split(' (')[0]
second_country = your_string[1].split(' (')[0]

With your shown samples please try following regex, written and tested in Python3. I am using Python3's re library and its findall module here.
import re
var="""...""" ##Place your value here.
re.findall(r'(\S+)\s\((?:\d{1,3}\.){3}\d{1,3}\)',var)
['Country1', 'Country2']
Here is the Online demo for above used regex.

Why does Regex finditer only return the first result

My string is a transcript, I want to capture the speaker, specifically their second name (Which needs to only match when fully capitalised)
Additionally, I want to match their speech until the next speaker begins, I want to loop this process over a huge text file eventually.
The problem is the match only returns one match object, even though there are two different speakers. Also I have tried online regex tester with the python flavor however, they return very different results (not sure why?).
str = 'Senator BACK\n (Western Australia) (21:15): This evening I had the pleasure (...) Senator DAY\n (South Australia) (21:34): Well, what a week it h(...) '
pattern = re.compile("(:?(Senator|Mr|Dr)\s+([A-Z]{2,})\s*(\(.+?\))\s+(\(\d{2}:\d{2}\):)(.*))(?=Senator)")
for match in re.finditer(pattern, str):
print(match)
I want 2 match objects, both objects having a group for there surname and their speech. It's important to note also I have used Regex debuggers online however the python flavor gives different results to Python on my terminal.

Just replace the regex into:
(:?(Senator|Mr|Dr)\s+([A-Z]{2,})\s*(\(.+?\))\s+(\(\d{2}:\d{2}\):)(.*))(?=Senator|$)
demo: https://regex101.com/r/gJDaWM/1/
With your current regex, you are enforcing the condition that each match must be followed by Senator through the positive lookahead.
You might actually have to change the positive lookahead into:
(?=Senator|Mr|Dr|$)
if you want to take into account Mr and Dr on top of Senator.

How to make regex that matches a number with commas for every three digits?

I am a beginner in Python and in regular expressions and now I try to deal with one exercise, that sound like that:
How would you write a regex that matches a number with commas for
every three digits? It must match the following:
'42'
'1,234'
'6,368,745'
but not the following:
'12,34,567' (which has only two digits between the commas)
'1234' (which lacks commas)
I thought it would be easy, but I've already spent several hours and still don't have write answer. And even the answer, that was in book with this exercise, doesn't work at all (the pattern in the book is ^\d{1,3}(,\d{3})*$)
Thank you in advance!

The answer in your book seems correct for me. It works on the test cases you have given also.
(^\d{1,3}(,\d{3})*$)
The '^' symbol tells to search for integers at the start of the line. d{1,3} tells that there should be at least one integer but not more than 3 so ;
1234,123
will not work.
(,\d{3})*$
This expression tells that there should be one comma followed by three integers at the end of the line as many as there are.
Maybe the answer you are looking for is this:
(^\d+(,\d{3})*$)
Which matches a number with commas for every three digits without limiting the number being larger than 3 digits long before the comma.

You can go with this (which is a slightly improved version of what the book specifies):
^\d{1,3}(?:,\d{3})*$
Demo on Regex101

I got it to work by putting the stuff between the carrot and the dollar in parentheses like so: re.compile(r'^(\d{1,3}(,\d{3})*)$')
but I find this regex pretty useless, because you can't use it to find these numbers in a document because the string has to begin and end with the exact phrase.

#This program is to validate the regular expression for this scenerio.
#Any properly formattes number (w/Commas) will match.
#Parsing through a document for this regex is beyond my capability at this time.
print('Type a number with commas')
sentence = input()
import re
pattern = re.compile(r'\d{1,3}(,\d{3})*')
matches = pattern.match(sentence)
if matches.group(0) != sentence:
#Checks to see if the input value
#does NOT match the pattern.
print ('Does Not Match the Regular Expression!')
else:
print(matches.group(0)+ ' matches the pattern.')
#If the values match it will state verification.

The Simple answer is :
^\d{1,2}(,\d{3})*$
^\d{1,2} - should start with a number and matches 1 or 2 digits.
(,\d{3})*$ - once ',' is passed it requires 3 digits.
Works for all the scenarios in the book.
test your scenarios on https://pythex.org/

I also went down the rabbit hole trying to write a regex that is a solution to the question in the book. The question in the book does not assume that each line is such a number, that is, there might be multiple such numbers in the same line and there might some kind of quotation marks around the number (similar to the question text). On the other hand, the solution provided in the book makes those assumptions: (^\d{1,3}(,\d{3})*$)
I tried to use the question text as input and ended up with the following pattern, which is way too complicated:
r'''(
(?:(?<=\s)|(?<=[\'"])|(?<=^))
\d{1,3}
(?:,\d{3})*
(?:(?=\s)|(?=[\'"])|(?=$))
)'''
(?:(?<=\s)|(?<=[\'"])|(?<=^)) is a non-capturing group that allows
the number to start after \s characters, ', ", or the start of the text.
(?:,\d{3})* is a non-capturing group to avoid capturing, for example, 123 in 12,123.
(?:(?=\s)|(?=[\'"])|(?=$)) is a non-capturing group that allows
the number to end before \s characters, ', ", or the end of the text (no newline case).
Obviously you could extend the list of allowed characters around the number.

How do I use regular expressions in Python with placeholder text?

I am doing a project in Python where I require a user to input text. If the text matches a format supported by the program, it will output a response that includes a user's key word (it is a simple chat bot). The format is stored in a text file as a user input format and an answer format.
For example, the text file looks like this, with user input on the left and output on the right:
my name is <-name> | Hi there, <-name>
So if the user writes my name is johnny, I want the program to know that johnny is the <-name> variable, and then to print the response Hi there, johnny.
Some prodding me in the right direction would be great! I have never used regular expressions before and I read an article on how to use them, but unfortunately it didn't really help me since it mainly went over how to match specific words.

Here's an example:
import re
io = [
('my name is (?P<name>\w+)', 'Hi there, {name}'),
]
string = input('> ')
for regex, output in io:
match = re.match(regex, string)
if match:
print(output.format(**match.groupdict()))
break
I'll take you through it:
'my name is (?P<name>\w+)'
(?P<name>...) stores the following part (\w+) under the name name in the match object which we're going to use later on.
match = re.match(regex, string)
This looks for the regex in the input given. Note that re.match only matches at the beginning of the input, if you don't want that restriction use re.search here instead.
If it matches:
output.format(**match.groupdict())
match.groupdict returns a dictionary of keys defined by (?P<name>...) and their associated matched values. ** passes those key/values to .format, in this case Python will translate it to output.format(name='matchedname').
To construct the io dictionary from a file do something like this:
io = []
with open('input.txt') as file_:
for line in file:
key, value = line.rsplit(' | ', 1)
io.append(tuple(key, value))

You are going to want to do a group match and then pull out the search groups.
First you would want to import re - re is the python regex module.
Lets say that user_input is the var holding the input string.
You then want to use the re.sub method to match your string and return a substitute it for something.
output = re.sub(input_regex, output_regex, user_input)
So the regex, first you can put the absolute stuff you want:
input_regex = 'my name is '
If you want it to match explicitly from the start of the line, you should proceed it with the caret:
input_regex = '^my name is '
You then want a group to match any string .+ (. is anything, + is 1 or more of the preceding item) until the end of the line '$'.
input_regex = '^my name is .+$'
Now you'll want to put that into a named group. Named groups take the form "(?Pregex)" - note that those angle brackets are literal.
input_regex = '^my name is (?P<name>.+)$'
You now have a regex that will match and give a match group named "name" with the users name in it. The output string will need to reference the match group with "\g"
output_regex = 'Hi there, \g<name>'
Putting it all together you can do it in a one liner (and the import):
import re
output = re.sub('^my name is (?P<name>.+)$', 'Hi there, \g<name>', user_input)

Asking for REGEXP inevitably leads to answers like the ones you're getting right now: demonstrations of basic REGEXP operations: how to split sentences, search for some term combination like 'my' + 'name' + 'is' within, etc.
In fact, you could learn all this from reading existing documentation and open source programs. REGEXP is not exactly easy. Still you'll need to understand a bit on your own to be able to really know what's going on, if you want to change and extend your program. Don't just copy from the receipts here.
But you may even want to have something more comprehensive. Because you mentioned building a "chat bot", you may want see, how others are approaching that task - way beyond REGEXP. See:
So if the user writes 'my name is johnny', I want the program to know that 'johnny' is the '<-name>' variable, ...
From you question it's unclear, how complex this program should become. What, if he types
'Johnny is my name.'
or
'Hey, my name is John X., but call me johnny.'
?

Take a look at re module and pay attention for capturing groups.
For example, you can assume that name will be a word, so it matches \w+. Then you have to construct a regular expression with \w+ capturing group where the name should be (capturing groups are delimited by parentheses):
r'my name is (\w+)'
and then match it against the input (hint: look for match in the re module docs).
Once you get the match, you have to get the contents of capturing group (in this case at index 1, index 0 is reserved for the whole match) and use it to construct your response.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Store output after finding matching string using regex and pexpect - python

Related

REGEX: how to i get the name more the character " : "

Going in reverse with RegEx

Why does Regex finditer only return the first result

How to make regex that matches a number with commas for every three digits?

How do I use regular expressions in Python with placeholder text?

Categories

Resources