Pattern for re not retriving any results - python

I'm trying to create a re pattern in python to extract this pattern of text.
contentId: '2301ae56-3b9c-4653-963b-2ad84d06ba08'
contentId: 'a887526b-ff19-4409-91ff-e1679e418922'
The length of the content ID is 36 characters long and has a mix of lowercase letters and numbers with dashes included at position 8,13,18,23,36.
Any help with this would be much appreciated as I just can't seem to get the results right now.
r1 = re.findall(r'^[a-zA-Z0-9~##$^*()_+=[\]{}|\\,.?: -]*{36}$',f.read())
print(r1)
Below is the file I'm trying to pull from
Object.defineProperty(e, '__esModule', { value: !0 }), e.default = void 0;
var t = r(d[0])(r(d[1])), n = r(d[0])(r(d[2])), o = r(d[0])(r(d[3])), c = r(d[0])(r(d[4])), l = r(d[0])(r(d[5])), u = function (t) {
return [
{
contentId: '2301ae56-3b9c-4653-963b-2ad84d06ba08',
prettyId: 'super',
style: { height: 0.5 * t }
},
{
contentId: 'a887526b-ff19-4409-91ff-e1679e418922',
prettyId: 'zap',
style: { height: t }
}
];
},

Is there a typo in the regex in your question? *{36} after the bracket ] that closes the character group causes an error: multiple repeat. Did you mean r'^[a-zA-Z0-9~##$^*()_+=[\]{}|\\,.?: -]{36}$'?
Fixing that, you get no results because ^ anchors the match to the start of the line, and $ to the end of the line, so you'd only get results if this pattern was alone on a single line.
Removing these anchors, we get lots of matches because it matches any string of those characters that is 36-long:
r1 = re.findall(r'[a-zA-Z0-9~##$^*()_+=[\]{}|\\,.?: -]{36}',t)
r1: ['var t = r(d[0])(r(d[1])), n = r(d[0]',
')(r(d[2])), o = r(d[0])(r(d[3])), c ',
'= r(d[0])(r(d[4])), l = r(d[0])(r(d[',
'2301ae56-3b9c-4653-963b-2ad84d06ba08',
' style: { height: 0.5',
'a887526b-ff19-4409-91ff-e1679e418922',
' style: { height: t }']
To only match your ids, only look for alphanumeric characters or dashes.
r1 = re.findall(r'[a-zA-Z0-9\-]{36}',t)
r1: ['2301ae56-3b9c-4653-963b-2ad84d06ba08',
'a887526b-ff19-4409-91ff-e1679e418922']
To make it even more specific, you could specify the positions of the dashes:
r1 = re.findall(r'[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12}', t, re.IGNORECASE)
r1: ['2301ae56-3b9c-4653-963b-2ad84d06ba08',
'a887526b-ff19-4409-91ff-e1679e418922']
Specifying the re.IGNORECASE flag removes the need to look for both upper- and lower-case characters.
Note:
You should read the file into a variable and use that variable if you're going to use its contents more than once, since f.read() won't give anything after the first .read() unless you f.seek(0)
To avoid creating a new file on disk with those contents, I just defined
t = """Object.defineProperty(e, '__esModule', { value: !0 }), e.default = void 0;
var t = r(d[0])(r(d[1])), n = r(d[0])(r(d[2])), o = r(d[0])(r(d[3])), c = r(d[0])(r(d[4])), l = r(d[0])(r(d[5])), u = function (t) {
return [
{
contentId: '2301ae56-3b9c-4653-963b-2ad84d06ba08',
prettyId: 'super',
style: { height: 0.5 * t }
},
{
contentId: 'a887526b-ff19-4409-91ff-e1679e418922',
prettyId: 'zap',
style: { height: t }
}
];
},"""
and used t in place of f.read() from your question.

Related

Unable to print string concatenated with list

I have a script that scraps a list of IPs from a router. The final output should look like this:
if net ~ [
12.5.161.0/24,
12.9.242.0/24,
12.11.215.0/24,
12.17.239.0/24,
.... etc etc
216.248.237.0/24,
216.248.238.0/24,
216.248.239.0/24,
216.251.224.0/19,
216.253.79.0/24
] then {
accept;
} else {
reject;
}
I've gotten to the point where i can get the list of IPs in the right format, ie
12.5.161.0/24,
12.9.242.0/24,
12.11.215.0/24,
12.17.239.0/24,
.... etc etc
216.248.237.0/24,
216.248.238.0/24,
216.248.239.0/24,
216.251.224.0/19,
216.253.79.0/24
The prb I run into is stringing together the str at the beginning with all of my IPs as one batch in the middle and the 4 line str at the end.
So far I have:
routes = get_bird_routes(args.s)
prefixes = parse_routes(routes, args.p)
dropped = drop_prefixes(prefixes, args.d)
for p in dropped:
lines = [ "if net ~ [", str(p), "] then {", " accept;", "} else {", " reject;", "}\n" ]
print "\n".join(lines)
but this gives me
if net ~ [
199.89.247.0/24
] then {
accept;
} else {
reject;
}
if net ~ [
192.149.228.0/24
] then {
accept;
} else {
reject;
}
if net ~ [
206.180.165.0/24
] then {
accept;
} else {
reject;
}
instead of all my IPs together and the str at the beg and at the end only. I tried to see what type(p) was (before i set it to str(p)) and it came back unicode. Looking at the document, I didnt get a clear understanding of what I was doing wrong. Newish to python still, any help appreciated!!
You should be joining dropped, not looping over it.
dropped_lines = ",\n".join(dropped)
lines = [ "if net ~ [", dropped_lines, "] then {", " accept;", "} else {", " reject;", "}\n" ]
print "\n".join(lines)
Try converting your list to a string with str() and join(). Then merge the one string into your logic statement(string) with .format().
Viz:
dropped_as_strs = map(str, dropped) # "a", "b", "c"
dropped_str = ',\n'.join(dropped_as_strs) # "a,\nb,\nc"
logic = "if net ~ [\n{}\n] then {{\n accept;\n}} ..."
result = logic.format(dropped_str)
(Note the need to double '{' and '}' in str.format() calls.)

Scraping element <script> for strings in Python

Currently trying to check the stock of a size small on this PAGE (which is 0) but specifically retrieve the inventory of a size small from this data:
<script>
(function($) {
var variantImages = {},
thumbnails,
variant,
variantImage;
variant = {"id":18116649221,"title":"XS","option1":"XS","option2":null,"option3":null,"sku":"BGT16073100","requires_shipping":true,"taxable":true,"featured_image":null,"available":true,"name":"Iron Lords T-Shirt - XS","public_title":"XS","options":["XS"],"price":2499,"weight":136,"compare_at_price":null,"inventory_quantity":16,"inventory_management":"shopify","inventory_policy":"deny","barcode":""};
if ( typeof variant.featured_image !== 'undefined' && variant.featured_image !== null ) {
variantImage = variant.featured_image.src.split('?')[0].replace('http:','');
variantImages[variantImage] = variantImages[variantImage] || {};
if (typeof variantImages[variantImage]["option-0"] === 'undefined') {
variantImages[variantImage]["option-0"] = "XS";
}
else {
var oldValue = variantImages[variantImage]["option-0"];
if ( oldValue !== null && oldValue !== "XS" ) {
variantImages[variantImage]["option-0"] = null;
}
}
}
variant = {"id":18116649285,"title":"Small","option1":"Small","option2":null,"option3":null,"sku":"BGT16073110","requires_shipping":true,"taxable":true,"featured_image":null,"available":false,"name":"Iron Lords T-Shirt - Small","public_title":"Small","options":["Small"],"price":2499,"weight":159,"compare_at_price":null,"inventory_quantity":0,"inventory_management":"shopify","inventory_policy":"deny","barcode":""};
if ( typeof variant.featured_image !== 'undefined' && variant.featured_image !== null ) {
variantImage = variant.featured_image.src.split('?')[0].replace('http:','');
variantImages[variantImage] = variantImages[variantImage] || {};
if (typeof variantImages[variantImage]["option-0"] === 'undefined') {
variantImages[variantImage]["option-0"] = "Small";
}
else {
var oldValue = variantImages[variantImage]["option-0"];
if ( oldValue !== null && oldValue !== "Small" ) {
variantImages[variantImage]["option-0"] = null;
}
}
}
How can I tell python to locate the <script> tag and then the specific "inventory_quantity":0 to return the stock of the product for a size Small?
you can find it using regex:
s = 'some sample text in which "inventory_quantity":0 appears'
occurences = re.findall('"inventory_quantity":(\d+)', s)
print(occurences[0])
'0'
edit:
I suppose you can get the whole content of <script>...</script> in a variable t (either lxml, xml.etree, beautifulsoup or simply re).
before we start, let's define some variables:
true = True
null = None
then using regex find a dictionary as text and convert to dict via eval
r = re.findall('variant = (\{.*}?);', t)
if r:
variant = eval(r)
This is what you get:
>>> variant
{'available': True,
'barcode': '',
'compare_at_price': None,
'featured_image': None,
'id': 18116649221,
'inventory_management': 'shopify',
'inventory_policy': 'deny',
'inventory_quantity': 16,
'name': 'Iron Lords T-Shirt - XS',
'option1': 'XS',
'option2': None,
'option3': None,
'options': ['XS'],
'price': 2499,
'public_title': 'XS',
'requires_shipping': True,
'sku': 'BGT16073100',
'taxable': True,
'title': 'XS',
'weight': 136}
Now you can easily get any information you need.
Both the current answers don't address the problem of locating the inventory_quantity by the desired size which is not straightforward at the first glance.
The idea is to not dive into string parsing too much, but extract the complete sca_product_info JS array into the Python list via json.loads(), then filter the list by the desired size. Of course, we should first locate the desired JS object - for this we'll use a regular expression - remember, this is not HTML parsing at this point and doing that with a regular expression is pretty much okay - this famous answer does not apply in this case.
Complete implementation:
import json
import re
import requests
DESIRED_SIZE = "XS"
pattern = re.compile(r"freegifts_product_json\s*\((.*?)\);", re.MULTILINE | re.DOTALL)
url = "http://bungiestore.com/collections/featured/products/iron-lords-t-shirt-men"
response = requests.get(url)
match = pattern.search(response.text)
# load the extracted string representing the "sca_product_info" JS array into a Python list
product_info = json.loads(match.group(1))
# look up the desired size in a list of product variants
for variant in product_info["variants"]:
if variant["title"] == DESIRED_SIZE:
print(variant["inventory_quantity"])
break
Prints 16 at the moment.
By the way, we could have also used a JavaScript parser, like slimit - here is a sample working solution:
Extracting text from script tag using BeautifulSoup in Python
Assuming you can get the block of code into a string format, and assuming the format of the code doesn't change too much, you could do something like this:
before = ('"inventory_quantity":')
after = (',"inventory_management"')
start = mystr.index(before) + len(before)
end = mystr.index(after)
print(mystr[start:end])

hex/binary string conversion in Swift

Python has two very useful library method (binascii.a2b_hex(keyStr) and binascii.hexlify(keyBytes)) which I have been struggling with in Swift. Is there anything readily available in Swift. If not, how would one implement it? Given all the bounds and other checks (like even-length key) are done.
Data from Swift 3 has no "built-in" method to print its contents as
a hex string, or to create a Data value from a hex string.
"Data to hex string" methods can be found e.g. at How to convert Data to hex string in swift or How can I print the content of a variable of type Data using Swift? or converting String to Data in swift 3.0. Here is an implementation from the first link:
extension Data {
func hexEncodedString() -> String {
return map { String(format: "%02hhx", $0) }.joined()
}
}
Here is a possible implementation of the reverse "hex string to Data"
conversion (taken from Hex String to Bytes (NSData) on Code Review, translated to Swift 3 and improved)
as a failable inititializer:
extension Data {
init?(fromHexEncodedString string: String) {
// Convert 0 ... 9, a ... f, A ...F to their decimal value,
// return nil for all other input characters
func decodeNibble(u: UInt8) -> UInt8? {
switch(u) {
case 0x30 ... 0x39:
return u - 0x30
case 0x41 ... 0x46:
return u - 0x41 + 10
case 0x61 ... 0x66:
return u - 0x61 + 10
default:
return nil
}
}
self.init(capacity: string.utf8.count/2)
var iter = string.utf8.makeIterator()
while let c1 = iter.next() {
guard
let val1 = decodeNibble(u: c1),
let c2 = iter.next(),
let val2 = decodeNibble(u: c2)
else { return nil }
self.append(val1 << 4 + val2)
}
}
}
Example:
// Hex string to Data:
if let data = Data(fromHexEncodedString: "0002468A13579BFF") {
let idata = Data(data.map { 255 - $0 })
// Data to hex string:
print(idata.hexEncodedString()) // fffdb975eca86400
} else {
print("invalid hex string")
}
Not really familiar with Python and the checks it performs when convert the numbers, but you can expand the function below:
func convert(_ str: String, fromRadix r1: Int, toRadix r2: Int) -> String? {
if let num = Int(str, radix: r1) {
return String(num, radix: r2)
} else {
return nil
}
}
convert("11111111", fromRadix: 2, toRadix: 16)
convert("ff", fromRadix: 16, toRadix: 2)
Swift 2
extension NSData {
class func dataFromHexString(hex: String) -> NSData? {
let regex = try! NSRegularExpression(pattern: "^[0-9a-zA-Z]*$", options: .CaseInsensitive)
let validate = regex.firstMatchInString(hex, options: NSMatchingOptions.init(rawValue: 0), range: NSRange(location: 0, length: hex.characters.count))
if validate == nil || hex.characters.count % 2 != 0 {
return nil
}
let data = NSMutableData()
for i in 0..<hex.characters.count/2 {
let hexStr = hex.substring(i * 2, length: 2)
var ch: UInt32 = 0
NSScanner(string: hexStr).scanHexInt(&ch)
data.appendBytes(&ch, length: 1)
}
return data
}
}
let a = 0xabcd1234
print(String(format: "%x", a)) // Hex to String
NSData.dataFromHexString("abcd1234") // String to hex

nested text in regular expressions

I am struggling with regular expressions. I`m having problems getting my head wrapped around similar text nested within larger text. Perhaps you can help me unclutter my thinking.
Here is an example test string:
message msgName { stuff { innerStuff } } \n message mn2 { junk }
I want to pull out term (e.g., msgName, mn2) and what follows until the next message, to get a list like this:
msgName
{ stuff { innerStuff } more stuff }
mn2
{ junk }'
I am having trouble with too greedily or non-greedily matching to retain the inner brackets but split apart the higher level messages.
Here is one program:
import re
text = 'message msgName { stuff { innerStuff } more stuff } \n message mn2 { junk }'
messagePattern = re.compile('message (.*?) {(.*)}', re.DOTALL)
messageList = messagePattern.findall(text)
print "messages:\n"
count = 0
for message, msgDef in messageList:
count = count + 1
print str(count)
print message
print msgDef
It produces:
messages:
1
msgName
stuff { innerStuff } more stuff }
message mn2 { junk
Here is my next attempt, which makes the inner part non-greedy:
import re
text = 'message msgName { stuff { innerStuff } more stuff } \n message mn2 { junk }'
messagePattern = re.compile('message (.*?) {(.*?)}', re.DOTALL)
messageList = messagePattern.findall(text)
print "messages:\n"
count = 0
for message, msgDef in messageList:
count = count + 1
print str(count)
print message
print msgDef
It produces:
messages:
1
msgName
stuff { innerStuff
2
mn2
junk
So, I lose } more stuff }
I've really run into a mental block on this. Could someone point me in the right direction? I`m failing to deal with text in nested brackets. A suggestion as to a working regular expression or a simpler example of dealing with nested, similar text would be helpful.
If you can use PyPi regex module, you can leverage its subroutine call support:
>>> import regex
>>> reg = regex.compile(r"(\w+)\s*({(?>[^{}]++|(?2))*})")
>>> s = "message msgName { stuff { innerStuff } } \n message mn2 { junk }"
>>> print(reg.findall(s))
[('msgName', '{ stuff { innerStuff } }'), ('mn2', '{ junk }')]
The regex - (\w+)\s*({(?>[^{}]++|(?2))*}) - matches:
(\w+) - Group 1 matching 1 or more alphanumeric / underscore characters
\s* - 0+ whitespace(s)
({(?>[^{}]++|(?2))*}) - Group 2 matching a {, followed with non-{} or another balanced {...} due to the (?2) subroutine call (recurses the whole Group 2 subpattern), 0 or more times, and then matches a closing }.
If there is only one nesting level, re can be used, too, with
(\w+)\s*{[^{}]*(?:{[^{}]*}[^{}]*)*}
See this regex demo
(\w+) - Group 1 matching word characters
\s* - 0+ whitespaces
{ - opening brace
[^{}]* - 0+ characters other than { and }
(?:{[^{}]*}[^{}]*)* - 0+ sequences of:
{- opening brace
[^{}]* - 0+ characters other than { and }
} - closing brace
[^{}]* - 0+ characters other than { and }
} - closing brace

python parse bind configuration with nested curly braces

I'm trying to automatically parse an existing bind configuration, consisting of multiple of these zone definitons:
zone "domain.com" {
type slave;
file "sec/domain.com";
masters {
11.22.33.44;
55.66.77.88;
};
allow-transfer {
"acl1";
"acl2";
};
};
note that the amount of elements in masters and in allow-transfer may differ. I tried my way around splitting this using re.split() and failed horribly due to the nested curly braces.
My goal is a dictionary for each of these entries.
Thanks in advance for any help!
This should do the trick, where 'st' is a string of all your zone definitions:
import re
zone_def = re.split('zone', st, re.DOTALL)
big_dict = {}
for zone in zone_def:
if len(zone) > 0:
zone_name = re.search('(".*?")', zone)
sub_dicts = re.finditer('([\w]+) ({.*?})', zone, re.DOTALL)
big_dict[zone_name.group(1)] = {}
for sub_dict in sub_dicts:
big_dict[zone_name.group(1)][sub_dict.group(1)] = sub_dict.group(2).replace(' ', '')
sub_types = re.finditer('([\w]+) (.*?);', zone)
for sub_type in sub_types:
big_dict[zone_name.group(1)][sub_type.group(1)] = sub_type.group(2)
big_dict will then return a dictionary of zone definitions. Each zone definition will have the domain/url as its key. Every key/value in the zone definition is a string.
This is the output for the above example:
{'"domain.com"': {'transfer': '{\n"acl1";\n"acl2";\n}', 'masters': '{\n11.22.33.44;\n55.66.77.88;\n}', 'type': 'slave', 'file': '"sec/domain.com"'}}
And this is the output if you were to have a second identical zone, with key "sssss.com".
{'"sssss.com"': {'transfer': '{\n"acl1";\n"acl2";\n}', 'masters': '{\n11.22.33.44;\n55.66.77.88;\n}', 'type': 'slave', 'file': '"sec/domain.com"'},'"domain.com"': {'transfer': '{\n"acl1";\n"acl2";\n}', 'masters': '{\n11.22.33.44;\n55.66.77.88;\n}', 'type': 'slave', 'file': '"sec/domain.com"'}}
You will have to do some further stripping to make it more readable.
A way is to (install and) use the regex module instead of the re module. The problem is that the re module is unable to deal with undefined level of nested brackets:
#!/usr/bin/python
import regex
data = '''zone "domain.com" {
type slave;
file "sec/domain.com";
masters {
11.22.33.44; { toto { pouet } glups };
55.66.77.88;
};
allow-transfer {
"acl1";
"acl2";
};
}; '''
pattern = r'''(?V1xi)
(?:
\G(?<!^)
|
zone \s (?<zone> "[^"]+" ) \s* {
) \s*
(?<key> \S+ ) \s+
(?<value> (?: ({ (?> [^{}]++ | (?4) )* }) | "[^"]+" | \w+ ) ; )
'''
matches = regex.finditer(pattern, data)
for m in matches:
if m.group("zone"):
print "\n" + m.group("zone")
print m.group("key") + "\t" + m.group("value")
You can find more informations about this module by following this link: https://pypi.python.org/pypi/regex

Categories