regex matching and get into a python list - python

I have the following saved as a string in a variable:
window.dataLayer=[{"articleCondition":"New","categoryNr":"12345","sellerCustomerNr":"88888888","articleStatus":"Open"}]
How do I extract the values of each element?
Goal would be to have something like this:
articleCondition = 'new'
categoryNr = '12345'
...

In python there are many ways to get value from a string, you can use regex, Python eval function and even more ways that I may not know.
Method 1
value = 'window.dataLayer=[{"articleCondition":"New","categoryNr":"12345","sellerCustomerNr":"88888888","articleStatus":"Open"}]'
value = value.split('=')[1]
data = eval(value)[0]
articleCondition = data['articleCondition']
Method 2
using regex
import re
re.findall('"articleCondition":"(\w*)"',value)
for regex you can be more creative to make a generall pattern.

You are having a list of dictionary. Use the dictionary key to get the value.
Ex:
dataLayer=[{"articleCondition":"New","categoryNr":"12345","sellerCustomerNr":"88888888","articleStatus":"Open"}]
print(dataLayer[0]["articleCondition"])
print(dataLayer[0]["categoryNr"])
Output:
New
12345

Use json. Your string is:
>>> s = 'window.dataLayer=[{"articleCondition":"New","categoryNr":"12345","sellerCustomerNr":"88888888","articleStatus":"Open"}]'
You can get the right hand side of the  = with a split:
>>> s.split('=')[1]
'[{"articleCondition":"New","categoryNr":"12345","sellerCustomerNr":"88888888","articleStatus":"Open"}]'
Then parse it with the json module:
>>> import json
>>> t = json.loads(s.split('=')[1])
>>> t[0]['articleCondition']
'New'
Please note that this works because you have double quotes in the RHS. Single quotes are not allowed in JSON.

Related

input format data in json with double curly braces

I get json from the other server like
{"field":"zxczxczcx_{{name}}_qweqweqwe"}
So the question is how to format that value?
I've tried
d = {"field":"zxczxczcx_{{name}}_qweqweqwe"}
d['field'].format('any_string')
But it just remove one pair of curly braces and output
"zxczxczcx_{name}_qweqweqwe"
Maybe you can use the replace method?
d = {"field":"zxczxczcx_{{name}}_qweqweqwe"}
d['field'] = d['field'].replace('{{name}}','any_string')
print(d)
Based on your comments (this uses the re module (regular expressions) to find the {{x}} pattern) :
import re
tokens_to_replace = re.findall('{{.*}}',d['field'])
for token in tokens_to_replace:
d['field'] = d['field'].replace(token,d[token[2:-2]])
tokens_to_replace will have value: ['{{name}}']

What is the RE to match the list?

I want to know how to construct the regular express to extract the list.
Here is my string:
audit = "{## audit_filter = ['hostname.*','service.*'] ##}"
Here is my expression:
AUDIT_FILTER_RE = r'([.*])'
And here is my search statement:
audit_filter = re.search(AUDIT_FILTER_RE, audit).group(1)
I want to extract everything inside the square brackets including the brackets. '[...]'
Expected Output:
['hostname.*','service.*']
import re
audit = "{## audit_filter = ['hostname.*','service.*'] ##}"
print eval(re.findall(r"\[.*\]", audit)[0]) # ['hostname.*', 'service.*']
findall returns a list of string matches. In your case, there should only be one, so I'm retrieving the string at index 0, which is a string representation of a list. Then, I use eval(...) to convert that string representation of a list to an actual list. Just beware:
If there are no matches, ...findall...[0] will throw a list index out of range error
Don't use eval() if you ever expect input coming from another source (i.e. input that is not yours) because that would be a security issue.
Use r"\[(.*?)\]"
Ex:
import re
audit = "{## audit_filter = ['hostname.*'] ##}"
print(re.findall(r"\[(.*?)\]", audit))
Output:
["'hostname.*'"]

How to swap comma and dot in a string

I am fetching price from a site in format: 10.990,00 which does not make sense as such. What is needed to make it as 10,990.00. I tried following but it's replacing all.
price = "10.990,00"
price = price.replace(',','.',1)
price = price.replace('.',',',1)
What am I doing wrong?
You are replacing the first dot with a comma, after first replacing the first comma with a dot. The dot the first str.replace() inserted is not exempt from being replaced by the second str.replace() call.
Use the str.translate() method instead:
try:
from string import maketrans # Python 2
except ImportError:
maketrans = str.maketrans # Python 3
price = price.translate(maketrans(',.', '.,'))
This'll swap commas for dots and vice versa as it traverses the string, and won't make double replacements, and is very fast to boot.
I made the code compatible with both Python 2 and 3, where string.maketrans() was replaced by a the static str.maketrans() function.
The exception here is Python 2 unicode; it works the same as str.translate() in Python 3, but there is no maketrans factory to create the mapping for you. You can use a dictionary for that:
unicode_price = unicode_price.translate({u'.': u',', u',': u'.'})
Demo:
>>> try:
... from string import maketrans # Python 2
... except ImportError:
... maketrans = str.maketrans # Python 3
...
>>> price = "10.990,00"
>>> price.translate(maketrans(',.', '.,'))
'10,990.00'
#Martijn has given the best answer. you can also iterate over the price and replace.
swap = {'.':',',',':'.'}
def switchDotsAndCommas(text):
text = ''.join(swap.get(k, k) for k in text)
print text
switchDotsAndCommas('10.990,00')
The reason your code doesn't work is because you convert 10.990,00 to 10.990.00 and then you are replacing all dots with comma.
Instead you can convert , to a symbol then convert . to , and the symbol to . :
price = "10.990,00"
price = price.replace(',','COMMA')
price = price.replace('.',',')
price = price.replace('COMMA','.')
print(price)
Or as suggested by georg
price = price.replace(',','COMMA').replace('.',',').replace('COMMA','.')
Note that i removed the optional argument in replace(), since numbers like 1.200.000,30 would not convert as expected.
May be this is a long answer but it is simple to execute and doesn't use any built in function:
l=input("enter the input:")
m=[]
for i in l:
if(i=='.'):
m.append(',')
elif(i==','):
m.append('.')
else:
m.append(i)
print(''.join(m))
price = "10.990,00"
price = price.replace(',','.')
price1=price[0:3].replace('.',',')
print(price1+price[3:9])

Regex Expression not matching correctly

I'm tackling a python challenge problem to find a block of text in the format xXXXxXXXx (lower vs upper case, not all X's) in a chunk like this:
jdskvSJNDfbSJneSfnJDKoJIWhsjnfakjn
I have tested the following RegEx and found it correctly matches what I am looking for from this site (http://www.regexr.com/):
'([a-z])([A-Z]){3}([a-z])([A-Z]){3}([a-z])'
However, when I try to match this expression to the block of text, it just returns the entire string:
In [1]: import re
In [2]: example = 'jdskvSJNDfbSJneSfnJDKoJIWhsjnfakjn'
In [3]: expression = re.compile(r'([a-z])([A-Z]){3}([a-z])([A-Z]){3}([a-z])')
In [4]: found = expression.search(example)
In [5]: print found.string
jdskvSJNDfbSJneSfnJDKoJIWhsjnfakjn
Any ideas? Is my expression incorrect? Also, if there is a simpler way to represent that expression, feel free to let me know. I'm fairly new to RegEx.
You need to return the match group instead of the string attribute.
>>> import re
>>> s = 'jdskvSJNDfbSJneSfnJDKoJIWhsjnfakjn'
>>> rgx = re.compile(r'[a-z][A-Z]{3}[a-z][A-Z]{3}[a-z]')
>>> found = rgx.search(s).group()
>>> print found
nJDKoJIWh
The string attribute always returns the string passed as input to the match. This is clearly documented:
string
The string passed to match() or search().
The problem has nothing to do with the matching, you're just grabbing the wrong thing from the match object. Use match.group(0) (or match.group()).
Based on xXXXxXXXx if you want upper letters with len 3 and lower with len 1 between them this is what you want :
([a-z])(([A-Z]){3}([a-z]))+
also you can get your search function with group()
print expression.search(example).group(0)

Regular Expression to extract parts of Twitter query

I have the following string from which I want to extract the q and geocode values.
?since_id=261042755432763393&q=salvia&geocode=39.862712%2C-75.33958%2C10mi
I've tried the following regular expression.
expr = re.compile('\[\=\](.*?)\[\&\]')
vals = expr.match(str)
However, vals is None. I'm also not sure how to find something before, say, q= versus =.
No need for a regex (using Python 3):
>>> from urllib.parse import parse_qs
>>> query = parse_qs(str[1:])
>>> query
{'q': ['salvia'], 'geocode': ['39.862712,-75.33958,10mi'], 'since_id': ['261042755432763393']}
>>> query['q']
['salvia']
>>> query['geocode']
['39.862712,-75.33958,10mi']
Obviously, str contains your input.
Since (according to your tag) you are using Python 2.7, I think you need to change the import statement to this, though:
from urlparse import parse_qs
and if you were using Python before version 2.6, the import statement is
from cgi import parse_qs
I think this can be easily done without regex:
string = '?since_id=261042755432763393&q=salvia&geocode=39.862712%2C-75.33958%2C10mi'
parts = string[1:].split('&') # the [1:] is to leave out the '?'
pairs = {}
for part in parts:
try:
key, value = part.split('=')
pairs[key] = value
except:
pass
And pairs should contain all the key-value pairs of the string.

Categories