Exact sub-string match inside a string in Python

Exact sub-string match inside a string in Python - python

I have a string as given below
dit ='{p_d: {a:3, what:3.6864e-05, s:lion, sst:{c:-20, b:6, p:panther}}}'
And I have a list of elements which I wanted to search in the above string and replace them with double quotes.
['', 'p_d', '', '', 'a', '3', '', 'what', '3.6864e-05', '', 's', 'lion', '', 'sst', '', 'c', '-20', '', 'b', '6', '', 'p', 'panther', '', '', '']
If I do search and replace using simple .replace it doesn't work as expected and can understand
import yaml
import ast
import json
import re
rep = {":": " ", "'":" ", "{":" ", "}":" ", ",": " "}
quot = "\""
dit = '{p_d: {a:3, what:3.6864e-05, s:lion, sst:{c:-20, b:6, p:panther}}}'
def replace_all(text, dic):
for i, j in dic.items():
text = text.replace(i, j)
print("replace_all: text {}".format(text))
return text
element_list_temp = replace_all(dit, rep)
element_list = element_list_temp.split(" ")
for z in element_list:
if z != "" and z in dit:
dit = dit.replace(z, quot+z+quot)
print(dit)
Output:
{""p"_d": {"a":"3", wh"a"t:"3"."6"8"6"4e-05, "s":"lion", "s""s"t:{"c":"-20", "b":"6", "p":"p""a"nther}}}
Desired Output:
'{"p_d": {"a":"3", "what":"3.6864e-05", "s":"lion", "sst":{"c":"-20", "b":"6", "p":"panther"}}}'
How to exactly match the string in the list one by one and replace them with double quotes.
Updates:
Different input
import yaml
import ast
import json
import re
rep = {":": " ", "'":" ", "{":" ", "}":" ", ",": " "}
quot = "\""
# dit = '{p_d: {a:3, what:3.6864e-05, s:lion, sst:{c:-20, b:6, p:panther}}}'
dit = "'{p_d: '{a:3, what:3.6864e-05, s:lion, vec_mode:'{2.5, -2.9, 3.4, 5.6, -8.9, -5.67, 2, 2, 2, 2, 5.4, 2, 2, 6.545, 2, 2}, sst:'{c:-20, b:6, p:panther}}}"
seps = ":'{}, "
val_strings = re.findall(f"[^{seps}]+", dit)
print("val_strings: {}".format(val_strings))
sep_strings = re.findall(f"[{seps}]+", dit)
print("sep_strings: {}".format(sep_strings))
seq = [f'{b}"{v}"' for b, v in zip(sep_strings, val_strings)] + sep_strings[-1:]
print("sep: {}".format(seq))
dit = "".join(seq)
print(dit)
Dict = json.loads(dit)
print(Dict)
result = yaml.dump(Dict)
print(result)
print(result.replace("'",""))
Output from above code
Think its failing because of the key:value pair of the dictionary. Checking at my end as well if there is a way to print them as arrays.
val_strings: ['p_d', 'a', '3', 'what', '3.6864e-05', 's', 'lion', 'vec_mode', '2.5', '-2.9', '3.4', '5.6', '-8.9', '-5.67', '2', '2', '2', '2', '5.4', '2', '2', '6.545', '2', '2', 'sst', 'c', '-20', 'b', '6', 'p', 'panther']
sep_strings: ["'{", ": '{", ':', ', ', ':', ', ', ':', ', ', ":'{", ', ', ', ', ', ', ', ', ', ', ', ', ', ', ', ', ', ', ', ', ', ', ', ', ', ', ', ', ', ', '}, ', ":'{", ':', ', ', ':', ', ', ':', '}}}']
sep: ['\'{"p_d"', ': \'{"a"', ':"3"', ', "what"', ':"3.6864e-05"', ', "s"', ':"lion"', ', "vec_mode"', ':\'{"2.5"', ', "-2.9"', ', "3.4"', ', "5.6"', ', "-8.9"', ', "-5.67"', ', "2"', ', "2"', ', "2"', ', "2"', ', "5.4"', ', "2"', ', "2"', ', "6.545"', ', "2"', ', "2"', '}, "sst"', ':\'{"c"', ':"-20"', ', "b"', ':"6"', ', "p"', ':"panther"', '}}}']
'{"p_d": '{"a":"3", "what":"3.6864e-05", "s":"lion", "vec_mode":'{"2.5", "-2.9", "3.4", "5.6", "-8.9", "-5.67", "2", "2", "2", "2", "5.4", "2", "2", "6.545", "2", "2"}, "sst":'{"c":"-20", "b":"6", "p":"panther"}}}
Traceback (most recent call last):
File "./ditoyaml_new.py", line 36, in <module>
Dict = json.loads(dit)
File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Expected Output with the json.load and dump as dictionary and if the key: value dictionary pair isnt available and put something like list or array. Checking at my end as well.
p_d:
a: 3
s: lion
sst:
b: 6
c: -20
p: panther
vec_mode:
[-8.9,
-5.67,
-2.9,
2,
2.5,
3.4,
5.4,
5.6,
6.545]
what: 3.6864e-05

Here is one way using regular expressions
import re
dit = '{p_d: {a:3, what:3.6864e-05, s:lion, sst:{c:-20, b:6, p:panther}}}'
seps = ":'{}, "
val_strings = re.findall(fr"[^{seps}]+", dit)
sep_strings = re.findall(fr"[{seps}]+", dit)
seq = [f'{b}"{v}"' for b, v in zip(sep_strings, val_strings)] + sep_strings[-1:]
dit = "".join(seq)
print(dit)
Output:
{"p_d": {"a":"3", "what":"3.6864e-05", "s":"lion", "sst":{"c":"-20", "b":"6", "p":"panther"}}}
JSON test:
import json
print(json.loads(dit))
Output:
{'p_d': {'a': '3', 'what': '3.6864e-05', 's': 'lion', 'sst': {'c': '-20', 'b': '6', 'p': 'panther'}}}

Related

How to write a conditional statement based on combination of two columns and a dictionary, using the dictionary for a mapping in a new column?

I am working with a pandas dataframe (the dataframe is called market_info_df):
And I have the following Python code:
market_info_df['is_and_mp'] = market_info_df['issue_status'] + market_info_df['market_phase']
no_collision_issue_status = ['000', '200', '203', '204', '300']
MARKET_STATES_DICT = {
('000', ' '): MARKET_STATES.CLOSED,
('100', ' ', 'F'): MARKET_STATES.OPENING_AUCTION,
('200', ' '): MARKET_STATES.CONTINUOUS_TRADING,
('203', ' '): MARKET_STATES.UNSCHEDULED_AUCTION,
('204', ' '): MARKET_STATES.UNSCHEDULED_AUCTION,
('100', 'B0'): MARKET_STATES.UNSCHEDULED_AUCTION,
('200', 'B1'): MARKET_STATES.CONTINUOUS_TRADING,
('400', 'C0'): MARKET_STATES.HALTED,
('400', 'C1'): MARKET_STATES.CONTINUOUS_TRADING,
('400', 'D0'): MARKET_STATES.HALTED,
('400', 'D1'): MARKET_STATES.POST_TRADE}
I am trying to write a condition such that if the is_and_mp is in the no_collision_issue_status list, OR the trading_status is not , then use the MARKET_STATES_DICT to map a new column called market_state.
Here is what I have written, but I get an error TypeError: unhashable type: 'Series':
market_info_df.loc[(market_info_df['is_and_mp'] in no_collision_issue_status) | (~market_info_df['trading_state'] == ' '),
'market_state'] = MARKET_STATES_DICT[(market_info_df['issue_status'], market_info_df['trading_state'])]
I understand what is wrong and why I am getting the error, but I am not sure how to fix it!

Use apply function on dataframe. Check for the desired condition as you have written. If true then return the value from dict else return None:
market_info_df["market_state"] = market_info_df.apply(lambda row: MARKET_STATES_DICT[(row["is_and_mp"],row["trading_status"])] if row["is_and_mp"] in no_collision_issue_status or row["trading_status"] != " " else None, axis=1)
Full example with dummy data:
market_info_df = pd.DataFrame(data=[["10","0","B0"],["20","0"," "],["40","0","D1"]], columns=["issue_status", "market_phase", "trading_status"])
market_info_df['is_and_mp'] = market_info_df['issue_status'] + market_info_df['market_phase']
no_collision_issue_status = ['000', '200', '203', '204', '300']
MARKET_STATES_DICT = {
('000', ' '): "CLOSED",
('100', ' ', 'F'): "OPENING_AUCTION",
('200', ' '): "CONTINUOUS_TRADING",
('203', ' '): "UNSCHEDULED_AUCTION",
('204', ' '): "UNSCHEDULED_AUCTION",
('100', 'B0'): "UNSCHEDULED_AUCTION",
('200', 'B1'): "CONTINUOUS_TRADING",
('400', 'C0'): "HALTED",
('400', 'C1'): "CONTINUOUS_TRADING",
('400', 'D0'): "HALTED",
('400', 'D1'): "POST_TRADE"}
market_info_df["market_state"] = market_info_df.apply(lambda row: MARKET_STATES_DICT[(row["is_and_mp"],row["trading_status"])] if row["is_and_mp"] in no_collision_issue_status or row["trading_status"] != " " else None, axis=1)
[Out]:
issue_status market_phase trading_status is_and_mp market_state
0 10 0 B0 100 UNSCHEDULED_AUCTION
1 20 0 200 CONTINUOUS_TRADING
2 40 0 D1 400 POST_TRADE

Program outputs incorrect string format with join method

I am trying to output in the format (a, b and c) from a function that should print common letters from 2 strings, but what I currently get is (a,b,and,c).
def task10(str1,str2):
str1=str1.lower()
str2=str2.lower()
ch1=""
ch2=""
for i in str1:
if i.isalpha():
ch1+=i
for k in str2:
if k.isalpha():
ch2+=k
common = list(set([c for c in ch1 if c in ch2]))
s=len(common)
if s==0:
common=['no common letters']
common.sort()
if s>1:
common.insert(-1," and ")
common= ', '.join(common)
common = common.replace(', and ,', 'and')
print(f'{common}')
task10("i like big cups","i cannot lie")

Do a string replace for ,and, with and to get the correct output.
def task10(str1,str2):
str1=str1.lower()
str2=str2.lower()
badchars = ['$','#','%',';',':','!',"*"," ",'1',"2","3","4","5","6","7","8","9","0",'^','&','#','~','?','[]','{',']',"+",'=','-','_','-',",",'"',"'",'`',"|","\\",'(',')']
for i in badchars:
str1=str1.replace(i,"")
str2=str2.replace(i,"")
common = list(set([c for c in str1 if c in str2]))
i=len(common)
if i==0:
common=['no common letters']
common.sort()
if i>1:
common.insert(-1,"and")
common= ','.join(common)
common = common.replace(',and,', ' and ')
print(f'{common}')
task10('icl!','i cannot lie!')
Output:
c,i and l

You can do this efficiently with set manipulation. Note that badchars is a set and not a list as in the original question.
badchars = {'$', '#', '%', ';', ':', '!', "*", " ", '1', "2", "3", "4", "5", "6", "7", "8", "9", "0", '^',
'&', '#', '~', '?', '[', '{', ']', "+", '=', '-', '_', '-', ",", '"', "'", '`', "|", "\\", '(', ')', '£'}
def task10(str1, str2):
match len(rv := (set(str1) & set(str2)) - badchars):
case 0:
return ''
case 1:
return rv.pop()
case _:
rv = list(rv)
return ', '.join(rv[:-1]) + ' and ' + rv[-1]
print(task10('ic!', 'i cannot lie!'))
Output:
i, c and l

Convert a list of tab prefixed strings to a dictionary

Text mining attempts here, I would like to turn the below:
a=['Colors.of.the universe:\n',
' Black: 111\n',
' Grey: 222\n',
' White: 11\n'
'Movies of the week:\n',
' Mission Impossible: 121\n',
' Die_Hard: 123\n',
' Jurassic Park: 33\n',
'Lands.categories.said:\n',
' Desert: 33212\n',
' forest: 4532\n',
' grassland : 431\n',
' tundra : 243451\n']
to this:
{'Colors.of.the universe':{Black:111,Grey:222,White:11},
'Movies of the week':{Mission Impossible:121,Die_Hard:123,Jurassic Park:33},
'Lands.categories.said': {Desert:33212,forest:4532,grassland:431,tundra:243451}}
Tried this code below but it was not good:
{words[1]:words[1:] for words in a}
which gives
{'o': 'olors.of.the universe:\n',
' ': ' tundra : 243451\n',
'a': 'ands.categories.said:\n'}
It only takes the first word as the key which is not what's needed.
A dict comprehension is an interesting approach.

a = ['Colors.of.the universe:\n',
' Black: 111\n',
' Grey: 222\n',
' White: 11\n',
'Movies of the week:\n',
' Mission Impossible: 121\n',
' Die_Hard: 123\n',
' Jurassic Park: 33\n',
'Lands.categories.said:\n',
' Desert: 33212\n',
' forest: 4532\n',
' grassland : 431\n',
' tundra : 243451\n']
result = dict()
current_key = None
for w in a:
# If starts with tab - its an item (under category)
if w.startswith(' '):
# Splitting item (i.e. ' Desert: 33212\n' -> [' Desert', ' 33212\n']
splitted = w.split(':')
# Setting the key and the value of the item
# Removing redundant spaces and '\n'
# Converting value to number
k, v = splitted[0].strip(), int(splitted[1].replace('\n', ''))
result[current_key][k] = v
# Else, it's a category
else:
# Removing ':' and '\n' form category name
current_key = w.replace(':', '').replace('\n', '')
# If category not exist - create a dictionary for it
if not current_key in result.keys():
result[current_key] = {}
# {'Colors.of.the universe': {'Black': 111, 'Grey': 222, 'White': 11}, 'Movies of the week': {'Mission Impossible': 121, 'Die_Hard': 123, 'Jurassic Park': 33}, 'Lands.categories.said': {'Desert': 33212, 'forest': 4532, 'grassland': 431, 'tundra': 243451}}
print(result)

That's really close to valid YAML already. You could just quote the property labels and parse. And parsing a known format is MUCH superior to dealing with and/or inventing your own. Even if you're just exploring base python, exploring good practices is just as (probably more) important.
import re
import yaml
raw = ['Colors.of.the universe:\n',
' Black: 111\n',
' Grey: 222\n',
' White: 11\n',
'Movies of the week:\n',
' Mission Impossible: 121\n',
' Die_Hard: 123\n',
' Jurassic Park: 33\n',
'Lands.categories.said:\n',
' Desert: 33212\n',
' forest: 4532\n',
' grassland : 431\n',
' tundra : 243451\n']
# Fix spaces in property names
fixed = []
for line in raw:
match = re.match(r'^( *)(\S.*?): ?(\S*)\s*', line)
if match:
fixed.append('{indent}{safe_label}:{value}'.format(
indent = match.group(1),
safe_label = "'{}'".format(match.group(2)),
value = ' ' + match.group(3) if match.group(3) else ''
))
else:
raise Exception("regex failed")
parsed = yaml.load('\n'.join(fixed), Loader=yaml.FullLoader)
print(parsed)

Change the value for multiple items in list python

I have a nested list:
Table=[['','','','',''],
['','','','',''],
['','','','',''],
['','','','',''],
['','','','',''],
['','','','','']]
I have randomly placed some values in Table and now I want to place other things in the 2D neighbours of those values. E.g.:
Table=[['','','','',''],
['','','','',''],
['','','','',''],
['','','value','',''],
['','','','',''],
['','','','','']]
Then i want to add:
Table=[['','','','',''],
['','','','',''],
['','','1','',''],
['','1','value','1',''],
['','','1','',''],
['','','','','']]
Under is all my code i don't know why but it would accept it in any other format sorry :/
def add_nukes():
pos=j.index('nuke')
if "nuke" not in j[0]:j[pos+1]='1'
if "nuke" not in j[-1]:
j[pos-1] = "1"
board[pos][i-1]="1"
board[i+1][pos]="1"
import random
size=150
if size%2==1:
size+=1
board = [[" "]*size for i in range(size)]
bombs = 25
all_cells = ["nuke"] * bombs + [" "] * (size - bombs)
random.shuffle(all_cells)
board = [all_cells[i:i+10] for i in range(0, size, 10)]
count=0
for j in board:
for i in range(len(j)):
count+=1
if "nuke" in j[i]:
add_nukes()
elif "nuke" in j[i]:
add_nukes()
for item in board:
print item

Any value in Table is identified uniquely by its x and y coordinates, i.e. the element in the 2nd column (x == 1 because 0-indexed) and 3rd row (y == 2) is Table[y][x] == Table[2][1].
The four immediate neighbours of any cell A are the cells with x one away from A OR with y one away from A. If A is Table[y][x], then the neighbours are [Table[y - 1][x], Table[y + 1][x], Table[y, x - 1], Table[y, x + 1]].

Just like #Aurel Bílý mentioned, there are four neighbouring coordinates in which you need to add value for the specific case: [Table[y - 1][x], Table[y + 1][x], Table[y, x - 1], Table[y, x + 1]].
In order to do that, you must first ensure that these coordinates are valid and do not throw an IndexError exception. After you make sure that this coordinates are valid, you can safely add them in your table.
The code below demonstrates this:
Table=[['','','','',''],
['','','','',''],
['','','','',''],
['','','value','',''],
['','','','',''],
['','','','','']]
def isInBounds(Table,x,y):
return 0 <= x < len(Table) and 0 <= y < len(Table[0])
def addValue(Table,x,y,value):
if isInBounds(Table,x,y):
Table[x][y] = value
def addValuesAround(Table,x,y,value):
addValue(Table,x-1,y,value)
addValue(Table,x,y-1,value)
addValue(Table,x+1,y,value)
addValue(Table,x,y+1,value)
addValuesAround(Table,3,2,1)
for elem in Table:
print(elem)
This will return:
['', '', '', '', '']
['', '', '', '', '']
['', '', 1, '', '']
['', 1, 'value', 1, '']
['', '', 1, '', '']
['', '', '', '', '']
EDIT:
I think I got it, using both of our codes. Just be sure to change the syntax of the print function, because you're using Python 2.7 and I use Python 3.6:
import random
def isInBounds(Table,x,y):
return 0 <= x < len(Table) and 0 <= y < len(Table[0])
def addValue(Table,x,y,value):
if isInBounds(Table,x,y):
Table[x][y] = value
def addValuesAround(Table,x,y,value):
addValue(Table,x-1,y,value)
addValue(Table,x,y-1,value)
addValue(Table,x+1,y,value)
addValue(Table,x,y+1,value)
size=150
if size%2==1:
size+=1
board = [[" " for i in range(size)] for i in range(size)]
bombs = 25
all_cells = ["nuke"] * bombs + [" "] * (size - bombs)
random.shuffle(all_cells)
board = [all_cells[i:i+10] for i in range(0, size, 10)]
count=0
for i in range(len(board)):
for j in range(len(board[i])):
if board[i][j] == 'nuke':
addValuesAround(board,i,j,"1")
for item in board:
print(item)
This will give an instance of a board like this:
[' ', ' ', ' ', ' ', '1', ' ', '1', ' ', '1', ' ']
[' ', ' ', ' ', '1', 'nuke', '1', 'nuke', '1', 'nuke', '1']
['1', ' ', ' ', ' ', '1', ' ', '1', ' ', '1', '1']
['nuke', '1', '1', '1', 'nuke', '1', ' ', ' ', '1', 'nuke']
['1', '1', 'nuke', '1', '1', ' ', '1', ' ', ' ', '1']
[' ', ' ', '1', ' ', ' ', '1', 'nuke', '1', ' ', ' ']
[' ', ' ', '1', ' ', ' ', '1', '1', ' ', ' ', ' ']
[' ', '1', 'nuke', '1', '1', 'nuke', '1', ' ', ' ', ' ']
['1', 'nuke', '1', ' ', '1', '1', '1', ' ', '1', ' ']
[' ', '1', 'nuke', '1', 'nuke', '1', 'nuke', '1', 'nuke', '1']
['1', 'nuke', '1', ' ', '1', ' ', '1', ' ', '1', ' ']
[' ', '1', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
[' ', ' ', '1', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
[' ', '1', 'nuke', '1', ' ', '1', ' ', '1', ' ', ' ']
[' ', ' ', '1', ' ', '1', 'nuke', '1', 'nuke', '1', ' ']

How can I sort these words to make this sentence in Python?

I have to sort these words in to a sentence.
This is my list, with the numbers:
sms=['Szentírás ', 'bölcs ', 'a ', 'már ', 'szükséges ', 'mondhat ', 'biztosak ', 'a ',
'feladata, ', 'Mivel ', 'ellent ', 'a ', 'érzéki ', 'azokkal ', 'következtetésekkel, ',
'a ', 'vagyunk ', 'a ', 'tapasztalataink ', 'szöveg ', 'azon ', 'igazság ', 'sose ',
'ami ', 'hogy ', 'melyekben ', 'kísérletek ', 'megtalálják ', 'által.', 'két ', 'fizikai ',
'egymásnak, ', 'egyezik ', 'és ', 'értelmezőinek ', 'értelmezését, ']
sorrend=[8,9,15,26,33,4,27,11,12,0,5,32,29,21,24,22,28,7,30,16,17,2,3,19,13,25,34,14,35,1,23,6,20,31,10,18]
and I have to get this sentence:
Mivel két igazság sose mondhat ellent egymásnak, a Szentírás bölcs értelmezőinek a feladata, hogy megtalálják a szöveg azon értelmezését, ami egyezik azokkal a fizikai következtetésekkel, melyekben már biztosak vagyunk érzéki tapasztalataink és a szükséges kísérletek által.
How can I sort it like that?
Thanks.

You could try something like this:
dict(sorted(zip(sorrend,sms))).values()
For your shortened example:
>>> sms=['love', 'I', 'much', 'so', 'you']
>>> sorrend=[2,1,5,4,3]
>>> ' '.join(dict(sorted(zip(sorrend,sms))).values())
'I love you so much'

Finally I've solved this riddle:
words = {}
for i in range(len(sorrend)):
words[sorrend[i]] = sms[i]
for i in range(len(sorrend)):
print(words[i], end=' ')
Output:
Mivel két igazság sose mondhat ellent egymásnak, a Szentírás bölcs értelmezőinek a feladata, hogy megtalálják a szöveg azon értelmezését, ami egyezik azokkal a fizikai következtetésekkel, melyekben már biztosak vagyunk érzéki tapasztalataink és a szükséges kísérletek által.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Exact sub-string match inside a string in Python - python

Related

How to write a conditional statement based on combination of two columns and a dictionary, using the dictionary for a mapping in a new column?

Program outputs incorrect string format with join method

Convert a list of tab prefixed strings to a dictionary

Change the value for multiple items in list python

How can I sort these words to make this sentence in Python?

Categories

Resources