How to remove brackets from python string? - python

I know from the title you might think that this is a duplicate but it's not.
for id,row in enumerate(rows):
columns = row.findall("td")
teamName = columns[0].find("a").text, # Lag
playedGames = columns[1].text, # S
wins = columns[2].text,
draw = columns[3].text,
lost = columns[4].text,
dif = columns[6].text, # GM-IM
points = columns[7].text, # P - last column
dict[divisionName].update({id :{"teamName":teamName, "playedGames":playedGames, "wins":wins, "draw":draw, "lost":lost, "dif":dif, "points":points }})
This is how my Python code looks like. Most of the code is removed but essentially i am extracting some information from a website. And i am saving the information as a dictionary. When i print the dictionary every value has a bracket around them ["blbal"] which causes trouble in my Iphone application. I know that i can convert the variables to strings but i want to know if there is a way to get the information DIRECTLY as a string.

That looks like you have a string inside a list:
["blbal"]
To get the string just index l = ["blbal"] print(l[0]) -> "blbal".
If it is a string use str.strip '["blbal"]'.strip("[]") or slicing '["blbal"]'[1:-1] if they are always present.

you can also you replace to just replace the text/symbol that you don't want with the empty string.
text = ["blbal","test"]
strippedText = str(text).replace('[','').replace(']','').replace('\'','').replace('\"','')
print(strippedText)

import re
text = "some (string) [another string] in brackets"
re.sub("\(.*?\)", "", text)
# some in brackets
# works for () and will work for [] if you replace () with [].
The \(.*?\) format matches brackets with some text in them with an unspecified length. And the \[.*?\] format matches also but a square brackets with some text inside the brackets.
The output will not contain brackets and texts inside of them.
If you want to match only square brackets replace square brackets with the bracket of choice and vise versa.
To match () and [] bracket in one go, use this format (\(.*?\)|\[.*?\]:) joining two pattern with the | character.

Related

Extract strings between brackets and nested brackets

So I have a file of text and titles, (titles indicated with the starting ";")
;star/stellar_(class(ification))_(chart)
Hertz-sprussels classification of stars is shows us . . .
What I want to do is have it where it's split by "_" into
['star/stellar','(class(ification))','(chart)'], interating through them and extracting whats in the brackets, e.g. '(class(ification))' to {'class':'ification'} and (chart) to just ['chart'].
All i've done so far is the splitting part
for ln in open(file,"r").read().split("\n"):
if ln.startswith(";"):
keys=ln[1:].split("_")
I have ways to extract bits in brackets, but I have had trouble finding a way that supports nested brackets in order.
I've tried things like re.findall('\(([^)]+)',ln) but that returns ['star/stellar', '(class', 'chart']. Any ideas?
You can do this with splits. If you separate the string using '_(' instead of only '_', the second part onward will be an enclosed keyword. you can strip the closing parentheses and split those parts on the '(' to get either one component (if there was no nested parentesis) or two components. You then form either a one-element list or dictionary depending on the number of components.
line = ";star/stellar_(class(ification))_(chart)"
if line.startswith(";"):
parts = [ part.rstrip(")") for part in line.split("_(")[1:]]
parts = [ part.split("(",1) for part in parts ]
parts = [ part if len(part)==1 else dict([part]) for part in parts ]
print(parts)
[{'class': 'ification'}, ['chart']]
Note that I assumed that the first part of the string is never included in the process and that there can only be one nested group at the end of the parts. If that is not the case, please update your question with relevant examples and expected output.
You can split (again) on the parentheses then do some cleaning:
x = ['star/stellar','(class(ification))','(chart)']
for v in x:
y = v.split('(')
y = [a.replace(')','') for a in y if a != '']
if len(y) > 1:
print(dict([y]))
else:
print(y)
Gives:
['star/stellar']
{'class': 'ification'}
['chart']
If all of the title lines have the same format, that is they all have these three parts ;some/title_(some(thing))_(something), then you can catch the different parts to separate variables:
first, second, third = ln.split("_")
From there, you know that:
for the first item you need to drop the ;:
first = first[1:]
for the second item, you want to extract the stuff in the parentheses and then merge it into a dict:
k, v = filter(bool, re.split('[()]', second))
second = {k:v}
for the third item, you want to drop the surrounding parentheses
third = third[1:-1]
Then you just need to put them all together again:
[first, second, third]

How to get rid of whitespace in element of dictionary?

I am trying to extract only the string without whitespace of an element in a dictionary. This is how my element looks currently:
u'GYM-7874 '
I only want to extract the value without the whitespace so "GYM-7874" only so that I can compare it with another list of strings. I tried using the .strip() method to get rid of whitespaces but unfortunately that only works with string. How do I get rid of the white space and only extract the characters?
# json_ob is the string array of dictionary
title = json_ob[index]["title"]
title.strip()
strip() doesn't modify the string in place (strings are immutable in Python). You need to assign the result back to the variable.
title = title.strip()

How to split a list item based on digits in item

I am currently parsing this huge rpt file. In each item there is a value in parentheses. For example, "item_number_one(3.14)". How could I extract that 3.14 using the split function in python? Or is there another way to do that?
#Splits all items by comma
items = line.split(',')
#splits items within comma, just gives name
name_only = [i.split('_')[0] for i in items]
# print(name_only)
#splits items within comma, just gives full name
full_name= [i.split('(')[0] for i in items]
# print(full_Name)
#splits items within comma, just gives value in parentheses
parenth_value = [i.split('0-9')[0] for i in items]
# parenth_value = [int(s) for s in items.split() if s.isdigit()]
print(parenth_value)
parenth_value = [i.split('0-9')[0] for i in items]
for a more general way of extracting numbers from strings, you should read about Regular Expressions.
for this very specific case, you can split by ( and then by ) to get the value in between them.
like this:
line = "item_number_one(3.14)"
num = line.split('(')[1].split(')')[0]
print(num)
You could simply find starting index of parentheses and ending parentheses, and get the area between them:
start_paren = line.index('(')
end_paren = line.index(')')
item = line[start_paren + 1:end_paren]
# item = '3.14'
Alternatively, you could use regex, which offers an arguably more elegant solution:
import re
...
# anything can come before the parentheses, anything can come afterwards.
# We have to escape the parentheses and put a group inside them
# (this is notated with its own parentheses inside the pair that is escaped)
item = re.match(r'.*\(([0-9.-]*)\).*', line).group(1)
# item = '3.14'
can use regex and do something like below;
import re
sentence = "item_number_one(3.14)"
re.findall(r'\d.+', sentence)
You could get the integer value by using the following regular expression:
import re
text = 'item_number_one(3.14)'
re.findall(r'\d.\d+', text)
o/p: ['3.14']
Explanation:
"\d" - Matches any decimal digit; this is equivalent to the class [0-9].
"+" - one or more integers
In the same way you can parse the rpt file and split the lines and fetch the value present in the parentheses .

Regex to match strings within braces

I am trying to write a regex to a string that has the following format
12740(34,12) [abc (a1b2c3) (a2b3c4)......] myId123
Currently, I have something like this
\((?P<expression>\S+)\)
But with this, I can capture only the strings within square brackets.
Is there anyway I can capture the integers before the square brackets and also id at the end along with the strings within square brackets.
The number of strings enclosed within small brackets will not be the same. I could also have a string that looks like this
10(3,2) [abc (a1b2c3)] myId1
I know that I can write a simple regex for the above expression using brute force. But could anyone please help me write one when the number of strings within the square bracket keeps changing.
Thanks in advance
You can capture the information by using ^ and $, which mean start and end respectively:
((?P<front>^\d+)|\((?P<expression>\S+)\)|(?P<id>[a-zA-Z0-9]+)$)
Regex101:
https://regex101.com/r/PoA5k4/1
To make the result more usable, I'd turn it into a dictionary:
import re
myStr = "12740(34,12) [abc (a1b2c3) (a2b3c4)......] myId123"
di = {}
for find in re.findall("((?P<front>^\d+)|\((?P<expression>\S+)\)|(?P<id>[a-zA-Z0-9]+)$)",myStr):
if find[1] != "":
di["starter"] = find[1]
elif find[3] != "":
di["id"] = find[3]
else:
di.setdefault("expression",[]).append(find[2])
print(di)

From list of strings, extract only characters within brackets

I have a list of strings that have variable construction but have a character sequence enclosed in square brackets. I want to extract only the sequence enclosed by the square brackets. There is only one instance of square brackets per string, which simplifies the process.
I am struggling to do so in an elegant manner, and this is clearly a simple problem with Python's large string library.
What is a simple expression to do this?
Check regular expression, "re"
Something like this should do the trick
import re
s = "hello_from_adele[this_is_the_string_i_am_looking_for]this_is_not_it"
match = re.search(r"\[([A-Za-z0-9_]+)\]", s)
print match.group(1)
If you provide an example, we can be more specific
You don't even need re to do this:
In [11]: strng = "This is some text [that has brackets] followed by more text"
In [12]: strng[strng.index("[")+1:strng.index("]")]
Out[12]: 'that has brackets'
This uses string slicing to return the characters inside the brackets. index() returns the 0-based position of its argument. Since we don't want to include the [ at the beginning, we add 1. The second argument of the slice is the stop position, but it is not included in the returned substring, so we don't need to add anything to it.
If you prefer not to use regex for whatever reason, it should be easy to do with string splitting since you're guaranteed to have one and only one instance of [ and ].
s = "some[string]to check"
_, midright = s.split("[")
target, _ = midright.split("]")
or
target = s.split("[")[1].split("]")[0] # ewww

Categories