How to remove curly braces from text file? - python

I am trying to remove curly braces from text file. This is my code
This is my text file
( . || . )
. =
(){
= . ? . (" ") : . .
. . = ( )+8+" "
( && _ (" ")!="")
()
it is not working
import re
symbols =re.compile(r'{{.*?}}',flags=re.UNICODE)
result = symbols.sub(" ",result)
Any suggestions?
I got solution, without using re
text.replace('{', '')
text.replace('}', '')

text.replace('{', '')
text.replace('}', '')
should work fine, I like
text = 'abc{def}ghi'
text.translate(None, '{}')
or
unitext = u'abc{def}ghi'
unitext.translate({ord('{'):None, ord('}'):None})
It's probably even faster, if you do a lot of replacing.

Your pattern, {{.*?}}, will change a string like foo{{bar}}baz to foo baz. But since nothing like {{bar}} appears in your file, I don't think that's really what you want to do.
If you want to remove { and } characters, try this:
symbols = re.compile(r'[{}]',flags=re.UNICODE)
Also note that symbols.sub(" ",result) will replace them with spaces. If you want to just remove them, use symbols.sub("",result).
And of course, for something this simple, regular expressions are probably overkill. Basic string manipulation functions will probably suffice.

with open('output_file', 'w') as f_out:
with open('input_file') as f_in:
for line in f_in:
for ch in ['{', '}']:
line = line.replace(ch, '')
f_out.write(line)

RE it is very slow, i suggest to use simple replace:
text.replace('{', '')
text.replace('}', '')

Something like the following would remove all curlys from mystring.
import re
mystring = 'my stuff with {curly} braces'
result = re.sub(r'[{}]', '', mystring)

Related

Python - remove spaces and indents from string

I have a sql query string
query_for_update =
f'''
update {db_name}.{schema_name}.{self.table_name}
set {self.updated_field} = {self.updated_field}
where {self.key_field} in ({ids});
'''
But when I try to write this query to file f.write(query_for_update) I get following result:
update store_1.dbo.[my_table]
set [Trusted Plan] = [Trusted Plan]
where [Entry No_] in (1472371,
1472375,
1472377,
1472379,
1472373,
);
Code that creates string:
ids_string = ',\n'.join(["'" + str(item) + "'" for item in result.id])
query_for_update = mssql_table.get_update_query('dbo', mssql_db_name, ids_string).strip()
with open(mssql_server_name + '.sql', 'a') as f:
f.write(query_for_update)
How can i remove indents for strings in this case?
You can use textwrap.dedent (standard library):
import textwrap
query = textwrap.dedent(f"""\
update {db_name}.{schema_name}.{self.table_name}
set {self.updated_field} = {self.updated_field}
where {self.key_field} in ({ids});
""")
print(query)
This will remove all leading spaces that are common between every line. Useful for tipple quoted strings.
You can use the str.strip() function with a for loop to fix it.
for x in list:
if x.strip():
list2.append(x)
then you can use list2 as your new usable list
you can use the str.stip method https://www.w3schools.com/python/ref_string_strip.asp
For indentations and breaks you need to consider that you might need to use \n. There is also a dedent method in the textwrap library that could be interesting for you. https://docs.python.org/3/library/textwrap.html
I hope this helps :)

How to extract text part from file using Python & Regular Expressions

Using Python I want to read a text file, search for a string and print all lines between this matching string and another one.
The textfile looks like the following:
Text=variables.Job_SalesDispatch.CaptionNew
Tab=0
TabAlign=0
}
}
}
[UserVariables]
User1=#StJid;IF(fields.Fieldtype="Artikel.Gerät" , STR$(fields.id,0,0) , #StJid)
[Parameters]
[#Parameters]
{
[Parameters]
{
LL.ProjectDescription=? (default)
LL.SortOrderID=
}
}
[PageLayouts]
[#PageLayouts]
{
[PageLayouts]
{
[PageLayout]
{
DisplayName=
Condition=Page() = 1
SourceTray=0
Now I want to print all "UserVariables", so only the lines between [UserVariables] and the next line starting with a square bracket. In this example this would be [Parameters].
What I have done so far is:
with open("path/testfile.lst", encoding="utf8", errors="ignore") as file:
for line in file:
uservars = re.findall('\b(\w*UserVariables\w*)\b', line)
print (uservars)
what gives me only [].
If using regular expressions is not a mandatory requirement for you, you can go with something like this:
with open("path/testfile.lst", encoding="utf8", errors="ignore") as file:
inside_uservars = False
for line in file:
if inside_uservars:
if line.strip().startswith('['):
inside_uservars = False
else:
print(line)
if line.strip() == '[UserVariables]':
inside_uservars = True
We can try using re.findall with the following regex pattern:
\[UserVariables\]\n((?:(?!\[.*?\]).)*)
This says to match a [UserVariables] tag, followed by a slightly complicated looking expression:
((?:(?!\[.*?\]).)*)
This expression is a tempered dot trick which matches any character, one at a time, so long as what lies immediately ahead is not another tag contained in square brackets.
matches = re.findall(r'\[UserVariables\]\n((?:(?!\[.*?\]).)*)', input, re.DOTALL)
print(matches)
[' User1=#StJid;IF(fields.Fieldtype="Artikel.Ger\xc3\xa4t" , STR$(fields.id,0,0) , #StJid)\n']
Edit:
My answer assumes that the entire file content sits in memory, in a single Python string. You may read the entire file using:
with open('Path/to/your/file.txt', 'r') as content_file:
input = content_file.read()
matches = re.findall(r'\[UserVariables\]\n((?:(?!\[.*?\]).)*)', input, re.DOTALL)
print(matches)

regex.sub unexpectedly modifying the substituting string with some kind of encoding?

I have a path string "...\\JustStuff\\2017GrainHarvest_GQimagesTestStand\\..." that I am inserting into an existing text file in place of another string. I compile a regex pattern and find bounding text to get the location to insert, and then use regex.sub to replace it. I'm doing something like this...
with open(imextXML, 'r') as file:
filedata = file.read()
redirpath = re.compile("(?<=<directoryPath>).*(?=</directoryPath>)", re.ASCII)
filedatatemp = redirpath.sub(newdir,filedata)
The inserted text is messed up though, with "\\20" being replaced with "\x8" and "\\" replaced with "\" (single slash)
i.e.
"...\\JustStuff\\2017GrainHarvest_GQimagesTestStand\\..." becomes
"...\\JustStuff\x817GrainHarvest_GQimagesTestStand\..."
What simple thing am I missing here to fix it?
Update:
to break this down even further to copy and paste to reproduce the issue...
t2 = r'\JustStuff\2017GrainHarvest_GQimagesTestStand\te'
redirpath = re.compile("(?<=<directoryPath>).*(?=</directoryPath>)", re.ASCII)
temp = r"<directoryPath>aasdfgsdagewweags</directoryPath>"
redirpath.sub(t2,temp)
produces...
>>'<directoryPath>\\JustStuff\x817GrainHarvest_GQimagesTestStand\te</directoryPath>'
When you define the string that you want to insert, prefix it with an r to indicate that it is a raw string literal:
>>> rex = re.compile('a')
>>> s = 'path\\with\\2017'
>>> sr = r'path\\with\\2017'
>>> rex.sub(s, 'ab')
'path\\with\x817b'
>>> rex.sub(sr, 'ab')
'path\\with\\2017b'

simple way convert python string to quoted string

i'm new to python and i'm having a select statement like following help_category_id, name, what is the most effective way to convert this string to this:
'help_category_id', 'name'
i've currently done this, which works fine, but is there a nicer and more clean way to do the same:
test_string = 'help_category_id, name'
column_sort_list = []
if test_string is not None:
for col in test_string.split(','):
column = "'{column}'".format(column=col)
column_sort_list.append(column)
column_sort = ','.join(column_sort_list)
print(column_sort)
Simple one liner using looping constructs:
result = ", ".join(["'" + i + "'" for i.strip() in myString.split(",")])
What we are doing here is we are creating a list that contains all substrings of your original string, with the quotes added. Then, using join, we make that list into a comma delimited string.
Deconstructed, the looping construct looks like this:
resultList = []
for i in myString.split(","):
resultList.append("'" + i.strip() + "'")
Note the call to i.strip(), which removes extraneous spaces around each substring.
Note: You can use format syntax to make this code even cleaner:
New syntax:
result = ", ".join(["'{}'".format(i.strip()) for i in myString.split(",")])
Old syntax:
result = ", ".join(["'%s'" % i.strip() for i in myString.split(",")])
it can be achieved by this also.
','.join("'{}'".format(value) for value in map(lambda text: text.strip(), test_string.split(",")))

Python - finding specific string in files

I try to read specific string in files. Basically file look like this:
S0M6A36A108A180A252A324A36|1|48|89|36|Single|
S0M6A36A108A180A252A324A36|2|43|83|108|Single|
S0M6A36A108A180A252A324A36|3|37|85|180|Single|
S0M6A36A108A180A252A324A36|4|37|93|252|Single|
S0M6A36A108A180A252A324A36|5|43|95|324|Single|
S0M6A36A108A180A252A324A36|6|42|89|36|Single|
[META DATA]
01/10/2015|14:50:27|USA|UWI_N2C34_2|MMS1|FORD35|Bednarek|true|6|0|false|
[QUALITY CAMERA CHECK]
1|1|0|
2|1|0|
3|1|0|
4|1|0|
5|1|0|
6|1|0|
[PRESET]
S0M6A36A108A180A252A324A36|TA|
What I need is to read from line: 01/10/2015|14:50:27|USA|UWI_N2C34_2|MMS1|FORD35|Bednarek|true|6|0|false|
a country name between string |USA|
To do that I tried to use function group which is part of regular expression. I deduced that I need to read from specific line which hold this string. So I wrote small code:
import os
import string
import re
import sys
import glob
import fileinput
country_pattern = 'MYS','IDN','ZAF', 'THA','TWN','SGP', 'NWZ', 'AUS','ALB','AUT','BEL', 'BGR', 'BIH', 'CHE','CZE', 'DEU', 'DNK', 'ESP','EST','SRB','MDK','MNE','BIH', 'BIH','MNE','FIN', 'FRA', 'GBR','GRC', 'HRV', 'HUN', 'IRL', 'ITA', 'LIE', 'LTU', 'LUX', 'LVA', 'MDA', 'SMR','CYP','NLD','NOR','POL','PRT','ROU','SCG', 'SVK','SVN','SWE','TUR','BRA','CAN','USA','MEX','CHL','ARG','RUS'
pattern = r'(\d+)/(\d+)/(\d+)|(\d+):(\d+):(\d+)|(\S+)|(\S+)|(\S+)|(\S+)|(\S+)|(\S+)|(\d+)|(\d+)|(\S+)|'
src = raw_input("Enter source disk location: ")
src = os.path.dirname(src)
for dir,_,_ in os.walk(src):
file_path = glob.glob(os.path.join(dir,"*.txt"))
for file in file_path:
f = open(file, 'r')
object_name = f.readlines()
f.close()
for line_name_tmp in object_name:
line_name = line_name_tmp.replace('\n','')
if line_name == '':
line_name.split()
continue
else:
try:
searchObj = re.search(pattern, line_name)
m = searchObj.group(7)
if m in country_pattern:
print "searchObj.group(7) : ", searchObj.group(7)
else:
print 'did not find any match'
except:
print line_name
pass
But it will always print me 'did not find any match'. Did I miss something ?
Thanks for advise.
your re is the problem
try this one
pattern = r'(\d+)/(\d+)/(\d+)\|(\d+):(\d+):(\d+)\|(\S+)\|(\S+)\|(\S+)\|(\S+)\|(\S+)\|(\S+)\|(\d+)\|(\d+)\|(\S+)\|'
In regular expressions, the character | separates alternatives. So if you define a regex like this,
(\d+)/(\d+)/(\d+)|(\d+):(\d+):(\d+)
it will match a string of the form digits/digits/digits or a string of the form digits:digits:digits. Not both.
Accordingly, when you take your pattern regex and search the line
01/10/2015|14:50:27|USA|UWI_N2C34_2|MMS1|FORD35|Bednarek|true|6|0|false|
for a match, the regex winds up matching only the part 01/10/2015, because that part is matched by the first alternative ((\d+)/(\d+)/(\d+)). The seventh capturing group in the regex is not within the part that matched, so m.group(7) returns None, and of course None is not one of the elements in country_pattern.
The easy - or one might say lazy - way to fix this is to escape the pipe characters in the definition of the regex: use \| instead of |. But since you have fields separated by | in the file, I think you might have a better designed program if you were to use line_name.split('|') and then pick out the third field, instead of using a regular expression.
if need just to find it text country abbreviation this will do it:
data = '''
01/10/2015|14:50:27|USA|UWI_N2C34_2|MMS1|FORD35|Bednarek|true|6|0|false|
'''
country_pattern = 'MYS','IDN','ZAF', 'THA','TWN','SGP', 'NWZ', 'AUS','ALB','AUT','BEL', 'BGR', 'BIH', 'CHE','CZE', 'DEU', 'DNK', 'ESP','EST','SRB','MDK','MNE','BIH', 'BIH','MNE','FIN', 'FRA', 'GBR','GRC', 'HRV', 'HUN', 'IRL', 'ITA', 'LIE', 'LTU', 'LUX', 'LVA', 'MDA', 'SMR','CYP','NLD','NOR','POL','PRT','ROU','SCG', 'SVK','SVN','SWE','TUR','BRA','CAN','USA','MEX','CHL','ARG','RUS'
mo = re.search(r'\|[A-Z]{3}\|',data)
if mo:
print(mo.group(0))
|USA|

Categories