Text Substitution Using Python Dictionary in re.sub - python

I have an input file in the below format:
<ftnt>
<p><su>1</su> aaaaaaaaaaa </p>
</ftnt>
...........
...........
...........
... the <su>1</su> is availabe in the .........
I need to convert this to the below format by replacing the value and deleting the whole data in ftnt tags:
"""...
...
... the aaaaaaaaaaa is available in the ..........."""
Please find the code which i have written. Initially i saved the keys & values in dictionary and tried to replace the value based on the key using grouping.
import re
dict = {}
in_file = open("in.txt", "r")
outfile = open("out.txt", "w")
File1 = in_file.read()
infile1 = File1.replace("\n", " ")
for mo in re.finditer(r'<p><su>(\d+)</su>(.*?)</p>',infile1):
dict[mo.group(1)] = mo.group(2)
subval = re.sub(r'<p><su>(\d+)</su>(.*?)</p>','',infile1)
subval = re.sub('<su>(\d+)</su>',dict[\\1], subval)
outfile.write(subval)
I tried to use dictionary in re.sub but I am getting a KeyError. I don't know why this happens could you please tell me how to use. I'd appreciate any help here.

Try using a lambda for the second argument to re.sub, rather than a string with backreferences:
subval = re.sub('<su>(\d+)</su>',lambda m:dict[m.group(1)], subval)

First off, don't name dictionaries dict or you'll destroy the dict function. Second, \\1 doesn't work outside of a string hence the syntax error. I think the best bet is to take advantage of str.format
import re
# store the substitutions
subs = {}
# read the data
in_file = open("in.txt", "r")
contents = in_file.read().replace("\n", " ")
in_file.close()
# save some regexes for later
ftnt_tag = re.compile(r'<ftnt>.*</ftnt>')
var_tag = re.compile(r'<p><su>(\d+)</su>(.*?)</p>')
# pull the ftnt tag out
ftnt = ftnt_tag.findall(contents)[0]
contents = ftnt_tag.sub('', contents)
# pull the su
for match in var_tag.finditer(ftnt):
# added s so they aren't numbers, useful for format
subs["s" + match.group(1)] = match.group(2)
# replace <su>1</su> with {s1}
contents = re.sub(r"<su>(\d+)</su>", r"{s\1}", contents)
# now that the <su> are the keys, we can just use str.format
out_file = open("out.txt", "w")
out_file.write( contents.format(**subs) )
out_file.close()

Related

Find and Remove function call with brackets

I'm working on a python script that will allow me to remove some attributes from a function call in a Java class. The problem is I can't find the right regex to include both the name of the attribute and the brackets.
The string I'm looking to remove is, as an example, 'withContentDescription("random text")'
What is the correct way to include the () brackets and the random content of those within my code?
import re
filein = '/path/file.java'
fileout = '/path/newfile.java'
f = open(filein,'r')
filedata = f.read()
f.close()
print("Removing Content Descriptor")
newdata = filedata.strip("withContentDescription\)")
f = open(fileout,'w')
f.write(newdata)
print("--- Done")
f.close()
I'd like to obtain something like
old string: allOf(withId(someinfo), withContentDescription("Text"))
new string: allOf(withId(someinfo))
Using re.sub
Ex:
import re
s = 'allOf(withId(someinfo), withContentDescription("Text"))'
print(re.sub(r",\s*(withContentDescription\(.*?\))", "", s))
Output:
allOf(withId(someinfo))

Python compare two files using a list

Im trying to compare two files via regex strings and print the output. I seem to have an issue with my loop as only the last line gets printed out. What am I missing ?
import re
delist = [r'"age":.*",',r'"average":.*",',r'"class":.*",']
with open('test1.txt', 'r') as bolo:
boloman = bolo.read()
for dabo in delist:
venga = re.findall(dabo, boloman)
for vaga in venga:
with open ('test.txt', 'r' ) as f:
content = f.read()
venga2 = re.findall(dabo, content)
for vaga2 in venga2:
mboa = content.replace(vaga2,vaga,1)
print (mboa)
At first, a problem I see is that you are always setting mboa with the only result. I think what you really want to do is to create a list and append it to that list.
import re
mboa = []
delist = [r'"age":.*",',r'"average":.*",',r'"class":.*",']
with open('test1.txt', 'r') as bolo:
boloman = bolo.read()
for dabo in delist:
venga = re.findall(dabo, boloman)
for vaga in venga:
with open ('test.txt', 'r' ) as f:
content = f.read()
venga2 = re.findall(dabo, content)
for vaga2 in venga2:
mboa.append(content.replace(vaga2,vaga,1))
print (mboa)
does that solve the issue? if it doesn't add a comment to this question and I'll try to fix it out ;)

Python: replace a string in a CSV file

I am a beginner and I have an issue with a short code. I want to replace a string from a csv to with another string, and put out a new
csv with an new name. The strings are separated with commas.
My code is a catastrophe:
import csv
f = open('C:\\User\\Desktop\\Replace_Test\\Testreplace.csv')
csv_f = csv.reader(f)
g = open('C:\\Users\\Desktop\\Replace_Test\\Testreplace.csv')
csv_g = csv.writer(g)
findlist = ['The String, that should replaced']
replacelist = ['The string that should replace the old striong']
#the function ?:
def findReplace(find,replace):
s = f.read()
for item, replacement in zip(findlist,replacelist):
s = s.replace(item,replacement)
g.write(s)
for row in csv_f:
print(row)
f.close()
g.close()
You can do this with the regex package re. Also, if you use with you don't have to remember to close your files, which helps me.
EDIT: Keep in mind that this matches the exact string, meaning it's also case-sensitive. If you don't want that then you probably need to use an actual regex to find the strings that need replacing. You would do this by replacing find_str in the re.sub() call with r'your_regex_here'.
import re
# open your csv and read as a text string
with open(my_csv_path, 'r') as f:
my_csv_text = f.read()
find_str = 'The String, that should replaced'
replace_str = 'The string that should replace the old striong'
# substitute
new_csv_str = re.sub(find_str, replace_str, my_csv_text)
# open new file and save
new_csv_path = './my_new_csv.csv' # or whatever path and name you want
with open(new_csv_path, 'w') as f:
f.write(new_csv_str)

Import dictionary from txt file

I have a function CalcPearson that needs 2 dictionaries as input. The dictionaries are in txt files in the following format:
(22, 0.4271125909116274)
(14, 0.4212051728881959)
(3, 0.4144765342960289)
(26, 0.41114433561925906)
(39, 0.41043882384484764)
.....
How can I import the data from the files as dictionaries? Do I need to modify them or there is a simple function for this?
I tried with this code:
inf = open('Route_cc.txt','r')
inf2 = open('Route_h2.txt','r')
d1 = eval(inf.read())
d2 = eval(inf2.read())
print(calcPearson(d1,d2))
inf.close()
But I got an invalid syntax error at the second row of the first file that the code opened so I think I need a particular syntax in the file.
If you're certain that you are looking for a dictionary, you can use something like this:
inf = open('Route_cc.txt', 'r')
content = inf.read().splitlines()
for line in range(content):
content[line] = content[line].strip('(').strip(')')
content[line] = content[line].split(', ')
inf_dict = dict(content)
Or more condensed:
inf = open('Route_cc.txt', 'r')
content = inf.read().splitlines()
inf_dict = dict(i.strip('(').strip(')').split(', ') for i in content)
Another option:
import re
inf = open('Route_cc.txt', 'r')
content = inf.read()
inf_dict = dict(i.split(', ') for i in re.findall("[^()\n-]+", content))
Note: Your original use of eval is unsafe and a poor practice.
Since you've mentioned that your dictionaries are in txt files, you'll have to tokenize your input by splitting into key/value pairs.
Read the file, line by line.
Remove the leading and trailing braces.
Split the stripped line using a comma as a delimiter.
Add each line to your dictionary.
I've written this code, and tested it for the sample input you have given. Have a look.
import collections
def addToDictionary(dict, key, value):
if key in dict:
print("Key already exists")
else:
dict[key] = value
def displayDictionary(dict):
dict = collections.OrderedDict(sorted(dict.items()))
for k, v in dict.items():
print(k, v)
filename = "dictionary1.txt"
f = open(filename, 'r')
dict1 = {}
for line in f:
line = line.lstrip('(')
line = line.rstrip(')\n')
tokenizedLine = line.split(', ')
addToDictionary(dict1, tokenizedLine[0], tokenizedLine[1])
displayDictionary(dict1)
Don't use eval it is dangerous (see the dangers of eval). Instead, use ast.literal_eval.
You can't create a dictionary directly from an input as you have given it. You have to go through the lines, one by one, convert them into a zip object and add it to a dictionary.
This process is shown below.
Code:
import ast
inf = open('Route_cc.txt','r')
d1 = {}
for line in inf:
zipi = ast.literal_eval(line)
d1[zipi[0]] = zipi[1]
inf2 = open('Route_h2.txt','r')
d2 = {}
for line1 in inf2:
zipi1 = ast.literal_eval(line1)
d2[zipi1[0]] = zipi1[1]
print(calcPearson(d1, d2))
inf.close()
inf2.close()

Writing a dict to txt file and reading it back?

I am trying to write a dictionary to a txt file. Then read the dict values by typing the keys with raw_input. I feel like I am just missing one step but I have been looking for a while now.
I get this error
File "name.py", line 24, in reading
print whip[name]
TypeError: string indices must be integers, not str
My code:
#!/usr/bin/env python
from sys import exit
class Person(object):
def __init__(self):
self.name = ""
self.address = ""
self.phone = ""
self.age = ""
self.whip = {}
def writing(self):
self.whip[p.name] = p.age, p.address, p.phone
target = open('deed.txt', 'a')
target.write(str(self.whip))
print self.whip
def reading(self):
self.whip = open('deed.txt', 'r').read()
name = raw_input("> ")
if name in self.whip:
print self.whip[name]
p = Person()
while True:
print "Type:\n\t*read to read data base\n\t*write to write to data base\n\t*exit to exit"
action = raw_input("\n> ")
if "write" in action:
p.name = raw_input("Name?\n> ")
p.phone = raw_input("Phone Number?\n> ")
p.age = raw_input("Age?\n> ")
p.address = raw_input("Address?\n>")
p.writing()
elif "read" in action:
p.reading()
elif "exit" in action:
exit(0)
Have you tried the json module? JSON format is very similar to python dictionary. And it's human readable/writable:
>>> import json
>>> d = {"one":1, "two":2}
>>> json.dump(d, open("text.txt",'w'))
This code dumps to a text file
$ cat text.txt
{"two": 2, "one": 1}
Also you can load from a JSON file:
>>> d2 = json.load(open("text.txt"))
>>> print d2
{u'two': 2, u'one': 1}
Your code is almost right! You are right, you are just missing one step. When you read in the file, you are reading it as a string; but you want to turn the string back into a dictionary.
The error message you saw was because self.whip was a string, not a dictionary. So you need to convert the string to a dictionary.
Example
Here is the simplest way: feed the string into eval(). Like so:
def reading(self):
s = open('deed.txt', 'r').read()
self.whip = eval(s)
You can do it in one line, but I think it looks messy this way:
def reading(self):
self.whip = eval(open('deed.txt', 'r').read())
But eval() is sometimes not recommended. The problem is that eval() will evaluate any string, and if someone tricked you into running a really tricky string, something bad might happen. In this case, you are just running eval() on your own file, so it should be okay.
But because eval() is useful, someone made an alternative to it that is safer. This is called literal_eval and you get it from a Python module called ast.
import ast
def reading(self):
s = open('deed.txt', 'r').read()
self.whip = ast.literal_eval(s)
ast.literal_eval() will only evaluate strings that turn into the basic Python types, so there is no way that a tricky string can do something bad on your computer.
EDIT
Actually, best practice in Python is to use a with statement to make sure the file gets properly closed. Rewriting the above to use a with statement:
import ast
def reading(self):
with open('deed.txt', 'r') as f:
s = f.read()
self.whip = ast.literal_eval(s)
In the most popular Python, known as "CPython", you usually don't need the with statement as the built-in "garbage collection" features will figure out that you are done with the file and will close it for you. But other Python implementations, like "Jython" (Python for the Java VM) or "PyPy" (a really cool experimental system with just-in-time code optimization) might not figure out to close the file for you. It's good to get in the habit of using with, and I think it makes the code pretty easy to understand.
To store Python objects in files, use the pickle module:
import pickle
a = {
'a': 1,
'b': 2
}
with open('file.txt', 'wb') as handle:
pickle.dump(a, handle)
with open('file.txt', 'rb') as handle:
b = pickle.loads(handle.read())
print a == b # True
Notice that I never set b = a, but instead pickled a to a file and then unpickled it into b.
As for your error:
self.whip = open('deed.txt', 'r').read()
self.whip was a dictionary object. deed.txt contains text, so when you load the contents of deed.txt into self.whip, self.whip becomes the string representation of itself.
You'd probably want to evaluate the string back into a Python object:
self.whip = eval(open('deed.txt', 'r').read())
Notice how eval sounds like evil. That's intentional. Use the pickle module instead.
Hi there is a way to write and read the dictionary to file you can turn your dictionary to JSON format and read and write quickly just do this :
To write your date:
import json
your_dictionary = {"some_date" : "date"}
f = open('destFile.txt', 'w+')
f.write(json.dumps(your_dictionary))
and to read your data:
import json
f = open('destFile.txt', 'r')
your_dictionary = json.loads(f.read())
I created my own functions which work really nicely:
def writeDict(dict, filename, sep):
with open(filename, "a") as f:
for i in dict.keys():
f.write(i + " " + sep.join([str(x) for x in dict[i]]) + "\n")
It will store the keyname first, followed by all values. Note that in this case my dict contains integers so that's why it converts to int. This is most likely the part you need to change for your situation.
def readDict(filename, sep):
with open(filename, "r") as f:
dict = {}
for line in f:
values = line.split(sep)
dict[values[0]] = {int(x) for x in values[1:len(values)]}
return(dict)
You can iterate through the key-value pair and write it into file
pair = {'name': name,'location': location}
with open('F:\\twitter.json', 'a') as f:
f.writelines('{}:{}'.format(k,v) for k, v in pair.items())
f.write('\n')

Categories