Python re.sub specific syntax - python

I am writing a script in python that replaces specific lines in files Linux. Say i have a file called hi in the /home directory that contains:
hi 873840
Here is my script:
#! /usr/bin/env python
import re
fp = open("/home/hi","w")
re.sub(r"hi+", "hi 90", fp)
My desired outcome is:
hi 90
however, when i run it i get this error and the hi file ends up being balnk:
Traceback (most recent call last):
File "./script.py", line 6, in <module>
re.sub(r"hi+", "hi 90", fp)
File "/usr/lib/python2.7/re.py", line 155, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or buffer
Is there something wrong with my syntax?
Thanks

use "r" more to read file, "w" mode will create empty file for writing. .readline() will get and pass the string to the re.sub(). r" .*" will return a string you want to replace after the 'space' character. i assume 'hi 873840' is the only text in your file and your desired output is only 'hi 90'
echo "hi 873840" > hi.txt
python3.6
import re
fp = open("hi.txt", "r")
print(re.sub(r" .*", " 90", fp.readline()))

You should open the file in read mode. re.sub expects three arguments, pattern, repl, string. The problem is the third argument you are passing is a file pointer.
Eg:
import re
with open('/home/hi', 'r', encoding='utf-8') as infile:
for line in infile:
print(re.sub(r"hi+", "hi 90", line.strip()))

Related

I get an error whenever I try to read a file in Python, how can I fix this?

My code:
String = open(r"C:\Users\chloe\OneDrive\Documents\Python\Python code\Python text files\Story\VerbJust.txt", "r").read()
print(String)
I have the file stored in the exact folder, but I got an error:``
Traceback (most recent call last):
File "C:\Users\chloe\OneDrive\Documents\Python\Python code\StoryClasses.py", line 47, in <module>
VerbTo = ReadFile("VerbTo")
File "C:\Users\chloe\OneDrive\Documents\Python\Python code\StoryClasses.py", line 41, in ReadFile
string = open(w[variable][0], "r").read()
FileNotFoundError: [Errno 2] No such file or directory: 'C'
Why is this? Can Python not access OneDrive?
In this line:
string = open(w[variable][0], "r").read()
it appears that w[variable] contains the filename. Adding [0] to that uses just the first character of the filename. Get rid of that.
string = open(w[variable], "r").read()
This error occurs because the quotation marks are formatted incorrectly.
Also, I suspect the variable name you chose, "String", may cause some issues.
Try:
string = open(r"filepath", "r").read()
print(string)

Python Code to Remove Spaces from Chinese Characters in multiple UTF8 text files

I am trying to write a Python code in Python 3.7.2 to remove spaces from all Chinese characters in multiple UTF8 text files in the same directory.
The code I have currently is applicable only to 1 file:
import re
with open("transcript 0623.txt") as text:
new_text = re.sub("(?<![ -~]) (?![ -~])", "", text)
with open("transcript 0623_out.txt", "w") as result:
result.write(new_text)
I get the following error:
Traceback (most recent call last):
File "C:\Users\Admin\Desktop\Wave.3\test.py", line 4, in <module>
new_text = re.sub("(?<![ -~]) (?![ -~])", "", text)
File "C:\Users\Admin\AppData\Local\Programs\Python\Python37-32\Lib\re.py", line 192, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
Can you advise me what is wrong and help me suggest improvements to the code? Thank you.
open() returns a file object (Source: https://docs.python.org/3/library/functions.html#open)
If you want to perform regex operations on the file's contents, you will have to use .read() function on the file object to get the text contents.
For example,
with open("transcript 0623.txt") as f:
text = f.read()
new_text = re.sub("(?<![ -~]) (?![ -~])", "", text)
with open("transcript 0623_out.txt", "w") as result:
result.write(new_text)

Creating text files, appending them to zip, then delete them

I am trying to get the code below to read the file raw.txt, split it by lines and save every individual line as a .txt file. I then want to append every text file to splits.zip, and delete them after appending so that the only thing remaining when the process is done is the splits.zip, which can then be moved elsewhere to be unzipped. With the current code, I get the following error:
Traceback (most recent call last): File "/Users/Simon/PycharmProjects/text-tools/file-splitter-txt.p‌​y",
line 13, in <module> at stonehenge summoning the all father. z.write(new_file)
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framewo‌​rk/Versions/2.7/lib/‌​python2.7/zipfile.py‌​", line 1123, in write st = os.stat(filename) TypeError: coercing to Unicode: need string or buffer,
file found
My code:
import zipfile
import os
z = zipfile.ZipFile("splits.zip", "w")
count = 0
with open('raw.txt','r') as infile:
for line in infile:
print line
count +=1
with open(str(count) + '.txt','w') as new_file:
new_file.write(str(line))
z.write(new_file)
os.remove(new_file)
You could simply use writestr to write a string directly into the zipFile. For example:
zf.writestr(str(count) + '.txt', str(line), compress_type=...)
Use the file name like below. write method expects the filename and remove expects path. But you have given the file (file_name)
z.write(str(count) + '.txt')
os.remove(str(count) + '.txt')

TypeError - What does this error mean?

So, i've been writing this program that takes a HTMl file, replaces some text and puts the return back into a different file in a different directory.
This error happened.
Traceback (most recent call last):
File "/Users/Glenn/jack/HTML_Task/src/HTML Rewriter.py", line 19, in <module>
with open (os.path.join("/Users/Glenn/jack/HTML_Task/src", out_file)):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/posixpath.py", line 89, in join
genericpath._check_arg_types('join', a, *p)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/genericpath.py", line 143, in _check_arg_types
(funcname, s.__class__.__name__)) from None
TypeError: join() argument must be str or bytes, not 'TextIOWrapper'
Below is my code. Has anyone got any solutions I could implement, or should I kill it with fire.
import re
import os
os.mkdir ("dest")
file = open("2016-06-06_UK_BackToSchool.html").read()
text_filtered = re.sub(r'http://', '/', file)
print (text_filtered)
with open ("2016-06-06_UK_BackToSchool.html", "wt") as out_file:
print ("testtesttest")
with open (os.path.join("/Users/Glenn/jack/HTML_Task/src", out_file)):
out_file.write(text_filtered)
os.rename("/Users/Glenn/jack/HTML_Task/src/2016-06-06_UK_BackToSchool.html", "/Users/Glenn/jack/HTML_Task/src/dest/2016-06-06_UK_BackToSchool.html")
with open (os.path.join("/Users/Glenn/jack/HTML_Task/src", out_file)):
Here out_file if TextIOWrapper, not string.
os.path.join takes string as arguments.
Do not use keywords name as variable. file is keyword.
Do not use space in between function call os.mkdir ("dest")
try to change this:
with open ("2016-06-06_UK_BackToSchool.html", "wt") as out_file
on this:
with open ("2016-06-06_UK_BackToSchool.html", "w") as out_file:
or this:
with open ("2016-06-06_UK_BackToSchool.html", "wb") as out_file:

Converting python webcrawler to 3.4 from 2.7

For this code I am converting a working python webcrawler from 2.7 to 3.4. I've made some modifications but I still get errors when running it:
Traceback (most recent call last):
File "Z:\testCrawler.py", line 11, in <module>
for i in re.findall('''href=["'](.[^"']+)["']''', urllib.request.urlopen(myurl).read(), re.I):
File "C:\Python34\lib\re.py", line 206, in findall
return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object
This is the code itself, please tell me if you see what the syntax errors are.
#! C:\python34
import re
import urllib.request
textfile = open('depth_1.txt','wt')
print ("Enter the URL you wish to crawl..")
print ('Usage - "http://phocks.org/stumble/creepy/" <-- With the double quotes')
myurl = input("#> ")
for i in re.findall('''href=["'](.[^"']+)["']''', urllib.request.urlopen(myurl).read(), re.I):
print (i)
for ee in re.findall('''href=["'](.[^"']+)["']''', urllib.request.urlopen(i).read(), re.I):
print (ee)
textfile.write(ee+'\n')
textfile.close()
Change
urllib.request.urlopen(myurl).read()
to for example
urllib.request.urlopen(myurl).read().decode('utf-8')
What happens here is .read() returning bytes instead of str like it was in python 2.7, so it has to be decoded using some encoding.

Categories