Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
I have regular expression \n([\d]) that can match this following text:
Then I want to replace that matched text with first group or $1 in Visual Studio Code. This is the result:
I want the same idea in python, which I already make this code.
import re
file = "out FCE.txt"
pattern = re.compile(".+")
for i, line in enumerate(open(file)):
for match in re.finditer(pattern, line):
print(re.sub(r"\n([\d])", r"\1", match.group()))
But that code does nothing to it. Which mean the result is still the same as the first picture. Newlines and the line with numbers at first character are not removed. I already read this answer, that python is using \1 not $1. And yes, I want to keep the whitespaces between in order to be neat as \t\t\t.
Sorry if my explanation is confusing and also my english is bad.
The problem here is that you are reading the file line by line. In each loop of for i, line in enumerate(open(file)):, re.sub accesses only one line, and therefore it cannot see whether the next line starts with a digit.
Try instead:
import re
file = "out FCE.txt"
with open(file, 'r') as f:
text = f.read()
new_text = re.sub(r"\n([\d])", r"\1", text)
print(new_text)
In this code the file is read as a whole (into the variable text) so that re.sub now sees whether the subsequent line starts with a digit.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Write a Python program that will search for lines that start with 'F', followed by 2 characters, followed by 'm:' using the mbox-short.txt text file.
Write a Python program that will search for lines that start with From and have an # sign
My code:
import re
file_hand = open("mbox-short.txt")
for line in file_hand:
line = line.rstrip()
if re.search('From:', line):
print(line)
your code seems to lack the actual regular expression that will find the result you are looking for. If I understand correctly, your aim is to find lines starting with F, followed by ANY two characters. If this is the case, you wish to print the line to the terminal. Let me guide you:
import re
file_hand = open("mbox-short.txt")
for line in file_hand: #NB: After a new scope is entered, use indentation
result = re.search("$f..", line) #pattern, search string
#$ matches character before the first in a line
#. matches 1 occurence of any character
if result.group() != "": #access result of re.search with group() method
print(line)
I trust you can follow this. If you need capital F, I will leave it as a homework exercise for you to find out how to do the capital F.
You can practice with regexp here:
https://regexr.com/
Or read more about it here:
https://www.youtube.com/watch?v=rhzKDrUiJVk
I think you didn't ask your question clear enough for everybody to understand. Also, insert your code for better readability ('Code Sample'). I already did that with your code, so you can have a look at that.
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 6 years ago.
Improve this question
Python3 is telling me I have an error in my indentation. I've tried about a million different indentations and I am lost. It's not pointing to the error directly, and just pointing to a parenthesis, leaving me to figure it out on my own.
import os
for root, dirs, files in os.walk('C:\\Users\\Tom\\Desktop'):
for file in files:
if file.endswith('.txt'):
f = open("test.txt", "r+")
f.seek(0)
for line in f:
a = f.read()
f.seek(0)
for char in a:
o = ord(char)
f.write(str(o))
f.truncate()
Apologies, I forgot to include the error message.
File "C:\Users\Tom\Desktop\Search.py", line 6
f = open("test.txt", "r+")
^
TabError: inconsistent use of tabs and spaces in indentation
I loaded the text from your question into a text editor (vim) and showed invisible characters, which renders this.
Here, spaces show as space, and tab shows as ^I. As you can see, your second for and first if lines are indented with spaces, and the rest of the file is indented with tabs.
In the general sense, this creates a real mess in Python, where indentation is syntactically significant to the program structure.
In Python 3 specifically, mixing tabs and spaces as indentation is a fatal compile error. That is what you've encountered (TabError).
See PEP-8, which suggests using spaces only, never tabs, and using a 4-space indent.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I have a text file that looks like this:
This is one sentence()This is another sentence()This is a full sentence at all)Maybe this too)This is the last sentence()
I need to split the parts that the text looks like this:
This is one sentence()
This is another sentence()
This is a full sentence at all)
Maybe this too)
This is the last sentence()
I tried it with regex and the help of https://regex101.com/r/sH8aR8/5#python but I can't find any solution. Any ideas?
You don't need a regex just str.replace any closing paren with a closing paren followed by a newline:
s="This is one sentence()This is another sentence()This is a full sentence at all)Maybe this too)This is the last sentence()"
print(s.replace(")",")\n"))
Output:
This is one sentence()
This is another sentence()
This is a full sentence at all)
Maybe this too)
This is the last sentence()
You can search using this lookbehind regex:
r'(?<=\))'
and replace by "\n"
RegEx Demo
Code:
input = u"This is one sentence()This is another sentence()This is a full sentence at all)Maybe this too)This is the last sentence()"
result = re.sub(ur'(?<=\))', "\n", input)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
How to read entire text file as chunk of data or string?
I do not want to read the file line by line instead read entire file as text and find count of certain words. What is the way to do that?
You can use the file read() function "which reads some quantity of data and returns it as a string".
Docs are here.
As for the second question, you might want to use a regex with word boundary anchors:
import re
with open("myfile.txt") as infile:
text = infile.read()
regex = re.compile(r"\bsearchword\b", re.I) # case-insensitive
count = len(regex.findall(text))
Use with and open.read together:
with open("/path/to/file") as file:
text = file.read()
with is a context manager that will auto-close the file for you when done.
You can read it line by line, count the words you are interested in on each line, add the results to the subtotal, and print the total when you are done. Handy if the file you are processing is big enough to cause swapping.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have a string like
x = '''
Anrede:*
Herr
*Name:*
Tobias
*Firma:*
*Strasse/Nr:*
feringerweg
*PLZ/Ort:*
72531
*Mail:*
tovoe#gmeex.de [1]
'''
In that there is a zip number PLZ/Ort:, this is zip number, i wanted to find the zip number from whole string, so the possible way is to use regex, but don't know regex,
Assuming the input in your example is file with multiple strings, you can try something like this:
import re
for line in open(filename, 'r'):
matchPattern = "^(\d{5})$"
match = re.match(matchPattern, line, flags=0)
print match.group(0) #the whole match
If this is just a long string, you can use the same match pattern but without ^ (line begin) and $ (line end) indicators --> (\d{5})
I'm assuming that the Postleitzahl always follows two lines that look like *PLZ/Ort:* and
, and that it's the only text on its line. If that's the case, then you can use something like:
import re
m = re.search('^\*PLZ/Ort:\*\n
\n(\d{5})', x, re.M)
if m:
print m.group(1)
You can try this regex:
(?<=PLZ\/Ort)[\s\S]+?([a-zA-Z0-9\- ]{3,9})
It will support Alpha numeric postal codes as well. You can see postal codes length/format from here.