Read whole file as text and not line wise in Python? [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
How to read entire text file as chunk of data or string?
I do not want to read the file line by line instead read entire file as text and find count of certain words. What is the way to do that?

You can use the file read() function "which reads some quantity of data and returns it as a string".
Docs are here.

As for the second question, you might want to use a regex with word boundary anchors:
import re
with open("myfile.txt") as infile:
text = infile.read()
regex = re.compile(r"\bsearchword\b", re.I) # case-insensitive
count = len(regex.findall(text))

Use with and open.read together:
with open("/path/to/file") as file:
text = file.read()
with is a context manager that will auto-close the file for you when done.

You can read it line by line, count the words you are interested in on each line, add the results to the subtotal, and print the total when you are done. Handy if the file you are processing is big enough to cause swapping.

Related

is it correct? i hope someone can help me hehe [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Write a Python program that will search for lines that start with 'F', followed by 2 characters, followed by 'm:' using the mbox-short.txt text file.
Write a Python program that will search for lines that start with From and have an # sign
My code:
import re
file_hand = open("mbox-short.txt")
for line in file_hand:
line = line.rstrip()
if re.search('From:', line):
print(line)
your code seems to lack the actual regular expression that will find the result you are looking for. If I understand correctly, your aim is to find lines starting with F, followed by ANY two characters. If this is the case, you wish to print the line to the terminal. Let me guide you:
import re
file_hand = open("mbox-short.txt")
for line in file_hand: #NB: After a new scope is entered, use indentation
result = re.search("$f..", line) #pattern, search string
#$ matches character before the first in a line
#. matches 1 occurence of any character
if result.group() != "": #access result of re.search with group() method
print(line)
I trust you can follow this. If you need capital F, I will leave it as a homework exercise for you to find out how to do the capital F.
You can practice with regexp here:
https://regexr.com/
Or read more about it here:
https://www.youtube.com/watch?v=rhzKDrUiJVk
I think you didn't ask your question clear enough for everybody to understand. Also, insert your code for better readability ('Code Sample'). I already did that with your code, so you can have a look at that.

Printing Output Causes Split Lines Half of The Time [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I am trying to automate a task which requires using a specific URL which changes depending on the site location. The site locations are already loaded into a .txt file with no spaces at the beginning nor end of each line. The script runs down the list and changes the variable in the URL to match the line it is currently on then saves it to a file to be used later.
The issue I am having is that the script seems to split the outputted lines nearly every time which breaks my ability to read the lines in the next program.
Sample output:
https://picklepickle.com/api/11/networks/12312313154564654
/fickle/toast/3
https://picklepickle.com/api/11/networks/12312313154564655
/fickle/toast/3
https://picklepickle.com/api/11/networks/12312313154564656/fickle/toast/3
https://picklepickle.com/api/11/networks/12312313154564657/fickle/toast/3
This is a small snippet as the original file has nearly 100 lines in it.
Why does the code output the lines in such a weird way? How do I fix it so that it outputs each URL into one neat line?
raw = open("NetIDs.txt")
networks = raw.readlines()
for line in networks:
for i in line:
f = open("Checker.txt", "a+")
f.write('https://picklepickle.com/api/11/networks/{}/fickle/toast/3\n'.format(line))
f.close()
Maybe try stripping your raw text- readlines() will return \n (newline characters), as well.
...
f.write('https://picklepickle.com/api/11/networks/{}/fickle/toast/3\n'.format(line.strip()))
...
The .strip() will remove characters like \n,\t, and more.

Regex Pyhon: cannot replace newlines with "$1" [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
I have regular expression \n([\d]) that can match this following text:
Then I want to replace that matched text with first group or $1 in Visual Studio Code. This is the result:
I want the same idea in python, which I already make this code.
import re
file = "out FCE.txt"
pattern = re.compile(".+")
for i, line in enumerate(open(file)):
for match in re.finditer(pattern, line):
print(re.sub(r"\n([\d])", r"\1", match.group()))
But that code does nothing to it. Which mean the result is still the same as the first picture. Newlines and the line with numbers at first character are not removed. I already read this answer, that python is using \1 not $1. And yes, I want to keep the whitespaces between in order to be neat as \t\t\t.
Sorry if my explanation is confusing and also my english is bad.
The problem here is that you are reading the file line by line. In each loop of for i, line in enumerate(open(file)):, re.sub accesses only one line, and therefore it cannot see whether the next line starts with a digit.
Try instead:
import re
file = "out FCE.txt"
with open(file, 'r') as f:
text = f.read()
new_text = re.sub(r"\n([\d])", r"\1", text)
print(new_text)
In this code the file is read as a whole (into the variable text) so that re.sub now sees whether the subsequent line starts with a digit.

Not able to successfully parse python regex from reading the contents of a file [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I've got an issue where my regex isn't parsing the output of a file I created:
#!/usr/bin/env python3
import wget, re
url=''
filename=wget.download(url)
with open ('Output.txt', "r") as f:
readlines=f.read()
ret=re.sub("^.*\^", "", readlines)
print(ret)
According to this site, the regex I'm using "^.*\^" is valid for my output. Sample output I'm feeding it is something like this:
1212-2010^readthispart
Where it has a carot for a delimiter. I tried double and single quotes to no avail and I'm not sure if it's an issue elsewhere in my code or what, but the printout does not match what I'm looking for. Ideas?
If I'm reading your question and edits right you're looking to return 'readthispart', correct? If so you need to look into using look-behinds in combination with search. See https://docs.python.org/2/library/re.html. re.search("(?<=\^).*",myinput)
You need to enable multiline mode:
re.sub('^.*\^', '', readlines, flags=re.MULTILINE)

Count characters in each line of a file? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Any tips on how to count the amount of characters in each line of a text file, to then compare them using python?
It would be helpful to have an idea of what the end goal of your code is. What information do you want to gain from comparing the number of characters on a line? I would have written this as a comment, but it's not yet an option for me since I just joined.
If you're completely lost and don't know where to begin, here are some general bits of code to get you started (this is using Python 3.x):
file = open("YourFileName.txt", "r")
stringList = file.readlines()
The first line will open (read, hence the "r") the file in question. The second line of code goes through each line in the file and assigns them to a variable I called stringList. stringList is now a list, in which each element is a string corresponding to one line of your text file.
So,
print(stringList)
should return
['line0', 'line1', 'line2', 'line3', etc...]
It's possible that stringList could look like
['line0\n', 'line1\n', 'line2\n', 'line3\n', etc...]
depending on how your file is formatted. In case you didn't know, the '\n' is a newline character, equivalent to hitting enter on the keyboard.
From there you can create another list to hold the length of each line, and then loop through each element of stringList to store the lengths.
lengthList = []
for line in stringList:
lengthList.append(len(line))
len(line) takes the number of characters in a string and converts it to the equivalent integer value. Your lengthList will then contain how many characters are on each line, stored as ints. If there are '\n's, you may want to use len(line) - 1, depending on what you want to do with the lengths.
I hope this is helpful; I can't help with the comparisons until you provide some code and explain more specifically what you want to accomplish.

Categories