Converting a multi-line text file to a python string - python

I have a text file that has a sequence of four characters a,b,c,d which is 100 lines long that I want to convert to a text string.
There are lines in the txt file that have asterisks that I want to skip entirely.
Here is an example of how the txt file can look. Note the third row has an asterisk where I want to skip the entire row
abcddbabcbbbdccbbdbaaabcbdbab
bacbdbbccdcbdabaabbbdcbababdb
bccddb*bacddcccbabababbdbdbcb
Below is how I'm trying to do this.
s = ''
with open("letters.txt", "r") as letr:
for line in letr:
if '*' not in line:
s.join(line)

Need to use readlines() function.
This is an example, please modify it yourself.
s = ''
with open("letters.txt", "r") as letr:
result = letr.readlines()
print(result)
for line in result:
if '*' not in line:
s += line
print(line)
print(s)
I looked at other answers and found that I made a mistake, your code s.join(line) --> s += line is ok.

s = ''
with open("letters.txt", "r") as letr:
for line in letr:
if '*' not in line:
s += line
builtin type str.method return a string which is the concatenation of the strings in iterable. you should use s += line for contacting string one by one.
Iterate a text file is not a problem.

Related

I'm trying to solve this Python exercise but I have no idea of how to do it: get first character of a line from a file + length of the line

I am learning Python on an app called SoloLearn, got to solve this exercise and I cannot see the solution or see the comments, I don't need to solve it to continue but I'd like to know how to do it.
Book Titles: You have been asked to make a special book categorization program, which assigns each book a special code based on its title.
The code is equal to the first letter of the book, followed by the number of characters in the title.
For example, for the book "Harry Potter", the code would be: H12, as it contains 12 characters (including the space).
You are provided a books.txt file, which includes the book titles, each one written on a separate line.
Read the title one by one and output the code for each book on a separate line.
For example, if the books.txt file contains:
Some book
Another book
Your program should output:
S9
A12
Recall the readlines() method, which returns a list containing the lines of the file.
Also, remember that all lines, except the last one, contain a \n at the end, which should not be included in the character count.
I tried:
file = open("books.txt","r")
for line in file:
for i in range(len(file.readlines())):
title = line[0]+str(len(line)-1)
print(titulo)
title = line[0]+str(len(line)-1)
print(title)
file.close
I also tried with range() and readlines() but I don't know how to solve it
This uses readlines():
with open('books.txt') as f: # Open file
for line in f.readlines(): # Iterate through lines
if line[-1] == '\n': # Check if there is '\n' at end of line
line = line[:-1] # If there is, ignore it
print(line[0], len(line), sep='') # Output first character and length
But I think splitlines() is easier, as it doesn't have the trailing '\n':
with open('books.txt') as f: # Open file
for line in f.read().splitlines(): # Iterate through lines
# No need to check for trailing '\n'
print(line[0], len(line), sep='') # Output first character and length
You can use "with" to handle file oppening and closing.
Use rstrip to get rid of '\n'.
with open('books.txt') as f:
lines = file.readlines()
for line in lines:
print(line[0] + str(len(line.rstrip())))
This is the same:
file = open('books.txt')
lines = file.readlines()
for line in lines:
print(line[0] + str(len(line.rstrip())))
file.close()

How to open a file in python, read the comments ("#"), find a word after the comments and select the word after it?

I have a function that loops through a file that Looks like this:
"#" XDI/1.0 XDAC/1.4 Athena/0.9.25
"#" Column.4: pre_edge
Content
That is to say that after the "#" there is a comment. My function aims to read each line and if it starts with a specific word, select what is after the ":"
For example if I had These two lines. I would like to read through them and if the line starts with "#" and contains the word "Column.4" the word "pre_edge" should be stored.
An example of my current approach follows:
with open(file, "r") as f:
for line in f:
if line.startswith ('#'):
word = line.split(" Column.4:")[1]
else:
print("n")
I think my Trouble is specifically after finding a line that starts with "#" how can I parse/search through it? and save its Content if it contains the desidered word.
In case that # comment contain str Column.4: as stated above, you could parse it this way.
with open(filepath) as f:
for line in f:
if line.startswith('#'):
# Here you proceed comment lines
if 'Column.4' in line:
first, remainder = line.split('Column.4: ')
# Remainder contains everything after '# Column.4: '
# So if you want to get first word ->
word = remainder.split()[0]
else:
# Here you can proceed lines that are not comments
pass
Note
Also it is a good practice to use for line in f: statement instead of f.readlines() (as mentioned in other answers), because this way you don't load all lines into memory, but proceed them one by one.
You should start by reading the file into a list and then work through that instead:
file = 'test.txt' #<- call file whatever you want
with open(file, "r") as f:
txt = f.readlines()
for line in txt:
if line.startswith ('"#"'):
word = line.split(" Column.4: ")
try:
print(word[1])
except IndexError:
print(word)
else:
print("n")
Output:
>>> ['"#" XDI/1.0 XDAC/1.4 Athena/0.9.25\n']
>>> pre_edge
Used a try and except catch because the first line also starts with "#" and we can't split that with your current logic.
Also, as a side note, in the question you have the file with lines starting as "#" with the quotation marks so the startswith() function was altered as such.
with open('stuff.txt', 'r+') as f:
data = f.readlines()
for line in data:
words = line.split()
if words and ('#' in words[0]) and ("Column.4:" in words):
print(words[-1])
# pre_edge

How do I print specific strings from text files?

file_contents = x.read()
#print (file_contents)
for line in file_contents:
if "ase" in line:
print (line)
I'm looking for all the sentences that contain the phrase "ase" in the file. When I run it, nothing is printed.
Since file_contents is the result of x.read(), it's a string not a list of strings.
So you're iterating on each character.
Do that instead:
file_contents = x.readlines()
now you can search in your lines
or if you're not planning to reuse file_contents, iterate on the file handle with:
for line in x:
so you don't have to readlines() and store all file in memory (if it's big, it can make a difference)
read will return the whole content of the file (not line by line) as string. So when you iterate over it you iterate over the single characters:
file_contents = """There is a ase."""
for char in file_contents:
print(char)
You can simply iterate over the file object (which returns it line-by-line):
for line in x:
if "ase" in line:
print(line)
Note that if you actually look for sentences instead of lines where 'ase' is contained it will be a bit more complicated. For example you could read the complete file and split at .:
for sentence in x.read().split('.'):
if "ase" in sentence:
print(sentence)
However that would fail if there are .s that don't represent the end of a sentence (like abbreviations).

extract the dimensions from the head lines of text file

Please see following attached image showing the format of the text file. I need to extract the dimensions of data matrix indicated by the first line in the file, here 49 * 70 * 1 for the case shown by the image. Note that the length of name "gd_fac" can be varying. How can I extract these numbers as integers? I am using Python 3.6.
Specification is not very clear. I am assuming that the information you want will always be in the first line, and always be in parenthesis. After that:
with open(filename) as infile:
line = infile.readline()
string = line[line.find('(')+1:line.find(')')]
lst = string.split('x')
This will create the list lst = [49, 70, 1].
What is happening here:
First I open the file (you will need to replace filename with the name of your file, as a string. The with ... as ... structure ensures that the file is closed after use. Then I read the first line. After that. I select only the parts of that line that fall after the open paren (, and before the close paren ). Finally, I break the string into parts, with the character x as the separator. This creates a list that contains the values in the first line of the file, which fall between parenthesis, and are separated by x.
Since you have mentioned that length of 'gd_fac' van be variable, best solution will be using Regular Expression.
import re
with open("a.txt") as fh:
for line in fh:
if '(' in line and ')' in line:
dimension = re.findall(r'.*\((.*)\)',line)[0]
break
print dimension
Output:
'49x70x1'
What this does is it looks for "gd_fac"
then if it's there is removes all the unneeded stuff and replaces it with just what you want.
with open('test.txt', 'r') as infile:
for line in infile:
if("gd_fac" in line):
line = line.replace("gd_fac", "")
line = line.replace("x", "*")
line = line.replace("(","")
line = line.replace(")","")
print (line)
break
OUTPUT: "49x70x1"

Remove linebreak at specific position in textfile

I have a large textfile, which has linebreaks at column 80 due to console width. Many of the lines in the textfile are not 80 characters long, and are not affected by the linebreak. In pseudocode, this is what I want:
Iterate through lines in file
If line matches this regex pattern: ^(.{80})\n(.+)
Replace this line with a new string consisting of match.group(1) and match.group(2). Just remove the linebreak from this line.
If line doesn't match the regex, skip!
Maybe I don't need regex to do this?
f=open("file")
for line in f:
if len(line)==81:
n=f.next()
line=line.rstrip()+n
print line.rstrip()
f.close()
Here's some code which should to the trick
def remove_linebreaks(textfile, position=81):
"""
textfile : an file opened in 'r' mode
position : the index on a line at which \n must be removed
return a string with the \n at position removed
"""
fixed_lines = []
for line in textfile:
if len(line) == position:
line = line[:position]
fixed_lines.append(line)
return ''.join(fixed_lines)
Note that compared to your pseudo code, this will merge any number of consecutive folded lines.
Consider this.
def merge_lines( line_iter ):
buffer = ''
for line in line_iter:
if len(line) <= 80:
yield buffer + line
buffer= ''
else:
buffer += line[:-1] # remove '\n'
with open('myFile','r') as source:
with open('copy of myFile','w') as destination:
for line in merge_lines( source ):
destination.write(line)
I find that an explicit generator function makes it much easier to test and debug the essential logic of the script without having to create mock filesystems or do lots of fancy setup and teardown for testing.
Here is an example of how to use regular expressions to archive this. But regular expressions aren't the best solution everywhere and in this case, i think not using regular expressions is more efficient. Anyway, here is the solution:
text = re.sub(r'(?<=^.{80})\n', '', text)
You can also use the your regular expression when you call re.sub with a callable:
text = re.sub(r'^(.{80})\n(.+)', lambda m: m.group(1)+m.group(2), text)

Categories