Read a multi-line string as one line in Python - python

I am writing a program that analyzes a large directory text file line-by-line. In doing so, I am trying to extract different parts of the file and categorize them as 'Name', 'Address', etc. However, due to the format of the file, I am running into a problem. Some of the text i have is split into two lines, such as:
'123 ABCDEF ST
APT 456'
How can I make it so that even through line-by-line analysis, Python returns this as a single-line string in the form of
'123 ABCDEF ST APT 456'?

if you want to remove newlines:
"".join( my_string.splitlines())

Assuming you are using windows if you do a print of the file to your screen you will see
'123 ABCDEF ST\nAPT 456\n'
the \n represent the line breaks.
so there are a number of ways to get rid of the new lines in the file. One easy way is to split the string on the newline characters and then rejoin the items from the list that will be created when you do the split
myList = [item for item in myFile.split('\n')]
newString = ' '.join(myList)

To replace the newlines with a space:
address = '123 ABCDEF ST\nAPT 456\n'
address.replace("\n", " ")

import re
def mergeline(c, l):
if c: return c.rstrip() + " " + l
else: return l
def getline(fname):
qstart = re.compile(r'^\'[^\']*$')
qend = re.compile(r'.*\'$')
with open(fname) as f:
linecache, halfline = ("", False)
for line in f:
if not halfline: linecache = ""
linecache = mergeline(linecache, line)
if halfline: halfline = not re.match(qend, line)
else: halfline = re.match(qstart, line)
if not halfline:
yield linecache
if halfline:
yield linecache
for line in getline('input'):
print line.rstrip()

Assuming you're iterating through your file with something like this:
with open('myfile.txt') as fh:
for line in fh:
# Code here
And also assuming strings in your text file are delimited with single quotes, I would do this:
while not line.endswith("'"):
line += next(fh)
That's a lot of assuming though.

i think i might have found a easy solution just put .replace('\n', " ") to whatever string u want to convert
Example u have
my_string = "hi i am an programmer\nand i like to code in python"
like anything and if u want to convert it u can just do
my_string.replace('\n', " ")
hope it helps

Related

Send keylogger log files to e-mail [duplicate]

I have a text file that looks like:
ABC
DEF
How can I read the file into a single-line string without newlines, in this case creating a string 'ABCDEF'?
For reading the file into a list of lines, but removing the trailing newline character from each line, see How to read a file without newlines?.
You could use:
with open('data.txt', 'r') as file:
data = file.read().replace('\n', '')
Or if the file content is guaranteed to be one-line
with open('data.txt', 'r') as file:
data = file.read().rstrip()
In Python 3.5 or later, using pathlib you can copy text file contents into a variable and close the file in one line:
from pathlib import Path
txt = Path('data.txt').read_text()
and then you can use str.replace to remove the newlines:
txt = txt.replace('\n', '')
You can read from a file in one line:
str = open('very_Important.txt', 'r').read()
Please note that this does not close the file explicitly.
CPython will close the file when it exits as part of the garbage collection.
But other python implementations won't. To write portable code, it is better to use with or close the file explicitly. Short is not always better. See https://stackoverflow.com/a/7396043/362951
To join all lines into a string and remove new lines, I normally use :
with open('t.txt') as f:
s = " ".join([l.rstrip("\n") for l in f])
with open("data.txt") as myfile:
data="".join(line.rstrip() for line in myfile)
join() will join a list of strings, and rstrip() with no arguments will trim whitespace, including newlines, from the end of strings.
This can be done using the read() method :
text_as_string = open('Your_Text_File.txt', 'r').read()
Or as the default mode itself is 'r' (read) so simply use,
text_as_string = open('Your_Text_File.txt').read()
I'm surprised nobody mentioned splitlines() yet.
with open ("data.txt", "r") as myfile:
data = myfile.read().splitlines()
Variable data is now a list that looks like this when printed:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
Note there are no newlines (\n).
At that point, it sounds like you want to print back the lines to console, which you can achieve with a for loop:
for line in data:
print(line)
It's hard to tell exactly what you're after, but something like this should get you started:
with open ("data.txt", "r") as myfile:
data = ' '.join([line.replace('\n', '') for line in myfile.readlines()])
I have fiddled around with this for a while and have prefer to use use read in combination with rstrip. Without rstrip("\n"), Python adds a newline to the end of the string, which in most cases is not very useful.
with open("myfile.txt") as f:
file_content = f.read().rstrip("\n")
print(file_content)
Here are four codes for you to choose one:
with open("my_text_file.txt", "r") as file:
data = file.read().replace("\n", "")
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().split("\n"))
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().splitlines())
or
with open("my_text_file.txt", "r") as file:
data = "".join([line for line in file])
you can compress this into one into two lines of code!!!
content = open('filepath','r').read().replace('\n',' ')
print(content)
if your file reads:
hello how are you?
who are you?
blank blank
python output
hello how are you? who are you? blank blank
You can also strip each line and concatenate into a final string.
myfile = open("data.txt","r")
data = ""
lines = myfile.readlines()
for line in lines:
data = data + line.strip();
This would also work out just fine.
This is a one line, copy-pasteable solution that also closes the file object:
_ = open('data.txt', 'r'); data = _.read(); _.close()
f = open('data.txt','r')
string = ""
while 1:
line = f.readline()
if not line:break
string += line
f.close()
print(string)
python3: Google "list comprehension" if the square bracket syntax is new to you.
with open('data.txt') as f:
lines = [ line.strip('\n') for line in list(f) ]
Oneliner:
List: "".join([line.rstrip('\n') for line in open('file.txt')])
Generator: "".join((line.rstrip('\n') for line in open('file.txt')))
List is faster than generator but heavier on memory. Generators are slower than lists and is lighter for memory like iterating over lines. In case of "".join(), I think both should work well. .join() function should be removed to get list or generator respectively.
Note: close() / closing of file descriptor probably not needed
Have you tried this?
x = "yourfilename.txt"
y = open(x, 'r').read()
print(y)
To remove line breaks using Python you can use replace function of a string.
This example removes all 3 types of line breaks:
my_string = open('lala.json').read()
print(my_string)
my_string = my_string.replace("\r","").replace("\n","")
print(my_string)
Example file is:
{
"lala": "lulu",
"foo": "bar"
}
You can try it using this replay scenario:
https://repl.it/repls/AnnualJointHardware
I don't feel that anyone addressed the [ ] part of your question. When you read each line into your variable, because there were multiple lines before you replaced the \n with '' you ended up creating a list. If you have a variable of x and print it out just by
x
or print(x)
or str(x)
You will see the entire list with the brackets. If you call each element of the (array of sorts)
x[0]
then it omits the brackets. If you use the str() function you will see just the data and not the '' either.
str(x[0])
Maybe you could try this? I use this in my programs.
Data= open ('data.txt', 'r')
data = Data.readlines()
for i in range(len(data)):
data[i] = data[i].strip()+ ' '
data = ''.join(data).strip()
Regular expression works too:
import re
with open("depression.txt") as f:
l = re.split(' ', re.sub('\n',' ', f.read()))[:-1]
print (l)
['I', 'feel', 'empty', 'and', 'dead', 'inside']
with open('data.txt', 'r') as file:
data = [line.strip('\n') for line in file.readlines()]
data = ''.join(data)
from pathlib import Path
line_lst = Path("to/the/file.txt").read_text().splitlines()
Is the best way to get all the lines of a file, the '\n' are already stripped by the splitlines() (which smartly recognize win/mac/unix lines types).
But if nonetheless you want to strip each lines:
line_lst = [line.strip() for line in txt = Path("to/the/file.txt").read_text().splitlines()]
strip() was just a useful exemple, but you can process your line as you please.
At the end, you just want concatenated text ?
txt = ''.join(Path("to/the/file.txt").read_text().splitlines())
This works:
Change your file to:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
Then:
file = open("file.txt")
line = file.read()
words = line.split()
This creates a list named words that equals:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
That got rid of the "\n". To answer the part about the brackets getting in your way, just do this:
for word in words: # Assuming words is the list above
print word # Prints each word in file on a different line
Or:
print words[0] + ",", words[1] # Note that the "+" symbol indicates no spaces
#The comma not in parentheses indicates a space
This returns:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN, GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
with open(player_name, 'r') as myfile:
data=myfile.readline()
list=data.split(" ")
word=list[0]
This code will help you to read the first line and then using the list and split option you can convert the first line word separated by space to be stored in a list.
Than you can easily access any word, or even store it in a string.
You can also do the same thing with using a for loop.
file = open("myfile.txt", "r")
lines = file.readlines()
str = '' #string declaration
for i in range(len(lines)):
str += lines[i].rstrip('\n') + ' '
print str
Try the following:
with open('data.txt', 'r') as myfile:
data = myfile.read()
sentences = data.split('\\n')
for sentence in sentences:
print(sentence)
Caution: It does not remove the \n. It is just for viewing the text as if there were no \n

reading in file python says its a string [duplicate]

I have a text file that looks like:
ABC
DEF
How can I read the file into a single-line string without newlines, in this case creating a string 'ABCDEF'?
For reading the file into a list of lines, but removing the trailing newline character from each line, see How to read a file without newlines?.
You could use:
with open('data.txt', 'r') as file:
data = file.read().replace('\n', '')
Or if the file content is guaranteed to be one-line
with open('data.txt', 'r') as file:
data = file.read().rstrip()
In Python 3.5 or later, using pathlib you can copy text file contents into a variable and close the file in one line:
from pathlib import Path
txt = Path('data.txt').read_text()
and then you can use str.replace to remove the newlines:
txt = txt.replace('\n', '')
You can read from a file in one line:
str = open('very_Important.txt', 'r').read()
Please note that this does not close the file explicitly.
CPython will close the file when it exits as part of the garbage collection.
But other python implementations won't. To write portable code, it is better to use with or close the file explicitly. Short is not always better. See https://stackoverflow.com/a/7396043/362951
To join all lines into a string and remove new lines, I normally use :
with open('t.txt') as f:
s = " ".join([l.rstrip("\n") for l in f])
with open("data.txt") as myfile:
data="".join(line.rstrip() for line in myfile)
join() will join a list of strings, and rstrip() with no arguments will trim whitespace, including newlines, from the end of strings.
This can be done using the read() method :
text_as_string = open('Your_Text_File.txt', 'r').read()
Or as the default mode itself is 'r' (read) so simply use,
text_as_string = open('Your_Text_File.txt').read()
I'm surprised nobody mentioned splitlines() yet.
with open ("data.txt", "r") as myfile:
data = myfile.read().splitlines()
Variable data is now a list that looks like this when printed:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
Note there are no newlines (\n).
At that point, it sounds like you want to print back the lines to console, which you can achieve with a for loop:
for line in data:
print(line)
It's hard to tell exactly what you're after, but something like this should get you started:
with open ("data.txt", "r") as myfile:
data = ' '.join([line.replace('\n', '') for line in myfile.readlines()])
I have fiddled around with this for a while and have prefer to use use read in combination with rstrip. Without rstrip("\n"), Python adds a newline to the end of the string, which in most cases is not very useful.
with open("myfile.txt") as f:
file_content = f.read().rstrip("\n")
print(file_content)
Here are four codes for you to choose one:
with open("my_text_file.txt", "r") as file:
data = file.read().replace("\n", "")
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().split("\n"))
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().splitlines())
or
with open("my_text_file.txt", "r") as file:
data = "".join([line for line in file])
you can compress this into one into two lines of code!!!
content = open('filepath','r').read().replace('\n',' ')
print(content)
if your file reads:
hello how are you?
who are you?
blank blank
python output
hello how are you? who are you? blank blank
You can also strip each line and concatenate into a final string.
myfile = open("data.txt","r")
data = ""
lines = myfile.readlines()
for line in lines:
data = data + line.strip();
This would also work out just fine.
This is a one line, copy-pasteable solution that also closes the file object:
_ = open('data.txt', 'r'); data = _.read(); _.close()
f = open('data.txt','r')
string = ""
while 1:
line = f.readline()
if not line:break
string += line
f.close()
print(string)
python3: Google "list comprehension" if the square bracket syntax is new to you.
with open('data.txt') as f:
lines = [ line.strip('\n') for line in list(f) ]
Oneliner:
List: "".join([line.rstrip('\n') for line in open('file.txt')])
Generator: "".join((line.rstrip('\n') for line in open('file.txt')))
List is faster than generator but heavier on memory. Generators are slower than lists and is lighter for memory like iterating over lines. In case of "".join(), I think both should work well. .join() function should be removed to get list or generator respectively.
Note: close() / closing of file descriptor probably not needed
Have you tried this?
x = "yourfilename.txt"
y = open(x, 'r').read()
print(y)
To remove line breaks using Python you can use replace function of a string.
This example removes all 3 types of line breaks:
my_string = open('lala.json').read()
print(my_string)
my_string = my_string.replace("\r","").replace("\n","")
print(my_string)
Example file is:
{
"lala": "lulu",
"foo": "bar"
}
You can try it using this replay scenario:
https://repl.it/repls/AnnualJointHardware
I don't feel that anyone addressed the [ ] part of your question. When you read each line into your variable, because there were multiple lines before you replaced the \n with '' you ended up creating a list. If you have a variable of x and print it out just by
x
or print(x)
or str(x)
You will see the entire list with the brackets. If you call each element of the (array of sorts)
x[0]
then it omits the brackets. If you use the str() function you will see just the data and not the '' either.
str(x[0])
Maybe you could try this? I use this in my programs.
Data= open ('data.txt', 'r')
data = Data.readlines()
for i in range(len(data)):
data[i] = data[i].strip()+ ' '
data = ''.join(data).strip()
Regular expression works too:
import re
with open("depression.txt") as f:
l = re.split(' ', re.sub('\n',' ', f.read()))[:-1]
print (l)
['I', 'feel', 'empty', 'and', 'dead', 'inside']
with open('data.txt', 'r') as file:
data = [line.strip('\n') for line in file.readlines()]
data = ''.join(data)
from pathlib import Path
line_lst = Path("to/the/file.txt").read_text().splitlines()
Is the best way to get all the lines of a file, the '\n' are already stripped by the splitlines() (which smartly recognize win/mac/unix lines types).
But if nonetheless you want to strip each lines:
line_lst = [line.strip() for line in txt = Path("to/the/file.txt").read_text().splitlines()]
strip() was just a useful exemple, but you can process your line as you please.
At the end, you just want concatenated text ?
txt = ''.join(Path("to/the/file.txt").read_text().splitlines())
This works:
Change your file to:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
Then:
file = open("file.txt")
line = file.read()
words = line.split()
This creates a list named words that equals:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
That got rid of the "\n". To answer the part about the brackets getting in your way, just do this:
for word in words: # Assuming words is the list above
print word # Prints each word in file on a different line
Or:
print words[0] + ",", words[1] # Note that the "+" symbol indicates no spaces
#The comma not in parentheses indicates a space
This returns:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN, GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
with open(player_name, 'r') as myfile:
data=myfile.readline()
list=data.split(" ")
word=list[0]
This code will help you to read the first line and then using the list and split option you can convert the first line word separated by space to be stored in a list.
Than you can easily access any word, or even store it in a string.
You can also do the same thing with using a for loop.
file = open("myfile.txt", "r")
lines = file.readlines()
str = '' #string declaration
for i in range(len(lines)):
str += lines[i].rstrip('\n') + ' '
print str
Try the following:
with open('data.txt', 'r') as myfile:
data = myfile.read()
sentences = data.split('\\n')
for sentence in sentences:
print(sentence)
Caution: It does not remove the \n. It is just for viewing the text as if there were no \n

Replace words with word-substitutions from another file

The words from my text file (mytext.txt) needs to be replaced by some other word provided in another text file (replace.txt)
cat mytext.txt
this is here. and it should be there.
me is this will become you is that.
cat replace.txt
this that
here there
me you
The following code does not work as expected.
with open('mytext.txt', 'r') as myf:
with open('replace.txt' , 'r') as myr:
for line in myf.readlines():
for l2 in myr.readlines():
original, replace = l2.split()
print line.replace(original, replace)
Expected output:
that is there. and it should be there.
you is that will become you is that.
Edit: I stand corrected, the OP is asking for word by word replacement rather than simple string replace ('become' -> 'become' rather than 'becoyou'). I guess a dict version might look like this, using the regex split method found on the comments of the accepted answer to Splitting a string into words and punctuation:
import re
def clean_split(string_input):
"""
Split a string into its component tokens and return as list
Treat spaces and punctuations, including in-word apostrophes as separate tokens
>>> clean_split("it's a good day today!")
["it", "'", "s", " ", "a", " ", "good", " ", "day", " ", "today", "!"]
"""
return re.findall(r"[\w]+|[^\w]", string_input)
with open('replace.txt' , 'r') as myr:
replacements = dict(tuple(line.split()) for line in myr)
with open('mytext.txt', 'r') as myf:
for line in myf:
print ''.join(replacements.get(word, word) for word in clean_split(line)),
I am not competent to reason well about re efficiency, if someone points out glaring inefficiencies I would be most grateful.
Edit 2: OK I was inserting spaces between words and punctuation, now that's fixed by treating spaces as tokens and doing a ''.join() instead of a ' '.join()
It looks like you want your inner loop to read the contents of 'replace.txt' for each line of 'mytext.txt'. That's very inefficient, and it won't actually work as written because once you've read all the lines of 'replace.txt' the file pointer is left at the end of the file, so when you try to process the 2nd line of 'mytext.txt' there won't be any lines left to read in 'replace.txt'.
You could send the myr file pointer back to the start of the file using myr.seek(0), but as I said, that's not very efficient. A much better strategy is to read 'replace.txt' into an appropriate data structure, and then use that data to do your replacements on each line of 'mytext.txt'.
A good data structure to use for this would be a dict. Eg,
replacements = {'this': 'that', 'here': 'there', 'me': 'you'}
Can you figure out how to build such a dict from 'replace.txt'?
I see that gman and 7stud have covered the issue of saving the results of your replacements so that they accumulate, so I won't bother discussing that. :)
here you go using re.sub:
>>> with open('mytext.txt') as f1, open('replace.txt') as f2:
... my_text = f1.read()
... for x in f2:
... x=x.strip().split()
... my_text = re.sub(r"\b%s\b" % x[0],x[1],my_text)
... print my_text
...
that is there. and it should be there.
you is that will become you is that.
\b%s\b defines the word boundaries
The following will solve your problem. The problem with your code is that you are printing after each replacement.
The optimal solution will be:
myr=open("replace.txt")
replacement=dict()
for i in myr.readlines():
original,replace=i.split()
replacement[original]=replace
myf=open("mytext.txt")
for i in myf.readlines():
for j in i.split():
if(j in replacement.keys()):
i=i.replace(j,replacement[j])
print i
As an alternative, we may use string's template to achieve this, it works but VERY ugly and inefficient though:
from string import Template
with open('replace.txt', 'r') as myr:
# read the replacement first and build a dictionary from it
d = {str(k): v for k,v in [line.strip().split(" ") for line in myr]}
d
{'here': 'there', 'me': 'you', 'this': 'that'}
with open('mytext.txt', 'r') as myf:
for line in myf:
print Template('$'+' $'.join(line.strip().replace('$', '_____').\
split(' '))).safe_substitute(**d).\
replace('$', '').replace('_____', '')
Results:
that is there. and it should be there.
you is that will become you is that.
You are printing the line after one replacement, then printing the line again after the next replacement. You want to print the line after doing all the replacements.
str.replace(old, new[, count])
Return a copy of the string...
You are discarding the copy every time because you don't save it in a variable. In other words, replace() does not change line.
Next, the word there contains the substring here(which is replaced by there), so the result ends up being tthere.
You can fix those problems like this:
import re
with open('replace.txt' , 'r') as f:
repl_dict = {}
for line in f:
key, val = line.split()
repl_dict[key] = val
with open('mytext.txt', 'r') as f:
for line in f:
for key, val in repl_dict.items():
line = re.sub(r"\b" + key + r"\b", val, line, flags=re.X)
print line.rstrip()
--output:--
that is there. and it should be there.
you is that will become you is that.
Or, like this:
import re
#Create a dict that returns the key itself
# if the key is not found in the dict:
class ReplacementDict(dict):
def __missing__(self, key):
self[key] = key
return key
#Create a replacement dict:
with open('replace.txt') as f:
repl_dict = ReplacementDict()
for line in f:
key, val = line.split()
repl_dict[key] = val
#Create the necessary inputs for re.sub():
def repl_func(match_obj):
return repl_dict[match_obj.group(0)]
pattern = r"""
\w+ #Match a 'word' character, one or more times
"""
regex = re.compile(pattern, flags=re.X)
#Replace the words in each line with the
#entries in the replacement dict:
with open('mytext.txt') as f:
for line in f:
line = re.sub(regex, repl_func, line)
print line.rstrip())
With replace.txt like this:
this that
here there
me you
there dog
...the output is:
that is there. and it should be dog.
you is that will become you is that.

.split() creating a blank line in python3

I am trying to convert a 'fastq' file in to a tab-delimited file using python3.
Here is the input: (line 1-4 is one record that i require to print as tab separated format). Here, I am trying to read in each record in to a list object:
#SEQ_ID
GATTTGGGGTT
+
!''*((((***
#SEQ_ID
GATTTGGGGTT
+
!''*((((***
using this:
data = open('sample3.fq')
fq_record = data.read().replace('#', ',#').split(',')
for item in fq_record:
print(item.replace('\n', '\t').split('\t'))
Output is:
['']
['#SEQ_ID', 'GATTTGGGGTT', '+', "!''*((((***", '']
['#SEQ_ID', 'GATTTGGGGTT', '+', "!''*((((***", '', '']
I am geting a blank line at the begining of the output, which I do not understand why ??
I am aware that this can be done in so many other ways but I need to figure out the reason as I am learning python.
Thanks
When you replace # with ,#, you put a comma at the beginning of the string (since it starts with #). Then when you split on commas, there is nothing before the first comma, so this gives you an empty string in the split. What happens is basically like this:
>>> print ',x'.split(',')
['', 'x']
If you know your data always begins with #, you can just skip the empty record in your loop. Just do for item in fq_record[1:].
You can also go line-by-line without all the replacing:
fobj = io.StringIO("""#SEQ_ID
GATTTGGGGTT
+
!''*((((***
#SEQ_ID
GATTTGGGGTT
+
!''*((((***""")
data = []
entry = []
for raw_line in fobj:
line = raw_line.strip()
if line.startswith('#'):
if entry:
data.append(entry)
entry = []
entry.append(line)
data.append(entry)
data looks like this:
[['#SEQ_ID', 'GATTTGGGGTTy', '+', "!''*((((***"],
['#SEQ_ID', 'GATTTGGGGTTx', '+', "!''*((((***"]]
Thank you all for your answers. As a beginner, my main problem was the occurrence of a blank line upon .split(',') which I have now understood conceptually. So my first useful program in python is here:
# this script converts a .fastq file in to .fasta format
import sys
# Usage statement:
print('\nUsage: fq2fasta.py input-file output-file\n=========================================\n\n')
# define a function for fasta formating
def format_fasta(name, sequence):
fasta_string = '>' + name + "\n" + sequence + '\n'
return fasta_string
# open the file for reading
data = open(sys.argv[1])
# open the file for writing
fasta = open(sys.argv[2], 'wt')
# feed all fastq records in to a list
fq_records = data.read().replace('#', ',#').split(',')
# iterate through list objects
for item in fq_records[1:]: # this is to avoid the first line which is created as blank by .split() function
line = item.replace('\n', '\t').split('\t')
name = line[0]
sequence = line[1]
fasta.write(format_fasta(name, sequence))
fasta.close()
Other things suggested in the answers would be more clear to me as I learn more.
Thanks again.

Splitting lines in python based on some character

Input:
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/1
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:14,000.
0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W
55.576,+0013!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013!,A,56
281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34
:18,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
Output:
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
'!' is the starting character and +0013 should be the ending of each line (if present).
Problem which I am getting:
Output is like :
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/1
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.
0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W
Any help would be highly appreciated...!!!
My code:
file_open= open('sample.txt','r')
file_read= file_open.read()
file_open2= open('output.txt','w+')
counter =0
for i in file_read:
if '!' in i:
if counter == 1:
file_open2.write('\n')
counter= counter -1
counter= counter +1
file_open2.write(i)
You can try something like this:
with open("abc.txt") as f:
data=f.read().replace("\r\n","") #replace the newlines with ""
#the newline can be "\n" in your system instead of "\r\n"
ans=filter(None,data.split("!")) #split the data at '!', then filter out empty lines
for x in ans:
print "!"+x #or write to some other file
.....:
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
Could you just use str.split?
lines = file_read.split('!')
Now lines is a list which holds the split data. This is almost the lines you want to write -- The only difference is that they don't have trailing newlines and they don't have '!' at the start. We can put those in easily with string formatting -- e.g. '!{0}\n'.format(line). Then we can put that whole thing in a generator expression which we'll pass to file.writelines to put the data in a new file:
file_open2.writelines('!{0}\n'.format(line) for line in lines)
You might need:
file_open2.writelines('!{0}\n'.format(line.replace('\n','')) for line in lines)
if you find that you're getting more newlines than you wanted in the output.
A few other points, when opening files, it's nice to use a context manager -- This makes sure that the file is closed properly:
with open('inputfile') as fin:
lines = fin.read()
with open('outputfile','w') as fout:
fout.writelines('!{0}\n'.format(line.replace('\n','')) for line in lines)
Another option, using replace instead of split, since you know the starting and ending characters of each line:
In [14]: data = """!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/1
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:14,000.
0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W
55.576,+0013!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013!,A,56
281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34
:18,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:19,000.0,0,37N22.""".replace('\n', '')
In [15]: print data.replace('+0013!', "+0013\n!")
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
Just for some variance, here is a regular expression answer:
import re
outputFile = open('output.txt', 'w+')
with open('sample.txt', 'r') as f:
for line in re.findall("!.+?(?=!|$)", f.read(), re.DOTALL):
outputFile.write(line.replace("\n", "") + '\n')
outputFile.close()
It will open the output file, get the contents of the input file, and loop through all the matches using the regular expression !.+?(?=!|$) with the re.DOTALL flag. The regular expression explanation & what it matches can be found here: http://regex101.com/r/aK6aV4
After we have a match, we strip out the new lines from the match, and write it to the file.
Let's try to add a \n before every "!"; then let python splitlines :-) :
file_read.replace("!", "!\n").splitlines()
I will actually implement as a generator so that you can work on the data stream rather than the entire content of the file. This will be quite memory friendly if working with huge files
>>> def split_on_stream(it,sep="!"):
prev = ""
for line in it:
line = (prev + line.strip()).split(sep)
for parts in line[:-1]:
yield parts
prev = line[-1]
yield prev
>>> with open("test.txt") as fin:
for parts in split_on_stream(fin):
print parts
,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:19,000.0,0,37N22.

Categories