Writing files in python the correct way - python

I have a function that writes the content of list into a text file. For every element in the list, it writes the element into the text file, each having it's own new line.
def write_file(filename):
name_file = filename
filename = open(name_file, 'w')
for line in list:
if line == len(list)-1:
filename.write(line)
else:
filename.write(line+'\n')
filename.close()
i tend to notice a mistake where an empty newline is generated at the final line of a text file and I'm wondering if I am writing the file correctly?
Let's say my list contains [1,2,3,4] and writing it to the text file would give me
1
2
3
4
#in some cases, an empty newline is printed here at the end
I have no idea how to check if the write function is generating an extra line in the end due to the '\n' so I'll appreciate if anyone could give me some feedback.

Instead of writing to the buffer so many times, do a .join, and write the result once:
with open(filename, 'w') as fp:
fp.write('\n'.join(your_list))

Update:
#John Coleman has pointed out a misunderstanding. It seems that the last line should not have any new line character. This can be corrected by using enumerate() to provide a line count, checking whether it's the last line when printing, and varying the line end character accordingly:
def write_file(filename, data):
with open(filename, 'w') as f:
for line_no, item in enumerate(data, 1):
print(item, file=f, end='\n' if line_no < len(data) else '')
This is not as elegant as using \n.join(data)` but it is memory efficient for large lists.
Alternative to join() is:
def write_file(filename, data):
with open(filename, 'w') as f:
print(*data, file=f, sep='\n', end='')
Original answer:
Why not simply use print() and specify the output file?
def write_file(filename, data):
with open(filename, 'w') as f:
for item in data:
print(item, file=f)
Or more succinctly:
def write_file(filename, data):
with open(filename, 'w') as f:
print(*data, file=f, sep='\n')
The former is preferred if you have a large list because the latter needs to unpack the list to pass its contents as arguments to print().
Both options will automatically take care of the new line characters for you.
Opening the file in a with statement will also take care of closing the file for you.
You could also use '\n'.join() to join the items in the list. Again, this is feasible for smallish lists. Also, your example shows a list of integers - print() does not require that its arguments first be converted to strings, as does join().

Try
def write_file(filename):
name_file = filename
filename = open(name_file, 'w')
for line in list:
if line == list[-1]:
filename.write(line)
else:
filename.write(line+'\n')
filename.close()
In your example line == len(list)-1: you are just you are comparing an int the length of the list -1 instead of the last item in the list.
Although this is still not perfect as you could run into issues if you have repeating items in the list such as [1,2,3,5,2] in this case it would be best to use a join or a for i statement.

If you want to write to a file from list of strings, you can use the following snippet:
def write_file(filename):
with open(filename, 'w') as f:
f.write('\n'.join(lines))
lines = ["hi", "hello"]
write_file('test.txt')

You shouldn't use for line in list here, list shouldn't be used for a list name because the word "list" is a reserved word for python. It's a keyword. You can do myLst = list("abcd") to obtain something like myLst=["a", "b", "c", "d"]
And about the solution to your problem, I recommend you use the with method in case you forget to close your file. That way, you won't have to close your file. Just exiting the indent will do the work. Here is how I have solved your problem:
#I just made a list using list comprehension method to avoid writing so much manually.
myLst=list("List number {}".format(x) for x in range(15))
#Here is where you open the file
with open ('testfile.txt','w') as file:
for each in myLst:
file.write(str(each))
if each!=myLst[len(myLst)-1]:
file.write('\n')
else:
#this "continue" command tells the python script to continue on to the next loop.
#It basically skips the current loop.
continue
I hope I was helpful.

thefile = open('test.txt', 'w')
I'd use a loop:
for item in thelist:
thefile.write("%s\n" % item)

Related

How do I sort a text file after the last instance of a character?

Goal: Sort the text file alphabetically based on the characters that appear AFTER the final slash. Note that there are random numbers right before the final slash.
Contents of the text file:
https://www.website.com/1939332/delta.html
https://www.website.com/2237243/alpha.html
https://www.website.com/1242174/zeta.html
https://www.website.com/1839352/charlie.html
Desired output:
https://www.website.com/2237243/alpha.html
https://www.website.com/1839352/charlie.html
https://www.website.com/1939332/delta.html
https://www.website.com/1242174/zeta.html
Code Attempt:
i = 0
for line in open("test.txt").readlines(): #reading text file
List = line.rsplit('/', 1) #splits by final slash and gives me 4 lists
dct = {list[i]:list[i+1]} #tried to use a dictionary
sorted_dict=sorted(dct.items()) #sort the dictionary
textfile = open("test.txt", "w")
for element in sorted_dict:
textfile.write(element + "\n")
textfile.close()
Code does not work.
I would pass a different key function to the sorted function. For example:
with open('test.txt', 'r') as f:
lines = f.readlines()
lines = sorted(lines, key=lambda line: line.split('/')[-1])
with open('test.txt', 'w') as f:
f.writelines(lines)
See here for a more detailed explanation of key functions.
Before you run this, I am assuming you have a newline at the end of your test.txt. This will fix "combining the second and third lines".
If you really want to use a dictionary:
dct = {}
i=0
with open("test.txt") as textfile:
for line in textfile.readlines():
mylist = line.rsplit('/',1)
dct[mylist[i]] = mylist[i+1]
sorted_dict=sorted(dct.items(), key=lambda item: item[1])
with open("test.txt", "w") as textfile:
for element in sorted_dict:
textfile.write(element[i] + '/' +element[i+1])
What you did wrong
In the first line, you name your variable List, and in the second you access it using list.
List = line.rsplit('/', 1)
dct = {list[i]:list[i+1]}
Variable names are case sensitive so you need use the same capitalisation each time. Furthermore, Python already has a built-in list class. It can be overridden, but I would not recommend naming your variables list, dict, etc.
( list[i] will actually just generate a types.GenericAlias object, which is a type hint, something completely different from a list, and not what you want at all.)
You also wrote
dct = {list[i]:list[i+1]}
which repeatedly creates a new dictionary in each loop iteration, overwriting whatever was stored in dct previously. You should instead create an empty dictionary before the loop, and assign values to its keys every time you want to update it, as I have done.
You're calling sort in each iteration in the loop; you should only call once it after the loop is done. After all, you only want to sort your dictionary once.
You also open the file twice, and although you close it at the end, I would suggest using a context manager and the with statement as I have done, so that file closing is automatically handled.
My code
sorted(dct.items(), key=lambda item: item[1])
means that the sorted() function uses the second element in the item tuple (the dictionary item) as the 'metric' by which to sort.
`textfile.write(element[i] + '/' +element[i+1])`
is necessary, since, when you did rsplit('/',1), you removed the /s in your data; you need to add them back and reconstruct the string from the element tuple before you write it.
You don't need + \n in textfile.write since readlines() preserves the \n. That's why you should end text files with a newline: so that you don't have to treat the last line differently.
def sortFiles(item):
return item.split("/")[-1]
FILENAME = "test.txt"
contents = [line for line in open(FILENAME, "r").readlines() if line.strip()]
contents.sort(key=sortFiles)
with open(FILENAME, "w") as outfile:
outfile.writelines(contents)

Open and Read a CSV File without libraries

I have the following problem. I am supposed to open a CSV file (its an excel table) and read it without using any library.
I tried already a lot and have now the first row in a tuple and this in a list. But only the first line. The header. But no other row.
This is what I have so far.
with open(path, 'r+') as file:
results=[]
text = file.readline()
while text != '':
for line in text.split('\n'):
a=line.split(',')
b=tuple(a)
results.append(b)
return results
The output should: be every line in a tuple and all the tuples in a list.
My question is now, how can I read the other lines in python?
I am really sorry, I am new to programming all together and so I have a real hard time finding my mistake.
Thank you very much in advance for helping me out!
This problem was many times on Stackoverflow so you should find working code.
But much better is to use module csv for this.
You have wrong indentation and you use return results after reading first line so it exits function and it never try read other lines.
But after changing this there are still other problems so it still will not read next lines.
You use readline() so you read only first line and your loop will works all time with the same line - and maybe it will never ends because you never set text = ''
You should use read() to get all text which later you split to lines using split("\n") or you could use readlines() to get all lines as list and then you don't need split(). OR you can use for line in file: In all situations you don't need while
def read_csv(path):
with open(path, 'r+') as file:
results = []
text = file.read()
for line in text.split('\n'):
items = line.split(',')
results.append(tuple(items))
# after for-loop
return results
def read_csv(path):
with open(path, 'r+') as file:
results = []
lines = file.readlines()
for line in lines:
line = line.rstrip('\n') # remove `\n` at the end of line
items = line.split(',')
results.append(tuple(items))
# after for-loop
return results
def read_csv(path):
with open(path, 'r+') as file:
results = []
for line in file:
line = line.rstrip('\n') # remove `\n` at the end of line
items = line.split(',')
results.append(tuple(items))
# after for-loop
return results
All this version will not work correctly if you will '\n' or , inside item which shouldn't be treated as end of row or as separtor between items. These items will be in " " which also can make problem to remove them. All these problem you can resolve using standard module csv.
Your code is pretty well and you are near goal:
with open(path, 'r+') as file:
results=[]
text = file.read()
#while text != '':
for line in text.split('\n'):
a=line.split(',')
b=tuple(a)
results.append(b)
return results
Your Code:
with open(path, 'r+') as file:
results=[]
text = file.readline()
while text != '':
for line in text.split('\n'):
a=line.split(',')
b=tuple(a)
results.append(b)
return results
So enjoy learning :)
One caveat is that the csv may not end with a blank line as this would result in an ugly tuple at the end of the list like ('',) (Which looks like a smiley)
To prevent this you have to check for empty lines: if line != '': after the for will do the trick.

filtering a weird text file in python

I have a text file in which each ID line starts with > and the next line(s) are the a sequence of characters. And the next line after the sequence of characters would be an other ID line starting with >. but in some of them, instead of sequence I have “Sequence unavailable”. The sequence after the ID line can be one or more lines.
like this example:
>ENSG00000173153|ENST00000000442|64073050;64074640|64073208;64074651
AAGCAGCCGGCGGCGCCGCCGAGTGAGGGGACGCGGCGCGGTGGGGCGGCGCGGCCCGAGGAGGCGGCGGAGGAGGGGCCGCCCGCGGCCCCCGGCTCACTCCGGCACTCCGGGCCGCTC
>ENSG00000004139|ENST00000003834
Sequence unavailable
I want to filter out those IDs with “Sequence unavailable”. The output should look like this:
output:
>ENSG00000173153|ENST00000000442|64073050;64074640|64073208;64074651
AAGCAGCCGGCGGCGCCGCCGAGTGAGGGGACGCGGCGCGGTGGGGCGGCGCGGCCCGAGGAGGCGGCGGAGGAGGGGCCGCCCGCGGCCCCCGGCTCACTCCGGCACTCCGGGCCGCTC
do you know how to do that in python?
Unlike the other answers, I’d strongly recommand against parsing the FASTA format manually. It’s not too hard but there are pitfalls, and it’s completely unnecessary since efficient, well-tested implementations exist:
Use Bio.SeqIO from BioPython; for example:
from Bio import SeqIO
for record in SeqIO.parse(filename, 'fasta'):
if record.seq != 'Sequenceunavailable':
SeqIO.write(record, outfile, 'fasta')
Note the missing space in 'Sequenceunavailable': reading the sequences in FASTA format will omit spaces.
How about this:
with open(filename, 'r+') as f:
data = f.read()
data = data.split('>')
result = ['>{}'.format(item) for item in data if item and 'Sequence unavailable' not in item]
f.seek(0)
for line in result:
f.write(line)
def main():
filename = open('text.txt', 'rU').readlines()
filterFile(filename)
def filterFile(SequenceFile):
outfile = open('outfile', 'w')
for line in SequenceFile:
if line.startswith('>'):
sequence = line.next()
if sequence.startswith('Sequence unavailable'):
//nothing should happen I suppose?
else:
outfile.write(line + "\n" + sequence + "\n")
main()
I unfortunately can't test this code right now but I made this out of the top of my head! Please test it and let me know what the outcome is so I can adjust the code :-)
So I don't exactly know how large these files will get, just in case, I'm doing it without mapping the file in memory:
with open(filename) as fh:
with open(filename+'.new', 'w+') as fh_new:
for idline, geneseq in zip(*[iter(fh)] * 2):
if geneseq.strip() != 'Sequence unavailable':
fh_new.write(idline)
fh_new.write(geneseq)
It works by creating a new file, then the zip thing is some magic to read the 2 lines of the file, the idline will be the first part and the geneseq the second part.
This solution should be relatively cheap in computer power but will create an extra output file.

generating list by reading from file

i want to generate a list of server addresses and credentials reading from a file, as a single list splitting from newline in file.
file is in this format
login:username
pass:password
destPath:/directory/subdir/
ip:10.95.64.211
ip:10.95.64.215
ip:10.95.64.212
ip:10.95.64.219
ip:10.95.64.213
output i want is in this manner
[['login:username', 'pass:password', 'destPath:/directory/subdirectory', 'ip:10.95.64.211;ip:10.95.64.215;ip:10.95.64.212;ip:10.95.64.219;ip:10.95.64.213']]
i tried this
with open('file') as f:
credentials = [x.strip().split('\n') for x in f.readlines()]
and this returns lists within list
[['login:username'], ['pass:password'], ['destPath:/directory/subdir/'], ['ip:10.95.64.211'], ['ip:10.95.64.215'], ['ip:10.95.64.212'], ['ip:10.95.64.219'], ['ip:10.95.64.213']]
am new to python, how can i split by newline character and create single list. thank you in advance
You could do it like this
with open('servers.dat') as f:
L = [[line.strip() for line in f]]
print(L)
Output
[['login:username', 'pass:password', 'destPath:/directory/subdir/', 'ip:10.95.64.211', 'ip:10.95.64.215', 'ip:10.95.64.212', 'ip:10.95.64.219', 'ip:10.95.64.213']]
Just use a list comprehension to read the lines. You don't need to split on \n as the regular file iterator reads line by line. The double list is a bit unconventional, just remove the outer [] if you decide you don't want it.
I just noticed you wanted the list of ip addresses joined in one string. It's not clear as its off the screen in the question and you make no attempt to do it in your own code.
To do that read the first three lines individually using next then just join up the remaining lines using ; as your delimiter.
def reader(f):
yield next(f)
yield next(f)
yield next(f)
yield ';'.join(ip.strip() for ip in f)
with open('servers.dat') as f:
L2 = [[line.strip() for line in reader(f)]]
For which the output is
[['login:username', 'pass:password', 'destPath:/directory/subdir/', 'ip:10.95.64.211;ip:10.95.64.215;ip:10.95.64.212;ip:10.95.64.219;ip:10.95.64.213']]
It does not match your expected output exactly as there is a typo 'destPath:/directory/subdirectory' instead of 'destPath:/directory/subdir' from the data.
This should work
arr = []
with open('file') as f:
for line in f:
arr.append(line)
return [arr]
You could just treat the file as a list and iterate through it with a for loop:
arr = []
with open('file', 'r') as f:
for line in f:
arr.append(line.strip('\n'))

Parsing a text file in python and outputting to a CSV

Preface - I'm pretty new to Python, having had more experience in another language.
I have a text file with single column list of strings in the generic (but slightly varying) format "./abc123a1/type/1ab2_x_data_type.file.type"
I need to extract the abc123a1 and the 1ab2 portions from all several hundred of the rows and put them under two columns (column a and b) in a csv. Sometimes there may be a "1ab2_a" and a "1ab2_b", but I only want one 1ab2. So I'd want to grab "1ab2_a" and ignore all others.
I have the regex which I THINK will work:
tmp = list()
if re.findall(re.compile(r'^([a-zA-Z0-9]{4})_'), x):
tmp = re.findall(re.compile(r'^([a-zA-Z0-9]{4})_'), x)
elif re.findall(re.compile(r'_([a-zA-Z0-9]{4})_'), x):
tmp = re.findall(re.compile(r'_([a-zA-Z0-9]{4})_'), x)
if len(tmp) == 0:
return None
elif len(tmp) > 1:
print "ERROR found multiple matches"
return "ERROR"
else:
return tmp[0].upper()
I am trying to make this script step by step and testing things to make sure it works, but it's just not.
import sys
import csv
listOfData = []
with open(sys.argv[1]) as f:
print "yes"
for line in f:
print line
for line in f:
listOfData.append([line])
print listOfData
with open('extracted.csv', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerow(('column a', 'column b'))
writer.writerows(listOfData)
print listOfData
Still failing to get anything in the csv other than column headers, much less a parsed version!
Does anyone have any better ideas or formats I could do this in? A friend mentioned looking into glob.glob, but I haven't had luck getting that to work either.
IMHO, you were not far from making it work. The problem is that you read once the whole file just to print the lines, and then (once at end of file) you try to put them into a list... and get an empty list !
You should read the file only once:
import sys
import csv
listOfData = []
with open(sys.argv[1]) as f:
print "yes"
for line in f:
print line
listOfData.append([line])
print listOfData
with open('extracted.csv', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerow(('column a', 'column b'))
writer.writerows(listOfData)
print listOfData
once it works, you still have to use the regex to get relevant data to put into the csv file
I am not sure about your regex (it will most probably not work) , but the reason why your current (non-regex , simple) code does not work is because -
with open(sys.argv[1]) as f:
print "yes"
for line in f:
print line
for line in f:
listOfData.append([line])
As you can see you are first iterating over each line in file and printing it, it should be fine, but after the loop ends, the file pointer is at the end of file, so trying to iterate over it again , would not produce any result. You should only iterate over it once, and do both printing and appending to list in it. Example -
with open(sys.argv[1]) as f:
print "yes"
for line in f:
print line
listOfData.append([line])
I think at least part of the problem is the two for loops in the following:
with open(sys.argv[1]) as f:
print "yes"
for line in f:
print line
for line in f:
listOfData.append([line])
The first one prints all the lines of f, so there's nothing left for the second one to iterate over unless you first f.seek(0) and rewind the file.
An alternative way would to simply to this:
with open(sys.argv[1]) as f:
print "yes"
for line in f:
print line
listOfData.append([line])
It's hard to tell if your regexes are OK without more than one line of sample input data.
Are you sure you need all of the regular expressions? You seem to be parsing a list of paths and filenames. The path could be split up using a split command, for example:
print "./abc123a1/type/1ab2_a_data_type.file.type".split("/")
Would give:
['.', 'abc123a1', 'type', '1ab2_a_data_type.file.type']
You could then create a set consisting of the second entry and up to the '_' in forth entry, e.g.
('abc123a1', '1ab2')
This could then be used to print only the first entry from each:
pairs = set()
with open(sys.argv[1], 'r') as in_file, open('extracted.csv', 'wb') as out_file:
writer = csv.writer(out_file)
for row in in_file:
folders = row.split("/")
col_a = folders[1]
col_b = folders[3].split("_")[0]
if (col_a, col_b) not in pairs:
pairs.add((col_a, col_b))
writer.writerow([col_a, col_b])
So for an input looking like this:
./abc123a1/type/1ab2_a_data_type.file.type
./abc123a1/type/1ab2_b_data_type.file.type
./abc123a2/type/1ab2_a_data_type.file.type
./abc123a3/type/1ab2_a_data_type.file.type
You would get a CSV file looking like:
abc123a1,1ab2
abc123a2,1ab2
abc123a3,1ab2

Categories