Python increment pulling from an array in a loop - python

I am trying to create something that would
Read a file
Separate ever thing into an array
Use the first string in the array and do something
After done go back to the second array and do something
Keep repeating this process until the array is done.
So far I have
users = open('users.txt', "r")
userL = users.read().splitlines()
What I want it to do exactly is open the text file, which is already separated 1 line per string, then have the Python part put that into an array, get the first string and set that into a variable. From there the variable will be used in a URL for xbox.com.
After it checks it I will have some JSON read the page and see if the gamertag list I have is being used, if it is being used it will go back to the array and go to the second string and check. This needs to be a constant loop of checking gamertags. If it does find a gamertag in the array (From the text file) that isn't used, it will save that to another text file entitled "available gametags" and keep moving on.
What I want it to do (requested in comments)
Open Program
Have it read a text file of usernames I have created
Have the program test each program at the end of the gamertag viewer link for xbox
JSON read the page and if it contains info that the name is taken it goes back to the list and uses the next gamertag on the page.
Keeps doing this
Logs all the gamertags that worked and saves to a text file.
Exits
The problem in doing that is I don't know how to go back to the file and access the line after the one it just tested and continue this pattern until the file is completely read.

Use a for loop:
with open("users.txt") as f:
for line in f:
# do whatever with the line
For example, to achieve your goal here, you might do something like this:
# import our required modules
import json
import urllib2
# declare some initial variables
input_file = "users.txt"
output_file = "available_tags.txt"
available_tags = [] # an empty list to hold our available gamertags
# open the file
with open(input_file) as input_f:
# loop through each line
for line in input_f:
# strip any new line character from the line
tag = line.strip()
# set up our URL and open a connection to the API
url = "http://360api.chary.us/?gamertag=%s" % tag
u = urllib2.urlopen(url)
# load the returned data as a JSON object
data = json.loads(u.read())
# check if the gamertag is available
if not data['GamertagExists']:
# print it and add it to our list of available tags if so
print "Tag %s is available." % tag
available_tags.append(tag)
else:
print "Tag %s is not available." % tag #otherwise
# check that we have at least one valid tag to store
if len(available_tags) > 0:
# open our output file
with open(output_file, "w") as output_f:
# loop through our available tags
for tag in available_tags:
# write each one to the file
output_f.write("%s\n" % tag)
else:
print "No valid tags to be written to output file."

To get you started, the following code will read a whole file from start to finish in that order, and print each line individually:
with open(r"path/to.file.txt") as fin:
for line in fin.readlines():
print(line) # Python 2.7: Use 'print line' instead
If you need to remove the trailing new lines from each string, use .strip().
To write data to a file, use something like the following:
with open(r"path/to/out/file.txt", "w") as fout:
fout.writelines(data_to_write)

Related

Python open() doesn't create a file when in w+ mode

I'm creating an application that looks into a website's text and then checks if the input string is in the url of the website's url. The way I'm doing is:
Replace the spaces (' ') in the given string (because url's can't have spaces, duh)
use requests to get the text of the website url
Create a new file and write every string you find in the website in the file.
Read the file line by line and if one line has the string in it, open it in a webbrowser.
I hope I explained it well. Here is my code:
def getGame():
game = gameEntry.get()
gameClean = game.replace(' ', '_')
print(gameClean)
gameCheck1 = requests.get('INSERT LINK HERE')
game2 = gameCheck1.text
with open('Links.txt', 'w+') as f:
f.write(game2)
readLinks = f.readlines()
for link in readLinks:
if game in link:
print(f'Found working link: {link}')
Thanks in advance.
When you write to the file, the file pointer ends up at the end of the file; a subsequent read begins at the end of the file and finds nothing. To fix, call f.seek(0) after the write call to move the file pointer back to the beginning of the file.
Also, just as a side-note, there's no reason to call .readlines(); just delete the readlines line entirely and change the loop to:
for link in f:
and you'll read the lines on demand (instead of creating a whole list of them up front when you only need a line at a time).

Using a for loop to add a new line to a table: python

I am trying to create a .bed file after searching through DNA sequences for two regular expressions. Ideally, I'd like to generate a tab-separated file which contains the sequence description, the start location of the first regex and the end location of the second regex. I know that the regex section works, it's just creating the \t separated file I am struggling with.
I was hoping that I could open/create a file and simply print a new line for each iteration of the for loop that contains this information, like so:
with open("Mimp_hits.bed", "a+") as file_object:
for line in file_object:
print(f'{sequence.description}\t{h.start()}\t{h_rc.end()}')
file_object.close()
But this doesn't seem to work (creates empty file). I have also tried to use file_object.write, but again this creates an empty file too.
This is all of the code I have including searching for the regexes:
import re, sys
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
infile = sys.argv[1]
for sequence in SeqIO.parse(infile, "fasta"):
hit = re.finditer(r"CAGTGGG..GCAA[TA]AA", str(sequence.seq))
mimp_length = 400
for h in hit:
h_start = h.start()
hit_rc = re.finditer(r"TT[TA]TTGC..CCCACTG", str(sequence.seq))
for h_rc in hit_rc:
h_rc_end = h_rc.end()
length = h_rc_end - h_start
if length > 0:
if length < mimp_length:
with open("Mimp_hits.bed", "a+") as file_object:
for line in file_object:
print(sequence.description, h.start(), h_rc.end())
file_object.close()
This is the desired output:
Focub_II5_mimp_1__contig_1.16(656599:656809) 2 208
Focub_II5_mimp_2__contig_1.47(41315:41540) 2 223
Focub_II5_mimp_3__contig_1.65(13656:13882) 2 224
Focub_II5_mimp_4__contig_1.70(61591:61809) 2 216
This is example input:
>Focub_II5_mimp_1__contig_1.16(656599:656809)
TACAGTGGGATGCAAAAAGTATTCGCAGGTGTGTAGAGAGATTTGTTGCTCGGAAGCTAGTTAGGTGTAGCTTGTCAGGTTCTCAGTACCCTATATTACACCGAGATCAGCGGGATAATCTAGTCTCGAGTACATAAGCTAAGTTAAGCTACTAACTAGCGCAGCTGACACAACTTACACACCTGCAAATACTTTTTGCATCCCACTGTA
>Focub_II5_mimp_2__contig_1.47(41315:41540)
TACAGTGGGAGGCAATAAGTATGAATACCGGGCGTGTATTGTTTTCTGCCGCTAGCCCATTTTAACAGCTAGAGTGTGTATATTAACCTCACACATAGCTATCTCTTATACTAATTGGTTAGGGAAAACCTCTAACCAGGATTAGGAGTCAACATAGCTTGTTTTAGGCTAAGAGGTGTGTGTCAGTACACCAAAGGGTATTCATACTTATTGCCCCCCACTGTA
>Focub_II5_mimp_3__contig_1.65(13656:13882)
TACAGTGGGAGGCAATAAGTATGAATACCGGGCGTGTATTGTTTTTCTGCCGCTAGCCTATTTTAATAGTTAGAGTGTGCATATTAACCTCACACATAGCTATCTTATATACTAATCGGTTAGGGAAAACCTCTAACCAGGATTAGGAGTCAACATAGCTTCTTTTAGGCTAAGAGGTGTGTGTCAGTACACCAAAGGGTATTCATACTTATTGCCCCCCACTGTA
>Focub_II5_mimp_4__contig_1.70(61591:61809)
TACAGTGGGATGCAATAAGTTTGAATGCAGGCTGAAGTACCAGCTGTTGTAATCTAGCTCCTGTATACAACGCTTTAGCTTGATAAAGTAAGCGCTAAGCTGTATCAGGCAAAAGGCTATCCCGATTGGGGTATTGCTACGTAGGGAACTGGTCTTACCTTGGTTAGTCAGTGAATGTGTACTTGAGTTTGGATTCAAACTTATTGCATCCCACTGTA
Is anybody able to help?
Thank you :)
to write a line to a file you would do something like this:
with open("file.txt", "a") as f:
print("new line", file=f)
and if you want it tab separated you can also add sep="\t", this is why python 3 made print a function so you can use sep, end, file, and flush keyword arguments. :)
opening a file for appending means the file pointer starts at the end of the file which means that writing to it doesn't override any data (gets appended to the end of the file) and iterating over it (or otherwise reading from it) gives nothing like you already reached the end of the file.
So instead of iterating over the lines of the file you would just write the single line to it:
with open("Mimp_hits.bed", "a") as file_object:
print(sequence.description, h.start(), h_rc.end(), file=file_object)
you can also consider just opening the file near the beginning of the loop since opening it once and writing multiple times is more efficient than opening it multiple times, also the with block automatically closes the file so no need to do that explicitly.
You are trying to open the file in "a+" mode, and loop over lines from it (which will not find anything because the file is positioned at the end when you do that). In any case, if this is an output file only, then you would open it in "a" mode to append to it.
Probably you just want to open the file once for appending, and inside the with statement, do your main loop, using file_object.write(...) when you want to actually append strings to the file. Note that there is no need for file_object.close() when using this with construct.
with open("Mimp_hits.bed", "a") as file_object:
for sequence in SeqIO.parse(infile, "fasta"):
# ... etc per original code ...
if length < mimp_length:
file_object.write("{}\t{}\t{}\n".format(
sequence.description, h.start(), h_rc.end()))

python search for string in file return entire line + next line into new text file

I have a very large text file (50,000+ lines) that should always be in the same sequence. In python I want to search the text file for each of the $INGGA lines and join this line with the subsequent $INHDT to create a new text file. I need to do this without reading into memory as this causes it to crash every time. I can find return the $INGGA line but I'm not sure of the best way of then getting the next line and joining into a new string that is memory efficient
Thanks
Phil
=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2016.05.06 09:11:34 =~=~=~=~=~=~=~=~=~=~=~= > $PRDID,2.15,-0.10,31.87*6E
$INGGA,091124.00,5249.8336,N,00120.9619,W,1,20,0.6,95.0,M,49.4,M,,*50
$INHDT,31.9,T*1E $INZDA,091124.0055,06,05,2016,,*7F
$INVTG,22.0,T,,M,4.4,N,8.1,K,A*24 $PRDID,2.13,-0.06,34.09*6C
$INGGA,091124.20,5249.8338,N,00120.9618,W,1,20,0.6,95.0,M,49.4,M,,*5D
$INHDT,34.1,T*13 $INZDA,091124.2055,06,05,2016,,*7D
$INVTG,24.9,T,,M,4.4,N,8.1,K,A*2B $PRDID,2.16,-0.03,36.24*61
$INGGA,091124.40,5249.8340,N,00120.9616,W,1,20,0.6,95.0,M,49.4,M,,*5A
$INHDT,36.3,T*13 $INZDA,091124.4055,06,05,2016,,*7B
$INVTG,27.3,T,,M,4.4,N,8.1,K,A*22 $PRDID,2.11,-0.05,38.33*68
$INGGA,091124.60,5249.8343,N,00120.9614,W,1,20,0.6,95.1,M,49.4,M,,*58
$INHDT,38.4,T*1A $INZDA,091124.6055,06,05,2016,,*79
$INVTG,29.5,T,,M,4.4,N,8.1,K,A*2A $PRDID,2.09,-0.02,40.37*6D
$INGGA,091124.80,5249.8345,N,00120.9612,W,1,20,0.6,95.1,M,49.4,M,,*56
$INHDT,40.4,T*15 $INZDA,091124.8055,06,05,2016,,*77
$INVTG,31.7,T,,M,4.4,N,8.1,K,A*21 $PRDID,2.09,0.02,42.42*40
$INGGA,091125.00,5249.8347,N,00120.9610,W,1,20,0.6,95.1,M,49.4,M,,*5F
$INHDT,42.4,T*17
You can just read a line of file and write to another new file.
Like this:
import re
#open new file with append
nf = open('newfile', 'at')
#open file with read
with open('file', 'rt') as f:
for line in f:
r = re.match(r'\$INGGA', line)
if r is not None:
nf.write(line)
nf.write("$INHDT,31.9,T*1E" + '\n')
You can use at to append write and wt to read line!
I have 150,000 lines file, It's run well!
I suggest using a simple regex that will parse and capture the parts you care about. Here is an example that will capture the piece you care about:
(\$INGGA.*\n\$INHDT.*\n)
https://regex101.com/r/tK1hF0/3
As in my above link, you'll notice that I used the "global" g setting on the regex, telling it to capture all groups that match. Otherwise, it'll stop after the first match.
I also had trouble determining where the actual line breaks exist in your above example file, so you can tweak the above to match exactly where the breaks occur.
Here is some starter python example code:
import re
test_str = # load your file here
p = re.compile(ur'(\$INGGA.*\n\$INHDT.*\n)')
matches = re.findall(p, test_str)
In the example PuTTY log you give, its all one line separated with space.
So in this case you can use this to replace the space with new line and gets new file -
cat large_file | sed 's/ /\n/g' > new_large_file
To iterate over the file separated with new line, run this -
cat new_large_file | python your_script.py
Your script get line by line so your computer should not crash.
your_script.py -
import sys
INGGA_line = ""
for line in sys.stdin:
line_striped = line.strip()
if line_striped.startswith("$INGGA"):
INGGA_line = line_striped
elif line_striped.startswith("$INZDA"):
print line_striped, INGGA_line
else:
print line_striped
This answer is aimed at python 3.
According to this other answer (and the docs), you can iterate your file line-by-line memory-efficiently:
with open(filename, 'r') as f:
for line in f:
...process...
An example of how you could fulfill your above criteria could be
# Target file write-only, source file read-only
with open(targetfile, 'w') as tf, open(sourcefile, 'r') as sf:
# Flag for whether we are looking for 1st or 2nd part
look_for_ingga = True
for line in sf:
if look_for_ingga:
if line.startswith('$INGGA,'):
tf.write(line)
look_for_ingga = False
elif line.startswith('$INHDT,'):
tf.write(line)
look_for_ingga = True
In the case where you have multiple '$INGGA,' prior to the '$INHDT,', this grabs the first one and disregards the rest. In case you want to take only the last '$INGGA,' before the '$INHDT,', store the last '$INGGA,' in a variable instead of writing it to disk. Then, when you find your '$INHDT,', store both.
In case you meant that you want to write to a separate new file for each INGGA-INHDT pair, the target file with-statement should be nested inside for line in sf instead, or the results should be buffered in a list for later storage.
Refer to the docs for introductions to with-statements and file reading/writing.

How to take input of a directory

What I'm trying to do is troll through a directory of log files which begin like this "filename001.log" there can be 100s of files in a directory
The code I want to run against each files is to check to make sure that the 8th position of the log always contains a number. I have a suspicion that a non-digit is throwing off our parser. Here's some simple code I'm trying to check this:
# import re
from urlparse import urlparse
a = '/folderA/filename*.log' #<< currently this only does 1 file
b = '/folderB/' #<< I'd like it to write the same file name as it read
with open(b, 'w') as newfile, open(a, 'r') as oldfile:
data = oldfile.readlines()
for line in data:
parts = line.split()
status = parts[8] # value of 8th position in the log file
isDigit = status.isdigit()
if isDigit = False:
print " Not A Number :",status
newfile.write(status)
My problem is:
How do I tell it to read all the files in a directory? (The above really only works for 1 file at a time)
If I find something is not a number I would like to write that character into a file in a different folder but of the same name as the log file. For example I find filename002.log has a "*" in one of the log lines. I would like folderB/filename002.log to made and the non-digit character to the written.
Sounds sounds simple enough I'm just a not very good at coding.
To read files in one directory matching a given pattern and write to another, use the glob module and the os.path functions to construct the output files:
srcpat = '/folderA/filename*.log'
dstdir = '/folderB'
for srcfile in glob.iglob(srcpat):
if not os.path.isfile(srcfile): continue
dstfile = os.path.join(dstdir, os.path.basename(srcfile))
with open(srcfile) as src, open(dstfile, 'w') as dst:
for line in src:
parts = line.split()
status = parts[8] # value of 8th position in the log file
if not status.isdigit():
print " Not A Number :", status
dst.write(status) # Or print >>dst, status if you want newline
This will create empty files even if no bad entries are found. You can either wait until you're finished processing the files (and the with block is closed) and just check the file size for the output and delete it if empty then, or you can move to a lazy approach where you delete the output file before beginning iteration unconditionally, but don't open it; only if you get a bad value do you open the file (for append instead of write to keep earlier loops' output from being discarded), write to it, allow it to close.
Import os and use: for filenames in os.listdir('path'):. This will list all files in the directory, including subdirectories.
Simply open a second file with the correct path. Since you already have filename from iterating with the above method, you only have to replace the directory. You can use os.path.join for that.

Python unable to read lines from text file after read()

I have a problem whereby I am trying to first check a text file for the existence of a known string, and based on this, loop over the file and insert a different line.
For some reason, after calling file.read() to check for the test string, the for loop appears not to work. I have tried calling file.seek(0) to get back to the start of the file, but this has not helped.
My current code is as follows:
try:
f_old = open(text_file)
f_new = open(text_file + '.new','w')
except:
print 'Unable to open text file!'
logger.info('Unable to open text file, exiting')
sys.exit()
wroteOut = False
# first check if file contains an test string
if '<dir>' in f_old.read():
#f_old.seek(0) # <-- do we need to do this??
for line in f_old: # loop thru file
print line
if '<test string>' in line:
line = ' <found the test string!>'
if '<test string2>' in line:
line = ' <found test string2!>'
f_new.write(line) # write out the line
wroteOut = True # set flag so we know it worked
f_new.close()
f_old.close()
You already know the answer:
#f_old.seek(0) # <-- do we need to do this??
Yes, you need to seek back to the start of the file before you can read the contents again.
All file operations work with the current file position. Using file.read() reads all of the file, leaving the current position set to the end of the file. If you wanted to re-read data from the start of the file, a file.seek(0) call is required. The alternatives are to:
Not read the file again, you just read all of the data, so use that information instead. File operations are slow, using the same data from memory is much, much faster:
contents = f_old.read()
if '<dir>' in contents:
for line in contents.splitlines():
# ....
Re-open the file. Opening a file in read mode puts the current file position back at the start.

Categories