Python script finding text in a file and doing checks on it - python

I have a file called parks, I read it and save it in parks_f. Now I want it to find a string in this file and print everything until the next ',' and do some checks on it
parks_f = open(parks).read()
find_str = "text_to_be_found=" in parks_f
if (find_str == (global_var_P) or find_str == (string_const))
# do this
else
#do this
How can I do this in python?

with open('input.txt', 'r') as f:
x = f.read()
x = x.split(',')
print x
The above code will open a file and perform read operation over it. In python the file read() returns a string. In the next line, i have split the returned string by using a symbol which is ',' here. So, after execution of that statement, x is now a list. You can do any thing with that list as desired, same as any other python list.

Related

Taking information by line from a file to tuple

I'm doing a python decryting program for a school project.
So first of all, i have a function who takes a file as argument. Then i must take all the line by line and return a tuple.
This file containt 3 things : -a number(whatever it's), -the decrypted text, -the crypted text)
import sys
fileName = sys.argv[-1]
def load_data(fileName):
tuple = ()
data = open(fileName, 'r')
content = data.readlines()
for i in contenu:
tuple += (i,)
return tuple #does nothing why?
print(tuple)
load_data(fileName)
Output:
('13\n', 'mecanisme chiffres substituer\n', "'dmnucmnn gmnuaetiihmnunofrutfrmhamprmnunshusfua f ludmuaoccsfta rtofumruvosnu vmzul ur aemudmulmnudmaetiihmhulmnucmnn gmnuaetiihmnunofrudtnpoftblmnunosnul uiohcmudusfurmxrmuaofnrtrsmudmulmrrhmnuctfsnaslmnun fnu aamfrumrudmua h armhmnubl fanuvosnun vmzuqsmulmucma ftncmudmuaetiihmcmfrusrtltnmuaofntnrmu unsbnrtrsmhulmnua h armhmnudsucmnn gmudmudmp hrup hudu srhmnumfuhmnpmar frusfudtartoff thmudmuaetiihmcmfr'")
Output needed:
(13,'mecanisme chiffres substituer','dmnucmnn gmnuaetiihmnunofrutfrmhamprmnunshusfua f ludmuaoccsfta rtofumruvosnu vmzul ur aemudmulmnudmaetiihmhulmnucmnn gmnuaetiihmnunofrudtnpoftblmnunosnul uiohcmudusfurmxrmuaofnrtrsmudmulmrrhmnuctfsnaslmnun fnu aamfrumrudmua h armhmnubl fanuvosnun vmzuqsmulmucma ftncmudmuaetiihmcmfrusrtltnmuaofntnrmu unsbnrtrsmhulmnua h armhmnudsucmnn gmudmudmp hrup hudu srhmnumfuhmnpmar frusfudtartoff thmudmuaetiihmcmfr')
The tuple need to be like this (count,word_list,crypted), 13 as count and so on..
If someone can help me it would be great.
Sorry if i'm asking wrongly my question..
You could try this to avoid the '\n' characters at the end
import sys
fileName = sys.argv[-1]
def load_data(fileName):
tuple = ()
data = open(fileName, 'r')
content = data.readlines()
for i in content:
tuple += (i.strip(''' \n'"'''),)
return tuple
print(load_data(fileName));
Note that a function ends when ever it finds a return statement, if you want to print the value of tuple do the before return statement or print the returned value.
I am a little confused about what the file in question looks like, but from what I could infer from the output you got the file appears to be something like this:
some number
decrypted text
encrypted text
If so, the most straightforward way to do this would be
with open('lines.txt','r') as f:
all_the_text = f.read()
list_of_text = all_the_text.split('\n')
tuple_of_text = tuple(list_of_text)
print(tuple_of_text)
Explanation:
The open built-in function creates an object that allows you to interact with the file. We use open with the argument 'r' to let it know we only want to read from the file. Doing this within a with statement ensures that the file gets closed properly when you are done with it. The as keyword followed by f tells us that we want the file object to be placed into the variable f. f.read() reads in all of the text in the file. String objects in python contain a split method that will place strings separated by some delimiter into a list without placing the delimiter into the separated strings. The split method will return the results in a list. To put it into a tuple, simply pass the list into tuple.

Same value in list keeps getting repeated when writing to text file

I'm a total noob to Python and need some help with my code.
The code is meant to take Input.txt [http://pastebin.com/bMdjrqFE], split it into seperate Pokemon (in a list), and then split that into seperate values which I use to reformat the data and write it to Output.txt.
However, when I run the program, only the last Pokemon gets outputted, 386 times. [http://pastebin.com/wkHzvvgE]
Here's my code:
f = open("Input.txt", "r")#opens the file (input.txt)
nf = open("Output.txt", "w")#opens the file (output.txt)
pokeData = []
for line in f:
#print "%r" % line
pokeData.append(line)
num = 0
tab = """ """
newl = """NEWL
"""
slash = "/"
while num != 386:
current = pokeData
current.append(line)
print current[num]
for tab in current:
words = tab.split()
print words
for newl in words:
nf.write('%s:{num:%s,species:"%s",types:["%s","%s"],baseStats:{hp:%s,atk:%s,def:%s,spa:%s,spd:%s,spe:%s},abilities:{0:"%s"},{1:"%s"},heightm:%s,weightkg:%s,color:"Who cares",eggGroups:["%s"],["%s"]},\n' % (str(words[2]).lower(),str(words[1]),str(words[2]),str(words[3]),str(words[4]),str(words[5]),str(words[6]),str(words[7]),str(words[8]),str(words[9]),str(words[10]),str(words[12]).replace("_"," "),str(words[12]),str(words[14]),str(words[15]),str(words[16]),str(words[16])))
num = num + 1
nf.close()
f.close()
There are quite a few problems with your program starting with the file reading.
To read the lines of a file to an array you can use file.readlines().
So instead of
f = open("Input.txt", "r")#opens the file (input.txt)
pokeData = []
for line in f:
#print "%r" % line
pokeData.append(line)
You can just do this
pokeData = open("Input.txt", "r").readlines() # This will return each line within an array.
Next you are misunderstanding the uses of for and while.
A for loop in python is designed to iterate through an array or list as shown below. I don't know what you were trying to do by for newl in words, a for loop will create a new variable and then iterate through an array setting the value of this new variable. Refer below.
array = ["one", "two", "three"]
for i in array: # i is created
print (i)
The output will be:
one
two
three
So to fix alot of this code you can replace the whole while loop with something like this.
(The code below is assuming your input file has been formatted such that all the words are split by tabs)
for line in pokeData:
words = line.split (tab) # Split the line by tabs
nf.write ('your very long and complicated string')
Other helpers
The formatted string that you write to the output file looks very similar to the JSON format. There is a builtin python module called json that can convert a native python dict type to a json string. This will probably make things alot easier for you but either way works.
Hope this helps

Appending lines to a file, then reading them

I want to append or write multiple lines to a file. I believe the following code appends one line:
with open(file_path,'a') as file:
file.write('1')
My first question is that if I do this:
with open(file_path,'a') as file:
file.write('1')
file.write('2')
file.write('3')
Will it create a file with the following content?
1
2
3
Second question—if I later do:
with open(file_path,'r') as file:
first = file.read()
second = file.read()
third = file.read()
Will that read the content to the variables so that first will be 1, second will be 2 etc? If not, how do I do it?
Question 1: No.
file.write simple writes whatever you pass to it to the position of the pointer in the file. file.write("Hello "); file.write("World!") will produce a file with contents "Hello World!"
You can write a whole line either by appending a newline character ("\n") to each string to be written, or by using the print function's file keyword argument (which I find to be a bit cleaner)
with open(file_path, 'a') as f:
print('1', file=f)
print('2', file=f)
print('3', file=f)
N.B. print to file doesn't always add a newline, but print itself does by default! print('1', file=f, end='') is identical to f.write('1')
Question 2: No.
file.read() reads the whole file, not one line at a time. In this case you'll get
first == "1\n2\n3"
second == ""
third == ""
This is because after the first call to file.read(), the pointer is set to the end of the file. Subsequent calls try to read from the pointer to the end of the file. Since they're in the same spot, you get an empty string. A better way to do this would be:
with open(file_path, 'r') as f: # `file` is a bad variable name since it shadows the class
lines = f.readlines()
first = lines[0]
second = lines[1]
third = lines[2]
Or:
with open(file_path, 'r') as f:
first, second, third = f.readlines() # fails if there aren't exactly 3 lines
The answer to the first question is no. You're writing individual characters. You would have to read them out individually.
Also, note that file.read() returns the full contents of the file.
If you wrote individual characters and you want to read individual characters, process the result of file.read() as a string.
text = open(file_path).read()
first = text[0]
second = text[1]
third = text[2]
As for the second question, you should write newline characters, '\n', to terminate each line that you write to the file.
with open(file_path, 'w') as out_file:
out_file.write('1\n')
out_file.write('2\n')
out_file.write('3\n')
To read the lines, you can use file.readlines().
lines = open(file_path).readlines()
first = lines[0] # -> '1\n'
second = lines[1] # -> '2\n'
third = lines[2] # -> '3\n'
If you want to get rid of the newline character at the end of each line, use strip(), which discards all whitespace before and after a string. For example:
first = lines[0].strip() # -> '1'
Better yet, you can use map to apply strip() to every line.
lines = list(map(str.strip, open(file_path).readlines()))
first = lines[0] # -> '1'
second = lines[1] # -> '2'
third = lines[2] # -> '3'
Writing multiple lines to a file
This will depend on how the data is stored. For writing individual values, your current example is:
with open(file_path,'a') as file:
file.write('1')
file.write('2')
file.write('3')
The file will contain the following:
123
It will also contain whatever contents it had previously since it was opened to append. To write newlines, you must explicitly add these or use writelines(), which expects an iterable.
Also, I don't recommend using file as an object name since it is a keyword, so I will use f from here on out.
For instance, here is an example where you have a list of values that you write using write() and explicit newline characters:
my_values = ['1', '2', '3']
with open(file_path,'a') as f:
for value in my_values:
f.write(value + '\n')
But a better way would be to use writelines(). To add newlines, you could join them with a list comprehension:
my_values = ['1', '2', '3']
with open(file_path,'a') as f:
f.writelines([value + '\n' for value in my_values])
If you are looking for printing a range of numbers, you could use a for loop with range (or xrange if using Python 2.x and printing a lot of numbers).
Reading individual lines from a file
To read individual lines from a file, you can also use a for loop:
my_list = []
with open(file_path,'r') as f:
for line in f:
my_list.append(line.strip()) # strip out newline characters
This way you can iterate through the lines of the file returned with a for loop (or just process them as you read them, particularly if it's a large file).

Find a string and insert text after it in Python

I am still learner in python. I was not able to find a specific string and insert multiple strings after that string in python. I want to search the line in the file and insert the content of write function
I have tried the following which is inserting at the end of the file.
line = '<abc hij kdkd>'
dataFile = open('C:\\Users\\Malik\\Desktop\\release_0.5\\release_0.5\\5075442.xml', 'a')
dataFile.write('<!--Delivery Date: 02/15/2013-->\n<!--XML Script: 1.0.0.1-->\n')
dataFile.close()
You can use fileinput to modify the same file inplace and re to search for particular pattern
import fileinput,re
def modify_file(file_name,pattern,value=""):
fh=fileinput.input(file_name,inplace=True)
for line in fh:
replacement=value + line
line=re.sub(pattern,replacement,line)
sys.stdout.write(line)
fh.close()
You can call this function something like this:
modify_file("C:\\Users\\Malik\\Desktop\\release_0.5\\release_0.5\\5075442.xml",
"abc..",
"!--Delivery Date:")
Python strings are immutable, which means that you wouldn't actually modify the input string -you would create a new one which has the first part of the input string, then the text you want to insert, then the rest of the input string.
You can use the find method on Python strings to locate the text you're looking for:
def insertAfter(haystack, needle, newText):
""" Inserts 'newText' into 'haystack' right after 'needle'. """
i = haystack.find(needle)
return haystack[:i + len(needle)] + newText + haystack[i + len(needle):]
You could use it like
print insertAfter("Hello World", "lo", " beautiful") # prints 'Hello beautiful world'
Here is a suggestion to deal with files, I suppose the pattern you search is a whole line (there is nothing more on the line than the pattern and the pattern fits on one line).
line = ... # What to match
input_filepath = ... # input full path
output_filepath = ... # output full path (must be different than input)
with open(input_filepath, "r", encoding=encoding) as fin \
open(output_filepath, "w", encoding=encoding) as fout:
pattern_found = False
for theline in fin:
# Write input to output unmodified
fout.write(theline)
# if you want to get rid of spaces
theline = theline.strip()
# Find the matching pattern
if pattern_found is False and theline == line:
# Insert extra data in output file
fout.write(all_data_to_insert)
pattern_found = True
# Final check
if pattern_found is False:
raise RuntimeError("No data was inserted because line was not found")
This code is for Python 3, some modifications may be needed for Python 2, especially the with statement (see contextlib.nested. If your pattern fits in one line but is not the entire line, you may use "theline in line" instead of "theline == line". If your pattern can spread on more than one line, you need a stronger algorithm. :)
To write to the same file, you can write to another file and then move the output file over the input file. I didn't plan to release this code, but I was in the same situation some days ago. So here is a class that insert content in a file between two tags and support writing on the input file: https://gist.github.com/Cilyan/8053594
Frerich Raabe...it worked perfectly for me...good one...thanks!!!
def insertAfter(haystack, needle, newText):
#""" Inserts 'newText' into 'haystack' right after 'needle'. """
i = haystack.find(needle)
return haystack[:i + len(needle)] + newText + haystack[i + len(needle):]
with open(sddraft) as f1:
tf = open("<path to your file>", 'a+')
# Read Lines in the file and replace the required content
for line in f1.readlines():
build = insertAfter(line, "<string to find in your file>", "<new value to be inserted after the string is found in your file>") # inserts value
tf.write(build)
tf.close()
f1.close()
shutil.copy("<path to the source file --> tf>", "<path to the destination where tf needs to be copied with the file name>")
Hope this helps someone:)

.split() creating a blank line in python3

I am trying to convert a 'fastq' file in to a tab-delimited file using python3.
Here is the input: (line 1-4 is one record that i require to print as tab separated format). Here, I am trying to read in each record in to a list object:
#SEQ_ID
GATTTGGGGTT
+
!''*((((***
#SEQ_ID
GATTTGGGGTT
+
!''*((((***
using this:
data = open('sample3.fq')
fq_record = data.read().replace('#', ',#').split(',')
for item in fq_record:
print(item.replace('\n', '\t').split('\t'))
Output is:
['']
['#SEQ_ID', 'GATTTGGGGTT', '+', "!''*((((***", '']
['#SEQ_ID', 'GATTTGGGGTT', '+', "!''*((((***", '', '']
I am geting a blank line at the begining of the output, which I do not understand why ??
I am aware that this can be done in so many other ways but I need to figure out the reason as I am learning python.
Thanks
When you replace # with ,#, you put a comma at the beginning of the string (since it starts with #). Then when you split on commas, there is nothing before the first comma, so this gives you an empty string in the split. What happens is basically like this:
>>> print ',x'.split(',')
['', 'x']
If you know your data always begins with #, you can just skip the empty record in your loop. Just do for item in fq_record[1:].
You can also go line-by-line without all the replacing:
fobj = io.StringIO("""#SEQ_ID
GATTTGGGGTT
+
!''*((((***
#SEQ_ID
GATTTGGGGTT
+
!''*((((***""")
data = []
entry = []
for raw_line in fobj:
line = raw_line.strip()
if line.startswith('#'):
if entry:
data.append(entry)
entry = []
entry.append(line)
data.append(entry)
data looks like this:
[['#SEQ_ID', 'GATTTGGGGTTy', '+', "!''*((((***"],
['#SEQ_ID', 'GATTTGGGGTTx', '+', "!''*((((***"]]
Thank you all for your answers. As a beginner, my main problem was the occurrence of a blank line upon .split(',') which I have now understood conceptually. So my first useful program in python is here:
# this script converts a .fastq file in to .fasta format
import sys
# Usage statement:
print('\nUsage: fq2fasta.py input-file output-file\n=========================================\n\n')
# define a function for fasta formating
def format_fasta(name, sequence):
fasta_string = '>' + name + "\n" + sequence + '\n'
return fasta_string
# open the file for reading
data = open(sys.argv[1])
# open the file for writing
fasta = open(sys.argv[2], 'wt')
# feed all fastq records in to a list
fq_records = data.read().replace('#', ',#').split(',')
# iterate through list objects
for item in fq_records[1:]: # this is to avoid the first line which is created as blank by .split() function
line = item.replace('\n', '\t').split('\t')
name = line[0]
sequence = line[1]
fasta.write(format_fasta(name, sequence))
fasta.close()
Other things suggested in the answers would be more clear to me as I learn more.
Thanks again.

Categories