python compare and output to new file - python

I have a Python script that does the following in the stated order:
Takes an argument (in this case a filename) and removes all characters other than a-z A-Z 0-9 and period '.'.
Strips out all information from the new file except IP addresses that are later going to be compared to a watchlist
Cleans up the file and saves it as a new file to be compared to the watchlist
finally it compares this cleaned up file (ip_list_clea) to the watchlist file and outputs matching lines to a new file (malicious_ips).
It is part 4 I am struggling with. The following code works up until stage 4 which stops the rest from working:
#!/usr/bin/python
import re
import sys
import cgi
# Compare the cleaned up list of IPs against the botwatch
# list and output the results to a new file.
new_list = set()
outfile = open("final_downloads/malicious_ips", "w")
for line in open("final_downloads/ip_list_clean", "r")
if line in open("/var/www/botwatch.txt", "r")
outfile.write(line)
new_list.add(line)
outfile.close()
Any ideas as to why the last section does not work? In fact, it stops the whole thing from working.

You are missing some colons in the last section. Try this:
new_list = set()
outfile = open("final_downloads/malicious_ips", "w")
for line in open("final_downloads/ip_list_clean", "r"):
if line in open("/var/www/botwatch.txt", "r"):
outfile.write(line)
new_list.add(line)
outfile.close()

Related

Python adding a string leaves extra characters

If you need any more info just Let Me Know
I have a python script that adds a string after each line on a CSV file. the line file_lines = [''.join([x.strip(), string_to_add, '\n']) for x in f.readlines()] is the trouble maker. For each file line it will add the string and then add a new line after each time the string is added.
Here is the script:
#Adding .JPG string to the end of each line for the Part Numbers
string_to_add = ".JPG"
#Open the file and join the .JPG to the current lines
with open("PartNums.csv", 'r') as f:
file_lines = [''.join([x.strip(), string_to_add, '\n']) for x in f.readlines()]
#Writes to the file until its done
with open("PartNums.csv", 'w') as f:
f.writelines(file_lines)
The script works and does what it is supposed to, however my issue is later on in this larger script. This script outputs into a CSV file and it looks like this:
X00TB0001.JPG
X01BJ0003.JPG
X01BJ0004.JPG
X01BJ0005.JPG
X01BJ0006.JPG
X01BJ0007.JPG
X01BJ0008.JPG
X01BJ0026.JPG
X01BJ0038.JPG
X01BJ0039.JPG
X01BJ0040.JPG
X01BJ0041.JPG
...
X01BJ0050.JPG
X01BJ0058.JPG
X01BJ0059.JPG
X01BJ0060.JPG
X01BJ0061.JPG
X01BJ0170.JPG
X01BJ0178.JPG
Without the \n in that line the csv file output looks like this file_lines = [''.join([x.strip(), string_to_add]) for x in f.readlines()]:
X00TB0001.JPGX01BJ0003.JPGX01BJ0004.JPGX01BJ0005.JPGX01BJ0006.JPG
The issue is when I go to read this file later and move files with it using this script:
#If the string matches a file name move it to a new directory
dst = r"xxx"
with open('PicsWeHave.txt') as my_file:
for filename in my_file:
src = os.path.join(XXX") # .strip() to avoid un-wanted white spaces
#shutil.copy(src, os.path.join(dst, filename.strip()))
shutil.copy(os.path.join(src, filename), os.path.join(dst, filename))
When I run this whole Script it works until it has to move the files I get this error:
FileNotFoundError: [Errno 2] No such file or directory: 'XXX\\X15SL0447.JPG\n'
I know the file exist however the '\n' should not be there and that's why I am asking how can I still get everything on a new line and not have \n after each name so when I move the file the strings match.
Thank You For Your Help!
As they said above you should use .strip():
shutil.copy(os.path.join(src, filename.strip()), os.path.join(dst, filename.strip()))
This way it gives you the file name or string you need and then it removes anything else.

How to extract only lines with specific word from text file and write a new one?

Whats the way to extract only lines with specific word only from requests (online text file) and write to a new text file? I am stuck here...
This is my code:
r = requests.get('http://website.com/file.txt'.format(x))
with open('data.txt', 'a') as f:
if 'word' in line:
f.write('\n')
f.writelines(str(r.text))
f.write('\n')
If I remove: if 'word' in line:, it works, but for all lines. So it's only copying all lines from one file to another.
Any idea how to give the correct command to extract (filter) only lines with specific word?
Update: This is working but If that word exist in the requests file, it start copying ALL lines, i need to copy only the line with 'SOME WORD'.
I have added this code:
for line in r.text.split('\n'):
if 'SOME WORD' in line:
*Thank you guys for all the answers and sorry If i didn't made myself clear.
Perhaps this will help.
Whenever you invoke POST/GET or whatever, always check the HTTP response code.
Now let's assume that the lines within the response text are delimited with newline ('\n') and that you want to write a new file (change the mode to 'a' if you want to append). Then:
import requests
(r := requests.get('SOME URL')).raise_for_status()
with open('SOME FILENAME', 'w') as outfile:
for line in r.text.split('\n'):
if 'SOME WORD' in line:
print(line, file=outfile)
break
Note:
You will need Python 3.8+ in order to take advantage of the walrus operator in this code
I would suggest you these steps for properly handling the file:
Step1:Streamline the download file to a temporary file
Step2:Read lines from the temporary file
Step3:Generate main file based on your filter
Step4:Delete the temporary file
Below is the code that does the following steps:
import requests
import os
def read_lines(file_name):
with open(file_name,'r') as fp:
for line in fp:
yield line
if __name__=="__main__":
word='ipsum'
temp_file='temp_file.txt'
main_file='main_file.txt'
url = 'https://filesamples.com/samples/document/txt/sample3.txt'
with open (temp_file,'wb') as out_file:
content = requests.get(url, stream=True).content
out_file.write(content)
with open(main_file,'w') as mf:
out=filter(lambda x: word in x,read_lines(temp_file))
for i in out:
mf.write(i)
os.remove(temp_file)
Well , there is missing line you have to put in order to check with if statement.
import requests
r = requests.get('http://website.com/file.txt').text
with open('data.txt', 'a') as f:
for line in r.splitlines(): #this is your loop where you get a hold of line.
if 'word' in line: #so that you can check your 'word'
f.write(line) # write your line contains your word

How to amend an existing python file

I am trying to ammend a group of files in a folder, by adding F to the 4th line (which is number 3 in python, if I'm correct). With the following code below, the code is just continuously running and not making the amendments, anyone got any ideas?
import os
from glob import glob
list_of_files = glob('*.gjf') # get list of all .gjf files
for file in list_of_files:
# read file:
with open(file, 'r+') as f:
lines=f.readlines()
for line in lines:
lines.insert(3,'F')
for file in files:
# read your file's lines
with open(file, 'r') as f:
lines = f.readlines()
# add the value to insert in the list of lines at index 3
# don't forget line-break (\n)
lines.insert(3, 'F\n')
# join lines to create a string
text = ''.join(lines)
# don't forget to write the string back in your file
with open(file, 'w') as f:
f.write(text)
You are not able to edit a file with Python in this way. You would need to create a temporary file and then do some cleanup at the end by renaming the temporary file and removing the temporary file. The core logic is to read through the original files and add an F to the 4th line, otherwise just add the line. This can be done with a function like this:
import os
def add_f_4th_line(filename):
with open(filename, 'r') as f_in:
with opne('temp', 'w') as f_out:
for line_number, line_contents in enumerate(f_in):
if line_number == 3:
f_out.write(line_contents.replace('\n', 'F\n'))
else:
f_out.write(line_contents)
os.rename('temp', filename)
os.remove('temp')
enumerate will keep make an iterator of the line number and the contents of the line, so just iterate over each line, and you can find the 4th line with line_number == 3. Then, it will take that line and replace \n with F\n. \n is a new line character, so I'm assuming the end of the line is this character. So after adding this function, you'll just need to call this function for each file you get with your glob call.
import os
from glob import glob
list_of_files = glob('*.gjf') # get list of all .gjf files
for file in list_of_files:
add_f_4th_line(file)

search replace the string from number of .txt files in python

there are multiple files in directory with extension .txt, .dox, .qcr etc.
i need to list out txt files, search & replace the text from each txt files only.
need to search the $$\d ...where \d stands for the digit 1,2,3.....100.
need to replace with xxx.
please let me know the python script for this .
thanks in advance .
-Shrinivas
#created following script, it works for single txt files, but it is not working for txt files more than one lies in directory.
-----
def replaceAll(file,searchExp,replaceExp):
for line in fileinput.input(file, inplace=1):
if searchExp in line:
line = line.replace(searchExp,replaceExp)
sys.stdout.write(line)
#following code is not working, i expect to list out the files start #with "um_*.txt", open the file & replace the "$$\d" with replaceAll function.
for um_file in glob.glob('*.txt'):
t = open(um_file, 'r')
replaceAll("t.read","$$\d","xxx")
t.close()
fileinput.input(...) is supposed to process a bunch of files, and must be ended with a corresponding fileinput.close(). So you can either process all in one single call:
def replaceAll(file,searchExp,replaceExp):
for line in fileinput.input(file, inplace=True):
if searchExp in line:
line = line.replace(searchExp,replaceExp)
dummy = sys.stdout.write(line) # to avoid a possible output of the size
fileinput.close() # to orderly close everythin
replaceAll(glob.glob('*.txt'), "$$\d","xxx")
or consistently close fileinput after processing each file, but it rather ignores the main fileinput feature.
Try out this.
import re
def replaceAll(file,searchExp,replaceExp):
for line in file.readlines():
try:
line = line.replace(re.findall(searchExp,line)[0],replaceExp)
except:
pass
sys.stdout.write(line)
#following code is not working, i expect to list out the files start #with "um_*.txt", open the file & replace the "$$\d" with replaceAll function.
for um_file in glob.glob('*.txt'):
t = open(um_file, 'r')
replaceAll(t,"\d+","xxx")
t.close()
Here we are sending file handler to the replaceAll function rather than a string.
You can try this:
import os
import re
the_files = [i for i in os.listdir("foldername") if i.endswith("txt")]
for file in the_files:
new_data = re.sub("\d+", "xxx", open(file).read())
final_file = open(file, 'w')
final_file.write(new_data)
final_file.close()

python printing a blank line on the first line when writing to a file

I'm stuck on why my code is printing a blank line before writing text to a file. What I am doing is reading two files from a zipped folder and writing the text to a new text file. I am getting the expected results in the file, except for the fact that there is a blank line on the first line of the file.
def test():
if zipfile.is_zipfile(r'C:\Users\test\Desktop\Zip_file.zip'):
zf = zipfile.ZipFile(r'C:\Users\test\Desktop\Zip_file.zip')
for filename in zf.namelist():
with zf.open(filename, 'r') as f:
words = io.TextIOWrapper(f)
new_file = io.open(r'C:\Users\test\Desktop\new_file.txt', 'a')
for line in words:
new_file.write(line)
new_file.write('\n')
else:
pass
zf.close()
words.close()
f.close()
new_file.close()
Output in new_file (there is a blank line before the first "This is a test line...")
This is a test line...
This is a test line...
this is test #2
this is test #2
Any ideas?
Thanks!
My guess is that the first file in zf.namelist() doesn't contain anything, so you skip the for line in words loop for that file and just do new_file.write('\n'). It's difficult to tell without seeing the files that you're looping over; perhaps add some debug statements that print out the files' names and some info, e.g. their size.

Categories