Python3 convert result of DB query to individual strings - python

I have 2 functions in a python script.
The first one gets the data from a database with a WHERE clause but the second function uses this data and iterates through the results to download a file.
I can get to print the results as a tuple?
[('mmpc',), ('vmware',), ('centos',), ('redhat',), ('postgresql',), ('drupal',)]
But I need to to iterate through each element as a string so the download function can append it onto the url for the response variable
Here is the code for the download script which contains the functions:-
import requests
import eventlet
import os
import sqlite3
# declare the global variable
active_vuln_type = None
# Get the active vulnerability sets
def GetActiveVulnSets() :
# make the variable global
global active_vuln_type
active_vuln_type = con = sqlite3.connect('data/vuln_sets.db')
cur = con.cursor()
cur.execute('''SELECT vulntype FROM vuln_sets WHERE active=1''')
active_vuln_type = cur.fetchall()
print(active_vuln_type)
return(active_vuln_type)
# return str(active_vuln_type)
def ExportList():
vulnlist = list(active_vuln_type)
activevulnlist = ""
for i in vulnlist:
activevulnlist = str(i)
basepath = os.path.dirname(__file__)
filepath = os.path.abspath(os.path.join(basepath, ".."))
response = requests.get('https://vulners.com/api/v3/archive/collection/?type=' + activevulnlist)
with open(filepath + '/vuln_files/' + activevulnlist + '.zip', 'wb') as f:
f.write(response.content)
f.close()
return activevulnlist + " - " + str(os.path.getsize(filepath + '/vuln_files/' + activevulnlist + '.zip'))
Currently it creates a corrupt .zip as ('mmpc',).zip so it is not the actual file which would be mmpc.zip for the first one but it does not seem to be iterating through the list either as it only creates the zip file for the first result from the DB, not any of the others but a print(i) returns [('mmpc',), ('vmware',), ('centos',), ('redhat',), ('postgresql',), ('drupal',)]
There is no traceback as the script thinks it is working.

The following fixes two issues: 1. converting the query output to an iterable of strings and 2. replacing the return statement with a print function so that the for-loop does not end prematurely.
I have also taken the liberty of removing some redundancies such as closing a file inside a with statement and pointlessly converting a list into a list. I am also calling the GetActiveVulnSets inside the ExportList function. This should eliminate the need to call GetActiveVulnSets outside of function definitions.
import requests
import eventlet
import os
import sqlite3
# declare the global variable
active_vuln_type = None
# Get the active vulnerability sets
def GetActiveVulnSets() :
# make the variable global
global active_vuln_type
active_vuln_type = con = sqlite3.connect('data/vuln_sets.db')
cur = con.cursor()
cur.execute('''SELECT vulntype FROM vuln_sets WHERE active=1''')
active_vuln_type = [x[0] for x in cur]
print(active_vuln_type)
return(active_vuln_type)
# return str(active_vuln_type)
def ExportList():
GetActiveVulnSets()
activevulnlist = ""
for i in active_vuln_type:
activevulnlist = str(i)
basepath = os.path.dirname(__file__)
filepath = os.path.abspath(os.path.join(basepath, ".."))
response = requests.get('https://vulners.com/api/v3/archive/collection/?type=' + activevulnlist)
with open(filepath + '/vuln_files/' + activevulnlist + '.zip', 'wb') as f:
f.write(response.content)
print(activevulnlist + " - " + str(os.path.getsize(filepath + '/vuln_files/' + activevulnlist + '.zip')))
While this may solve the problem you are encountering, I would recommend that you write functions with parameters. This way, you know what each function is supposed to take in as an argument and what it spits out as output. In essence, avoid the usage of global variables if you can. They are hard to debug and quite frankly unnecessary in many use cases.
I hope this helps.

Related

How to add strings in the file at the end of all lines

I am trying to download files using python and then add lines at the end of the downloaded files, but it returns an error:
f.write(data + """<auth-user-pass>
TypeError: can't concat str to bytes
Edit: Thanks, it works now when I do this b"""< auth-user-pass >""", but I only want to add the string at the end of the file. When I run the code, it adds the string for every line.
I also tried something like this but it also did not work: f.write(str(data) + "< auth-user-pass >")
here is my full code:
import requests
from multiprocessing.pool import ThreadPool
def download_url(url):
print("downloading: ", url)
# assumes that the last segment after the / represents the file name
# if url is abc/xyz/file.txt, the file name will be file.txt
file_name_start_pos = url.rfind("/") + 1
file_name = url[file_name_start_pos:]
save_path = 'ovpns/'
complete_path = os.path.join(save_path, file_name)
print(complete_path)
r = requests.get(url, stream=True)
if r.status_code == requests.codes.ok:
with open(complete_path, 'wb') as f:
for data in r:
f.write(data + """<auth-user-pass>
username
password
</auth-user-pass>""")
return url
servers = [
"us-ca72.nordvpn.com",
"us-ca73.nordvpn.com"
]
urls = []
for server in servers:
urls.append("https://downloads.nordcdn.com/configs/files/ovpn_legacy/servers/" + server + ".udp1194.ovpn")
# Run 5 multiple threads. Each call will take the next element in urls list
results = ThreadPool(5).imap_unordered(download_url, urls)
for r in results:
print(r)
EDIT: Thanks, it works now when I do this b"""< auth-user-pass >""", but I only want to add the string at the end of the file. When I run the code, it adds the string for every line.
Try this:
import requests
from multiprocessing.pool import ThreadPool
def download_url(url):
print("downloading: ", url)
# assumes that the last segment after the / represents the file name
# if url is abc/xyz/file.txt, the file name will be file.txt
file_name_start_pos = url.rfind("/") + 1
file_name = url[file_name_start_pos:]
save_path = 'ovpns/'
complete_path = os.path.join(save_path, file_name)
print(complete_path)
r = requests.get(url, stream=True)
if r.status_code == requests.codes.ok:
with open(complete_path, 'wb') as f:
for data in r:
f.write(data)
return url
servers = [
"us-ca72.nordvpn.com",
"us-ca73.nordvpn.com"
]
urls = []
for server in servers:
urls.append("https://downloads.nordcdn.com/configs/files/ovpn_legacy/servers/" + server + ".udp1194.ovpn")
# Run 5 multiple threads. Each call will take the next element in urls list
results = ThreadPool(5).imap_unordered(download_url, urls)
with open(complete_path, 'ab') as f:
f.write(b"""<auth-user-pass>
username
password
</auth-user-pass>""")
for r in results:
print(r)
You are using binary mode, encode your string before concat, that is replace
for data in r:
f.write(data + """<auth-user-pass>
username
password
</auth-user-pass>""")
using
for data in r:
f.write(data + """<auth-user-pass>
username
password
</auth-user-pass>""".encode())
You open the file as a write in binary.
Because of that you cant use normal strings like the comment from #user56700 said.
You either need to convert the string or open it another way(ex. 'a' = appending).
Im not completly sure but it is also possible that the write binary variant of open the data of the file deletes. Normally open with write deletes existing data, so its quite possible that you need to change it to 'rwb'.

Exporting results to CSV file

I'm pretty new to coding and I'm stuck on this problem. Written in python.
import logging
import os
import sys
import json
import pymysql
import requests
import csv
## set up logger to pass information to Cloudwatch ##
#logger = logging.getLogger()
#logger.setLevel(logging.INFO)
## define RDS variables ##
rds_host = 'host'
db_username = 'username'
db_password = 'password'
db_name = 'name'
## connect to rds database ##
try:
conn = pymysql.connect(host=rds_host, user=db_username, password=db_password, db=db_name, port=1234,
connect_timeout=10)
except Exception as e:
print("ERROR: Could not connect to MySql instance.")
print(e)
sys.exit()
print("SUCCESS: Connection to RDS mysql instance succeeded")
def main():
with conn.cursor() as cur:
cur.execute("SELECT Domain FROM domain_reg")
domains = cur.fetchall()
# logger.info(domains)
conn.close()
new_domains = []
for x in domains:
a = "http://" + x[0] + ("/orange/health")
new_domains.append(a)
print(new_domains)
for y in new_domains:
try:
response = requests.get(y)
if response.status_code == 200:
print("Domain " + y + " exists")
else:
print("Domain " + y + " does not exist; Status code = " + str(response.status_code))
except Exception as e:
print("Exception: With domain " + y)
with open("new_orangeZ.csv", "w", newline='') as csv_file:
writer = csv.writer(csv_file, delimiter=',')
for line in new_domains:
writer.writerow([new_domains])
if __name__ == "__main__":
main()
This code does create a CSV file, but it's not exactly exporting what I want it to export. It only creates a csv file listing only the "Y" and I understand that because i'm calling "new_domains" in writer.writerow. I'm trying to figure out how to also export the print function that matches with the if else statement into the csv, if that makes sense. Sorry if this may sounds gibberish, like I said, I'm super new to coding. Was hoping to post a picture of what I get in the csv file vs what I wanted but I'm new to stackoverflow also so it doesn't allow me to post pictures haha.
Thank you!!!
print() only displays the strings on the screen.
You need to remember them somewhere, like in a new list:
result=[] #somewhere at the beginning
...
print("Domain " + y + " exists")
result.append([y,"Domain " + y + " exists"]) #after each print
and save both in the CSV file with something like:
for domain,status in new_domains:
writer.writerow([domain, status])
It's easier to save the domains again, as the for / in may not keep their order.
By the way, with "for line in new_domains:" I guess you should have written "line" in the CSV insead of "new_domains"...
Question: m trying to figure out how to also export the print function that matches with the if else statement into the csv
If you want to print into a file, you have to give the file object to print(..., file=<my file object>. In your example, move the for ... inside of with ....
Note: It's no good Idea to use csv.writer(... for non csv data"
with open("test", "w", newline='') as my_file_object:
for y in new_domains:
From Python Documentation - Built-in Functions
print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False)
Print objects to the text stream file, separated by sep and followed by end.
sep, end, file and flush, if present, must be given as keyword arguments.
The file argument must be an object with a write(string) method; if it is not present or None, sys.stdout will be used.

Python pass variable between functions

I have 2 functions in one script that are called from another file. I want to pass the variable 'active_vuln_type' and its contents to the second function 'Download'.
The file with the scripts is:-
projectfolder/vuln_backend/download.py
import requests
import eventlet
import os
import sqlite3
#Get the active vulnerability sets
def GetActiveVulnSets() :
active_vuln_type = con = sqlite3.connect('data/vuln_sets.db')
cur = con.cursor()
cur.execute('''SELECT vulntype FROM vuln_sets WHERE active=1''')
active_vuln_type = cur.fetchall()
print(active_vuln_type)
return active_vuln_type
#Download the relevant collections
def Download(active_vuln_type) :
response = requests.get('https://vulners.com/api/v3/archive/collection/?type=' + active_vuln_type)
with open('vuln_files/' + active_vuln_type + '.zip' , 'wb') as f:
f.write(response.content)
f.close()
return active_vuln_type + " - " + str(os.path.getsize('vuln_files/' + active_vuln_type + '.zip'))
The main file in /
projectfolder/vuln_backend.py:-
from vuln_backend import vuln_sets, download, test
test.update_vuln_sets()
#vuln_sets.update_vuln_sets()
download.GetActiveVulnSets()
download.Download()
I am adapting the following script:-
import requests
import json
import eventlet
import os
response = requests.get('https://vulners.com/api/v3/search/stats/')
objects = json.loads(response.text)
object_names = set()
for name in objects['data']['type_results']:
object_names.add(name)
def download(name):
response = requests.get('https://vulners.com/api/v3/archive/collection/?type=' + name)
with open('vulners_collections/' + name + '.zip' , 'wb') as f:
f.write(response.content)
f.close()
return name + " - " + str(os.path.getsize('vulners_collections/' + name + '.zip'))
pool = eventlet.GreenPool()
for name in pool.imap(download, object_names):
print(name)
So far, I have got the values from ['data']['type_results'] into a SQLite DB, and some of these are marked with a '1' in the 'active' column. The first function then returns only the ones marked as active.
It is the download part I am having issues getting to work correctly.
you can also use the concept of global variable here.
import requests
import eventlet
import os
import sqlite3
#declare the global variable
active_vuln_type = None
#Get the active vulnerability sets
def GetActiveVulnSets() :
#make the variable global
global active_vuln_type
active_vuln_type = con = sqlite3.connect('data/vuln_sets.db')
cur = con.cursor()
cur.execute('''SELECT vulntype FROM vuln_sets WHERE active=1''')
active_vuln_type = cur.fetchall()
print(active_vuln_type)
return active_vuln_type
#Download the relevant collections
def Download(active_vuln_type = active_vuln_type) :
response = requests.get('https://vulners.com/api/v3/archive/collection/?type=' + active_vuln_type)
with open('vuln_files/' + active_vuln_type + '.zip' , 'wb') as f:
f.write(response.content)
f.close()
return active_vuln_type + " - " + str(os.path.getsize('vuln_files/' + active_vuln_type + '.zip'))
I think this is what your looking for:
active_vuln_type = download.GetActiveVulnSets()
download.Download(active_vuln_type)
from vuln_backend import vuln_sets, download, test
test.update_vuln_sets()
#vuln_sets.update_vuln_sets()
active_vuln_sets = download.GetActiveVulnSets()
download.Download(active_vuln_sets)
Do this
from vuln_backend import vuln_sets, download, test
test.update_vuln_sets()
#vuln_sets.update_vuln_sets()
active_vuln_type = download.GetActiveVulnSets()
download.Download(active_vuln_type)
You probably need to brush up on (or just learn) how functions work in Python (and most other languages). In that spirit, please don't just take this code and use it directly; try to understand it (especially if this is homework).
Specifically, you need to actually use the value that return gives, which is the result of the function:
my_active_vuln_type = download.GetActiveVulnSets()
download.Download(my_active_vuln_type)
or just
download.Download(download.GetActiveVulnSets())
However, it seems that download.GetActiveVulnSets() actually returns a list, so it seems like a loop is required:
active_vuln_type_list = download.GetActiveVulnSets()
for my_active_vuln_type in active_vuln_type_list:
download.Download(my_active_vuln_type)
However, you now have a similar problem: what do you want to do with the result of download.Download?
So really you probably want something like:
active_vuln_type_list = download.GetActiveVulnSets()
download_results = []
for my_active_vuln_type in active_vuln_type_list:
single_download_result = download.Download(my_active_vuln_type)
download_results.append(single_download_result)
Alternately, you can use a list comprehension:
active_vuln_type_list = download.GetActiveVulnSets()
download_results = [download.Download(mavt) for mavt in active_vuln_type_list]
Either way, you can use the list download_results, for something if you need to!...

Python: Creating an empty file object

I am attempting to make a logging module for Python that does not work because it fails on creation of the file object.
debug.py:
import os
import datetime
import globals
global fil
fil = None
def init(fname):
fil = open(fname, 'w+')
fil.write("# PyIDE Log for" + str(datetime.datetime.now()))
def log(strn):
currentTime = datetime.datetime.now()
fil.write(str(currentTime) + ' ' + str(os.getpid()) + ' ' + strn)
print str(currentTime) + ' ' + str(os.getpid()) + ' ' + strn
def halt():
fil.close()
fil will not work as None as I get an AttributeError. I also tried creating a dummy object:
fil = open("dummy.tmp","w+")
but the dummy.tmp file is written to instead, even though init() is called before log() is. Obviously you cannot open a new file over an already opened file. I attempted to close fil before init(), but Python said it could not perform write() on a closed file.
This is the code that is accessing debug.py
if os.path.exists(temp):
os.rename(temp, os.path.join("logs","archived","log-" + str(os.path.getctime(temp)) + ".txt"))
debug.init(globals.logPath)
debug.log("Logger initialized!")
I would like to have logging in my program and I cannot find a workaround for this.
Your problem is that you don't assign to the global fil:
def init(fname):
fil = open(fname, 'w+')
This creates a new local variable called fil.
If you want to assign to the global variable fil you need to bring it into the local scope:
def init(fname):
global fil
fil = open(fname, 'w+')
If you want to MAKE your own logging module, then you may want to turn what you already have into a class, so you can import it as a module.
#LoggerThingie.py
import os
import datetime
class LoggerThingie(object):
def __init__(self,fname):
self.fil = open(fname, 'w+')
self.fil.write("# PyIDE Log for" + str(datetime.datetime.now()))
def log(self,strn):
currentTime = datetime.datetime.now()
self.fil.write(str(currentTime) + ' ' + str(os.getpid()) + ' ' + strn)
print str(currentTime) + ' ' + str(os.getpid()) + ' ' + strn
def halt(self):
self.fil.close()
If you did this as a class, you would not have to keep track of globals in the first place (which is generally understood as bad practice in the world of programming: Why are global variables evil? )
Since it is now a module on its own, when you want to use it in another python program you would do this:
from LoggerThingie import LoggerThingie
#because module filename is LoggerThingie.py and ClassName is LoggerThingie
and then use it wherever you want, for example:
x = LoggerThingie('filename.txt') #create LoggerThingie object named x
and every-time you want to insert logs into it:
x.log('log this to the file')
and when you are finally done:
x.halt() # when ur done
If you don't want to start with an empty file you could use StringIO to keep the messages in memory and write them to disk at the end but be careful, if something happened and you didn't wrote the messages they will be lost.

Naming multiple files in python and scrapy

I'm trying to save files to a directory after scraping them from the web using scrapy. I'm extracting a date from the file and using that as the file name. The problem I'm running into, however, is that some files have the same date, i.e. there are two files that would take the name "June 2, 2009". So, what I'm looking to do is somehow check whether there is already a file with the same name, and if so, name it something like "June 2, 2009.1" or some such.
The code I'm using is as follows:
def parse_item(self, response):
self.log('Hi, this is an item page! %s' % response.url)
response = response.replace(body=response.body.replace('<br />', '\n'))
hxs = HtmlXPathSelector(response)
date = hxs.select("//div[#id='content']").extract()[0]
dateStrip = re.search(r"([A-Z]*|[A-z][a-z]+)\s\d*\d,\s[0-9]+", date)
newDate = dateStrip.group()
content = hxs.select("//div[#id='content']")
content = content.select('string()').extract()[0]
filename = ("/path/to/a/folder/ %s.txt") % (newDate)
with codecs.open(filename, 'w', encoding='utf-8') as output:
output.write(content)
You can use os.listdir to get a list of existing files and allocate a filename that will not cause conflict.
import os
def get_file_store_name(path, fname):
count = 0
for f in os.listdir(path):
if fname in f:
count += 1
return os.path.join(path, fname+str(count))
# This is example to use
print get_file_store_name(".", "README")+".txt"
The usual way to check for existence of a file in the C library is with a function called stat(). Python offers a thin wrapper around this function in the form of os.stat(). I suggest you use that.
http://docs.python.org/library/stat.html
def file_exists(fname):
try:
stat_info = os.stat(fname)
if os.S_ISREG(stat_info): # true for regular file
return True
except Exception:
pass
return False
one other solution is you can append time with date, for naming file like
from datetime import datetime
filename = ("/path/to/a/folder/ %s_%s.txt") % (newDate,datetime.now().strftime("%H%M%S"))
The other answer pointed me in the correct direction by checking into the os tools in python, but I think the way I found is perhaps more straightforward. Reference here How do I check whether a file exists using Python? for more.
The following is the code I came up with:
existence = os.path.isfile(filename)
if existence == False:
with codecs.open(filename, 'w', encoding='utf-8') as output:
output.write(content)
else:
newFilename = ("/path/.../.../- " + '%s' ".1.txt") % (newDate)
with codecs.open(newFilename, 'w', encoding='utf-8') as output:
output.write(content)
Edited to Add:
I didn't like this solution too much, and thought the other answer's solution was probably better but didn't quite work. The main part I didn't like about my solution was that it only worked with 2 files of the same name; if three or four files had the same name the initial problem would occur. The following is what I came up with:
filename = ("/Users/path/" + " " + "title " + '%s' + " " + "-1.txt") % (date)
filename = str(filename)
while True:
os.path.isfile(filename)
newName = filename.replace(".txt", "", filename)
newName = str.split(newName)
newName[-1] = str(int(newName[-1]) + 1)
filename = " ".join(newName) + ".txt"
if os.path.isfile(filename) == False:
with codecs.open(filename, 'w', encoding='utf-8') as output:
output.write(texts)
break
It probably isn't the most elegant and might be kind of a hackish approach, but it has worked so far and seems to have addressed my problem.

Categories