Exporting results to CSV file - python

I'm pretty new to coding and I'm stuck on this problem. Written in python.
import logging
import os
import sys
import json
import pymysql
import requests
import csv
## set up logger to pass information to Cloudwatch ##
#logger = logging.getLogger()
#logger.setLevel(logging.INFO)
## define RDS variables ##
rds_host = 'host'
db_username = 'username'
db_password = 'password'
db_name = 'name'
## connect to rds database ##
try:
conn = pymysql.connect(host=rds_host, user=db_username, password=db_password, db=db_name, port=1234,
connect_timeout=10)
except Exception as e:
print("ERROR: Could not connect to MySql instance.")
print(e)
sys.exit()
print("SUCCESS: Connection to RDS mysql instance succeeded")
def main():
with conn.cursor() as cur:
cur.execute("SELECT Domain FROM domain_reg")
domains = cur.fetchall()
# logger.info(domains)
conn.close()
new_domains = []
for x in domains:
a = "http://" + x[0] + ("/orange/health")
new_domains.append(a)
print(new_domains)
for y in new_domains:
try:
response = requests.get(y)
if response.status_code == 200:
print("Domain " + y + " exists")
else:
print("Domain " + y + " does not exist; Status code = " + str(response.status_code))
except Exception as e:
print("Exception: With domain " + y)
with open("new_orangeZ.csv", "w", newline='') as csv_file:
writer = csv.writer(csv_file, delimiter=',')
for line in new_domains:
writer.writerow([new_domains])
if __name__ == "__main__":
main()
This code does create a CSV file, but it's not exactly exporting what I want it to export. It only creates a csv file listing only the "Y" and I understand that because i'm calling "new_domains" in writer.writerow. I'm trying to figure out how to also export the print function that matches with the if else statement into the csv, if that makes sense. Sorry if this may sounds gibberish, like I said, I'm super new to coding. Was hoping to post a picture of what I get in the csv file vs what I wanted but I'm new to stackoverflow also so it doesn't allow me to post pictures haha.
Thank you!!!

print() only displays the strings on the screen.
You need to remember them somewhere, like in a new list:
result=[] #somewhere at the beginning
...
print("Domain " + y + " exists")
result.append([y,"Domain " + y + " exists"]) #after each print
and save both in the CSV file with something like:
for domain,status in new_domains:
writer.writerow([domain, status])
It's easier to save the domains again, as the for / in may not keep their order.
By the way, with "for line in new_domains:" I guess you should have written "line" in the CSV insead of "new_domains"...

Question: m trying to figure out how to also export the print function that matches with the if else statement into the csv
If you want to print into a file, you have to give the file object to print(..., file=<my file object>. In your example, move the for ... inside of with ....
Note: It's no good Idea to use csv.writer(... for non csv data"
with open("test", "w", newline='') as my_file_object:
for y in new_domains:
From Python Documentation - Built-in Functions
print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False)
Print objects to the text stream file, separated by sep and followed by end.
sep, end, file and flush, if present, must be given as keyword arguments.
The file argument must be an object with a write(string) method; if it is not present or None, sys.stdout will be used.

Related

Write output data to csv

I'm writing a short piece of code in python to check the status code of a list of URLS. The steps are
1. read the URL's from a csv file.
2. Check request code
3. Write the status code request into the csv next to the checked URL
The first two steps I've managed to do but I'm stuck with writing the output of the requests into the same csv, next to the urls. Please help.
import urllib.request
import urllib.error
from multiprocessing import Pool
file = open('innovators.csv', 'r', encoding="ISO-8859-1")
urls = file.readlines()
def checkurl(url):
try:
conn = urllib.request.urlopen(url)
except urllib.error.HTTPError as e:
print('HTTPError: {}'.format(e.code) + ', ' + url)
except urllib.error.URLError as e:
print('URLError: {}'.format(e.reason) + ', ' + url)
else:
print('200' + ', ' + url)
if __name__ == "__main__":
p = Pool(processes=1)
result = p.map(checkurl, urls)
with open('innovators.csv', 'w') as f:
for line in file:
url = ''.join(line)
checkurl(urls + "," + checkurl)
The .readlines() operation leaves the file object at the end of file. When you attempt to loop through the lines of file again, without first rewinding it (file.seek(0)) or closing and opening it again (file.close() followed by opening again), there are no lines remaining. Always recommended to use with open(...) as file construct to ensure file is closed when operation is finished.
Additionally, there appears to be an error in your input to checkurl. You have added a list (urls) to a string (",") to a function (checkurl).
You probably meant for this section to read
with open('innovators.csv', 'w') as f:
for line in urls:
url = ''.join(line.replace('\n','')) # readlines leaves linefeed character at end of line
f.write(url + "," + checkurl(url))
The checkurl function should return what you are intending to place into the csv file. You are simply printing to standard output (screen). Thus, replace your checkurl with
def checkurl(url):
try:
conn = urllib.request.urlopen(url)
ret='0'
except urllib.error.HTTPError as e:
ret='HTTPError: {}'.format(e.code)
except urllib.error.URLError as e:
ret='URLError: {}'.format(e.reason)
else:
ret='200'
return ret
or something equivalent to your needs.
Save the status in a dict. and convert it to dataframe. Then simply send it to a csv file. str(code.getcode()) will return 200 if the url is connecting else it will return an exception, for which i assigned status as '000'. So your csv file will contain url,200 if URL is connecting and url,000 if URL is not connecting.
status_dict={}
for line in lines:
try:
code = urllib.request.urlopen(line)
status = str(code.getcode())
status_dict[line] = status
except:
status = "000"
status_dict[line] = status
df = pd.Dataframe(status_dict)
df.to_csv('filename.csv')

SQL query returns blank output when running inside Python script

I have a python script that is supposed to loop through a text file and gather the domain as an argument from each line in the text file. Then it is supposed to use the domain as an argument in a SQL query. The issue is when I'm passing in the domain_name as an argument the JSON output the script produces is blank. If I set the domain_name argument in my sql query directly inside the query then the script outputs perfect JSON format. As you can see in the top of my script right below def connect_to_db() I start to loop through the text file. I'm not sure where in my code the error is occurring by any assistance would be greatly appreciated!
Code
from __future__ import print_function
try:
import psycopg2
except ImportError:
raise ImportError('\n\033[33mpsycopg2 library missing. pip install psycopg2\033[1;m\n')
sys.exit(1)
import re
import sys
import json
import pprint
DB_HOST = 'crt.sh'
DB_NAME = 'certwatch'
DB_USER = 'guest'
def connect_to_db():
filepath = 'test.txt'
with open(filepath) as fp:
for cnt, domain_name in enumerate(fp):
print("Line {}: {}".format(cnt, domain_name))
print(domain_name)
domain_name = domain_name.rstrip()
conn = psycopg2.connect("dbname={0} user={1} host={2}".format(DB_NAME, DB_USER, DB_HOST))
cursor = conn.cursor()
cursor.execute(
"SELECT c.id, x509_commonName(c.certificate), x509_issuerName(c.certificate) FROM certificate c, certificate_identity ci WHERE c.id = ci.certificate_id AND ci.name_type = 'dNSName' AND lower(ci.name_value) = lower('%s') AND x509_notAfter(c.certificate) > statement_timestamp();".format(
domain_name))
unique_domains = cursor.fetchall()
# print out the records using pretty print
# note that the NAMES of the columns are not shown, instead just indexes.
# for most people this isn't very useful so we'll show you how to return
# columns as a dictionary (hash) in the next example.
pprint.pprint(unique_domains)
outfilepath = domain_name + ".json"
with open(outfilepath, 'a') as outfile:
outfile.write(json.dumps(unique_domains, sort_keys=True, indent=4))
if __name__ == "__main__":
connect_to_db()
Don't use format to create your SQL statement. Use ? placeholders and then a tuple of the values to insert:
c.execute('''SELECT c.id, x509_commonName(c.certificate),
x509_issuerName(c.certificate) FROM certificate c, certificate_identity ci WHERE
c.id= ci.certificate_id AND ci.name_type = 'dNSName' AND lower(ci.name_value) =
lower(?) AND x509_notAfter(c.certificate) > statement_timestamp()''',(domain_name,))
More generically:
c.execute('''SELECT columnX FROM tableA where columnY = ? AND columnZ =?'''
(desired_columnY_value,desired_columnZ_value))

Python script doesn't work when passing in a argument from a text file

I have a python script that is supposed to loop through a text file and gather the domain as an argument from each line in the text file. Then it is supposed to use the domain as an argument in a SQL query. The issue is when I'm passing in the domain as an argument the JSON output the script produces is blank. If I set the domain_name argument in my sql query directly inside the query then the script outputs perfect JSON format. As you can see in the top of my script right below def connect_to_db() I start to loop through the text file. I'm not sure where in my code the error is occurring by any assistance would be greatly appreciated!
Code
from __future__ import print_function
try:
import psycopg2
except ImportError:
raise ImportError('\n\033[33mpsycopg2 library missing. pip install psycopg2\033[1;m\n')
sys.exit(1)
import re
import sys
import json
import pprint
DB_HOST = 'crt.sh'
DB_NAME = 'certwatch'
DB_USER = 'guest'
def connect_to_db():
filepath = 'test.txt'
with open(filepath) as fp:
for cnt, domain_name in enumerate(fp):
print("Line {}: {}".format(cnt, domain_name))
print(domain_name)
domain_name = domain_name.rstrip()
conn = psycopg2.connect("dbname={0} user={1} host={2}".format(DB_NAME, DB_USER, DB_HOST))
cursor = conn.cursor()
cursor.execute(
"SELECT c.id, x509_commonName(c.certificate), x509_issuerName(c.certificate) FROM certificate c, certificate_identity ci WHERE c.id = ci.certificate_id AND ci.name_type = 'dNSName' AND lower(ci.name_value) = lower('**domain_name**') AND x509_notAfter(c.certificate) > statement_timestamp();")
unique_domains = cursor.fetchall()
# print out the records using pretty print
# note that the NAMES of the columns are not shown, instead just indexes.
# for most people this isn't very useful so we'll show you how to return
# columns as a dictionary (hash) in the next example.
pprint.pprint(unique_domains)
outfilepath = domain_name + ".json"
with open(outfilepath, 'a') as outfile:
outfile.write(json.dumps(unique_domains, sort_keys=True, indent=4))
if __name__ == "__main__":
# filepath = 'test.txt'
# with open(filepath) as fp:
# for cnt, domain_name in enumerate(fp):
# print("Line {}: {}".format(cnt, domain_name))
# print(domain_name)
# domain_name = domain_name.rstrip()
connect_to_db()

Python loop overwrites previous text written to json file [duplicate]

This question already has answers here:
Difference between modes a, a+, w, w+, and r+ in built-in open function?
(9 answers)
Closed last month.
I have a python script that performs an sql query and writes the output of the query to a .json file. However, every time it writes to the json file for me it overwrites the previously written text. I want each sql query to be written to a new and separate .json. Below is my code that is not working. Any help would be greatly appreciated!
from __future__ import print_function
try:
import psycopg2
except ImportError:
raise ImportError('\n\033[33mpsycopg2 library missing. pip install psycopg2\033[1;m\n')
sys.exit(1)
import re
import sys
import json
DB_HOST = 'crt.sh'
DB_NAME = 'certwatch'
DB_USER = 'guest'
OUTPUT_DIR="output/"
def connect_to_db(domain_name):
try:
conn = psycopg2.connect("dbname={0} user={1} host={2}".format(DB_NAME, DB_USER, DB_HOST))
cursor = conn.cursor()
cursor.execute("SELECT ci.NAME_VALUE NAME_VALUE FROM certificate_identity ci WHERE ci.NAME_TYPE = 'dNSName' AND reverse(lower(ci.NAME_VALUE)) LIKE reverse(lower('%{}'));".format(domain_name))
except:
print("\n\033[1;31m[!] Unable to connect to the database\n\033[1;m")
return cursor
def get_unique_emails(cursor, domain_name):
unique_emails = []
for result in cursor.fetchall():
matches=re.findall(r"\'(.+?)\'",str(result))
for email in matches:
if email not in unique_emails:
if "{}".format(domain_name) in email:
unique_emails.append(email)
return unique_emails
def print_unique_emails(unique_emails):
print("\033[1;32m[+] Total unique emails found: {}\033[1;m".format(len(unique_emails)))
for unique_email in sorted(unique_emails):
print(unique_email)
if __name__ == '__main__':
filepath = 'test.txt'
with open(filepath) as fp:
for cnt, domain_name in enumerate(fp):
print("Line {}: {}".format(cnt, domain_name))
print(domain_name)
domain_name = domain_name.rstrip()
cursor = connect_to_db(domain_name)
unique_emails = get_unique_emails(cursor, domain_name)
print_unique_emails(unique_emails)
outfilepath = OUTPUT_DIR + unique_emails + ".json"
with open(outfilepath, 'w') as outfile:
outfile.write(json.dumps(unique_emails, sort_keys=True, indent=4))
with open(outfilepath, 'w') as outfile:
outfile.write(json.dumps(unique_emails, sort_keys=True, indent=4))
You are currently opening the file to write. You want to append to the file. You can do this by changing w to a
with open(outfilepath, 'a') as outfile:
outfile.write(json.dumps(unique_emails, sort_keys=True, indent=4))
You can read the documentation on open() here.
I think it's because your not looping when your writing the json file, you have a single write, so it just writes to the one file. So you need to do something like what you did when you ... enumerate(fp):. Make another for loop, looping over each domain, and change your OUTPUT_DIR + unique_emails + ".json" to be OUTPUT_DIR + domain_name + ".json".

Python3 convert result of DB query to individual strings

I have 2 functions in a python script.
The first one gets the data from a database with a WHERE clause but the second function uses this data and iterates through the results to download a file.
I can get to print the results as a tuple?
[('mmpc',), ('vmware',), ('centos',), ('redhat',), ('postgresql',), ('drupal',)]
But I need to to iterate through each element as a string so the download function can append it onto the url for the response variable
Here is the code for the download script which contains the functions:-
import requests
import eventlet
import os
import sqlite3
# declare the global variable
active_vuln_type = None
# Get the active vulnerability sets
def GetActiveVulnSets() :
# make the variable global
global active_vuln_type
active_vuln_type = con = sqlite3.connect('data/vuln_sets.db')
cur = con.cursor()
cur.execute('''SELECT vulntype FROM vuln_sets WHERE active=1''')
active_vuln_type = cur.fetchall()
print(active_vuln_type)
return(active_vuln_type)
# return str(active_vuln_type)
def ExportList():
vulnlist = list(active_vuln_type)
activevulnlist = ""
for i in vulnlist:
activevulnlist = str(i)
basepath = os.path.dirname(__file__)
filepath = os.path.abspath(os.path.join(basepath, ".."))
response = requests.get('https://vulners.com/api/v3/archive/collection/?type=' + activevulnlist)
with open(filepath + '/vuln_files/' + activevulnlist + '.zip', 'wb') as f:
f.write(response.content)
f.close()
return activevulnlist + " - " + str(os.path.getsize(filepath + '/vuln_files/' + activevulnlist + '.zip'))
Currently it creates a corrupt .zip as ('mmpc',).zip so it is not the actual file which would be mmpc.zip for the first one but it does not seem to be iterating through the list either as it only creates the zip file for the first result from the DB, not any of the others but a print(i) returns [('mmpc',), ('vmware',), ('centos',), ('redhat',), ('postgresql',), ('drupal',)]
There is no traceback as the script thinks it is working.
The following fixes two issues: 1. converting the query output to an iterable of strings and 2. replacing the return statement with a print function so that the for-loop does not end prematurely.
I have also taken the liberty of removing some redundancies such as closing a file inside a with statement and pointlessly converting a list into a list. I am also calling the GetActiveVulnSets inside the ExportList function. This should eliminate the need to call GetActiveVulnSets outside of function definitions.
import requests
import eventlet
import os
import sqlite3
# declare the global variable
active_vuln_type = None
# Get the active vulnerability sets
def GetActiveVulnSets() :
# make the variable global
global active_vuln_type
active_vuln_type = con = sqlite3.connect('data/vuln_sets.db')
cur = con.cursor()
cur.execute('''SELECT vulntype FROM vuln_sets WHERE active=1''')
active_vuln_type = [x[0] for x in cur]
print(active_vuln_type)
return(active_vuln_type)
# return str(active_vuln_type)
def ExportList():
GetActiveVulnSets()
activevulnlist = ""
for i in active_vuln_type:
activevulnlist = str(i)
basepath = os.path.dirname(__file__)
filepath = os.path.abspath(os.path.join(basepath, ".."))
response = requests.get('https://vulners.com/api/v3/archive/collection/?type=' + activevulnlist)
with open(filepath + '/vuln_files/' + activevulnlist + '.zip', 'wb') as f:
f.write(response.content)
print(activevulnlist + " - " + str(os.path.getsize(filepath + '/vuln_files/' + activevulnlist + '.zip')))
While this may solve the problem you are encountering, I would recommend that you write functions with parameters. This way, you know what each function is supposed to take in as an argument and what it spits out as output. In essence, avoid the usage of global variables if you can. They are hard to debug and quite frankly unnecessary in many use cases.
I hope this helps.

Categories