New to Python/Boto3 so this is a little confusing. I am trying to get AWS Security Hub findings written to a csv using csv.writer but only certain items in the response. I can get the correct columns and rows written to csv however when I try to loop through the writer it just repeats the same row, not the other data from the response. I feel like I'm overlooking something simple, any help is appreciated.
def getSecurityHubFindings():
hub = boto3.client('securityhub')
findingsList = []
for key in paginate(hub.get_findings, Filters=filters, PaginationConfig={'MaxItems': MAX_ITEMS}):
scantype = key['Types']
str1 = ''.join(scantype)
port=key['ProductFields']['attributes:2/value']
vgw=key['ProductFields']['attributes:3/value']
scantype = key['Types']
str1 = ''.join(scantype)
findingAccountId = key['AwsAccountId']
findingLastObservedAt=key['LastObservedAt']
findingFirstObservedAt=key['FirstObservedAt']
findingCreatedAt=key['CreatedAt']
findingrecommendation=key['Remediation']['Recommendation']
findingTypes=key['Types']
InstanceId=key['Resources'][0]['Id']
findingInstanceId=str(InstanceId)
findingAppCode=key['Resources'][0]['Tags']['AppCode']
findingGeneratorId=key['GeneratorId']
findingProductArn=key['ProductArn']
findingTitle=key['Title']
findingsList.append(key)
if (str1 == 'Software and Configuration Checks/AWS Security Best Practices/Network Reachability - Recognized port reachable from a Peered VPC'):
vgw=''
port=key['ProductFields'][ 'attributes:4/value']
peeredvpc= key['ProductFields']['attributes:2/value']
if (str1 == 'Software and Configuration Checks/AWS Security Best Practices/Network Reachability - Recognized port reachable from a Virtual Private Gateway'):
peeredvpc=''
sev = key['Severity']['Product']
if (sev == 3):
findingSeverity='LOW'
elif (sev == 6):
findingSeverity='MEDIUM'
elif ( sev == 9):
findingSeverity='HIGH'
rows = [findingAccountId, findingGeneratorId, findingTitle,findingProductArn,findingSeverity,findingAppCode,findingFirstObservedAt,findingLastObservedAt,findingCreatedAt,findingrecommendation,findingTypes,port,vgw,peeredvpc,findingInstanceId]
columns = ('Account ID', 'Generator ID', 'Title', 'Product ARN', 'Severity', 'AppCode', 'First Observed At','Last Observed At', 'Created At', 'Recommendation', 'Types', 'Port', 'VGW', 'Peered VPC', 'Instance #ID')
with open(FILE_NAME, mode='w', newline='',) as writefile:
writefile_writer = csv.writer(writefile, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
writefile_writer.writerow(columns)
i = 0
while i < MAX_ITEMS:
writefile_writer.writerow(rows)
i +=1
return(findingsList)
The general flow should be:
def getSecurityHubFindings():
...
# Open output file and write header
columns = ('Account ID', 'Generator ID', 'Title', 'Product ARN', 'Severity', 'AppCode', 'First Observed At','Last Observed At', 'Created At', 'Recommendation', 'Types', 'Port', 'VGW', 'Peered VPC', 'Instance #ID')
with open(FILE_NAME, mode='w', newline='',) as writefile:
writefile_writer = csv.writer(writefile, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
writefile_writer.writerow(columns)
## Loop through response
for key in paginate(...):
...
(get data here)
...
# Write output
row = [findingAccountId, findingGeneratorId, findingTitle,findingProductArn,findingSeverity,findingAppCode,findingFirstObservedAt,findingLastObservedAt,findingCreatedAt,findingrecommendation,findingTypes,port,vgw,peeredvpc,findingInstanceId]
writefile_writer.writerow(row)
You are opening your file within the for loop every time with the 'w' option which truncates the file [1] and writes from the beginning, so you're overwriting your csv each time.
The block
while i < MAX_ITEMS:
writefile_writer.writerow(rows)
i +=1
also seem wrong, this just writes the same row (even though its called rows) MAX_ITEMS number of times. You probably want to open your csv file and write the header names outside of the for loop, and then write a single row for each iteration of the for loop.
Related
I am doing a university assignment and since we are excluded from using any web scrapping libraries, I am limited to regex, I have the current code written (excuse the poor formatting, I am still very new):
def print_ticket():
if event.get() == 1:
web_page = urlopen(url1)
html_code = web_page.read().decode("UTF-8")
web_page.close()
event_title = findall('<h6.*>(.+)</h6>', html_code)[0]
event_image = findall('<img.* src="([^"]+)".*>', html_code)[4]
event_url = 'https://suncorpstadium.com.au/what-s-on.aspx'
event_details = findall('<h7.*>(.+)</h7>', html_code)[1]
filename = event_title.replace(' ', '_') + '_Ticket.html'
html_file = open(filename, 'w')
html_file.write(ticket_template.replace('EVENT TITLE', event_title + ' Ticket'))
html_file.write(ticket_template.replace('IMAGE', event_image))
html_file.write(ticket_template.replace('DATE TIME', event_details))
My issue is, everytime I run that that event in my GUI, my web document prints 3 different sets of my template with the .write replaces occurring on one per section.
Is there a way to make multiple .replaces at once without it printing multiple copies of my template?
The problem is that you are calling write 3 times and you need to call it just once. So what you could do:
ticket_template = ticket_template.replace('EVENT TITLE', event_title + ' Ticket')
ticket_template = ticket_template.replace('IMAGE', event_image)
ticket_template = ticket_template.replace('DATE TIME', event_details)
html_file.write(ticket_template)
in that way it will work, and you will only have the final output of the ticket_template. Also you can reduce this to a one-liner but it won't look legible
html_file.write(ticket_template.replace('EVENT TITLE', event_title + ' Ticket').replace('IMAGE', event_image).replace('DATE TIME', event_details))
You can do it using an "f-string" or Formatted string literal which was introduced in Python 3.6. To control its evaluation, it must be specified as the result returned from a lambda function as shown in the sample code below.
Note that the variable names used do not have to be ALL_CAPS as shown — I only did it that way to make it easier to spot where they're being used.
ticket_template = lambda: f'''\
Congratulations! Your ticket to {EVENT_TITLE} has been booked!
{IMAGE}
{DATE} {TIME}
'''
filename = 'whatever.html'
with open(filename, 'w') as html_file:
EVENT_TITLE = 'Some event title'
IMAGE = 'Picture of event'
DATE, TIME = '29/05', '4:00 PM'
filled_in_ticket = ticket_template() # *Call* the lambda function.
html_file.write(filled_in_ticket)
print('fini')
I was playing around with the code provided here: https://www.geeksforgeeks.org/update-column-value-of-csv-in-python/ and couldn't seem to figure out how to change the value in a specific column of the row without it bringing up an error.
Say I wanted to change the status of the row belonging to the name Molly Singh, how would I go about it? I've tried the following below only to get an error and the CSV file turning out empty. I'd also prefer the solution be without the use of pandas tysm.
For example the row in the csv file will originally be
Sno Registration Number Name RollNo Status
1 11913907 Molly Singh RK19TSA01 P
What I want the outcome to be
Sno Registration Number Name RollNo Status
1 11913907 Molly Singh RK19TSA01 N
One more question if I were to alter the value in column snow by doing addition/substraction etc how would I go about that as well? Thanks!
the error I get as you can see, the name column is changed to true then false etc
import csv
op = open("AllDetails.csv", "r")
dt = csv.DictReader(op)
print(dt)
up_dt = []
for r in dt:
print(r)
row = {'Sno': r['Sno'],
'Registration Number': r['Registration Number'],
'Name'== "Molly Singh": r['Name'],
'RollNo': r['RollNo'],
'Status': 'P'}
up_dt.append(row)
print(up_dt)
op.close()
op = open("AllDetails.csv", "w", newline='')
headers = ['Sno', 'Registration Number', 'Name', 'RollNo', 'Status']
data = csv.DictWriter(op, delimiter=',', fieldnames=headers)
data.writerow(dict((heads, heads) for heads in headers))
data.writerows(up_dt)
op.close()
Issues
Your error is because the field name in the input file is misspelled as Regristation rather than Registration
Correction is to just read the names from the input file and propagate to the output file as below.
Alternately, you can your code to:
headers = ['Sno', 'Regristation Number', 'Name', 'RollNo', 'Status']
"One more question if I were to alter the value in column snow by doing addition/substraction etc how would I go about that as well"
I'm not sure what is meant by this. In the code below you would just have:
r['Sno'] = (some compute value)
Code
import csv
with open("AllDetails.csv", "r") as op:
dt = csv.DictReader(op)
headers = None
up_dt = []
for r in dt:
# get header of input file
if headers is None:
headers = r
# Change status of 'Molly Singh' record
if r['Name'] == 'Molly Singh':
r['Status'] = 'N'
up_dt.append(r)
with open("AllDetails.csv", "w", newline='') as op:
# Use headers from input file above
data = csv.DictWriter(op, delimiter=',', fieldnames=headers)
data.writerow(dict((heads, heads) for heads in headers))
data.writerows(up_dt)
Im new to Python and I have been given a task for the company's HRIS. So I have written a code from a raw .csv file that will be re-written by filtering all other data and make sure that the first instance of IN and OUT of a person will be listed. How can I insert or append the time-in of a person on a list on the very same row?
employeeInfo2 = {'Name': employeeName, 'Date': employeeDate, 'Status': employeeStatus}
if employeeInfo2 not in employeeProfile:
employeeProfile.append(employeeInfo2)
when i tried putting this line of code just below the above code, the time is displayed as a new line in the csv file or is written on a new row.
employeeProfile.append({'Time': employeeTime})
import csv
import string
import datetime
from dateutil.parser import parse
from collections import OrderedDict
employeeProfile = []
with open('newDTR.csv', 'r') as csv_file:
employee = csv.DictReader(csv_file, delimiter=",")
for employeeList in employee:
stringDate = employeeList['Date/Time']
employeeName = employeeList['Name']
employeeStatus = employeeList['Status']
dateTimeObject = datetime.datetime.strptime(stringDate, '%d/%m/%Y %I:%M:%S %p')
employeeDate = dateTimeObject.strftime("%d/%m/%Y")
employeeTime = dateTimeObject.strftime("%H:%M:%S")
parsedTimeOut = parse(employeeTime)
expected = parsedTimeOut + datetime.timedelta(hours=9)
timeOut = expected.time()
employeeInfo2 = {'Name': employeeName, 'Date': employeeDate, 'Status': employeeStatus}
if employeeInfo2 not in employeeProfile:
employeeProfile.append(employeeInfo2)
with open('fixedDTR.csv', mode='w', newline='', encoding='utf8') as new_csv_file:
fieldnames = ['Name', 'Date', 'Status', 'Time', 'Expected Time Out']
csv_writer = csv.DictWriter(new_csv_file, fieldnames=fieldnames)
csv_writer.writeheader()
for b in employeeProfile:
print(b)
csv_writer.writerow(b)
I was expecting that employeeTime will be aligned to each line of data but is not. Probably because the employeeProfile.append({'Time': employeeTime}) is on a new line. What should be the best approach?
Well, looking at your code, there isnt and insert for Time as you are writing b from employeeProfile.
Simply put when you use csv_write.writerow(b), this will automatically go down 1 row in the csv file. You could append your time key to the dict stored in employeeprofile.
employeeInfo2 = {'Name': employeeName, 'Date': employeeDate, 'Status': employeeStatus}
if employeeInfo2 not in employeeProfile:
employeeInfo2["Time"] = employeeTime # or whatever you wanted in the field
employeeProfile.append(employeeInfo2)
This would add the Time column to your dict, which would then be written nicely by csv_writer.writerow.
Based on your output I am guessing you are writing the time after like this:
csv_writer.writerow(b)
csv_writer.writerow(times)
where times is your dict for the Times. which causes the offset since writerow adds a newline to each line of your csv.
I have a CSV containing ~600k partnumbers to be uploaded to my website's inventory. However, this CSV only contains limited information. We're missing pricing and other related information. To get this information I'm required to make requests to the provider's API, and add it into the CSV. At the moment, I've been splitting this part-file into 6 pieces and running the script on each of these files simultaneously. If I run one script it will take hours. Whereas if I split it up, it will go considerably faster.
The Process:
Read Partnumber from CSV
Make request
If errors, continue, and notate error
If inventory, write to inventory.csv with ID and warehouse info
Place part info into results.csv
Onto the next one
I was thinking that I could assign each item a unique ID, have the script request that information, go back into the original csv and finally place the information back into the original document.
How can I utilize the full potential of the system I'm running this script on?
Here's what I've got so far:
import csv
import zeep
wsdl = '#####'
client = zeep.Client(wsdl=wsdl)
def get_data():
with open('partfile.csv')as f:
parts = csv.reader(f, delimiter='|')
with open('results.csv' , 'w+') as outfile:
with open('inventory.csv', 'w+') as inventoryfile:
output = csv.writer(outfile, delimiter=',')
inventoryoutput = csv.writer(inventoryfile, delimiter=',')
inventoryoutput.writerow([
'ID',
'WarehouseNumber',
'WarehouseName',
'QuantityAvailable'
])
# Header Row
output.writerow([
'ID',
'Make',
'Part Number',
'Price',
'Dealer Price',
'Retail Price',
'List Price',
'Core Cost',
'Part Description',
'Is Discontinued',
'Is Dropship Only',
'Is Refrigerant',
'Is Oversize',
'Is Hazmat',
'Sub Parts',
'Cross Reference Parts',
'Log',
'Total Inventory'
])
itemId = 0
for row in parts:
try:
item = client.service.ExactPartLookup('#####', '#####', row[0], row[1])
if (item == None):
raise Exception('Item is None')
except:
write_error(row[1])
continue
item = item.PartInformation_v2[0]
totalInventory = 0
data = [
itemId,
item.Make,
item.PartNumber,
item.Price,
item.Dealer,
item.Retail,
item.List,
item.CoreCost,
item.PartDescription,
item.IsDiscontinued,
item.IsDropShipOnly,
item.IsRefrigerant,
item.IsOversize,
item.IsHazmat,
item.SubParts,
item.CrossReferenceParts,
item.Log
]
print(item.PartNumber)
if (item.Inventory != None):
inventory = item.Inventory.InventoryInformation_v2
iterator = 0
for i in inventory:
inventoryoutput.writerow([
itemId,
inventory[iterator].WarehouseNumber,
inventory[iterator].WarehouseName,
inventory[iterator].QuantityAvailable
])
totalInventory += inventory[iterator].QuantityAvailable
iterator += 1
data.append(totalInventory)
itemId += 1
output.writerow(data)
def write_error( partNumber ):
with open("errors.log", "a+") as errorfile:
errorfile.write("Error! Part Number: " + partNumber + "\n")
get_data()
Please let me know if there is anymore information I could provide.
Thank you!
The csv file works fine. So does the dictionary but I can't seem to check the values in the csv file to make sure I'm not adding duplicate entries. How can I check this? The code I tried is below:
def write_csv():
csvfile = csv.writer(open("address.csv", "a"))
check = csv.reader(open("address.csv"))
for item in address2:
csvfile.writerow([address2[items]['address']['value'],address2[items]['address']['count'],items, datetime.datetime.now()])
def check_csv():
check = csv.reader(open("address.csv"))
csvfile = csv.writer(open("address.csv", "a"))
for stuff in address2:
address = address2[str(stuff)]['address']['value']
for sub in check:
if sub[0] == address:
print "equals"
try:
address2[stuff]['delete'] = True
except:
address2[stuff]['delete'] = True
else:
csvfile.writerow([address2[stuff]['address']['value'], address2[stuff]['address']['count'], stuff, datetime.datetime.now()])
Any ideas?
Your CSV and dict structures are a little wonky - I'd love to know if that is set or if you can change them to be more useful. Here is an example that does basically what you want -- you'll have to change some things to fit your format. The most important change is probably not writing to a file that you are reading - that is going to lead to headaches.
This does what you asked with the delete flag -- is there an external need for this? If not there is almost certainly a better way (removing the bad rows, saving the good rows somewhere else, etc - depends on what you are doing).
Anyway, here is the example. I used just the commented block to create the csv file in the first place, then added the new address to the list and ran the rest. Instead of looping through the file over and over it makes a lookup dict by address and stores the row number, which it then uses to update the delete flag if it is found when it reads the csv file. You'll want to take the prints out and uncomment the last line to actually write the new rows.
import csv, datetime
addresses = [
{'address': {'value': '123 road', 'count': 1}, 'delete': False},
{'address': {'value': '456 road', 'count': 1}, 'delete': False},
{'address': {'value': '789 road', 'count': 1}, 'delete': False},
{'address': {'value': '1 new road', 'count': 1}, 'delete': False},
]
now = datetime.datetime.now()
### create the csv
##with open('address.csv', 'wb') as csv_file:
## writer = csv.writer(csv_file)
## for row in addresses:
## writer.writerow([ row['address']['value'], row['address']['count'], now.strftime('%Y-%m-%d %H:%M:%S') ])
# make lookup keys for the dict
address_lookup = {}
for i in range(len(addresses)):
address_row = addresses[i]
address_lookup[address_row['address']['value']] = i
# read csv once
with open('address.csv', 'rb') as csv_file:
reader = csv.reader(csv_file)
for row in reader:
print row
# if address is found in the dict, set delete flag to true
if row[0] in address_lookup:
print 'flagging address as old: %s' % row[0]
addresses[ address_lookup[row[0]] ]['delete'] = True
with open('address.csv', 'ab') as csv_file:
# go back through addresses and add any that shouldnt be deleted to the csv
writer = csv.writer(csv_file)
for address_row in addresses:
if address_row['delete'] is False:
print 'adding row: '
print address_row
#writer.writerow([ row['address']['value'], row['address']['count'], now.strftime('%Y-%m-%d %H:%M:%S') ])