The csv file works fine. So does the dictionary but I can't seem to check the values in the csv file to make sure I'm not adding duplicate entries. How can I check this? The code I tried is below:
def write_csv():
csvfile = csv.writer(open("address.csv", "a"))
check = csv.reader(open("address.csv"))
for item in address2:
csvfile.writerow([address2[items]['address']['value'],address2[items]['address']['count'],items, datetime.datetime.now()])
def check_csv():
check = csv.reader(open("address.csv"))
csvfile = csv.writer(open("address.csv", "a"))
for stuff in address2:
address = address2[str(stuff)]['address']['value']
for sub in check:
if sub[0] == address:
print "equals"
try:
address2[stuff]['delete'] = True
except:
address2[stuff]['delete'] = True
else:
csvfile.writerow([address2[stuff]['address']['value'], address2[stuff]['address']['count'], stuff, datetime.datetime.now()])
Any ideas?
Your CSV and dict structures are a little wonky - I'd love to know if that is set or if you can change them to be more useful. Here is an example that does basically what you want -- you'll have to change some things to fit your format. The most important change is probably not writing to a file that you are reading - that is going to lead to headaches.
This does what you asked with the delete flag -- is there an external need for this? If not there is almost certainly a better way (removing the bad rows, saving the good rows somewhere else, etc - depends on what you are doing).
Anyway, here is the example. I used just the commented block to create the csv file in the first place, then added the new address to the list and ran the rest. Instead of looping through the file over and over it makes a lookup dict by address and stores the row number, which it then uses to update the delete flag if it is found when it reads the csv file. You'll want to take the prints out and uncomment the last line to actually write the new rows.
import csv, datetime
addresses = [
{'address': {'value': '123 road', 'count': 1}, 'delete': False},
{'address': {'value': '456 road', 'count': 1}, 'delete': False},
{'address': {'value': '789 road', 'count': 1}, 'delete': False},
{'address': {'value': '1 new road', 'count': 1}, 'delete': False},
]
now = datetime.datetime.now()
### create the csv
##with open('address.csv', 'wb') as csv_file:
## writer = csv.writer(csv_file)
## for row in addresses:
## writer.writerow([ row['address']['value'], row['address']['count'], now.strftime('%Y-%m-%d %H:%M:%S') ])
# make lookup keys for the dict
address_lookup = {}
for i in range(len(addresses)):
address_row = addresses[i]
address_lookup[address_row['address']['value']] = i
# read csv once
with open('address.csv', 'rb') as csv_file:
reader = csv.reader(csv_file)
for row in reader:
print row
# if address is found in the dict, set delete flag to true
if row[0] in address_lookup:
print 'flagging address as old: %s' % row[0]
addresses[ address_lookup[row[0]] ]['delete'] = True
with open('address.csv', 'ab') as csv_file:
# go back through addresses and add any that shouldnt be deleted to the csv
writer = csv.writer(csv_file)
for address_row in addresses:
if address_row['delete'] is False:
print 'adding row: '
print address_row
#writer.writerow([ row['address']['value'], row['address']['count'], now.strftime('%Y-%m-%d %H:%M:%S') ])
Related
I got a list in Python with Twitter user information and exported it with Pandas to an Excel file.
One row is one Twitter user with nearly all information of the user (name, #-tag, location etc.)
Here is my code to create the list and fill it with the user data:
def get_usernames(userids, api):
fullusers = []
u_count = len(userids)
try:
for i in range(int(u_count/100) + 1):
end_loc = min((i + 1) * 100, u_count)
fullusers.extend(
api.lookup_users(user_ids=userids[i * 100:end_loc])
)
print('\n' + 'Done! We found ' + str(len(fullusers)) + ' follower in total for this account.' + '\n')
return fullusers
except:
import traceback
traceback.print_exc()
print ('Something went wrong, quitting...')
The only problem is that every row is in JSON object and therefore one long comma-seperated string. I would like to create headers (no problem with Pandas) and only write parts of the string (i.e. ID or name) to colums.
Here is an example of a row from my output.xlsx:
User(_api=<tweepy.api.API object at 0x16898928>, _json={'id': 12345, 'id_str': '12345', 'name': 'Jane Doe', 'screen_name': 'jdoe', 'location': 'Nirvana, NI', 'description': 'Just some random descrition')
I have two ideas, but I don't know how to realize them due to my lack of skills and experience with Python.
Create a loop which saves certain parts ('id','name' etc.) from the JSON-string in colums.
Cut off the User(_api=<tweepy.api. API object at 0x16898928>, _json={ at the beginning and ) at the end, so that I may export they file as CSV.
Could anyone help me out with one of my two solutions or suggest a "simple" way to do this?
fyi: I want to do this to gather data for my thesis.
Try the python json library:
import json
jsonstring = "{'id': 12345, 'id_str': '12345', 'name': 'Jane Doe', 'screen_name': 'jdoe', 'location': 'Nirvana, NI', 'description': 'Just some random descrition')"
jsondict = json.loads(jsonstring)
# type(jsondict) == dictionary
Now you can just extract the data you want from it:
id = jsondict["id"]
name = jsondict["name"]
newdict = {"id":id,"name":name}
New to Python/Boto3 so this is a little confusing. I am trying to get AWS Security Hub findings written to a csv using csv.writer but only certain items in the response. I can get the correct columns and rows written to csv however when I try to loop through the writer it just repeats the same row, not the other data from the response. I feel like I'm overlooking something simple, any help is appreciated.
def getSecurityHubFindings():
hub = boto3.client('securityhub')
findingsList = []
for key in paginate(hub.get_findings, Filters=filters, PaginationConfig={'MaxItems': MAX_ITEMS}):
scantype = key['Types']
str1 = ''.join(scantype)
port=key['ProductFields']['attributes:2/value']
vgw=key['ProductFields']['attributes:3/value']
scantype = key['Types']
str1 = ''.join(scantype)
findingAccountId = key['AwsAccountId']
findingLastObservedAt=key['LastObservedAt']
findingFirstObservedAt=key['FirstObservedAt']
findingCreatedAt=key['CreatedAt']
findingrecommendation=key['Remediation']['Recommendation']
findingTypes=key['Types']
InstanceId=key['Resources'][0]['Id']
findingInstanceId=str(InstanceId)
findingAppCode=key['Resources'][0]['Tags']['AppCode']
findingGeneratorId=key['GeneratorId']
findingProductArn=key['ProductArn']
findingTitle=key['Title']
findingsList.append(key)
if (str1 == 'Software and Configuration Checks/AWS Security Best Practices/Network Reachability - Recognized port reachable from a Peered VPC'):
vgw=''
port=key['ProductFields'][ 'attributes:4/value']
peeredvpc= key['ProductFields']['attributes:2/value']
if (str1 == 'Software and Configuration Checks/AWS Security Best Practices/Network Reachability - Recognized port reachable from a Virtual Private Gateway'):
peeredvpc=''
sev = key['Severity']['Product']
if (sev == 3):
findingSeverity='LOW'
elif (sev == 6):
findingSeverity='MEDIUM'
elif ( sev == 9):
findingSeverity='HIGH'
rows = [findingAccountId, findingGeneratorId, findingTitle,findingProductArn,findingSeverity,findingAppCode,findingFirstObservedAt,findingLastObservedAt,findingCreatedAt,findingrecommendation,findingTypes,port,vgw,peeredvpc,findingInstanceId]
columns = ('Account ID', 'Generator ID', 'Title', 'Product ARN', 'Severity', 'AppCode', 'First Observed At','Last Observed At', 'Created At', 'Recommendation', 'Types', 'Port', 'VGW', 'Peered VPC', 'Instance #ID')
with open(FILE_NAME, mode='w', newline='',) as writefile:
writefile_writer = csv.writer(writefile, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
writefile_writer.writerow(columns)
i = 0
while i < MAX_ITEMS:
writefile_writer.writerow(rows)
i +=1
return(findingsList)
The general flow should be:
def getSecurityHubFindings():
...
# Open output file and write header
columns = ('Account ID', 'Generator ID', 'Title', 'Product ARN', 'Severity', 'AppCode', 'First Observed At','Last Observed At', 'Created At', 'Recommendation', 'Types', 'Port', 'VGW', 'Peered VPC', 'Instance #ID')
with open(FILE_NAME, mode='w', newline='',) as writefile:
writefile_writer = csv.writer(writefile, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
writefile_writer.writerow(columns)
## Loop through response
for key in paginate(...):
...
(get data here)
...
# Write output
row = [findingAccountId, findingGeneratorId, findingTitle,findingProductArn,findingSeverity,findingAppCode,findingFirstObservedAt,findingLastObservedAt,findingCreatedAt,findingrecommendation,findingTypes,port,vgw,peeredvpc,findingInstanceId]
writefile_writer.writerow(row)
You are opening your file within the for loop every time with the 'w' option which truncates the file [1] and writes from the beginning, so you're overwriting your csv each time.
The block
while i < MAX_ITEMS:
writefile_writer.writerow(rows)
i +=1
also seem wrong, this just writes the same row (even though its called rows) MAX_ITEMS number of times. You probably want to open your csv file and write the header names outside of the for loop, and then write a single row for each iteration of the for loop.
Im new to Python and I have been given a task for the company's HRIS. So I have written a code from a raw .csv file that will be re-written by filtering all other data and make sure that the first instance of IN and OUT of a person will be listed. How can I insert or append the time-in of a person on a list on the very same row?
employeeInfo2 = {'Name': employeeName, 'Date': employeeDate, 'Status': employeeStatus}
if employeeInfo2 not in employeeProfile:
employeeProfile.append(employeeInfo2)
when i tried putting this line of code just below the above code, the time is displayed as a new line in the csv file or is written on a new row.
employeeProfile.append({'Time': employeeTime})
import csv
import string
import datetime
from dateutil.parser import parse
from collections import OrderedDict
employeeProfile = []
with open('newDTR.csv', 'r') as csv_file:
employee = csv.DictReader(csv_file, delimiter=",")
for employeeList in employee:
stringDate = employeeList['Date/Time']
employeeName = employeeList['Name']
employeeStatus = employeeList['Status']
dateTimeObject = datetime.datetime.strptime(stringDate, '%d/%m/%Y %I:%M:%S %p')
employeeDate = dateTimeObject.strftime("%d/%m/%Y")
employeeTime = dateTimeObject.strftime("%H:%M:%S")
parsedTimeOut = parse(employeeTime)
expected = parsedTimeOut + datetime.timedelta(hours=9)
timeOut = expected.time()
employeeInfo2 = {'Name': employeeName, 'Date': employeeDate, 'Status': employeeStatus}
if employeeInfo2 not in employeeProfile:
employeeProfile.append(employeeInfo2)
with open('fixedDTR.csv', mode='w', newline='', encoding='utf8') as new_csv_file:
fieldnames = ['Name', 'Date', 'Status', 'Time', 'Expected Time Out']
csv_writer = csv.DictWriter(new_csv_file, fieldnames=fieldnames)
csv_writer.writeheader()
for b in employeeProfile:
print(b)
csv_writer.writerow(b)
I was expecting that employeeTime will be aligned to each line of data but is not. Probably because the employeeProfile.append({'Time': employeeTime}) is on a new line. What should be the best approach?
Well, looking at your code, there isnt and insert for Time as you are writing b from employeeProfile.
Simply put when you use csv_write.writerow(b), this will automatically go down 1 row in the csv file. You could append your time key to the dict stored in employeeprofile.
employeeInfo2 = {'Name': employeeName, 'Date': employeeDate, 'Status': employeeStatus}
if employeeInfo2 not in employeeProfile:
employeeInfo2["Time"] = employeeTime # or whatever you wanted in the field
employeeProfile.append(employeeInfo2)
This would add the Time column to your dict, which would then be written nicely by csv_writer.writerow.
Based on your output I am guessing you are writing the time after like this:
csv_writer.writerow(b)
csv_writer.writerow(times)
where times is your dict for the Times. which causes the offset since writerow adds a newline to each line of your csv.
I have a need to add entries to a dictionary with the following keys:
name
element
type
I want each entry to append to a JSON file, where I will access them for another piece of the project.
What I have below technically works, but there are couple things(at least) wrong with this.
First, it doesn't prevent duplicates being entered. For example I can have 'xyz', '4444' and 'test2' appear as JSON entries multiple times. Is there a way to correct this?
Is there a cleaner way to write the actual data entry piece so when I am entering these values into the dictionary it's not directly there in the parentheses?
Finally, is there a better place to put the JSON piece? Should it be inside the function?
Just trying to clean this up a bit. Thanks
import json
element_dict = {}
def add_entry(name, element, type):
element_dict["name"] = name
element_dict["element"] = element
element_dict["type"] = type
return element_dict
#add entry
entry = add_entry('xyz', '4444', 'test2')
#export to JSON
with open('elements.json', 'a', encoding="utf-8") as file:
x = json.dumps(element_dict, indent=4)
file.write(x + '\n')
There are several questions here. The main points worth mentioning:
Use can use a list to hold your arguments and use *args to unpack when you supply them to add_entry.
To check / avoid duplicates, you can use set to track items already added.
For writing to JSON, now you have a list, you can simply iterate your list and write in one function at the end.
Putting these aspects together:
import json
res = []
seen = set()
def add_entry(res, name, element, type):
# check if in seen set
if (name, element, type) in seen:
return res
# add to seen set
seen.add(tuple([name, element, type]))
# append to results list
res.append({'name': name, 'element': element, 'type': type})
return res
args = ['xyz', '4444', 'test2']
res = add_entry(res, *args) # add entry - SUCCESS
res = add_entry(res, *args) # try to add again - FAIL
args2 = ['wxy', '3241', 'test3']
res = add_entry(res, *args2) # add another - SUCCESS
Result:
print(res)
[{'name': 'xyz', 'element': '4444', 'type': 'test2'},
{'name': 'wxy', 'element': '3241', 'type': 'test3'}]
Writing to JSON via a function:
def write_to_json(lst, fn):
with open(fn, 'a', encoding='utf-8') as file:
for item in lst:
x = json.dumps(item, indent=4)
file.write(x + '\n')
#export to JSON
write_to_json(res, 'elements.json')
you can try this way
import json
import hashlib
def add_entry(name, element, type):
return {hashlib.md5(name+element+type).hexdigest(): {"name": name, "element": element, "type": type}}
#add entry
entry = add_entry('xyz', '4444', 'test2')
#Update to JSON
with open('my_file.json', 'r') as f:
json_data = json.load(f)
print json_data.values() # View Previous entries
json_data.update(entry)
with open('elements.json', 'w') as f:
f.write(json.dumps(json_data))
I have List of multiple dictionaries inside it(as JSON ).I have a list of value and based on that value I want that JSON object for that particular value. For eg.
[{'content_type': 'Press Release',
'content_id': '1',
'Author':John},
{'content_type': 'editorial',
'content_id': '2',
'Author': Harry
},
{'content_type': 'Article',
'content_id': '3',
'Author':Paul}]
I want to to fetch complete object where author is Paul.
This is the code I have made so far.
import json
newJson = "testJsonNewInput.json"
ListForNewJson = []
def testComparision(newJson,oldJson):
with open(newJson, mode = 'r') as fp_n:
json_data_new = json.load(fp_n)
for jData_new in json_data_new:
ListForNewJson.append(jData_new['author'])
If any other information required, please ask.
Case 1
One time access
It is perfectly alright to read your data and iterate over it, returning the first match found.
def access(f, author):
with open(file) as f:
data = json.load(f)
for d in data:
if d['Author'] == author:
return d
else:
return 'Not Found'
Case 2
Repeated access
In this instance, it would be wise to reshape your data in such a way that accessing objects by author names is much faster (think dictionaries!).
For example, one possible option would be:
with open(file) as f:
data = json.load(f)
newData = {}
for d in data:
newData[d['Author']] = d
Now, define a function and pass your pre-loaded data along with a list of author names.
def access(myData, author_list):
for a in author_list:
yield myData.get(a)
The function is called like this:
for i in access(newData, ['Paul', 'John', ...]):
print(i)
Alternatively, store the results in a list r. The list(...) is necessary, because yield returns a generator object which you must exhaust by iterating over.
r = list(access(newData, [...]))
Why not do something like this? It should be fast and you will not have to load the authors that wont be searched.
alreadyknown = {}
list_of_obj = [{'content_type': 'Press Release',
'content_id': '1',
'Author':'John'},
{'content_type': 'editorial',
'content_id': '2',
'Author': 'Harry'
},
{'content_type': 'Article',
'content_id': '3',
'Author':'Paul'}]
def func(author):
if author not in alreadyknown:
obj = get_obj(author)
alreadyknown[author] = obj
return alreadyknown[author]
def get_obj(auth):
return [obj for obj in list_of_obj if obj['Author'] is auth]
print(func('Paul'))