How to initialise data structure once in python loop - python

I am trying to build up a data structure from a CSV file. The CSV file contents are below.
‘Windows 8’,10.1.1.1,’Windows 8 Server’,’SiteA’
‘Windows 8’,10.2.2.2,’Windows 8 Server’,’SiteB’
‘Cisco Router,’172.16.1.1’,’Cisco Router 881’,’SiteA’
‘Cisco Router,’172.16.1.3’,’Cisco Router 881’,’SiteC’
‘Cisco Router,’172.16.1.4’,’Cisco Router 881’,’SiteB’
I am trying to group the data by Device Type, then Site and have a list of common ip addresses along with the description.
The problem I am having is I cannot work out to ensure I am only initialising the various parts of the data structure only one.
Below is my code.
import csv
import pprint
data = {}
pp = pprint.PrettyPrinter(indent=4)
f = open('/Users/marcos/Desktop/vulns/data.csv', 'rt')
try:
reader = csv.reader(f)
for row in reader:
product = row[0]
ip = row[1]
description = row[2]
site = row[3]
try:
data[product][site]['ipaddresses'].append(ip)
data[product][site]['description'] = description
except:
data[product] = {}
data[product][site] = {}
data[product][site]['ipaddresses'] = []
data[product][site]['ipaddresses'].append(ip)
data[product][site]['description'] = description
finally:
f.close()
pp.pprint(data)
What I am currently getting is the following, which is because my except is always triggering I believe
{ '‘Cisco Router': { '’SiteB’': { 'description': '’Cisco Router 881’',
'ipaddresses': ['’172.16.1.4’']}},
'‘Windows 8’': { '’SiteB’': { 'description': '’Windows 8 Server’',
'ipaddresses': ['10.2.2.2']}}}

Raising an exception is useful in showing what is actually wrong. When I did this I saw KeyErrors, so I used this approach:
try:
reader = csv.reader(f)
for row in reader:
product = row[0]
ip = row[1]
description = row[2]
site = row[3]
try:
if product not in data:
data[product] = {}
if site not in data[product]:
data[product][site] = {}
if 'description' not in data[product][site]:
data[product][site]['description'] = description
if 'ipaddresses' not in data[product][site]:
data[product][site]['ipaddresses'] = []
data[product][site]['ipaddresses'].append(ip)
data[product][site]['description'] = description
except Exception, e:
raise
finally:
f.close()
pp.pprint(data)
Notice that I am creating any keys, lists, or dicts that are needed before trying to work with them.
This gives me the following output:
{ 'Cisco Router': { 'SiteA': { 'description': 'Cisco Router 881',
'ipaddresses': ['172.16.1.1']},
'SiteB': { 'description': 'Cisco Router 881',
'ipaddresses': ['172.16.1.4']},
'SiteC': { 'description': 'Cisco Router 881',
'ipaddresses': ['172.16.1.3']}},
'Windows 8': { 'SiteA': { 'description': 'Windows 8 Server',
'ipaddresses': ['10.1.1.1']},
'SiteB': { 'description': 'Windows 8 Server',
'ipaddresses': ['10.2.2.2']}}}

Here is an approach using the .setdefault method. When used in a loop it does exactly what you're asking for: It initialises the value if the key does not exist, otherwise it returns the stored value.
I personally like it but I can see how others don't because it makes nested lookups a bit harder to read. It's a matter of taste:
reader = """
‘Windows 8’,10.1.1.1,’Windows 8 Server’,’SiteA’
‘Windows 8’,10.2.2.2,’Windows 8 Server’,’SiteB’
‘Cisco Router,’172.16.1.1’,’Cisco Router 881’,’SiteA’
‘Cisco Router,’172.16.1.3’,’Cisco Router 881’,’SiteC’
‘Cisco Router,’172.16.1.4’,’Cisco Router 881’,’SiteB’
"""
reader = [line.split(',') for line in reader.replace("'", '').strip().split('\n')]
data = {}
for row in reader:
product, ip, description, site = row[:4]
site_data = data.setdefault(product, {}).setdefault(site, {})
site_data.setdefault('ipaddresses', []).append(ip)
site_data['description'] = description
import pprint
pprint.pprint(data)
Prints:
{'‘Cisco Router': {'’SiteA’': {'description': '’Cisco Router 881’',
'ipaddresses': ['’172.16.1.1’']},
'’SiteB’': {'description': '’Cisco Router 881’',
'ipaddresses': ['’172.16.1.4’']},
'’SiteC’': {'description': '’Cisco Router 881’',
'ipaddresses': ['’172.16.1.3’']}},
'‘Windows 8’': {'’SiteA’': {'description': '’Windows 8 Server’',
'ipaddresses': ['10.1.1.1']},
'’SiteB’': {'description': '’Windows 8 Server’',
'ipaddresses': ['10.2.2.2']}}}

This seems like a useful time to use pandas.
import pandas as pd
data_ = pd.read_csv('path-to-data.csv')
data_.columns = ['product', 'ip', 'description', 'site']
# Create a 'grouped' dataset object
grouped = df.groupby(['product', 'site', 'ip'])
# Create a dataset with a list of unique 'description' values,
# grouped by columns above
unique_desc_by_group = grouped['description'].aggregate(lambda x: tuple(x))
print(unique_desc_by_group)

Related

Parse a txt file and store data into a dictionary

I have a set of data that I would like to extract from a txt file and stored in a specific format. The data is is currently in a txt file like so:
set firewall family inet filter INBOUND term TEST from source-address 1.1.1.1/32
set firewall family inet filter INBOUND term TEST from destination-prefix-list test-list
set firewall family inet filter INBOUND term TEST from protocol udp
set firewall family inet filter INBOUND term TEST from destination-port 53
set firewall family inet filter INBOUND term TEST then accept
set firewall family inet filter PROD term LAN from source-address 4.4.4.4/32
set firewall family inet filter PROD term LAN from source-address 5.5.5.5/32
set firewall family inet filter PROD term LAN from protocol tcp
set firewall family inet filter PROD term LAN from destination-port 443
set firewall family inet filter PROD term LAN then deny
I would like the data to be structured to where each rule has their respective options placed into dictionary and appended to a list. For example:
Expected Output
[{'Filter': 'INBOUND', 'Term': 'TEST', 'SourceIP': '1.1.1.1/32', 'DestinationList': 'test-list', 'Protocol': 'udp', 'DestinationPort': '53', 'Action': 'accept},
{'Filter': 'PROD', 'Term': 'LAN', 'SourceIP': ['4.4.4.4/32','5.5.5.5/32'], 'Protocol': 'tcp', 'DestinationPort': '443', 'Action': 'deny'}]
As you can see there may be instances where a certain trait does not exist for a rule. I would also have to add multiple IP addresses as a value. I am currently using Regex to match the items in the txt file. My thought was to iterate through each line in the file, find any matches and add them as a key-value pair to a dictionary.
Once I get an "accept" or "deny", that should signal the end of the rule and I will append the dictionary to the list, clear the dictionary and start the process with the next rule. However this does not seem to be working as intended. My Regex seems fine but I cant seem to figure out the logic when processing each line, adding multiple values to a value list, and adding values to the dictionary. Here is my code below
import re
data_file = "sample_data.txt"
##### REGEX PATTERNS #####
filter_re = r'(?<=filter\s)(.*)(?=\sterm.)'
term_re = r'(?<=term\s)(.*)(?=\sfrom|\sthen)'
protocol_re = r'(?<=protocol\s)(.*)'
dest_port_re = r'(?<=destination-port\s)(.*)'
source_port_re = r'(?<=from\ssource-port\s)(.*)'
prefix_source_re = r'(?<=from\ssource-prefix-list\s)(.*)'
prefix_dest_re = r'(?<=from\sdestination-prefix-list\s)(.*)'
source_addr_re = r'(?<=source-address\s)(.*)'
dest_addr_re = r'(?<=destination-address\s)(.*)'
action_re = r'(?<=then\s)(deny|accept)'
pattern_list = [filter_re, term_re, source_addr_re, prefix_source_re, source_port_re, dest_addr_re, prefix_dest_re, dest_port_re, protocol_re, action_re]
pattern_headers = ["Filter", "Term", "Source_Address", "Source_Prefix_List", "Source_Port", "Destination_Address," "Destination_Prefix_List", "Destination_Port", "Protocol", "Action"]
final_list = []
def open_file(file):
rule_dict = {}
with open(file, 'r') as f:
line = f.readline()
while line:
line = f.readline().strip()
for header, pattern in zip(pattern_headers,pattern_list):
match = re.findall(pattern, line)
if len(match) != 0:
if header != 'accept' or header != 'deny':
rule_dict[header] = match[0]
else:
rule_dict[header] = match[0]
final.append(rule_dict)
rule_dict = {}
print(rule_dict)
print(final_list)
The final list is empty and the rule_dict only contains the final rule from the text file not the both of the rulesets. Any guidance would be greatly appreciated.
There are few little mistakes in your code:
in your while loop f.readline() needs to be at the end, otherwise you already begin in line 2 (readline called twice before doing anything)
final_list has to be defined in your function and also used
correctly then (instead of only "final"
if header != 'accept' or header != 'deny':: here needs to be an and. One of them is always True, so the else part never gets executed.
you need to check the match for accept|deny, not the header
for example in Source_IP you want to have a list with all IP's you find. The way you do it, the value would always be updated and only the last found IP will be in your final_list
def open_file(file):
final_list = []
rule_dict = {}
with open(file) as f:
line = f.readline()
while line:
line = line.strip()
for header, pattern in zip(pattern_headers, pattern_list):
match = re.findall(pattern, line)
if len(match) != 0:
if (match[0] != "accept") and (match[0] != "deny"):
rule_dict.setdefault(header, set()).add(match[0])
else:
rule_dict.setdefault(header, set()).add(match[0])
#adjust values of dict to list (if multiple values) or just a value (instead of set) before appending to list
final_list.append({k:(list(v) if len(v)>1 else v.pop()) for k,v in rule_dict.items()})
rule_dict = {}
line = f.readline()
print(f"{rule_dict=}")
print(f"{final_list=}")
open_file(data_file)
Output:
rule_dict={}
final_list=[
{
'Filter': 'INBOUND',
'Term': 'TEST',
'Source_Address': '1.1.1.1/32',
'Destination_Prefix_List': 'test-list',
'Protocol': 'udp', 'Destination_Port': '53',
'Action': 'accept'
},
{
'Filter': 'PROD',
'Term': 'LAN',
'Source_Address': ['5.5.5.5/32', '4.4.4.4/32'],
'Protocol': 'tcp',
'Destination_Port': '443',
'Action': 'deny'
}
]
There are few things that i have change in your code:
When "accept" and "deny" found in action then append final_dict in final_list and empty final_dict
allow to add more than one SourceIP- for that create list in value of SourceIP when more than SourceIP get
import re
data_file = "/home/hiraltalsaniya/Documents/Hiral/test"
filter_re = r'(?<=filter\s)(.*)(?=\sterm.)'
term_re = r'(?<=term\s)(.*)(?=\sfrom|\sthen)'
protocol_re = r'(?<=protocol\s)(.*)'
dest_port_re = r'(?<=destination-port\s)(.*)'
source_port_re = r'(?<=from\ssource-port\s)(.*)'
prefix_source_re = r'(?<=from\ssource-prefix-list\s)(.*)'
prefix_dest_re = r'(?<=from\sdestination-prefix-list\s)(.*)'
source_addr_re = r'(?<=source-address\s)(.*)'
dest_addr_re = r'(?<=destination-address\s)(.*)'
action_re = r'(?<=then\s)(deny|accept)'
pattern_list = [filter_re, term_re, source_addr_re, prefix_source_re, source_port_re, dest_addr_re, prefix_dest_re,
dest_port_re, protocol_re, action_re]
pattern_headers = ["Filter", "Term", "SourceIP", "Source_Prefix_List", "Source_Port", "Destination_Address",
"DestinationList", "Destination_Port", "Protocol", "Action"]
def open_file(file):
final_dict: dict = dict()
final_list: list = list()
with open(file) as f:
for line in f:
for header, pattern in zip(pattern_headers, pattern_list):
match = re.search(pattern, line)
if match:
# check if accept or deny it means the end of the rule then empty dictionary
if str(match.group()) == "accept" or match.group() == "deny":
final_list.append(final_dict)
final_dict: dict = dict()
# if more than one SourceIP then create list of SourceIP
elif header == "SourceIP" and header in final_dict.keys():
final_dict[header] = [final_dict[header]]
final_dict.setdefault(header, final_dict[header]).append(match.group())
else:
final_dict[header] = match.group()
print("final_list=", final_list)
open_file(data_file)
Output:
final_list= [{'Filter': 'INBOUND',
'Term': 'TEST',
'SourceIP': '1.1.1.1/32',
'DestinationList': 'test-list',
'Protocol': 'udp',
'Destination_Port': '53'
},
{'Filter': 'PROD',
'Term': 'LAN',
'SourceIP': ['4.4.4.4/32', '5.5.5.5/32'],
'Protocol': 'tcp',
'Destination_Port': '443'
}]

Importing 5k+ rows in odoo 12 gives me timeout

I'm trying to import 5000+ rows in Odoo 12 it's basically a mapping from a CSV developed in a custom method in a module, the problem I'm getting timeout in the request, that's happening when writing to the database, I'm using the standard ERP methods create and write.
How can I work around a solution to this? I know bulk insert is not possible to this, any other solution to this?
is a SQL command for insertion OK to use?
class file_reader(models.TransientModel):
_name = "rw.file.reader"
csv_file = fields.Binary(string='CSV File', required=True)
#api.multi
def import_csv(self):
# csv importer handler
file = base64.b64decode(self.csv_file).decode().split('\n')
reader = csv.DictReader(file)
# account.analytic.line
ignored = []
time1 = datetime.now()
self._cr.execute('select id, name from project_project where active = true')
projects = self._cr.fetchall()
self._cr.execute('select id, login from res_users')
users = self._cr.fetchall()
self._cr.execute('select id, work_email from hr_employee')
employees = self._cr.fetchall()
LOG_EVERY_N = 100
for row in reader:
project_name = row['Project - Name']
email = row['User - Email Address']
project = [item for item in projects if item[1] == project_name]
if len(project) >0:
user = [item for item in users if item[1] == email]
employee = [item for item in employees if item[1] == email]
if len(user)>0 and len(employee)>0:
task = self.env['project.task'].search([['user_id','=',user[0][0]],
['project_id','=',project[0][0] ]],limit=1)
if task:
y = row['Duration'].split(':')
i, j = y[0], y[1]
model = {
'project_id': project[0][0],
'task_id': task['id'],
'employee_id':employee[0][0],
'user_id': user[0][0],
'date': row['Date'],
'unit_amount': int(i) + (float(j) / 60), # Time Spent convertion to float
'is_timesheet': True,
'billable': True if row['Billable'] == 'Yes' else False,
'nexonia_id':row['ID']
}
time_sheet = self.env['account.analytic.line'].search([['nexonia_id','=', row['ID']]],limit=1)
if time_sheet:
model.update({'id':time_sheet.id})
self.env['account.analytic.line'].sudo().write(model)
else:
self.env['account.analytic.line'].sudo().create(model)
else:
if email not in ignored:
ignored.append(email)
else:
if project_name not in ignored:
ignored.append(project_name)
all_text = 'Nothing ignored'
if ignored is not None:
all_text = "\n".join(filter(None, ignored))
message_id = self.env['message.wizard'].create({
'message': "Import data completed",
'ignored': all_text
})
time2 = datetime.now()
logging.info('total time ------------------------------------------ %s',time2-time1)
return {
'name': 'Successfull',
'type': 'ir.actions.act_window',
'view_mode': 'form',
'res_model': 'message.wizard',
# pass the id
'res_id': message_id.id,
'target': 'new'
}
I Enhanced your code a litle bit because you are searching for each project, user and employee using loop
for each row and for 5000+ row.
Using ORM method is always good because, they handle the stored compute fields and python constrains, but this will take time too
if you don't have any complex compute you can use INSERT or UPDATE query this will speed up the importion 100 times.
#api.multi
def import_csv(self):
# when you use env[model] for more than ones extract it to variable its better
# notice how I added sudo to the name of variable
AccountAnalyticLine_sudo =self.env['account.analytic.line'].sudo()
# csv importer handler
file = base64.b64decode(self.csv_file).decode().split('\n')
reader = csv.DictReader(file)
# account.analytic.line
ignored = []
time1 = datetime.now()
# convert result to dictionary for easy access later
self._cr.execute('select id, name from project_project where active = true order by name')
projects = {p[1]: p for p in self._cr.fetchall()}
self._cr.execute('select id, login from res_users order by login')
users = {u[1]: u for u in self._cr.fetchall()}
self._cr.execute('select id, work_email from hr_employee order by work_email')
employees = {emp[1]: emp for emp in self._cr.fetchall()}
LOG_EVERY_N = 100
for row in reader:
project_name = row['Project - Name']
email = row['User - Email Address']
# no need for loop and the dicionary loopkup is so fast
project = projects.get(project_name)
if project:
user = user.get(email)
employee = employees.get(email)
if user and employee:
task = self.env['project.task'].search([('user_id','=',user[0]),
('project_id','=',project[0])],
limit=1)
if task:
y = row['Duration'].split(':')
i, j = y[0], y[1]
# by convention dictionary that are passed to create or write should be named vals or values
vals = {
'project_id': project[0],
'task_id': task['id'],
'employee_id':employee[0],
'user_id': user[0],
'date': row['Date'],
'unit_amount': int(i) + (float(j) / 60), # Time Spent convertion to float
'is_timesheet': True,
'billable': True if row['Billable'] == 'Yes' else False,
'nexonia_id':row['ID']
}
time_sheet = AccountAnalyticLine_sudo.search([('nexonia_id','=', row['ID'])],limit=1)
# I think adding logger message here will be or create and update counters to know how much record record were updated or created
if time_sheet:
# I think you want to update the existing time sheet record so do this
# record.write(vals)
time_sheet.write(vals)
# you are updating an empty RecordSet
#self.env['account.analytic.line'].sudo().write(model)
else:
# create new one
AccountAnalyticLine_sudo.create(model)
else:
if email not in ignored:
ignored.append(email)
else:
if project_name not in ignored:
ignored.append(project_name)
all_text = 'Nothing ignored'
# ignored is not None is always True because ignored is a list
if ignored:
all_text = "\n".join(filter(None, ignored))
message_id = self.env['message.wizard'].create({
'message': "Import data completed",
'ignored': all_text
})
time2 = datetime.now()
logging.info('total time ------------------------------------------ %s',time2-time1)
return {
'name': 'Successfull',
'type': 'ir.actions.act_window',
'view_mode': 'form',
'res_model': 'message.wizard',
# pass the id
'res_id': message_id.id,
'target': 'new'
}
I hope this will help you a little bit even that the question is meant for somethinng else but I'm confused Odoo usually allow request to be handled
for 60 minutes.
While you are importing records through script, code optimization is very important.Try to reduce the number of search/read calls by using dictionary to save each result or use the SQL which i don't recommend.

How to structure a list with JSON objects in Python?

I got a list in Python with Twitter user information and exported it with Pandas to an Excel file.
One row is one Twitter user with nearly all information of the user (name, #-tag, location etc.)
Here is my code to create the list and fill it with the user data:
def get_usernames(userids, api):
fullusers = []
u_count = len(userids)
try:
for i in range(int(u_count/100) + 1):
end_loc = min((i + 1) * 100, u_count)
fullusers.extend(
api.lookup_users(user_ids=userids[i * 100:end_loc])
)
print('\n' + 'Done! We found ' + str(len(fullusers)) + ' follower in total for this account.' + '\n')
return fullusers
except:
import traceback
traceback.print_exc()
print ('Something went wrong, quitting...')
The only problem is that every row is in JSON object and therefore one long comma-seperated string. I would like to create headers (no problem with Pandas) and only write parts of the string (i.e. ID or name) to colums.
Here is an example of a row from my output.xlsx:
User(_api=<tweepy.api.API object at 0x16898928>, _json={'id': 12345, 'id_str': '12345', 'name': 'Jane Doe', 'screen_name': 'jdoe', 'location': 'Nirvana, NI', 'description': 'Just some random descrition')
I have two ideas, but I don't know how to realize them due to my lack of skills and experience with Python.
Create a loop which saves certain parts ('id','name' etc.) from the JSON-string in colums.
Cut off the User(_api=<tweepy.api. API object at 0x16898928>, _json={ at the beginning and ) at the end, so that I may export they file as CSV.
Could anyone help me out with one of my two solutions or suggest a "simple" way to do this?
fyi: I want to do this to gather data for my thesis.
Try the python json library:
import json
jsonstring = "{'id': 12345, 'id_str': '12345', 'name': 'Jane Doe', 'screen_name': 'jdoe', 'location': 'Nirvana, NI', 'description': 'Just some random descrition')"
jsondict = json.loads(jsonstring)
# type(jsondict) == dictionary
Now you can just extract the data you want from it:
id = jsondict["id"]
name = jsondict["name"]
newdict = {"id":id,"name":name}

How to select particular JSON object with specific value?

I have List of multiple dictionaries inside it(as JSON ).I have a list of value and based on that value I want that JSON object for that particular value. For eg.
[{'content_type': 'Press Release',
'content_id': '1',
'Author':John},
{'content_type': 'editorial',
'content_id': '2',
'Author': Harry
},
{'content_type': 'Article',
'content_id': '3',
'Author':Paul}]
I want to to fetch complete object where author is Paul.
This is the code I have made so far.
import json
newJson = "testJsonNewInput.json"
ListForNewJson = []
def testComparision(newJson,oldJson):
with open(newJson, mode = 'r') as fp_n:
json_data_new = json.load(fp_n)
for jData_new in json_data_new:
ListForNewJson.append(jData_new['author'])
If any other information required, please ask.
Case 1
One time access
It is perfectly alright to read your data and iterate over it, returning the first match found.
def access(f, author):
with open(file) as f:
data = json.load(f)
for d in data:
if d['Author'] == author:
return d
else:
return 'Not Found'
Case 2
Repeated access
In this instance, it would be wise to reshape your data in such a way that accessing objects by author names is much faster (think dictionaries!).
For example, one possible option would be:
with open(file) as f:
data = json.load(f)
newData = {}
for d in data:
newData[d['Author']] = d
Now, define a function and pass your pre-loaded data along with a list of author names.
def access(myData, author_list):
for a in author_list:
yield myData.get(a)
The function is called like this:
for i in access(newData, ['Paul', 'John', ...]):
print(i)
Alternatively, store the results in a list r. The list(...) is necessary, because yield returns a generator object which you must exhaust by iterating over.
r = list(access(newData, [...]))
Why not do something like this? It should be fast and you will not have to load the authors that wont be searched.
alreadyknown = {}
list_of_obj = [{'content_type': 'Press Release',
'content_id': '1',
'Author':'John'},
{'content_type': 'editorial',
'content_id': '2',
'Author': 'Harry'
},
{'content_type': 'Article',
'content_id': '3',
'Author':'Paul'}]
def func(author):
if author not in alreadyknown:
obj = get_obj(author)
alreadyknown[author] = obj
return alreadyknown[author]
def get_obj(auth):
return [obj for obj in list_of_obj if obj['Author'] is auth]
print(func('Paul'))

Python Dictionaries & CSV Values | Check CSV

The csv file works fine. So does the dictionary but I can't seem to check the values in the csv file to make sure I'm not adding duplicate entries. How can I check this? The code I tried is below:
def write_csv():
csvfile = csv.writer(open("address.csv", "a"))
check = csv.reader(open("address.csv"))
for item in address2:
csvfile.writerow([address2[items]['address']['value'],address2[items]['address']['count'],items, datetime.datetime.now()])
def check_csv():
check = csv.reader(open("address.csv"))
csvfile = csv.writer(open("address.csv", "a"))
for stuff in address2:
address = address2[str(stuff)]['address']['value']
for sub in check:
if sub[0] == address:
print "equals"
try:
address2[stuff]['delete'] = True
except:
address2[stuff]['delete'] = True
else:
csvfile.writerow([address2[stuff]['address']['value'], address2[stuff]['address']['count'], stuff, datetime.datetime.now()])
Any ideas?
Your CSV and dict structures are a little wonky - I'd love to know if that is set or if you can change them to be more useful. Here is an example that does basically what you want -- you'll have to change some things to fit your format. The most important change is probably not writing to a file that you are reading - that is going to lead to headaches.
This does what you asked with the delete flag -- is there an external need for this? If not there is almost certainly a better way (removing the bad rows, saving the good rows somewhere else, etc - depends on what you are doing).
Anyway, here is the example. I used just the commented block to create the csv file in the first place, then added the new address to the list and ran the rest. Instead of looping through the file over and over it makes a lookup dict by address and stores the row number, which it then uses to update the delete flag if it is found when it reads the csv file. You'll want to take the prints out and uncomment the last line to actually write the new rows.
import csv, datetime
addresses = [
{'address': {'value': '123 road', 'count': 1}, 'delete': False},
{'address': {'value': '456 road', 'count': 1}, 'delete': False},
{'address': {'value': '789 road', 'count': 1}, 'delete': False},
{'address': {'value': '1 new road', 'count': 1}, 'delete': False},
]
now = datetime.datetime.now()
### create the csv
##with open('address.csv', 'wb') as csv_file:
## writer = csv.writer(csv_file)
## for row in addresses:
## writer.writerow([ row['address']['value'], row['address']['count'], now.strftime('%Y-%m-%d %H:%M:%S') ])
# make lookup keys for the dict
address_lookup = {}
for i in range(len(addresses)):
address_row = addresses[i]
address_lookup[address_row['address']['value']] = i
# read csv once
with open('address.csv', 'rb') as csv_file:
reader = csv.reader(csv_file)
for row in reader:
print row
# if address is found in the dict, set delete flag to true
if row[0] in address_lookup:
print 'flagging address as old: %s' % row[0]
addresses[ address_lookup[row[0]] ]['delete'] = True
with open('address.csv', 'ab') as csv_file:
# go back through addresses and add any that shouldnt be deleted to the csv
writer = csv.writer(csv_file)
for address_row in addresses:
if address_row['delete'] is False:
print 'adding row: '
print address_row
#writer.writerow([ row['address']['value'], row['address']['count'], now.strftime('%Y-%m-%d %H:%M:%S') ])

Categories