I am trying to write this but I just realized attr_dict is not supported in the new version of NetworkX as I am not getting the desired rows from the code below.
Can someone tell me how to update this piece of code? Adding the row part is what's the issue as attr_dict is no longer supported.
if not this_user_id in G:
G.add_node(this_user_id, attr_dict={
'followers': row['followers'],
'age': row['age'],
})
This is the code in context
# Gather the data out of the row
#
this_user_id = row['author']
author = row['retweet_of']
followers = row['followers']
age = row['age']
rtfollowers = row['rtfollowers']
rtage = row['rtage']
#
# Is the sender of this tweet in our network?
#
if not this_user_id in G:
G.add_node(this_user_id, attr_dict={
'followers': row['followers'],
'age': row['age'],
})
#
# If this is a retweet, is the original author a node?
#
if author != "" and not author in G:
G.add_node(author, attr_dict={
'followers': row['rtfollowers'],
'age': row['rtage'],
})
#
# If this is a retweet, add an edge between the two nodes.
#
if author != "":
if G.has_edge(author, this_user_id):
G[author][this_user_id]['weight'] += 1
else:
G.add_weighted_edges_from([(author, this_user_id, 1.0)])
nx.write_gexf(G, 'tweets1.gexf')
Now the add_node function accepts the node attributes directly as keyword arguments so you can reformulate the function to be:
G.add_node(this_user_id, followers = row['followers'],age = row['age'])
or if you have the attributes saved in a dict named my_attr_dict, you can say
G.add_node(this_user_id,**my_attr_dict)
Related
I have a python script where I'm trying to fetch data from meraki dashboard through its API. Now the data is stored in a dataframe which needs to be pushed to a Smartsheet using the Smartsheet API integration. I've tried searching the Smartsheet API documentation but couldn't find any solution to the problem. Has anyone worked on this kind of use case before or know a script to push a simple data frame to the smartsheet?
The code is something like this:
for device in list_of_devices:
try:
dict1 = {'Name': [device['name']],
"Serial_No": [device['serial']],
'MAC': [device['mac']],
'Network_Id': [device['networkId']],
'Product_Type': [device['productType']],
'Model': [device['model']],
'Tags': [device['tags']],
'Lan_Ip': [device['lanIp']],
'Configuration_Updated_At': [device['configurationUpdatedAt']],
'Firmware': [device['firmware']],
'URL': [device['url']]
}
except KeyError:
dict1['Lan_Ip'] = "NA"
temp = pd.DataFrame.from_dict(dict1)
alldata = alldata.append(temp)
alldata.reset_index(drop=True, inplace=True)
The dataframe("alldata") looks something like this:
Name Serial_No MAC \
0 xxxxxxxxxxxxxxxx xxxxxxxxxxxxxx xxxxxxxxxxxxxxxxx
1 xxxxxxxxxxxxxxxx xxxxxxxxxxxxxx xxxxxxxxxxxxxxxxx
2 xxxxxxxxxxxxxxxx xxxxxxxxxxxxxx xxxxxxxxxxxxxxxxx
the dataframe has somewhere around 1000 rows and 11 columns
I've tried pushing this dataframe similar to the code mentioned in the comments but I'm getting a "Bad Request" error.
smart = smartsheet.Smartsheet(access_token='xxxxxxxx')
sheet_id = xxxxxxxxxxxxx
sheet = smart.Sheets.get_sheet(sheet_id)
column_map = {}
for column in sheet.columns:
column_map[column.title] = column.id
data_dict = alldata.to_dict('index')
rowsToAdd = []
for i,i in data_dict.items():
new_row = smart.models.Row()
new_row.to_top = True
for k,v in i.items():
new_cell = smart.models.Cell()
new_cell.column_id = column_map[k]
new_cell.value = v
new_row.cells.append(new_cell)
rowsToAdd.append(new_row)
result = smart.Sheets.add_rows(sheet_id, rowsToAdd)
{"response": {"statusCode": 400, "reason": "Bad Request", "content": {"detail": {"index": 0}, "errorCode": 1012, "message": "Required object attribute(s) are missing from your request: cell.value.", "refId": "1ob56acvz5nzv"}}}
Smartsheet photo where the data must be pushed
The following code adds data from a dataframe to a sheet in Smartsheet -- this should be enough to at least get you started. If you still can't get the desired result using this code, please update your original post to include the code you're using, the outcome you're wanting, and a detailed description of the issue you encountered. (Add a comment to this answer if you update your original post, so I'll be notified and will know to look.)
# target sheet
sheet_id = 3932034054809476
sheet = smartsheet_client.Sheets.get_sheet(sheet_id)
# translate column names to column id
column_map = {}
for column in sheet.columns:
column_map[column.title] = column.id
df = pd.DataFrame({'item_id': [111111, 222222],
'item_color': ['red', 'yellow'],
'item_location': ['office', 'kitchen']})
data_dict = df.to_dict('index')
rowsToAdd = []
# each object in data_dict represents 1 row of data
for i, i in data_dict.items():
# create a new row object
new_row = smartsheet_client.models.Row()
new_row.to_top = True
# for each key value pair, create & add a cell to the row object
for k, v in i.items():
# create the cell object and populate with value
new_cell = smartsheet_client.models.Cell()
new_cell.column_id = column_map[k]
new_cell.value = v
# add the cell object to the row object
new_row.cells.append(new_cell)
# add the row object to the collection of rows
rowsToAdd.append(new_row)
# add the collection of rows to the sheet in Smartsheet
result = smartsheet_client.Sheets.add_rows(sheet_id, rowsToAdd)
UPDATE #1 - re Bad Request error
Seems like the error you've described in your first comment below is perhaps being caused by the fact that some of the cells in your dataframe don't have a value. When you add a new row using the Smartsheet API, each cell that's specified for the row must specify a value for the cell -- otherwise you'll get the Bad Request error you've described. Maybe try adding an if statement inside the for loop to skip adding the cell if the value of v is None?
for k,v in i.items():
# skip adding this cell if there's no value
if v is None:
continue
...
UPDATE #2 - re further troubleshooting
In response to your second comment below: you'll need to debug further using the data in your dataframe, as I'm unable to repro the issue you describe using other data.
To simplify things -- I'd suggest that you start by trying to debug with just one item in the dataframe. You can do so by adding the line (statement) break at the end of the for loop that's building the dict -- that way, only the first device will be added.
for device in list_of_devices:
try:
...
except KeyError:
dict1['Lan_Ip'] = "NA"
temp = pd.DataFrame.from_dict(dict1)
alldata = alldata.append(temp)
# break out of loop after one item is added
break
alldata.reset_index(drop=True, inplace=True)
# print dataframe contents
print (alldata)
If you get the same error when testing with just one item, and can't recognize what it is about that data (or the way it's stored in your dataframe) that's causing the Smartsheet error, then feel free to add a print (alldata) statement after the for loop (as I show in the code snippet above) to your code and update your original post again to include the output of that statement (changing any sensitive data values, of course) -- and then I can try to repro/troubleshoot using that data.
UPDATE #3 - repro'd issue
Okay, so I've reproduced the error you've described -- by specifying None as the value of a field in the dict.
The following code successfully inserts two new rows into Smartsheet -- because every field in each dict it builds contains a (non-None) value. (For simplicity, I'm manually constructing two dicts in the same manner as you do in your for loop.)
# target sheet
sheet_id = 37558492129156
sheet = smartsheet_client.Sheets.get_sheet(sheet_id)
# translate column names to column id
column_map = {}
for column in sheet.columns:
column_map[column.title] = column.id
#----
# start: repro SO question's building of dataframe
#----
alldata = pd.DataFrame()
dict1 = {'Name': ['name1'],
"Serial_No": ['serial_no1'],
'MAC': ['mac1'],
'Network_Id': ['networkId1'],
'Product_Type': ['productType1'],
'Model': ['model1'],
'Tags': ['tags1'],
'Lan_Ip': ['lanIp1'],
'Configuration_Updated_At': ['configurationUpdatedAt1'],
'Firmware': ['firmware1'],
'URL': ['url1']
}
temp = pd.DataFrame.from_dict(dict1)
alldata = alldata.append(temp)
dict2 = {'Name': ['name2'],
"Serial_No": ['serial_no2'],
'MAC': ['mac2'],
'Network_Id': ['networkId2'],
'Product_Type': ['productType2'],
'Model': ['model2'],
'Tags': ['tags2'],
'Lan_Ip': ['lanIp2'],
'Configuration_Updated_At': ['configurationUpdatedAt2'],
'Firmware': ['firmware2'],
'URL': ['URL2']
}
temp = pd.DataFrame.from_dict(dict2)
alldata = alldata.append(temp)
alldata.reset_index(drop=True, inplace=True)
#----
# end: repro SO question's building of dataframe
#----
data_dict = alldata.to_dict('index')
rowsToAdd = []
# each object in data_dict represents 1 row of data
for i, i in data_dict.items():
# create a new row object
new_row = smartsheet_client.models.Row()
new_row.to_top = True
# for each key value pair, create & add a cell to the row object
for k, v in i.items():
# create the cell object and populate with value
new_cell = smartsheet_client.models.Cell()
new_cell.column_id = column_map[k]
new_cell.value = v
# add the cell object to the row object
new_row.cells.append(new_cell)
# add the row object to the collection of rows
rowsToAdd.append(new_row)
result = smartsheet_client.Sheets.add_rows(sheet_id, rowsToAdd)
However, running the following code (where the value of the URL field in the second dict is set to None) results in the same error you've described:
{"response": {"statusCode": 400, "reason": "Bad Request", "content": {"detail": {"index": 1}, "errorCode": 1012, "message": "Required object attribute(s) are missing from your request: cell.value.", "refId": "dw1id3oj1bv0"}}}
Code that causes this error (identical to the successful code above except that the value of the URL field in the second dict is None):
# target sheet
sheet_id = 37558492129156
sheet = smartsheet_client.Sheets.get_sheet(sheet_id)
# translate column names to column id
column_map = {}
for column in sheet.columns:
column_map[column.title] = column.id
#----
# start: repro SO question's building of dataframe
#----
alldata = pd.DataFrame()
dict1 = {'Name': ['name1'],
"Serial_No": ['serial_no1'],
'MAC': ['mac1'],
'Network_Id': ['networkId1'],
'Product_Type': ['productType1'],
'Model': ['model1'],
'Tags': ['tags1'],
'Lan_Ip': ['lanIp1'],
'Configuration_Updated_At': ['configurationUpdatedAt1'],
'Firmware': ['firmware1'],
'URL': ['url1']
}
temp = pd.DataFrame.from_dict(dict1)
alldata = alldata.append(temp)
dict2 = {'Name': ['name2'],
"Serial_No": ['serial_no2'],
'MAC': ['mac2'],
'Network_Id': ['networkId2'],
'Product_Type': ['productType2'],
'Model': ['model2'],
'Tags': ['tags2'],
'Lan_Ip': ['lanIp2'],
'Configuration_Updated_At': ['configurationUpdatedAt2'],
'Firmware': ['firmware2'],
'URL': [None]
}
temp = pd.DataFrame.from_dict(dict2)
alldata = alldata.append(temp)
alldata.reset_index(drop=True, inplace=True)
#----
# end: repro SO question's building of dataframe
#----
data_dict = alldata.to_dict('index')
rowsToAdd = []
# each object in data_dict represents 1 row of data
for i, i in data_dict.items():
# create a new row object
new_row = smartsheet_client.models.Row()
new_row.to_top = True
# for each key value pair, create & add a cell to the row object
for k, v in i.items():
# create the cell object and populate with value
new_cell = smartsheet_client.models.Cell()
new_cell.column_id = column_map[k]
new_cell.value = v
# add the cell object to the row object
new_row.cells.append(new_cell)
# add the row object to the collection of rows
rowsToAdd.append(new_row)
result = smartsheet_client.Sheets.add_rows(sheet_id, rowsToAdd)
Finally, note that the error message I received contains {"index": 1} -- this implies that the value of index in this error message indicates the (zero-based) index of the problematic row. The fact that your error message contains {"index": 0} implies that there's a problem with the data in the first row you're trying to add to Smartsheet (i.e., the first item in the dataframe). Therefore, following the troubleshooting guidance I posted in my previous update (Update #2 above) should allow you to closely examine the data for the first item/row and hopefully spot the problematic data (i.e., where the value is missing).
I'm performing what I imagine is a common pattern with indexing graph databases: my data is a list of edges and I want to "stream" the upload of this data. I.e, for each edge, I want to create the two nodes on each side and then create the edge between them; I don't want to first upload all the nodes and then link them afterwards. A naive implementation would result in a lot of duplicate nodes obviously. Therefore, I want to implement some sort of "get_or_create" to avoid duplication.
My current implementation is below, using pyArango:
def get_or_create_graph(self):
db = self._get_db()
if db.hasGraph('citator'):
self.g = db.graphs["citator"]
self.judgment = db["judgment"]
self.citation = db["citation"]
else:
self.judgment = db.createCollection("judgment")
self.citation = db.createCollection("citation")
self.g = db.createGraph("citator")
def get_or_create_node_object(self, name, vertex_data):
object_list = self.judgment.fetchFirstExample(
{"name": name}
)
if object_list:
node = object_list[0]
else:
node = self.g.createVertex('judgment', vertex_data)
node.save()
return node
My problems with this solution are:
Since the application, not the database, is checking existence, there could be an insertion between the existence check and the creation. I have found duplicate nodes in practice I suspect this is why?
It isn't very fast. Probably because it hits the DB twice potentially.
I am wandering whether there is a faster and/or more atomic way to do this, ideally a native ArangoDB query? Suggestions? Thank you.
Update
As requested, calling code shown below. It's in a Django context, where Link is a Django model (ie data in a database):
... # Class definitions etc
links = Link.objects.filter(dirty=True)
for i, batch in enumerate(batch_iterator(links, limit=LIMIT, batch_size=ITERATOR_BATCH_SIZE)):
for link in batch:
source_name = cleaner.clean(link.case.mnc)
target_name = cleaner.clean(link.citation.case.mnc)
if source_name == target_name: continue
source_data = _serialize_node(link.case)
target_data = _serialize_node(link.citation.case)
populate_pair(citation_manager, source_name, source_data, target_name, target_data, link)
def populate_pair(citation_manager, source_name, source_data, target_name, target_data, link):
source_node = citation_manager.get_or_create_node_object(
source_name,
source_data
)
target_node = citation_manager.get_or_create_node_object(
target_name,
target_data
)
description = source_name + " to " + target_name
citation_manager.populate_link(source_node, target_node, description)
link.dirty = False
link.save()
And here's a sample of what the data looks like after cleaning and serializing:
source_data: {'name': 'P v R A Fu', 'court': 'ukw', 'collection': 'uf', 'number': 'CA 139/2009', 'tag': 'NA', 'node_id': 'uf89638', 'multiplier': '5.012480529547776', 'setdown_year': 0, 'judgment_year': 0, 'phantom': 'false'}
target_data: {'name': 'Ck v R A Fu', 'court': 'ukw', 'collection': 'uf', 'number': '10/22147', 'tag': 'NA', 'node_id': 'uf67224', 'multiplier': '1.316227766016838', 'setdown_year': 0, 'judgment_year': 0, 'phantom': 'false'}
source_name: [2010] ZAECGHC 9
target_name: [2012] ZAGPJHC 189
I don't know with the Python driver. But this could be done using AQL
FOR doc in judgement
Filter doc.name == "name"
Limit 1
Insert merge((vertexobject, { _from: doc.id }) into citator
The vertextObject need to be an AQL object with at least the _to value
Note There may be typo I'm answering from my phone
I'm trying to import 5000+ rows in Odoo 12 it's basically a mapping from a CSV developed in a custom method in a module, the problem I'm getting timeout in the request, that's happening when writing to the database, I'm using the standard ERP methods create and write.
How can I work around a solution to this? I know bulk insert is not possible to this, any other solution to this?
is a SQL command for insertion OK to use?
class file_reader(models.TransientModel):
_name = "rw.file.reader"
csv_file = fields.Binary(string='CSV File', required=True)
#api.multi
def import_csv(self):
# csv importer handler
file = base64.b64decode(self.csv_file).decode().split('\n')
reader = csv.DictReader(file)
# account.analytic.line
ignored = []
time1 = datetime.now()
self._cr.execute('select id, name from project_project where active = true')
projects = self._cr.fetchall()
self._cr.execute('select id, login from res_users')
users = self._cr.fetchall()
self._cr.execute('select id, work_email from hr_employee')
employees = self._cr.fetchall()
LOG_EVERY_N = 100
for row in reader:
project_name = row['Project - Name']
email = row['User - Email Address']
project = [item for item in projects if item[1] == project_name]
if len(project) >0:
user = [item for item in users if item[1] == email]
employee = [item for item in employees if item[1] == email]
if len(user)>0 and len(employee)>0:
task = self.env['project.task'].search([['user_id','=',user[0][0]],
['project_id','=',project[0][0] ]],limit=1)
if task:
y = row['Duration'].split(':')
i, j = y[0], y[1]
model = {
'project_id': project[0][0],
'task_id': task['id'],
'employee_id':employee[0][0],
'user_id': user[0][0],
'date': row['Date'],
'unit_amount': int(i) + (float(j) / 60), # Time Spent convertion to float
'is_timesheet': True,
'billable': True if row['Billable'] == 'Yes' else False,
'nexonia_id':row['ID']
}
time_sheet = self.env['account.analytic.line'].search([['nexonia_id','=', row['ID']]],limit=1)
if time_sheet:
model.update({'id':time_sheet.id})
self.env['account.analytic.line'].sudo().write(model)
else:
self.env['account.analytic.line'].sudo().create(model)
else:
if email not in ignored:
ignored.append(email)
else:
if project_name not in ignored:
ignored.append(project_name)
all_text = 'Nothing ignored'
if ignored is not None:
all_text = "\n".join(filter(None, ignored))
message_id = self.env['message.wizard'].create({
'message': "Import data completed",
'ignored': all_text
})
time2 = datetime.now()
logging.info('total time ------------------------------------------ %s',time2-time1)
return {
'name': 'Successfull',
'type': 'ir.actions.act_window',
'view_mode': 'form',
'res_model': 'message.wizard',
# pass the id
'res_id': message_id.id,
'target': 'new'
}
I Enhanced your code a litle bit because you are searching for each project, user and employee using loop
for each row and for 5000+ row.
Using ORM method is always good because, they handle the stored compute fields and python constrains, but this will take time too
if you don't have any complex compute you can use INSERT or UPDATE query this will speed up the importion 100 times.
#api.multi
def import_csv(self):
# when you use env[model] for more than ones extract it to variable its better
# notice how I added sudo to the name of variable
AccountAnalyticLine_sudo =self.env['account.analytic.line'].sudo()
# csv importer handler
file = base64.b64decode(self.csv_file).decode().split('\n')
reader = csv.DictReader(file)
# account.analytic.line
ignored = []
time1 = datetime.now()
# convert result to dictionary for easy access later
self._cr.execute('select id, name from project_project where active = true order by name')
projects = {p[1]: p for p in self._cr.fetchall()}
self._cr.execute('select id, login from res_users order by login')
users = {u[1]: u for u in self._cr.fetchall()}
self._cr.execute('select id, work_email from hr_employee order by work_email')
employees = {emp[1]: emp for emp in self._cr.fetchall()}
LOG_EVERY_N = 100
for row in reader:
project_name = row['Project - Name']
email = row['User - Email Address']
# no need for loop and the dicionary loopkup is so fast
project = projects.get(project_name)
if project:
user = user.get(email)
employee = employees.get(email)
if user and employee:
task = self.env['project.task'].search([('user_id','=',user[0]),
('project_id','=',project[0])],
limit=1)
if task:
y = row['Duration'].split(':')
i, j = y[0], y[1]
# by convention dictionary that are passed to create or write should be named vals or values
vals = {
'project_id': project[0],
'task_id': task['id'],
'employee_id':employee[0],
'user_id': user[0],
'date': row['Date'],
'unit_amount': int(i) + (float(j) / 60), # Time Spent convertion to float
'is_timesheet': True,
'billable': True if row['Billable'] == 'Yes' else False,
'nexonia_id':row['ID']
}
time_sheet = AccountAnalyticLine_sudo.search([('nexonia_id','=', row['ID'])],limit=1)
# I think adding logger message here will be or create and update counters to know how much record record were updated or created
if time_sheet:
# I think you want to update the existing time sheet record so do this
# record.write(vals)
time_sheet.write(vals)
# you are updating an empty RecordSet
#self.env['account.analytic.line'].sudo().write(model)
else:
# create new one
AccountAnalyticLine_sudo.create(model)
else:
if email not in ignored:
ignored.append(email)
else:
if project_name not in ignored:
ignored.append(project_name)
all_text = 'Nothing ignored'
# ignored is not None is always True because ignored is a list
if ignored:
all_text = "\n".join(filter(None, ignored))
message_id = self.env['message.wizard'].create({
'message': "Import data completed",
'ignored': all_text
})
time2 = datetime.now()
logging.info('total time ------------------------------------------ %s',time2-time1)
return {
'name': 'Successfull',
'type': 'ir.actions.act_window',
'view_mode': 'form',
'res_model': 'message.wizard',
# pass the id
'res_id': message_id.id,
'target': 'new'
}
I hope this will help you a little bit even that the question is meant for somethinng else but I'm confused Odoo usually allow request to be handled
for 60 minutes.
While you are importing records through script, code optimization is very important.Try to reduce the number of search/read calls by using dictionary to save each result or use the SQL which i don't recommend.
I need to create the body for multiple updates to a Google Spreadsheet using Python.
I used the Python dictionary dict() but that doesn't work for multiple values that are repeated as dict() doesn't allow multiple keys.
My code snippet is:
body = {
}
for i in range (0,len(deltaListcolNames) ):
rangeItem = deltaListcolNames[i]
batch_input_value = deltaListcolVals[i]
body["range"] = rangeItem
body["majorDimension"] = "ROWS"
body["values"] = "[["+str(batch_input_value)+"]]"
batch_update_values_request_body = {
# How the input data should be interpreted.
'value_input_option': 'USER_ENTERED',
# The new values for the input sheet... to apply to the spreadsheet.
'data': [
dict(body)
]
}
print(batch_update_values_request_body)
request = service.spreadsheets().values().batchUpdate(
spreadsheetId=spreadsheetId,
body=batch_update_values_request_body)
response = request.execute()
Thanks for the answer, Graham.
I doubled back and went away from using the dict paradigm and found that by using this grid, I was able to make the data structure. Here is how I coded it...
perhaps a bit quirky but it works nicely:
range_value_data_list = []
width = 1
#
height = 1
for i in range (0,len(deltaListcolNames) ):
rangeItem = deltaListcolNames[i]
# print(" the value for rangeItem is : ", rangeItem)
batch_input_value = str(deltaListcolVals[i])
print(" the value for batch_input_value is : ", batch_input_value)
# construct the data structure for the value
grid = [[None] * width for i in range(height)]
grid[0][0] = batch_input_value
range_value_item_str = { 'range': rangeItem, 'values': (grid) }
range_value_data_list.append(range_value_item_str)
Review the documentation for the Python client library methods: The data portion is a list of dict objects.
So your construct is close, you just need a loop that fills the data list:
data = []
for i in range(0, len(deltaListcolNames)):
body = {}
# fill out the body
rangeItem = deltaListcolNames[i]
....
# Add this update's body to the array with the other update bodies.
data.append(body)
# build the rest of the request
...
# send the request
...
There is probably a term for what I'm attempting to do, but it escapes me. I'm using peewee to set some values in a class, and want to iterate through a list of keys and values to generate the command to store the values.
Not all 'collections' contain each of the values within the class, so I want to just include the ones that are contained within my data set. This is how far I've made it:
for value in result['response']['docs']:
for keys in value:
print keys, value[keys] # keys are "identifier, title, language'
#for value in result['response']['docs']:
# collection = Collection(
# identifier = value['identifier'],
# title = value['title'],
# language = value['language'],
# mediatype = value['mediatype'],
# description = value['description'],
# subject = value['subject'],
# collection = value['collection'],
# avg_rating = value['avg_rating'],
# downloads = value['downloads'],
# num_reviews = value['num_reviews'],
# creator = value['creator'],
# format = value['format'],
# licenseurl = value['licenseurl'],
# publisher = value['publisher'],
# uploader = value['uploader'],
# source = value['source'],
# type = value['type'],
# volume = value['volume']
# )
# collection.save()
for value in result['response']['docs']:
Collection(**value).save()
See this question for an explanation on how **kwargs work.
Are you talking about how to find out whether a key is in a dict or not?
>>> somedict = {'firstname': 'Samuel', 'lastname': 'Sample'}
>>> if somedict.get('firstname'):
>>> print somedict['firstname']
Samuel
>>> print somedict.get('address', 'no address given'):
no address given
If there is a different problem you'd like to solve, please clarify your question.