for loop in mysql id dictionary with values - python

My goal is to come up with the average of points
Im using a for loop to do a mysql query in python. This query returns the following id's along with some values:
{'speed_range_id': 0, 'count(speed_range_id)': 511}
{'speed_range_id': 1, 'count(speed_range_id)': 1827}
{'speed_range_id': 2, 'count(speed_range_id)': 48}
{'speed_range_id': 4, 'count(speed_range_id)': 100}
{'speed_range_id': 8, 'count(speed_range_id)': 60}
What i want to do is to create a dictionary that maps the id to a value, say speed_range_id:1 = 15km/hr, speed_range_id:2 = 25 km, speed_range_id:4 = 50 km/hr and so on.
I would like to then multiply the count(speed_range_id)': 1827 times the value i gave the id, in this case 1827*15 and so on for every other id. I'd then have to add up the result of every ID and divide it by the total sum of the counts 1827+48+100+60=2035 in order to come up with the average km/hr
I am stuck trying to create the dictionary of values for the speed_range_id's and store them in a variable. I think it's necesasary i do some if statements?
Any help or guidance is appreciated.
My for loop currently looks like this:
for rowdict in result:
cursor2.execute(speed_query, rowdict)
speed_result = cursor2.fetchall()
for rowdict2 in speed_result:
print(rowdict2)

Declare a dict id_vals to store the associations of id and values, and also speed_result as you have in your loop already. You can use sum with a generator expression to evaluate the sum of something for each element in speed_result.
id_vals = {1:15, 2:25, 4:50...}
result_sum = sum(id_vals[row['speed_range_id']]*row['count(speed_range_id)'] for row in speed_result)
count_sum = sum(row['count(speed_range_id)'] for row in speed_result)
avg = result_sum/count_sum

Related

Why is mapping dataseries with dataframe column names taking so long with map function

Hi all i have a dataframe of approx 400k rows with a column of interest. I would like to map each element in the column to a category (LU, HU, etc.). This is obtained from a smaller dataframe where the column names are the category. The function below however runs very slow voor only 400k rows. I m not sure why. In the example below ofcourse it is fast for 5 examples.
cwp_sector_mapping = {
'LU': ['C2P34', 'C2P35', 'C2P36'],
'HU': ['C2P37', 'C2P38', 'C2P39'],
'EH': ['C2P40', 'C2P41', 'C2P42'],
'EL': ['C2P43', 'C2P44', 'C2P45'],
'WL': ['C2P12', 'C2P13', 'C2P14'],
'WH': ['C2P15', 'C2P16', 'C2P17'],
'NL': ['C2P18', 'C2P19', 'C2P20'],
}
df_cwp = pd.DataFrame.from_dict(cwp_sector_mapping)
columns = df_cwp.columns
ls = pd.Series(['C2P44', 'C2P43', 'C2P12', 'C2P1'])
temp = list((map(lambda pos: columns[df_cwp.eq(pos).any()][0] if
columns[df_cwp.eq(pos).any()].size != 0 else 'UN', ls)))
Use next with iter trick for possible get first meached value of columns, if no match get default value UN:
temp = [next(iter(columns[df_cwp.eq(pos).any()]), 'UN') for pos in ls]

Looped append of values to list resulting in ValueError: too many values to unpack

Apologies if I'm making a rookie error here, I've only been coding a couple weeks. I am attempting to build a function that takes a series of data and builds a set of bootstrap replicates.
To do this, I believe I want to:
build a list of values selected at random from the initial data series
aggregate the values into a list of key, value pairs
create an empty list to collect these aggregates
loop through 1,2 some number of times appending the values of each aggregate (from step 2) into the list (from 3)
I can get 1 and 2 down, I think... My first block of code is the following:
test_list = ['ABBV', 'ABBV', 'ACV1', 'ACV1', 'ACV1', 'ACV1', 'ACV1', 'AMKR', 'AMKR', 'AMKR']
data = test_list
def bootstrap_replicate_vc(data):
"""Generate single bootstrap replicate of 1D data."""
bs_sample = np.random.choice(data, len(data))
bs_rep = collections.defaultdict(list)
for value in bs_sample:
if value not in bs_rep:
bs_rep[value] = 1
else :
bs_rep[value] = bs_rep[value] + 1
return bs_rep
When I run this I get what I expect as a single bootstrapped replication of my categorical series, for example:
bs_rep1 = bootstrap_replicate_vc(data) #check
print(bs_rep1)
defaultdict(<class 'list'>, {'ACV1': 6, 'ABBV': 1, 'AMKR': 3})
My desire is to then create a function to iterate on the prior function to build additional bootstrapped replicates and append the values to the appropriate keys. So, an additional iteration (now size 2) should look something like this:
bs_reps = bootstrap_reps_categorical(data, size=2) #check
print(bs_rep1)
defaultdict(<class 'list'>, {'ACV1': [6, 5], 'ABBV': [1,3], 'AMKR': [3,2]})
What I wrote to get there was:
def generate_bs_reps_categorical(data, size=1):
# Initialize array of replicates: bs_replicates
bs_reps = collections.defaultdict(list)
# Generate replicates
for i in range(size):
bs_rep = bootstrap_replicate_vc(data)
for key, value in bs_rep:
bs_reps[key].append(value)
return bs_reps
However, when I run this I get a Value Error.
ValueError: too many values to unpack (expected 2)
I modeled this off of the python documentation I saw when researching this but I'm definitely not getting the right result.
Any help would be appreciated!

Python: append list in for-loop unexpected result

I am trying to create a new variable from a list ('provider') that checks if some ids are present in another column in the data frame:
import pandas as pd
xx = {'provider_id': [1, 2, 30, 8, 8, 7, 9]}
xx = pd.DataFrame(data=xx)
ids = [8,9,30]
names = ["netflix", "prime","sky"]
for id_,name in zip(ids,names):
provider = []
if id_ in xx["provider_id"]:
provider.append(name)
provider
excpected result:
['netflix', 'prime', 'sky']
actual result:
['sky']
So the for loop keeps overwriting the result of name inside the loop? This functionality seems weird to me and I honestly don't know how to prevent this other then to write three individual if statements.
Your loop keeps initialising the list. Move the list outside the loop:
provider = []
for id_,name in zip(ids,names):
if id_ in xx["provider_id"]:
provider.append(name)
print(provider)
Scrap the loops altogether and use the built-in pandas methods. It will work much faster.
df = pd.DataFrame({'ids': [8,9,30], 'names': ["netflix", "prime","sky"]})
cond = df.ids.isin(xx.provider_id)
df.loc[cond, 'names'].tolist()
['netflix', 'prime', 'sky']
One way to make this more efficient is using sets and isin to find the matching ids in the dataframe, and then a list comprehension with zip to keep the corresponding names.
The error as #quamrana points out is that you keep resetting the list inside the loop.
s = set(xx.loc[xx.isin(ids).values, 'provider_id'].values)
# {8, 9, 30}
[name for id_, name in zip(ids, names) if id_ in s]
# ['netflix', 'prime', 'sky']

sum content from looping python

i want to write ms.excel file use this script, i get data from this table
dataangsuran = Trpinjaman.objects.filter(ckarid=str(id)).select_related('ckarid')
then i get data using looping
col_num=0
for obj in dataangsuran:
col = [
str(obj.ckarid),
str(obj.ckarid.cnik_nip),
str(obj.ckarid.tunit),
str(obj.cangsuranpokok),
]
for row_num in xrange(len(col)):
ws.write(row_pend, col_num, col[row_num])
how can i sum from looping data?
str(obj.cangsuranpokok)
I think you could try one of this:
Sum inside the for loop
total = 0
for obj in dataangsuran:
total = total + obj.cangsuranpokok
col = [
str(obj.ckarid),
str(obj.ckarid.cnik_nip),
total,
str(obj.cangsuranpokok),
]
And then use total
Take the sum from ORM
from django.db.models import Sum
dataangsuran.aggregate(total=Sum('cangsuranpokok'))
Keep in mind that dataangsuran is a QuerySet object, so you can add the aggregate after the first loop, when you write the excel file.

Adding Values to an Array and getting distinct values using Python

I have python code below that will loop through a table and print out values within a particular column. What is not shown is the form in which the user selects a Feature Layer. Once the Feature Layer is selected a second Dropdown is populated with all the Column Headings for that Feature and the user chooses which Column they want to focus on. Now within the python script, I simply print out each value within that column. But I want to store each value in a List or Array and get Distinct values. How can I do this in Python?
Also is there a more efficient way to loop through the table than to go row by row? That is very slow for some reason.
many thanks
# Import system modules
import sys, string, os, arcgisscripting
# Create the Geoprocessor object
gp = arcgisscripting.create(9.3)
gp.AddToolbox("E:/Program Files (x86)/ArcGIS/ArcToolbox/Toolboxes/Data Management Tools.tbx")
# Declare our user input args
input_dataset = sys.argv[1] #This is the Feature Layer the User wants to Query against
Atts = sys.argv[2] #This is the Column Name The User Selected
#Lets Loop through the rows to get values from a particular column
fc = input_dataset
gp.AddMessage(Atts)
rows = gp.searchcursor(fc)
row = rows.next()
NewList = []
for row in gp.SearchCursor(fc):
##grab field values
fcValue = fields.getvalue(Atts)
NewList.add(fcValue)
You can store distinct values in a set:
>>> a = [ 1, 2, 3, 1, 5, 3, 2, 1, 5, 4 ]
>>> b = set( a )
>>> b
{1, 2, 3, 4, 5}
>>> b.add( 5 )
>>> b
{1, 2, 3, 4, 5}
>>> b.add( 6 )
>>> b
{1, 2, 3, 4, 5, 6}
Also you can make your loop more pythonic, although I'm not sure why you loop over the row to begin with (given that you are not using it):
for row in gp.searchcursor( fc ):
##grab field values
fcValue = fields.getvalue(Atts)
gp.AddMessage(fcValue)
And btw, """ text """ is not a comment. Python only has single line comments starting with #.
One way to get distinct values is to use a set to see if you've seen the value already, and display it only when it's a new value:
fcValues = set()
for row in gp.searchcursor(fc):
##grab field values
fcValue = fields.getvalue(Atts)
if fcValue not in fcValues:
gp.AddMessage(fcValue)
fcValues.add(fcValue)

Categories