Adding Values to an Array and getting distinct values using Python - python

I have python code below that will loop through a table and print out values within a particular column. What is not shown is the form in which the user selects a Feature Layer. Once the Feature Layer is selected a second Dropdown is populated with all the Column Headings for that Feature and the user chooses which Column they want to focus on. Now within the python script, I simply print out each value within that column. But I want to store each value in a List or Array and get Distinct values. How can I do this in Python?
Also is there a more efficient way to loop through the table than to go row by row? That is very slow for some reason.
many thanks
# Import system modules
import sys, string, os, arcgisscripting
# Create the Geoprocessor object
gp = arcgisscripting.create(9.3)
gp.AddToolbox("E:/Program Files (x86)/ArcGIS/ArcToolbox/Toolboxes/Data Management Tools.tbx")
# Declare our user input args
input_dataset = sys.argv[1] #This is the Feature Layer the User wants to Query against
Atts = sys.argv[2] #This is the Column Name The User Selected
#Lets Loop through the rows to get values from a particular column
fc = input_dataset
gp.AddMessage(Atts)
rows = gp.searchcursor(fc)
row = rows.next()
NewList = []
for row in gp.SearchCursor(fc):
##grab field values
fcValue = fields.getvalue(Atts)
NewList.add(fcValue)

You can store distinct values in a set:
>>> a = [ 1, 2, 3, 1, 5, 3, 2, 1, 5, 4 ]
>>> b = set( a )
>>> b
{1, 2, 3, 4, 5}
>>> b.add( 5 )
>>> b
{1, 2, 3, 4, 5}
>>> b.add( 6 )
>>> b
{1, 2, 3, 4, 5, 6}
Also you can make your loop more pythonic, although I'm not sure why you loop over the row to begin with (given that you are not using it):
for row in gp.searchcursor( fc ):
##grab field values
fcValue = fields.getvalue(Atts)
gp.AddMessage(fcValue)
And btw, """ text """ is not a comment. Python only has single line comments starting with #.

One way to get distinct values is to use a set to see if you've seen the value already, and display it only when it's a new value:
fcValues = set()
for row in gp.searchcursor(fc):
##grab field values
fcValue = fields.getvalue(Atts)
if fcValue not in fcValues:
gp.AddMessage(fcValue)
fcValues.add(fcValue)

Related

Python: append list in for-loop unexpected result

I am trying to create a new variable from a list ('provider') that checks if some ids are present in another column in the data frame:
import pandas as pd
xx = {'provider_id': [1, 2, 30, 8, 8, 7, 9]}
xx = pd.DataFrame(data=xx)
ids = [8,9,30]
names = ["netflix", "prime","sky"]
for id_,name in zip(ids,names):
provider = []
if id_ in xx["provider_id"]:
provider.append(name)
provider
excpected result:
['netflix', 'prime', 'sky']
actual result:
['sky']
So the for loop keeps overwriting the result of name inside the loop? This functionality seems weird to me and I honestly don't know how to prevent this other then to write three individual if statements.
Your loop keeps initialising the list. Move the list outside the loop:
provider = []
for id_,name in zip(ids,names):
if id_ in xx["provider_id"]:
provider.append(name)
print(provider)
Scrap the loops altogether and use the built-in pandas methods. It will work much faster.
df = pd.DataFrame({'ids': [8,9,30], 'names': ["netflix", "prime","sky"]})
cond = df.ids.isin(xx.provider_id)
df.loc[cond, 'names'].tolist()
['netflix', 'prime', 'sky']
One way to make this more efficient is using sets and isin to find the matching ids in the dataframe, and then a list comprehension with zip to keep the corresponding names.
The error as #quamrana points out is that you keep resetting the list inside the loop.
s = set(xx.loc[xx.isin(ids).values, 'provider_id'].values)
# {8, 9, 30}
[name for id_, name in zip(ids, names) if id_ in s]
# ['netflix', 'prime', 'sky']

Python extract data from a semi-structured .xlsx file

I have a .xlsx file which looks as the attached file. What is the most common way to extract the different data parts from this excel file in Python?
Ideally there would be a method that is defined as :
pd.read_part_csv(columns=['data1', 'data2','data3'], rows=['val1', 'val2', 'val3'])
and returns an iterator over pandas dataframes which hold the values in the given table.
here is a solution with pylightxl that might be a good fit for your project if all you are doing is reading. I wrote the solution in terms of rows but you could just as well have done it in terms of columns. See docs for more info on pylightxl https://pylightxl.readthedocs.io/en/latest/quickstart.html
import pylightxl
db = pylightxl.readxl('Book1.xlsx')
# pull out all the rowIDs where data groups start
keyrows = [rowID for rowID, row in enumerate(db.ws('Sheet1').rows,1) if 'val1' in row]
# find the columnIDs where data groups start (like in your example, not all data groups start in col A)
keycols = []
for keyrow in keyrows:
# add +1 since python index start from 0
keycols.append(db.ws('Sheet1').row(keyrow).index('val1') + 1)
# define a dict to hold your data groups
datagroups = {}
# populate datatables
for tableIndex, keyrow in enumerate(keyrows,1):
i = 0
# data groups: keys are group IDs starting from 1, list: list of data rows (ie: val1, val2...)
datagroups.update({tableIndex: []})
while True:
# pull out the current group row of data, and remove leading cells with keycols
datarow = db.ws('Sheet1').row(keyrow + i)[keycols[tableIndex-1]:]
# check if the current row is still part of the datagroup
if datarow[0] == '':
# current row is empty and is no longer part of the data group
break
datagroups[tableIndex].append(datarow)
i += 1
print(datagroups[1])
print(datagroups[2])
[[1, 2, 3, ''], [4, 5, 6, ''], [7, 8, 9, '']]
[[9, 1, 4], [2, 4, 1], [3, 2, 1]]
Note that output of table 1 has extra '' on it, that is because the size of the sheet data is larger than your group size. You can easily remove these with list.remove('') if you like

for loop in mysql id dictionary with values

My goal is to come up with the average of points
Im using a for loop to do a mysql query in python. This query returns the following id's along with some values:
{'speed_range_id': 0, 'count(speed_range_id)': 511}
{'speed_range_id': 1, 'count(speed_range_id)': 1827}
{'speed_range_id': 2, 'count(speed_range_id)': 48}
{'speed_range_id': 4, 'count(speed_range_id)': 100}
{'speed_range_id': 8, 'count(speed_range_id)': 60}
What i want to do is to create a dictionary that maps the id to a value, say speed_range_id:1 = 15km/hr, speed_range_id:2 = 25 km, speed_range_id:4 = 50 km/hr and so on.
I would like to then multiply the count(speed_range_id)': 1827 times the value i gave the id, in this case 1827*15 and so on for every other id. I'd then have to add up the result of every ID and divide it by the total sum of the counts 1827+48+100+60=2035 in order to come up with the average km/hr
I am stuck trying to create the dictionary of values for the speed_range_id's and store them in a variable. I think it's necesasary i do some if statements?
Any help or guidance is appreciated.
My for loop currently looks like this:
for rowdict in result:
cursor2.execute(speed_query, rowdict)
speed_result = cursor2.fetchall()
for rowdict2 in speed_result:
print(rowdict2)
Declare a dict id_vals to store the associations of id and values, and also speed_result as you have in your loop already. You can use sum with a generator expression to evaluate the sum of something for each element in speed_result.
id_vals = {1:15, 2:25, 4:50...}
result_sum = sum(id_vals[row['speed_range_id']]*row['count(speed_range_id)'] for row in speed_result)
count_sum = sum(row['count(speed_range_id)'] for row in speed_result)
avg = result_sum/count_sum

dataframe generating own column names

For a project, I want to create a script that allows the user to enter values (like a value in centimetres) multiple times. I had a While-loop in mind for this.
The values need to be stored in a dataframe, which will be used to generate a graph of the values.
Also, there is no maximum nr of entries that the user can enter, so the names of the variables that hold the values have to be generated with each entry (such as M1, M2, M3…Mn). However, the dataframe will only consist of one row (only for the specific case that the user is entering values for).
So, my question boils down to this:
How do I create a dataframe (with pandas) where the script generates its own column name for a measurement, like M1, M2, M3, …Mn, so that all the values are stored.
I can't acces my code right now, but I have created a While-loop that allows the user to enter values, but I'm stuck on the dataframe and columns part.
Any help would be greatly appreciated!
I agree with #mischi, without additional context, pandas seems overkill, but here is an alternate method to create what you describe...
This code proposes a method to collect the values using a while loop and input() (your while loop is probably similar).
colnames = []
inputs = []
counter = 0
while True:
value = input('Add a value: ')
if value == 'q': # provides a way to leave the loop
break
else:
key = 'M' + str(counter)
counter += 1
colnames.append(key)
inputs.append(value)
from pandas import DataFrame
df = DataFrame(inputs, colnames) # this creates a DataFrame with
# a single column and an index
# using the colnames
df = df.T # This transposes the DataFrame to
# so the indexes become the colnames
df.index = ['values'] # Sets the name of your row
print(df)
The output of this script looks like this...
Add a value: 1
Add a value: 2
Add a value: 3
Add a value: 4
Add a value: q
M0 M1 M2 M3
values 1 2 3 4
pandas seems a bit of an overkill, but to answer your question.
Assuming you collect numerical values from your users and store them in a list:
import numpy as np
import pandas as pd
values = np.random.random_integers(0, 10, 10)
print(values)
array([1, 5, 0, 1, 1, 1, 4, 1, 9, 6])
columns = {}
column_base_name = 'Column'
for i, value in enumerate(values):
columns['{:s}{:d}'.format(column_base_name, i)] = value
print(columns)
{'Column0': 1,
'Column1': 5,
'Column2': 0,
'Column3': 1,
'Column4': 1,
'Column5': 1,
'Column6': 4,
'Column7': 1,
'Column8': 9,
'Column9': 6}
df = pd.DataFrame(data=columns, index=[0])
print(df)
Column0 Column1 Column2 Column3 Column4 Column5 Column6 Column7 \
0 1 5 0 1 1 1 4 1
Column8 Column9
0 9 6

Convert json to array in python (value,id,value,id....)

Am trying to create an array from json object, I can print the required values but couldn't push them into array in Python, how can I do that?
data={"wc":[{"value":8,"id":0},{"value":9,"id":1}]}
dataset = []
test=[]
for i in data['wc']:
print(i['value'],',',i['id'])
test=i['value'],i['id']
dataset.append(test)
print(dataset)
Am getting correct values as required but with '(' and ')'
How can I remove them and get final output as
[8,0,9,1]
Like [value,id,value,id....]
You already have a nested dictionary. Just iterate over the values of the nested dicts:
dataset = []
for entry in data['wc']:
for value in entry.values():
dataset.append(value)
>>> dataset
[0, 8, 1, 9]
with order value first id second:
dataset = []
for entry in data['wc']:
dataset.extend([entry['value'], entry['id']])
dataset
[0, 8, 1, 9]

Categories