Am trying to create an array from json object, I can print the required values but couldn't push them into array in Python, how can I do that?
data={"wc":[{"value":8,"id":0},{"value":9,"id":1}]}
dataset = []
test=[]
for i in data['wc']:
print(i['value'],',',i['id'])
test=i['value'],i['id']
dataset.append(test)
print(dataset)
Am getting correct values as required but with '(' and ')'
How can I remove them and get final output as
[8,0,9,1]
Like [value,id,value,id....]
You already have a nested dictionary. Just iterate over the values of the nested dicts:
dataset = []
for entry in data['wc']:
for value in entry.values():
dataset.append(value)
>>> dataset
[0, 8, 1, 9]
with order value first id second:
dataset = []
for entry in data['wc']:
dataset.extend([entry['value'], entry['id']])
dataset
[0, 8, 1, 9]
Related
Apologies if I'm making a rookie error here, I've only been coding a couple weeks. I am attempting to build a function that takes a series of data and builds a set of bootstrap replicates.
To do this, I believe I want to:
build a list of values selected at random from the initial data series
aggregate the values into a list of key, value pairs
create an empty list to collect these aggregates
loop through 1,2 some number of times appending the values of each aggregate (from step 2) into the list (from 3)
I can get 1 and 2 down, I think... My first block of code is the following:
test_list = ['ABBV', 'ABBV', 'ACV1', 'ACV1', 'ACV1', 'ACV1', 'ACV1', 'AMKR', 'AMKR', 'AMKR']
data = test_list
def bootstrap_replicate_vc(data):
"""Generate single bootstrap replicate of 1D data."""
bs_sample = np.random.choice(data, len(data))
bs_rep = collections.defaultdict(list)
for value in bs_sample:
if value not in bs_rep:
bs_rep[value] = 1
else :
bs_rep[value] = bs_rep[value] + 1
return bs_rep
When I run this I get what I expect as a single bootstrapped replication of my categorical series, for example:
bs_rep1 = bootstrap_replicate_vc(data) #check
print(bs_rep1)
defaultdict(<class 'list'>, {'ACV1': 6, 'ABBV': 1, 'AMKR': 3})
My desire is to then create a function to iterate on the prior function to build additional bootstrapped replicates and append the values to the appropriate keys. So, an additional iteration (now size 2) should look something like this:
bs_reps = bootstrap_reps_categorical(data, size=2) #check
print(bs_rep1)
defaultdict(<class 'list'>, {'ACV1': [6, 5], 'ABBV': [1,3], 'AMKR': [3,2]})
What I wrote to get there was:
def generate_bs_reps_categorical(data, size=1):
# Initialize array of replicates: bs_replicates
bs_reps = collections.defaultdict(list)
# Generate replicates
for i in range(size):
bs_rep = bootstrap_replicate_vc(data)
for key, value in bs_rep:
bs_reps[key].append(value)
return bs_reps
However, when I run this I get a Value Error.
ValueError: too many values to unpack (expected 2)
I modeled this off of the python documentation I saw when researching this but I'm definitely not getting the right result.
Any help would be appreciated!
I am trying to create a new variable from a list ('provider') that checks if some ids are present in another column in the data frame:
import pandas as pd
xx = {'provider_id': [1, 2, 30, 8, 8, 7, 9]}
xx = pd.DataFrame(data=xx)
ids = [8,9,30]
names = ["netflix", "prime","sky"]
for id_,name in zip(ids,names):
provider = []
if id_ in xx["provider_id"]:
provider.append(name)
provider
excpected result:
['netflix', 'prime', 'sky']
actual result:
['sky']
So the for loop keeps overwriting the result of name inside the loop? This functionality seems weird to me and I honestly don't know how to prevent this other then to write three individual if statements.
Your loop keeps initialising the list. Move the list outside the loop:
provider = []
for id_,name in zip(ids,names):
if id_ in xx["provider_id"]:
provider.append(name)
print(provider)
Scrap the loops altogether and use the built-in pandas methods. It will work much faster.
df = pd.DataFrame({'ids': [8,9,30], 'names': ["netflix", "prime","sky"]})
cond = df.ids.isin(xx.provider_id)
df.loc[cond, 'names'].tolist()
['netflix', 'prime', 'sky']
One way to make this more efficient is using sets and isin to find the matching ids in the dataframe, and then a list comprehension with zip to keep the corresponding names.
The error as #quamrana points out is that you keep resetting the list inside the loop.
s = set(xx.loc[xx.isin(ids).values, 'provider_id'].values)
# {8, 9, 30}
[name for id_, name in zip(ids, names) if id_ in s]
# ['netflix', 'prime', 'sky']
I have a .xlsx file which looks as the attached file. What is the most common way to extract the different data parts from this excel file in Python?
Ideally there would be a method that is defined as :
pd.read_part_csv(columns=['data1', 'data2','data3'], rows=['val1', 'val2', 'val3'])
and returns an iterator over pandas dataframes which hold the values in the given table.
here is a solution with pylightxl that might be a good fit for your project if all you are doing is reading. I wrote the solution in terms of rows but you could just as well have done it in terms of columns. See docs for more info on pylightxl https://pylightxl.readthedocs.io/en/latest/quickstart.html
import pylightxl
db = pylightxl.readxl('Book1.xlsx')
# pull out all the rowIDs where data groups start
keyrows = [rowID for rowID, row in enumerate(db.ws('Sheet1').rows,1) if 'val1' in row]
# find the columnIDs where data groups start (like in your example, not all data groups start in col A)
keycols = []
for keyrow in keyrows:
# add +1 since python index start from 0
keycols.append(db.ws('Sheet1').row(keyrow).index('val1') + 1)
# define a dict to hold your data groups
datagroups = {}
# populate datatables
for tableIndex, keyrow in enumerate(keyrows,1):
i = 0
# data groups: keys are group IDs starting from 1, list: list of data rows (ie: val1, val2...)
datagroups.update({tableIndex: []})
while True:
# pull out the current group row of data, and remove leading cells with keycols
datarow = db.ws('Sheet1').row(keyrow + i)[keycols[tableIndex-1]:]
# check if the current row is still part of the datagroup
if datarow[0] == '':
# current row is empty and is no longer part of the data group
break
datagroups[tableIndex].append(datarow)
i += 1
print(datagroups[1])
print(datagroups[2])
[[1, 2, 3, ''], [4, 5, 6, ''], [7, 8, 9, '']]
[[9, 1, 4], [2, 4, 1], [3, 2, 1]]
Note that output of table 1 has extra '' on it, that is because the size of the sheet data is larger than your group size. You can easily remove these with list.remove('') if you like
I'am new in Python.
Is there a way to search a list of values (words and fraises) in another list (csv table), and get only the matched rows?
Example:
LiastOfValues=['smoking','hard smoker','alcoholic']
ListfromCSV =
ID,TYPE,STRING1,NUMBER
1, a,'this is hard smoker man',4
2, b,'this one likes to drink',5
3, c,'dont like sigarets',6
4, e,'this one is smoking',7
To search LiastOfValues in each row and reply only the matched rows.
The Output:
Output=
ID,TYPE,STRING1,NUMBER
1, a,'this is hard smoker man',4
4, e,'this one is smoking',7
I have tryed this:
import csv
ListfromCSV ="ListfromCSV.txt"
LiastOfValues=['smoking','hard smoker','alcoholic','smoker']
with open(ListfromCSV ,'r') as f:
LineReader=csv.reader(f,delimiter=',')
for i in LineReader:
if value in (i[2])) :
print (i)
Try this. It assumes your csv is not nested and contains strings. If it is nested, you can convert the lists to strings:
[row for row in csv if any(map(lambda x: x in row,LiastOfValues))]
This code should get a list with matched rows (does not include header unless you match it)
I have python code below that will loop through a table and print out values within a particular column. What is not shown is the form in which the user selects a Feature Layer. Once the Feature Layer is selected a second Dropdown is populated with all the Column Headings for that Feature and the user chooses which Column they want to focus on. Now within the python script, I simply print out each value within that column. But I want to store each value in a List or Array and get Distinct values. How can I do this in Python?
Also is there a more efficient way to loop through the table than to go row by row? That is very slow for some reason.
many thanks
# Import system modules
import sys, string, os, arcgisscripting
# Create the Geoprocessor object
gp = arcgisscripting.create(9.3)
gp.AddToolbox("E:/Program Files (x86)/ArcGIS/ArcToolbox/Toolboxes/Data Management Tools.tbx")
# Declare our user input args
input_dataset = sys.argv[1] #This is the Feature Layer the User wants to Query against
Atts = sys.argv[2] #This is the Column Name The User Selected
#Lets Loop through the rows to get values from a particular column
fc = input_dataset
gp.AddMessage(Atts)
rows = gp.searchcursor(fc)
row = rows.next()
NewList = []
for row in gp.SearchCursor(fc):
##grab field values
fcValue = fields.getvalue(Atts)
NewList.add(fcValue)
You can store distinct values in a set:
>>> a = [ 1, 2, 3, 1, 5, 3, 2, 1, 5, 4 ]
>>> b = set( a )
>>> b
{1, 2, 3, 4, 5}
>>> b.add( 5 )
>>> b
{1, 2, 3, 4, 5}
>>> b.add( 6 )
>>> b
{1, 2, 3, 4, 5, 6}
Also you can make your loop more pythonic, although I'm not sure why you loop over the row to begin with (given that you are not using it):
for row in gp.searchcursor( fc ):
##grab field values
fcValue = fields.getvalue(Atts)
gp.AddMessage(fcValue)
And btw, """ text """ is not a comment. Python only has single line comments starting with #.
One way to get distinct values is to use a set to see if you've seen the value already, and display it only when it's a new value:
fcValues = set()
for row in gp.searchcursor(fc):
##grab field values
fcValue = fields.getvalue(Atts)
if fcValue not in fcValues:
gp.AddMessage(fcValue)
fcValues.add(fcValue)