How to append/extend multiple elements into an Array? - python

I'm having issues trying to input headers into my array, I cannot append the array with headers of the values within the array. I've also attempted to use .extend, but it's also giving me issues.
mylist = []
for i in jsonResponse['M']:
if i['x'] == 'n':
x = ((i['x']))
y = ((i['x']['y']))
z = ((i['x']['z']))
mylist.append([x,y,z])
import pandas as pd
#panda related stuff
df0 = pd.DataFrame(data=mylist)
df0.to_excel(writer, sheet_name='Sheet1', header=True, index=False)
I'm trying to have:
mylist.append('Header1': [x],['Header2': [y],['Header3': [z])
I've tried using the .extend as I would think it's the most appropriate, but it's giving me error.
The purpose of this is when I write the data to excel, it would be most ideal to have a header. The problem stems from opening the array from the very beginning above the JSON pull, I think i have to open the array, because it's a means to allocate memory to store the values.
I was thinking as a solution would be to create an array for value from the JSON pull meaning:
x = []
y = []
z = []
for i in jsonResponse['M']:
if i['x'] == 'n':
x = ((i['x']))
y = ((i['x']['y']))
z = ((i['x']['z']))
x.append('header1':[x])
y.append('header2':[y])
z.append('header3':[z])
But I don't know if that's going to work; I really don't want to go through that as it's a bit labor intensive, because I have a lot of json objects.
I'm new to Python and never wrote anything in a language before. I'm trying, I've spent most my time copying script from the documentation and other users attempting to parse together a workable code and at the same time trying to understand what i'm trying to do.

What you're looking for is called a dictionary (or dict) in Python.
A dict allows you to create mappings like this:
{'name': 'John', 'age': 30}
So, now all you have to do is you need to create a dict inside your for loop and then append that dict to mylist.
for i in jsonResponse['M']:
...
mylist.append({'Header1': x, 'Header2': y, 'Header3': z})

Related

Why is my list coming up blank when trying to import data from a CSV file?

Python is completely new to me and i'm still trying to figure out the basics... we were given a project to analyse and determine particular things in a csv file that was given to us.
There are many columns, but the first is most important as one of the variables in the function we need to create is for the first column. It's labelled 'adultids' where a combination of letters and numbers are given, and one 'adultid' takes up 15 rows of different information - there's many different 'adultids' within the file.
So start it off I am trying to make a list from that csv file that contains only the information from the 'adultsID' given (which, as a variable in the function, is a list of two 'adultids' from the csv file), basically trying to single out that information from the rest of the data in the csv file. When i run it, it comes up with '[]', and i cant figure out why... can someone tell me whats wrong?
I'm not sure if any of that makes sense, its very hard to describe, so i apologise in advance, but here is the code i tried:)
def file_to_read(csvfile, adultIDs):
with open(csvfile, 'r') as asymfile:
lst = asymfile.read().split("\n")
new_lst = []
if adultIDs == True:
for row in lst:
adultid, point, x, y, z = row.split(',')
if adultid == adultIDs:
new_lst.append([adultid, point, x, y, z])
return new_lst
Try this.
This is because if you give adultIDs to False Then you get the output [], because you assign the new_lst to []
def file_to_read(csvfile, adultIDs):
with open(csvfile, 'r') as asymfile:
lst = asymfile.read().split("\n")
new_lst = []
if adultIDs == True:
for row in lst:
adultid, point, x, y, z = row.split(',')
if adultid == adultIDs:
new_lst.append([adultid, point, x, y, z])
return new_lst
return lst
As far as I understand, you pass a list of ids like ['R10', 'R20', 'R30'] to the second argument of your function. Those ids are also contained in the csv-file, you are trying to parse. In this case you should, probably, rewrite your function, in a way, that checks if the adultid from a row of your csv-file is contained in the list adultIDs that you pass into your function. I'd rather do it like this:
def file_to_read(csvfile, adult_ids): # [1]
lst = []
with open(csvfile, 'r') as asymfile:
for row in asymfile: # [2]
r = row[:-1].split(',') # [3]
if r[0] in adult_ids: # [4]
lst.append(r)
return lst
Description for commented digits in brackets:
Python programmers usually prefer snake_case names for variables and arguments. You can learn more about it in PEP 8. Although it's not connected to your question, it may just be helpful for your future projects, when other programmers will review your code. Here I also would recommend
You don't need to read the whole file in a variable. You can iterate it row by row, thus saving memory. This may be helpful if you use huge files, or lack of memory.
You need to take the whole string except last character, which is \n.
in checks if the adult_id from the row of a csv-file is contained in the argument, that you pass. Thus I would recommend using set datatype for adult_ids rather than list. You can read about sets in documentation.
I hope I got your task right, and that helps you. Have a nice day!

Defining max/min stats from nested JSON data in python

I'd like to preface this with I'm new to python (and not traditionally a programmer) so all suggestions to clean up any of the code (even not related to the below problem) are entirely welcome. I've been stuck on this for a couple of days now so I figured I'd give it a shot here.
I have a script that calls a RESTful API via requests package, parses returned JSON data via JSON package, assigns data to variables and then writes to a csv file via csv package (which is later via vba script written to an excel file.) This all seems to work fine, but currently its writing individual data points to excel and while I'd like to keep doing that, I'd also like to calculate summary statistics for that data (min, max, average, standard deviation, etc) in python before writing to a separate CSV output file.
I would imagine the correct way to do this is to write those saved variables (initially from JSON) to a nested dictionary/list and then use max/min functions etc. on the correct list, but I'm having trouble constructing the nested dictionaries dynamically.
To be clear, each data point in the jData['ProductActivity'] node is a separate transaction. I'm trying to build a dictionary that logically looks like:
[prodsize1]:
'bid':
[bid1]
[bid2]
'ask':
[ask1]
[ask2]
'trade':
[trade1]
[trade2]
[prodsize2]:
'bid':
[bid1]
[bid2]
'ask':
[ask1]
[ask2]
'trade':
[trade1]
[trade2]
Where [prodSize] key and [bid][trade] & [ask] value lists are all being added dynamically off of the jData.
code:
state_dict = {"trade": "3", "bid": "2", "ask": "1"}
market_activity = {}
bid_list = []
ask_list = []
trade_list = []
def get_hist(side):
state = state_dict.get(side)
jData = json.loads(myResponse.content)
page_length = len(jData['ProductActivity'])
for i in range (0, page_length):
chainId = jData['ProductActivity'][i]['chainId']
skuUuid = jData['ProductActivity'][i]['skuUuid']
createdAt = jData['ProductActivity'][i]['createdAt']
prodSize = float(jData['ProductActivity'][i]['prodSize'])
amount = float(jData['ProductActivity'][i]['amount'])
localAmount = float(jData['ProductActivity'][i]['localAmount'])
localCurrency = jData['ProductActivity'][i]['localCurrency']
productId = jData['ProductActivity'][i]['productId']
customerId = jData['ProductActivity'][i]['customerId']
if "frequency" in jData['ProductActivity'][i]:
frequency = jData['ProductActivity'][i]['frequency']
else:
frequency = 1
csv_writer.writerow([chainId, skuUuid, createdAt, styleId, name, target_product, side, prodSize, amount, localAmount,
frequency, localCurrency, productId, customerId])
if side == 'bid':
market_activity[prodSize] = {'bid': bid_list.append(amount)}
elif side == 'ask':
market_activity[prodSize] = {'ask': ask_list.append(amount)}
elif side == 'trade':
market_activity[prodSize] = {'trade': trade_list.append(amount)}
myResponse.raise_for_status()
get_hist(side="trade")
get_hist(side="bid")
get_hist(side="ask")
available_sizes = []
for key in market_activity.keys():
available_sizes.append(key)
summary_stats={'max_bid': '','min_ask': '','avg_trade': ''}
def generate_summary_stats:
for size in available_shoe_sizes:
summary_stats[size].update(max(market_info[size]['bid']))
summary_stats[size].update(min(market_info[size]['ask']))
#add in rest of stats
generate_summary_stats()
data_to_file.close()
I think I may need to add new keys separately and then append the lists stored as values. I also fear that the way I have it written will write over 'state' (bid, ask, trade) values instead of add to each list.
It's difficult to understand what you're expecting to get without an example. Can you post a sample of what your JSON data looks like, and how you ultimately want the market_activity dictionary to look? As for overwriting, notice that market_activity[prodSize] = {'bid': bid_list.append(amount)} for example tries to assign a new value to market_activity[prodSize] on each run. Maybe you want something more like market_activity[prodSize]['bid'].append(amount) here. Though you'll have to set the initial {'bid': []} empty list before you can "append" anything to it.
Also, since you asked for some general Python coding suggestions:
market_activity.keys() already returns a list, so you should be able to just replace for size in available_shoe_sizes: with for size in market_activity.keys():, and forget about making the available_shoe_sizes list completely
Since "side" is the only input variable in your get_hist()
function, you can just call the function as get_hist("trade") for
example. In general, the order of the parameters you pass in just has to match
up with the order they're defined.
I'm not sure where market_info[] comes from. Is that supposed to be market_activity?
In general, definitions (functions) should be put near the top of your code, instead of being defined in between other commands you're running.
Likewise, generate_summary_stats should have the parameters passed through it, instead of having them be global. Then return the value you want, which is probably the dictionary summary_stats. So something more like this:
.
def generate_summary_stats(sizes):
summary_stats={'max_bid': '','min_ask': '','avg_trade': ''}
for size in sizes:
## Here's the loop where you'd insert your code
## for updating summary_stats properly.
return summary_stats
# Getting the stats, outside of your function.
stats = generate_summary_stats(market_activity)

JSON Python Lists

I have got this json file
[{"url": ["instrumentos-musicales-126030594.htm", "liquidacion-muebles-y-electrodomesticos-127660457.htm"], "title": ["INSTRUMENTOS MUSICALES", "LIQUIDACION, MUEBLES Y ELECTRODOMESTICOS"]}]
mydata = json.load(json_data)
And then I'd like to manipulate mydata to append url-title pair to a list paginas
What would be a good solution in Python? I want to be able to have something like
paginas[0].url
instrumentos-musicales-126030594.htm
paginas[0].title
"INSTRUMENTOS MUSICALES"
paginas[1].url
"liquidacion-muebles-y-electrodomesticos-127660457.htm"
paginas[1].title
"LIQUIDACION, MUEBLES Y ELECTRODOMESTICOS"
paginas = mydata[0]
paginas["url"].append(new_url)
paginas["title"].append(new_title)
You still need to use json.dump to save it, of course.
If you want to extract the values to a new list:
new_list =[mydata[0]["url"],mydata[0]["title"]]
will give you:
[['instrumentos-musicales-126030594.htm', 'liquidacion-muebles-y-electrodomesticos-127660457.htm'], ['INSTRUMENTOS MUSICALES', 'LIQUIDACION, MUEBLES Y ELECTRODOMESTICOS']]
To access elements you can using indexing:
`mydata[0]["url"][0]` will give you "instrumentos-musicales-126030594.htm"
mydata[0]["title"][0] will give you "INSTRUMENTOS MUSICALES" etc..
You don't need to do anything to your data structure.

How can I use Pandas column to parse textfrom the web?

I've used the map function on a dataframe column of postcodes to create a new Series of tuples which I can then manipulate into a new dataframe.
def scrape_data(series_data):
#A bit of code to create the URL goes here
r = requests.get(url)
root_content = r.content
root = lxml.html.fromstring(root_content)
address = root.cssselect(".lr_results ul")
for place in address:
address_property = place.cssselect("li a")[0].text
house_type = place.cssselect("li")[1].text
house_sell_price = place.cssselect("li")[2].text
house_sell_date = place.cssselect("li")[3].text
return address_property, house_type, house_sell_price, house_sell_date
df = postcode_subset['Postcode'].map(scrape_data)
While it works where there is only one property on a results page, it fails to create a tuple for multiple properties.
What I'd like to be able to do is iterate through a series of pages and then add that content to a dataframe. I know that Pandas can convert nested dicts into dataframes, but really struggling to make it work. I've tried to use the answers at How to make a nested dictionary and dynamically append data but I'm getting lost.
At the moment your function only returns for the first place in address (usually in python you would yield (rather than return) to retrieve all the results as a generator.
When subsequently doing an apply/map, you'll usually want the function to return a Series...
However, I think you just want to return the following DataFrame:
return pd.DataFrame([{'address_ property': place.cssselect("li a")[0].text,
'house_type': place.cssselect("li")[1].text,
'house_sell_price': place.cssselect("li")[2].text,
'house_sell_date': place.cssselect("li")[3].text}
for place in address],
index=address)
To make the code work, I eventually reworked Andy Hayden's solution to:
listed = []
for place in address:
results = [{'postcode':postcode_bit,'address_ property': place.cssselect("li a")[0].text,
'house_type': place.cssselect("li")[1].text,
'house_sell_price': place.cssselect("li")[2].text,
'house_sell_date': place.cssselect("li")[3].text}]
listed.extend(results)
return listed
At least I understand a bit more about how Python data structures work now.

using Python to import a CSV (lookup table) and add GPS coordinates to another output CSV

So I have already imported one XML-ish file with 3000 elements and parsed them into a CSV for output. But I also need to import a second CSV file with 'keyword','latitude','longitude' as columns and use it to add the GPS coordinates to additional columns on the first file.
Reading the python tutorial, it seems like {dictionary} is what I need, although I've read on here that tuples might be better. I don't know.
But either way - I start with:
floc = open('c:\python\kenya_location_lookup.csv','r')
l = csv.DictReader(floc)
for row in l: print row.keys()
The output look like:
{'LATITUDE': '-1.311467078', 'LONGITUDE': '36.77352011', 'KEYWORD': 'Kianda'}
{'LATITUDE': '-1.315288401', 'LONGITUDE': '36.77614331', 'KEYWORD': 'Soweto'}
{'LATITUDE': '-1.315446430425027', 'LONGITUDE': '36.78170621395111', 'KEYWORD': 'Gatwekera'}
{'LATITUDE': '-1.3136151425171327', 'LONGITUDE': '36.785863637924194', 'KEYWORD': 'Kisumu Ndogo'}
I'm a newbie (and not a programmer). Question is how do I use the keys to pluck out the corresponding row data and match it against words in the body of the element in the other set?
Reading the python tutorial, it seems
like {dictionary} is what I need,
although I've read on here that tuples
might be better. I don't know.
They're both fine choices for this task.
print row.keys() The output look
like:
{'LATITUDE': '-1.311467078',
No it doesn't! This is the output from print row, most definitely NOT print row.keys(). Please don't supply disinformation in your questions, it makes them really hard to answer effectively (being a newbie makes no difference: surely you can check that the output you provide actually comes from the code you also provide!).
I'm a newbie (and not a programmer).
Question is how do I use the keys to
pluck out the corresponding row data
and match it against words in the body
of the element in the other set?
Since you give us absolutely zero information on the structure of "the other set", you make it of course impossible to answer this question. Guessing wildly, if for example the entries in "the other set" are also dicts each with a key of KEYWORD, you want to build an auxiliary dict first, then merge (some of) its entries in the "other set":
l = csv.DictReader(floc)
dloc = dict((d['KEYWORD'], d) for d in l)
for d in otherset:
d.update(dloc.get(d['KEYWORD'], ()))
This will leave the location missing from the other set when not present in a corresponding keyword entry in the CSV -- if that's a problem you may want to use a "fake location" dictionary as the default for missing entries instead of that () in the last statement I've shown. But, this is all wild speculation anyway, due to the dearth of info in your Q.
If you dump the DictReader into a list (data = [row for row in csv.DictReader(file)]), and you have unique keywords for each row, convert that list of dictionaries into a dictionary of dictionaries, using that keyword as the key.
>>> data = [row for row in csv.DictReader(open('C:\\my.csv'),
... ('num','time','time2'))]
>>> len(data) # lots of old data :P
1410
>>> data[1].keys()
['time2', 'num', 'time']
>>> keyeddata = {}
>>> for row in data[2:]: # I have some junk rows
... keyeddata[row['num']] = row
...
>>> keyeddata['32']
{'num': '32', 'time2': '8', 'time': '13269'}
Once you have the keyword pulled out, you can iterate through your other list, grab the keyword from it, and use it as the index for the lat/long list. Pull out the lat/long from that index and add it to the other list.
Thanks -
Alex: My code for the other set is working, and the only relevant part is that I have a string that may or may not contain the 'keyword' that is in this dictionary.
Structurally, this is how I organized it:
def main():
f = open('c:\python\ggce.sms', 'r')
sensetree = etree.parse(f)
senses = sensetree.getiterator('SenseMakingItem')
bodies = sensetree.getiterator('Body')
stories = []
for body in bodies:
fix_body(body)
storybyte = unicode(body.text)
storybit = storybyte.encode('ascii','ignore')
stories.append(storybit)
rows = [ids,titles,locations,stories]
out = map(None, *rows)
print out[120:121]
write_data(out,'c:\python\output_test.csv')
(I omitted the code for getting its, titles, locations because they work and will not be used to get the real locations from the data within stories)
Hope this helps.

Categories