Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I have parsed the JSON with json.load.
Now I want to query that JSON dict using SQL-like commands. Does anything exist like this in Python? I tried using Pynq https://github.com/heynemann/pynq but that didn't work too well and I've also looked into Pandas but not sure if that's what I need.
Here is a simple pandas example with Python 2.7 to get you started...
import json
import pandas as pd
jsonData = '[ {"name": "Frank", "age": 39}, {"name": "Mike", "age":
18}, {"name": "Wendy", "age": 45} ]'
# using json.loads because I'm working with a string for example
d = json.loads(jsonData)
# convert to pandas dataframe
dframe = pd.DataFrame(d)
# Some example queries
# calculate mean age
mean_age = dframe['age'].mean()
# output - mean_age
# 34.0
# select under 40 participants
young = dframe.loc[dframe['age']<40]
# output - young
# age name
#0 39 Frank
#1 18 Mike
# select Wendy from data
wendy = dframe.loc[dframe['name']=='Wendy']
# output - wendy
# age name
# 2 45 Wendy
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Hello all just learning dictionary in python. I have few data please let me know how to create a nested dictionary. Data are available below with duplicate values in excel file. Please do explain using for loop
Name Account Dept
John AC Lab1
Dev AC Lab1
Dilip AC Lab1,Lab2
Sat AC Lab1,Lab2
Dina AC Lab3
Surez AC Lab4
I need the result in below format:
{
'AC': {
'Lab1': ['John', 'Dev', 'Dilip', 'Sat'],
'Lab2': ['Dilip','Sat'],
'Lab3': ['Dina'],
'Lab4': ['Surez']
}
}
Something like this should get you closer to an answer but I'd need your input file to optimize it:
import xlrd
from collections import defaultdict
wb = xlrd.open_workbook("<your filename>")
sheet_names = wb.sheet_names()
sheet = wb.sheet_by_name(sheet_names[0])
d = defaultdict(defaultdict(list))
for row_idx in range(0, sheet.nrows):
cell_obj_0 = sheet.cell(row_idx, 0)
cell_obj_1 = sheet.cell(row_idx, 1)
cell_obj_2 = sheet.cell(row_idx, 2)
for lab in cell_obj_2.split(","):
d[cell_obj_1][lab].append(cell_obj_0)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
AcledData = pd.read_csv("https://api.acleddata.com/acled/read?terms=accept&country=Afghanistan&date=20200315.csv", sep=',',quotechar='"', encoding ='utf-8')
print(AcledData)
Empty DataFrame
Columns: [{"status":200, success:true, last_update:117, count:500, data:[{"data_id":"6996791", iso:"4", event_id_cnty:"AFG44631", event_id_no_cnty:"44631", event_date:"2020-03-21", year:"2020", time_precision:"1", event_type:"Battles", sub_event_type:"Armed clash", actor1:"Taliban", assoc_actor_1:"", inter1:"2", actor2:"Military Forces of Afghanistan (2014-)", assoc_actor_2:"", inter2:"1", interaction:"12", region:"Caucasus and Central Asia", country:"Afghanistan", admin1:"Balkh", admin2:"Dawlat Abad", admin3:"", location:"Dawlat Abad", latitude:"36.9882", longitude:"66.8207", geo_precision:"2", source:"Xinhua; Khaama Press", source_scale:"National-International", notes:"On 21 March 2020, 12 Taliban militants including 2 commanders were killed and 5 including a commander were wounded when Afghan forces repulsed their attack in Dawlat Abad district, Balkh.", fatalities:"12", timestamp:"1584984341", iso3:"AFG"}, {"data_id":"6997066", iso:"4".1, event_id_cnty:"AFG44667", event_id_no_cnty:"44667", event_date:"2020-03-21".1, year:"2020".1, time_precision:"1".1, event_type:"Violence against civilians", sub_event_type:"Attack", actor1:"Unidentified Armed Group (Afghanistan)", assoc_actor_1:"".1, inter1:"3", actor2:"Civilians (Afghanistan)", assoc_actor_2:"Muslim Group (Afghanistan); Teachers (Afghanistan)", inter2:"7", interaction:"37", region:"Caucasus and Central Asia".1, country:"Afghanistan".1, admin1:"Kabul", admin2:"Kabul", admin3:"".1, location:"Kabul", latitude:"34.5167", longitude:"69.1833", geo_precision:"1", source:"Pajhwok Afghan News", source_scale:"National", notes:"On 21 March 2020.1, 1 religious scholar and teacher was killed by an unknown gunmen in Kabul city.", fatalities:"1", timestamp:"1584984341".1, iso3:"AFG"}.1, {"data_id":"6997171", iso:"4".2, event_id_cnty:"AFG44715", event_id_no_cnty:"44715", event_date:"2020-03-21".2, year:"2020".2, time_precision:"2", event_type:"Battles".1, sub_event_type:"Armed clash".1, actor1:"Taliban".1, assoc_actor_1:"".2, inter1:"2".1, actor2:"Military Forces of Afghanistan (2014-)".1, assoc_actor_2:"".1, inter2:"1".1, interaction:"12".1, region:"Caucasus and Central Asia".2, country:"Afghanistan".2, admin1:"Balkh".1, admin2:"Nahri Shahi", admin3:"".2, location:"Nahri Shahi", latitude:"36.8544", longitude:"67.1800", geo_precision:"2".1, source:"Voice of Jihad", source_scale:"Other", notes:"As reported on 21 March 2020, 3 Afghan security personnel were killed and 5 were wounded following an attack by Taliban militants on a check point in Nahri Shahi district, Balkh. Fatalities coded as 0 (VoJ reported 3 fatalities).", fatalities:"0", ...]
Index: []
The query returns a Json string, but the relevant data is nested into that json. You will have to:
read the returned string as a json object
use the relevant part of that object to feed a dataframe
For example using urllib.request, you could do:
data = json.load(urllib.request.urlopen('https://api.acleddata.com/acled/read?terms=accept&country=Afghanistan&date=20200315.csv'))['data']
df = pd.DataFrame(data)
If you want to convert that to a csv file, no need for pandas, but you should use the csv module:
data = json.load(urllib.request.urlopen('https://api.acleddata.com/acled/read?terms=accept&country=Afghanistan&date=20200315.csv'))['data']
with open('file.csv', 'w', newline=''):
wr = csv.DictWriter(fd, fieldnames=data[0].keys())
_ = wr.writeheader()
for d in data:
_ = wr.writerow(d)
This is not csv, this is JSON
import pandas as pd
api = "https://api.acleddata.com/acled/read?terms=accept&country=Afghanistan&date=20200315.csv"
AcledData = pd.read_json(api)
the data field is then again JSON but you can use a similar technique/dataframe methods to get what you want
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I was making getting stock data file and output was just
In Progress
[]
what's the matter?
import quandl
from datetime import datetime as dt
def get_stock_data(stock_ticker):
print("In Progress")
start_date = dt(2019, 1, 1)
end_date = dt.now()
quandl_api_key = "tJDGptkdfqwjYi123RVV"
quandl.ApiConfig.api_key = quandl_api_key
source = "WIKI/" + stock_ticker
data = quandl.get(source, start_date=str(start_date), end_date=str(end_date))
data = data[["Open", "High", "Low", "Volume", "Close"]].values
print(data)
return data
get_stock_data("AAPL")
There's nothing wrong with your code. However recent stock data is a Premium product from Quandl and I presume you are just on the free subscription, hence your dataframe comes back empty. If you change the dates to 2017, you will get some results but that's as far as it goes on the free subscription it seems.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Let's say I have a list composed of a client number, number of a store, first name, last name and address which in this format:
11, 2, Lisa, Anderson, NewYork
13, 4, John, Smith, Alabama
54, 2, Lucy, Nicholsson, NewYork
etc.
What is the best way for me to organize this data within python so I can easily access it so I can do stuff like input a client number and have an output with the location, and other derivatives of stuff like that.
You can use pandas. It provides database-like (or spreadsheet-like) tables which can be used to store and query data. Like this:
import pandas as pd
df = pd.DataFrame([
[11, 2, 'Lisa', 'Anderson', 'NewYork'],
[13, 4, 'John', 'Smith', 'Alabama'],
[54, 2, 'Lucy', 'Nicholsson', 'NewYork']
], columns = ['cl_number', 'store', 'first_name', 'last_name','address'])
df.index=df["cl_number"]
# rows will be indexed with cl_number
df.loc[11]
# returns a record of client with number 11
df.loc[11, 'address']
# returns address of a client with number 11
df[df['address'] == 'NewYork']
# returns a list of all clients with address 'NewYork'
However, you may also need full-featured database (see SQLite, for example).
If your data is reasonably consistent and there isn't so much that you want a fully-fledged database, you can get quite far with a namedtuple:
from collections import namedtuple
Client = namedtuple('Client', ('id', 'storeno', 'first_name', 'last_name',
'location'))
# Read the data
with open('db.csv') as fi:
rows = fi.readlines()
db = {}
for row in rows:
f= row.split(',')
db[int(f[0])] = Client(int(f[0]), int(f[1]), f[2].strip(),
f[3].strip(), f[4].strip())
def find(**kwargs):
"""Return a list of all Clients matching a given set of criteria."""
matches = []
for client in db.values():
for k,v in kwargs.items():
if getattr(client, k) != v:
break
else:
matches.append(client)
return matches
# Client with ID 11
print db[11]
# All clients in New York
print find(location='NewYork')
# All clients called Lisa in New York
print find(location='NewYork', first_name='Lisa')
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have an Excel file that contains four columns. I want to fetch this data and store it in MySQL. Later on I want fetch the data from here and store in Redis, then run a validation on it. I have already done the importing of data from Excel to Python.
You have to reshape your 4 column excel data to a 1 column data.
The redis client for Matlab/GNU Octave is doing this e.g.: https://github.com/markuman/go-redis/wiki/Data-Structure#arrays
Take care that in this example, Matlab/Octave are using Column-Major-Order.
Python is using Row-Major-Order: https://en.wikipedia.org/wiki/Row-major_order
So you have to save your 4 column X rows data as a row-major-order list in redis as a list (RPUSH).
example
given this excel sheet
using this python3 code
#!/usr/bin/python3
# -*- coding: utf-8 -*-
"""
Created on Tue Oct 20 23:02:53 2015
#author: markus
"""
import pandas as pd
import redis
# redis connection
r = redis.StrictRedis(host='localhost', port=6379, db=0)
# open the first worksheed
df = pd.read_excel('/home/markus/excel.xlsx',0)
# read in as a list
# [[1, 'two', 'python'], ['excel', 'redis', 'action']]
a = list(df.T.itertuples())
print("this is a, your excel list")
print(a)
for list in a:
for value in list:
r.rpush('myexceldata', str(value))
# read all back to python
b = r.lrange('myexceldata', '0', '-1')
print("A1 becomes 0, B1 becomes 3 ...")
print(b[3].decode('UTF-8'))
to save it serialized as a list in redis
127.0.0.1:6379> lrange myexceldata 0 -1
1) "1"
2) "two"
3) "python"
4) "excel"
5) "redis"
6) "action"
This is just one way to save a spreadsheet in redis. It always belong on your datastructure and what you're going to do with it.