I have the following csv file(total 20000 lines)
ozone,paricullate_matter,carbon_monoxide,sulfure_dioxide,nitrogen_dioxide,longitude,latitude,timestamp,avg_measured_time,avg_speed,median_measured_time,timestamp:1,vehicle_count,lat1,long1,lat2,long2,distance_between_2_points,duration_of_measurement,ndt_in_kmh
99,99,98,116,118,10.09351660921,56.1671665604395,1407575099.99998,0,0,0,1407575099.99998,0,56.1089513576227,10.1823955595246,56.1048021343541,10.1988040846558,1124,65,62
99,99,98,116,118,10.09351660921,56.1671665604395,1407575099.99998,0,0,0,1407575099.99998,0,56.10986429895,10.1627288048935,56.1089513576227,10.1823955595246,1254,71,64
99,99,98,116,118,10.09351660921,56.1671665604395,1407575099.99998,0,0,0,1407575099.99998,0,56.1425188527673,10.1868802625656,56.1417522836526,10.1927236478157,521,62,30
99,99,98,116,118,10.09351660921,56.1671665604395,1407575099.99998,18,84,18,1407575099.99998,1,56.1395320665735,10.1772034087371,56.1384485157567,10.1791506011887,422,50,30
I want to convert this into a dictionary like
{'ozone': [99,99,99,99], 'paricullate_matter': [99,99,99,99],'carbon_monoxide': [98,98,98,98],etc....}
What i have tried
import csv
reader = csv.DictReader(open('resulttable.csv'))
output = open("finalprojdata.py","w")
result = {}
for row in reader:
for column, value in row.iteritems():
result.setdefault(column, []).append(float(value))
output.write(str(result))
The output am getting is consisting of only few dictionaries. Like from
{'vehicle_count': [0,0,0,1], 'lat1': etc}
The whole csv file is not getting converted to dictionary.
If you have pandas this is super easy:
import pandas as pd
data = pd.read_csv("data.csv")
data_dict = {col: list(data[col]) for col in data.columns}
this should do what you want:
import csv
def int_or_float(strg):
val = float(strg)
return int(val) if val.is_integer() else val
with open('test.csv') as in_file:
it = zip(*csv.reader(in_file))
dct = {el[0]: [int_or_float(val) for val in el[1:]] for el in it}
zip(*it) will just transpose the data you have and rearrange it in the way you want; the dictionary comprehension then builds your new dictionary.
dct now contains the dictionary you want.
Awk version
awk -F',' '
NR==1 {s=0;for( i=1;i<=NR;i++) D=sprintf("%s \"%s\" : [", (s++?",":""), $i);next}
{for( i=1;i<=NR;i++) D[i] = D[i] sprintf( "%s %s", (NR>2?",":""), $(i))}
END {
printf( "{ ")
s=0;for( d in D) { printf( "%s]", (s++?",":""), D[d] )
printf( "}"
}
' YourFile > final.py
quick and dirty,not memory optimized (2000 lines is not so huge form modern memory space)
from collections import defaultdict
import csv
columns = defaultdict(list)
with open('test.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
for (k,v) in row.items():
columns[k].append(v)
print columns
#Output
defaultdict(<type 'list'>, {'vehicle_count': ['0', '0', '0', '1'], 'lat1': ['56.1089513576227', '56.10986429895', '56.1425188527673', '56.1395320665735'], 'lat2': ['56.1048021343541', '56.1089513576227', '56.1417522836526', '56.1384485157567'], 'paricullate_matter': ['99', '99', '99', '99'], 'timestamp': ['1407575099.99998', '1407575099.99998', '1407575099.99998', '1407575099.99998'], 'long1': ['10.1823955595246', '10.1627288048935', '10.1868802625656', '10.1772034087371'], 'longitude': ['10.09351660921', '10.09351660921', '10.09351660921', '10.09351660921'], 'nitrogen_dioxide': ['118', '118', '118', '118'], 'ozone': ['99', '99', '99', '99'], 'latitude': ['56.1671665604395', '56.1671665604395', '56.1671665604395', '56.1671665604395'], 'timestamp:1': ['1407575099.99998', '1407575099.99998', '1407575099.99998', '1407575099.99998'], 'distance_between_2_points': ['1124', '1254', '521', '422'], 'long2': ['10.1988040846558', '10.1823955595246', '10.1927236478157', '10.1791506011887'], 'avg_measured_time': ['0', '0', '0', '18'], 'carbon_monoxide': ['98', '98', '98', '98'], 'ndt_in_kmh': ['62', '64', '30', '30'], 'avg_speed': ['0', '0', '0', '84'], 'sulfure_dioxide': ['116', '116', '116', '116'], 'duration_of_measurement': ['65', '71', '62', '50'], 'median_measured_time': ['0', '0', '0', '18']})
pyexcel version:
import pyexcel as p
p.get_dict(file_name='test.csv')
$ cat tst.awk
BEGIN { FS=OFS=","; ORS="}\n" }
NR==1 {split($0,hdr); next }
{
for (i=1; i<=NF; i++) {
vals[i] = (i in vals ? vals[i] "," : "") $i
}
}
END {
printf "{"
for (i=1; i<=NF; i++) {
printf "\047%s\047: [%s]%s", hdr[i], vals[i], (i<NF?OFS:ORS)
}
}
$ awk -f tst.awk file
{'ozone': [99,99,99,99],'paricullate_matter': [99,99,99,99],'carbon_monoxide': [98,98,98,98],'sulfure_dioxide': [116,116,116,116],'nitrogen_dioxide': [118,118,118,118],'longitude': [10.09351660921,10.09351660921,10.09351660921,10.09351660921],'latitude': [56.1671665604395,56.1671665604395,56.1671665604395,56.1671665604395],'timestamp': [1407575099.99998,1407575099.99998,1407575099.99998,1407575099.99998],'avg_measured_time': [0,0,0,18],'avg_speed': [0,0,0,84],'median_measured_time': [0,0,0,18],'timestamp:1': [1407575099.99998,1407575099.99998,1407575099.99998,1407575099.99998],'vehicle_count': [0,0,0,1],'lat1': [56.1089513576227,56.10986429895,56.1425188527673,56.1395320665735],'long1': [10.1823955595246,10.1627288048935,10.1868802625656,10.1772034087371],'lat2': [56.1048021343541,56.1089513576227,56.1417522836526,56.1384485157567],'long2': [10.1988040846558,10.1823955595246,10.1927236478157,10.1791506011887],'distance_between_2_points': [1124,1254,521,422],'duration_of_measurement': [65,71,62,50],'ndt_in_kmh': [62,64,30,30]}
Related
Input data:
data = [
['QR', ''],
['Cust', ''],
['fea', 'restroom'],
['chain', 'pa'],
['store', 'cd'],
['App', ''],
['End', 'EndnR'],
['Request', '0'],
['Sound', '15'],
['Target', '60'],
['Is', 'TRUE']
]
I want to turn this into a dictionary, and each blank value indicates the start of a new, nested sub-dictionary.
Desired output:
{
'QR': {
'Cust': {
'fea': 'restroom ',
'chain': 'pa',
'store': 'cd'
},
'App': {
'End': 'EndnR',
'Request': '0',
'Sound': '15',
'Target': '60',
'Is': 'true'
},
}
}
Here is my code so far:
from collections import defaultdict
res = defaultdict(dict)
for i in data:
res[i[0]] = i[1]
print(res)
But it only creates a flat dictionary with some blank values, not a nested dictionary.
try this:
result = {}
nbr_keys = 0
keys = [ item[0] for item in data if item[1] == "" ]
for index, item in enumerate(data):
if index == 0:
if item[1] == "":
key = item[0]
di[item[0]] = {}
else:
if item[1] == "":
di[key].update({item[0]: {}})
nbr_keys +=1
else:
di[key][keys[nbr_keys]].update({item[0]: item[1]})
which outputs this:
{'QR': {'Cust': {'fea': 'restroom', 'chain': 'pa', 'store': 'cd'},
'App': {'End': 'EndnR',
'Request': '0',
'Sound': '15',
'Target': '60',
'Is': 'TRUE'}}}
I have been trying to seed a django DB with some covid data from an api and get a KeyError for a particular data type - in the source it is a floating_timstamp ("lab_report_date" : "2014-10-13T00:00:00.000"). (edit: not sure if the type is relevant, but trying to be comprehensive here).
I tried doing a more simple API request in python but get the same keyError. Below is my code and the error message.
import requests
response = requests.get("https://data.cityofchicago.org/resource/naz8-j4nc.json")
print(response.json())
The output looks like this:
[
{
"cases_age_0_17": "1",
"cases_age_18_29": "1",
"cases_age_30_39": "0",
"cases_age_40_49": "1",
"cases_age_50_59": "0",
"cases_age_60_69": "0",
"cases_age_70_79": "1",
"cases_age_80_": "0",
"cases_age_unknown": "0",
"cases_asian_non_latinx": "1",
"cases_black_non_latinx": "0",
"cases_female": "1",
"cases_latinx": "1",
"cases_male": "3",
"cases_other_non_latinx": "0",
"cases_total": "4",
"cases_unknown_gender": "0",
"cases_unknown_race_eth": "1",
"cases_white_non_latinx": "1",
"deaths_0_17_yrs": "0",
"deaths_18_29_yrs": "0",
"deaths_30_39_yrs": "0",
"deaths_40_49_yrs": "0",
show more (open the raw output data in a text editor) ...
"hospitalizations_unknown_gender": "3",
"hospitalizations_unknown_race_ethnicity": "16",
"hospitalizations_white_non_latinx": "135"
}
]
So far so good, but if I try to extract the problem key, i get the KeyError:
report_date = []
for i in response.json():
ls = i['lab_report_date']
report_date.append(ls)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/var/folders/h3/5wlbmz0s3jb978hyhtvf9f4h0000gn/T/ipykernel_2163/2095152945.py in <module>
1 report_date = []
2 for i in response.json():
----> 3 ls = i['lab_report_date']
4 report_date.append(ls)
KeyError: 'lab_report_date'
This issue occurs with or without using a for loop. I've gotten myself real turned around, so apologies if there are any errors or omissions in my code.
Because there's an item in the array response.json() that does not contain a key lab_report_date. That happens when the backend data is not so clean.
So what you need to do is to use try-except code block to handle this exception. The following code runs well now.
import requests
response = requests.get("https://data.cityofchicago.org/resource/naz8-j4nc.json")
print("The total length of response is %s" % len(response.json()))
report_date = []
for i in response.json():
try:
ls = i['lab_report_date']
report_date.append(ls)
except:
print("There is an item in the response containing no key lab_report_date:")
print(i)
print("The length of report_date is %s" % len(report_date))
The output of the above code is as follows.
The total length of response is 592
There is an item in the response containing no key lab_report_date:
{'cases_total': '504', 'deaths_total': '1', 'hospitalizations_total': '654', 'cases_age_0_17': '28', 'cases_age_18_29': '116', 'cases_age_30_39': '105', 'cases_age_40_49': '83', 'cases_age_50_59': '72', 'cases_age_60_69': '61', 'cases_age_70_79': '25', 'cases_age_80_': '14', 'cases_age_unknown': '0', 'cases_female': '264', 'cases_male': '233', 'cases_unknown_gender': '7', 'cases_latinx': '122', 'cases_asian_non_latinx': '15', 'cases_black_non_latinx': '116', 'cases_white_non_latinx': '122', 'cases_other_non_latinx': '30', 'cases_unknown_race_eth': '99', 'deaths_0_17_yrs': '0', 'deaths_18_29_yrs': '0', 'deaths_30_39_yrs': '0', 'deaths_40_49_yrs': '1', 'deaths_50_59_yrs': '0', 'deaths_60_69_yrs': '0', 'deaths_70_79_yrs': '0', 'deaths_80_yrs': '0', 'deaths_unknown_age': '0', 'deaths_female': '0', 'deaths_male': '1', 'deaths_unknown_gender': '0', 'deaths_latinx': '0', 'deaths_asian_non_latinx': '0', 'deaths_black_non_latinx': '0', 'deaths_white_non_latinx': '1', 'deaths_other_non_latinx': '0', 'deaths_unknown_race_eth': '0', 'hospitalizations_age_0_17': '30', 'hospitalizations_age_18_29': '78', 'hospitalizations_age_30_39': '74', 'hospitalizations_age_40_49': '96', 'hospitalizations_age_50_59': '105', 'hospitalizations_age_60_69': '111', 'hospitalizations_age_70_79': '89', 'hospitalizations_age_80_': '71', 'hospitalizations_age_unknown': '0', 'hospitalizations_female': '310', 'hospitalizations_male': '341', 'hospitalizations_unknown_gender': '3', 'hospitalizations_latinx': '216', 'hospitalizations_asian_non_latinx': '48', 'hospitalizations_black_non_latinx': '208', 'hospitalizations_white_non_latinx': '135', 'hospitalizations_other_race_non_latinx': '31', 'hospitalizations_unknown_race_ethnicity': '16'}
The length of report_date is 591
You can use the dict get method to read the data from json response like below :-
report_date = []
for i in response.json():
if type(i) == dict: # Just check the type to avoid the runtime error.
ls = i.get('lab_report_date', None)
if ls:
report_date.append(ls)
hi i have a similar issue which is sometimes the response comes empty
from the api request which cause to me a stop in the Code Execution :
i found an easy solution for it now :
let's say you have a :
requestfromapi = requests.get("https://api-server")
if requestfromapi.json()['data']['something'] != KeyError:
print(requestfromapi.json()['data']['something'])
// this will make sure that your code will not stop from executing .
I have been trying to seed a django DB with some covid data from an api and get a KeyError for a particular data type - in the source it is a floating_timstamp ("lab_report_date" : "2014-10-13T00:00:00.000"). (edit: not sure if the type is relevant, but trying to be comprehensive here).
I tried doing a more simple API request in python but get the same keyError. Below is my code and the error message.
import requests
response = requests.get("https://data.cityofchicago.org/resource/naz8-j4nc.json")
print(response.json())
The output looks like this:
[
{
"cases_age_0_17": "1",
"cases_age_18_29": "1",
"cases_age_30_39": "0",
"cases_age_40_49": "1",
"cases_age_50_59": "0",
"cases_age_60_69": "0",
"cases_age_70_79": "1",
"cases_age_80_": "0",
"cases_age_unknown": "0",
"cases_asian_non_latinx": "1",
"cases_black_non_latinx": "0",
"cases_female": "1",
"cases_latinx": "1",
"cases_male": "3",
"cases_other_non_latinx": "0",
"cases_total": "4",
"cases_unknown_gender": "0",
"cases_unknown_race_eth": "1",
"cases_white_non_latinx": "1",
"deaths_0_17_yrs": "0",
"deaths_18_29_yrs": "0",
"deaths_30_39_yrs": "0",
"deaths_40_49_yrs": "0",
show more (open the raw output data in a text editor) ...
"hospitalizations_unknown_gender": "3",
"hospitalizations_unknown_race_ethnicity": "16",
"hospitalizations_white_non_latinx": "135"
}
]
So far so good, but if I try to extract the problem key, i get the KeyError:
report_date = []
for i in response.json():
ls = i['lab_report_date']
report_date.append(ls)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/var/folders/h3/5wlbmz0s3jb978hyhtvf9f4h0000gn/T/ipykernel_2163/2095152945.py in <module>
1 report_date = []
2 for i in response.json():
----> 3 ls = i['lab_report_date']
4 report_date.append(ls)
KeyError: 'lab_report_date'
This issue occurs with or without using a for loop. I've gotten myself real turned around, so apologies if there are any errors or omissions in my code.
Because there's an item in the array response.json() that does not contain a key lab_report_date. That happens when the backend data is not so clean.
So what you need to do is to use try-except code block to handle this exception. The following code runs well now.
import requests
response = requests.get("https://data.cityofchicago.org/resource/naz8-j4nc.json")
print("The total length of response is %s" % len(response.json()))
report_date = []
for i in response.json():
try:
ls = i['lab_report_date']
report_date.append(ls)
except:
print("There is an item in the response containing no key lab_report_date:")
print(i)
print("The length of report_date is %s" % len(report_date))
The output of the above code is as follows.
The total length of response is 592
There is an item in the response containing no key lab_report_date:
{'cases_total': '504', 'deaths_total': '1', 'hospitalizations_total': '654', 'cases_age_0_17': '28', 'cases_age_18_29': '116', 'cases_age_30_39': '105', 'cases_age_40_49': '83', 'cases_age_50_59': '72', 'cases_age_60_69': '61', 'cases_age_70_79': '25', 'cases_age_80_': '14', 'cases_age_unknown': '0', 'cases_female': '264', 'cases_male': '233', 'cases_unknown_gender': '7', 'cases_latinx': '122', 'cases_asian_non_latinx': '15', 'cases_black_non_latinx': '116', 'cases_white_non_latinx': '122', 'cases_other_non_latinx': '30', 'cases_unknown_race_eth': '99', 'deaths_0_17_yrs': '0', 'deaths_18_29_yrs': '0', 'deaths_30_39_yrs': '0', 'deaths_40_49_yrs': '1', 'deaths_50_59_yrs': '0', 'deaths_60_69_yrs': '0', 'deaths_70_79_yrs': '0', 'deaths_80_yrs': '0', 'deaths_unknown_age': '0', 'deaths_female': '0', 'deaths_male': '1', 'deaths_unknown_gender': '0', 'deaths_latinx': '0', 'deaths_asian_non_latinx': '0', 'deaths_black_non_latinx': '0', 'deaths_white_non_latinx': '1', 'deaths_other_non_latinx': '0', 'deaths_unknown_race_eth': '0', 'hospitalizations_age_0_17': '30', 'hospitalizations_age_18_29': '78', 'hospitalizations_age_30_39': '74', 'hospitalizations_age_40_49': '96', 'hospitalizations_age_50_59': '105', 'hospitalizations_age_60_69': '111', 'hospitalizations_age_70_79': '89', 'hospitalizations_age_80_': '71', 'hospitalizations_age_unknown': '0', 'hospitalizations_female': '310', 'hospitalizations_male': '341', 'hospitalizations_unknown_gender': '3', 'hospitalizations_latinx': '216', 'hospitalizations_asian_non_latinx': '48', 'hospitalizations_black_non_latinx': '208', 'hospitalizations_white_non_latinx': '135', 'hospitalizations_other_race_non_latinx': '31', 'hospitalizations_unknown_race_ethnicity': '16'}
The length of report_date is 591
You can use the dict get method to read the data from json response like below :-
report_date = []
for i in response.json():
if type(i) == dict: # Just check the type to avoid the runtime error.
ls = i.get('lab_report_date', None)
if ls:
report_date.append(ls)
hi i have a similar issue which is sometimes the response comes empty
from the api request which cause to me a stop in the Code Execution :
i found an easy solution for it now :
let's say you have a :
requestfromapi = requests.get("https://api-server")
if requestfromapi.json()['data']['something'] != KeyError:
print(requestfromapi.json()['data']['something'])
// this will make sure that your code will not stop from executing .
I need to read CSV file and fill dict by data from file. So I wrote one method
def read_data(self):
with open('storage/data/heart.csv') as f:
self.raw_data = {
len(self.raw_data): {
'age':line[0],
'sex':line[1],
'cp':line[2],
'trtbps':line[3],
'chol':line[4],
'fbs':line[5],
'restecg':line[6],
'thalachh':line[7]
} for line in csv.reader(f)}
But print(raw_data) returns this:
{0: {'age': '57', 'sex': '0', 'cp': '1', 'trtbps': '130', 'chol': '236', 'fbs': '0', 'restecg': '0', 'thalachh': '174'}}
As u can see my method saves only 1 line to dict and this line is the last line from the file. Pls help me
len(self.raw_data) is evaluated ones at the start and does not change inside the dict comprehension. Just use a normal loop or enumerate like:
def read_data(self):
with open('storage/data/heart.csv') as f:
self.raw_data = {
i: {
'age':line[0],
'sex':line[1],
'cp':line[2],
'trtbps':line[3],
'chol':line[4],
'fbs':line[5],
'restecg':line[6],
'thalachh':line[7]
} for line in i, enumerate(csv.reader(f))}
I trying to structure the data with dict with appending list, I tried using defaultdict but giving error.
data =
"""
[{'transit01_net': '192.168.1.0',
'transit01_subnet': '26',
'transit02_net': '192.168.2.0',
'transit02_subnet': '26',
'transit03_net': '192.168.3.0',
'transit03_subnet': '26',
}]
"""
output = {
'transit01': [],
'transit02': [],
'transit03': []
}
I would like to get:
{
'transit01': ['192.168.1.0', '26', 'Transit01'],
'transit02': ['192.168.2.0', '26', 'Transit02'],
'transit03': ['192.168.3.0', '26', 'Transit03'],
}
I have tried following, but only able to print the first
for item in data:
# Iterating the elements in list
output['transit01'].append(item['transit01_net'])
output['transit01'].append(item['transit01_subnet'])
output['transit01'].append('Transit01')
output['transit02'].append(item['transit02_net'])
output['transit02'].append(item['transit02_subnet'])
output['transit02'].append('Transit02')
output['transit03'].append(item['transit03_net'])
output['transit03'].append(item['transit03_subnet'])
output['transit03'].append('Transit03')
Step through this. You want to get from this:
data =
"""
[{'transit01_net': '192.168.1.0',
'transit01_subnet': '26',
'transit02_net': '192.168.2.0',
'transit02_subnet': '26',
'transit03_net': '192.168.3.0',
'transit03_subnet': '26',
}]
"""
To this
{
'transit01': ['192.168.1.0', '26', 'Transit01'],
'transit02': ['192.168.2.0', '26', 'Transit02'],
'transit03': ['192.168.3.0', '26', 'Transit03'],
}
The former is a string that describes a literal data structure. Python gives you access to ast to lex and tokenize that into a python object for you.
import ast
evald_data = ast.literal_eval(data)
From there you need to do the more difficult work of actually parsing the structure. Looks like you can split each key, though, and get what you need. Let's save off name of each field for now.
result = {}
for d in evald_data: # for each dictionary in the (single-item) list
for k, v in d.items():
name, key = k.split("_")
result.setdefault(name, {})[key] = v
# this should give you
expected = {
{'transit01': {'net': '192.168.1.0', 'subnet': '26'},
{'transit02': {'net': '192.168.2.0', 'subnet': '26'},
{'transit03': {'net': '192.168.3.0', 'subnet': '26'}
}
assert result == expected
From there it's pretty simple stuff. I'd posit that you probably want a tuple instead of a list, since these values' order seem to matter (sorting them isn't just bad, it's incorrect).
final_result = {k: (v['net'], v['subnet'], k.title()) for k,v in result.items()}
expected = {
'transit01': ['192.168.1.0', '26', 'Transit01'],
'transit02': ['192.168.2.0', '26', 'Transit02'],
'transit03': ['192.168.3.0', '26', 'Transit03'],
}
assert final_result == expected
Use collections.defaultdict
Ex.
from collections import defaultdict
data = [{'transit01_net': '192.168.1.0',
'transit01_subnet': '26',
'transit02_net': '192.168.2.0',
'transit02_subnet': '26',
'transit03_net': '192.168.3.0',
'transit03_subnet': '26',
}]
output = defaultdict(list)
temp = 1
for x in data[0]:
key = x.split("_")[0]
output[key].append(data[0][x])
sub_key = "transit0{}_subnet".format(temp)
if x == sub_key:
output[key].append(key.capitalize())
temp+=1
print(dict(output))
O/P
{'transit01': ['192.168.1.0', '26', 'Transit01'], 'transit02': ['192.168.2.0', '26',
'Transit02'], 'transit03': ['192.168.3.0', '26', 'Transit03']}