CSV to JSON array only numbers - python

import csv
import json
myDict = {}
jsonStr = json.dumps(myDict)
print(jsonStr)
with open('test - Cópia.csv', 'rb',encoding=) as csvdata:
reader = csv.DictReader(csvdata,fieldnames=['Time', 'Yaw','Pitch','Roll','Ax','Ay','Az','Gx','Gy','Gz','Mx','My','Mz'])
json.dump([row for row in reader], open('output.json', 'w+'))
I have a problem that I can´t figure it out yet. I Have a csv only with numbers, no header. With 14 columns, Each column name are above in []. I have to create a JSON that puts the key name and each key name have an array with all the numbers that I have in csv.
CSV is like this :
1364.00,0.15,0.36,-0.13,-3.24,-0.42,-0.15,0.90,0.00,-0.01,0.02,0.26,0.01,-0.04
1374.00,0.30,0.76,-0.25,-3.25,-0.41,-0.13,0.91,0.00,-0.00,0.02,0.26,0.01,-0.04
1384.00,0.45,1.08,-0.35,-3.17,-0.41,-0.10,1.00,-0.00,-0.01,0.02,0.26,0.01,-0.07
1394.00,0.61,1.44,-0.49,-3.21,-0.40,-0.10,1.01,-0.00,-0.01,0.02,0.26,0.01,-0.07
1404.00,0.77,1.81,-0.65,-3.25,-0.40,-0.11,1.00,-0.01,-0.01,0.02,0.26,0.01,-0.07
1414.00,0.92,2.12,-0.83,-3.29,-0.38,-0.14,0.98,-0.00,-0.01,0.02,0.26,0.01,-0.07
1424.00,1.05,2.43,-1.01,-3.34,-0.37,-0.14,0.96,-0.00,-0.01,0.02,0.26,0.01,-0.07
1434.00,1.21,2.78,-1.15,-2.95,-0.38,-0.10,0.91,-0.00,-0.01,0.02,0.26,0.01,-0.05
1444.00,1.35,3.10,-1.27,-2.97,-0.37,-0.09,0.90,-0.00,-0.01,0.02,0.26,0.01,-0.05
1454.00,1.49,3.42,-1.39,-2.99,-0.37,-0.10,0.90,-0.00,-0.01,0.02,0.26,0.01,-0.05
1464.00,1.62,3.74,-1.57,-3.02,-0.37,-0.14,0.90,-0.00,-0.01,0.02,0.26,0.01,-0.05
1474.00,1.74,4.08,-1.77,-3.05,-0.38,-0.16,0.87,-0.00,-0.01,0.02,0.26,0.01,-0.05
2054.00,8.39,14.06,-10.55,-0.08,-0.05,0.06,1.20,-0.01,0.02,-0.00,0.24,-0.01,-0.04
and I want to create a JSON file like this
session 1 { "Time": [an array with all the numbers that are in column 0 of csv],
"Pitch": [an array with all the numbers that are in column 1 of csv],
...
}

Here's how to do it using only built-in functions and modules included in Python's standard library.
As I mentioned in a comment, you will need to read in the entire CSV file first. This is because its row and columns need to be transposed in order to output them as an array the way you want. Fortunately doing that is easy using the built-in zip() function.
Also note the use of csv.reader instead of a csv.DictReader. This change was made because the zip() function couldn't (easily) be used on a list of dictionaries. The field names are still used, but it's not until the dictionary is created as describe next. Note that this will ignore the extra value in each row that does not have a fieldname.
You can use a dictionary comprehension to create one formatted the way you want before calling json.dump() to write it to the output file.
import csv
import json
fieldnames = 'Time','Yaw','Pitch','Roll','Ax','Ay','Az','Gx','Gy','Gz','Mx','My','Mz'
csv_filepath = 'test - Cópia.csv'
json_filepath = 'output.json'
with open(csv_filepath, 'r', newline='') as csv_file:
rows = (map(float, row) for row in csv.reader(csv_file)) # Expr to read entire CSV file.
data = tuple(zip(*rows)) # Transpose the CSV file's rows and cols.
my_dict = {'session 1': {fieldname: row for fieldname in fieldnames for row in data}}
with open(json_filepath, 'w') as json_file:
json.dump(my_dict, json_file, indent=4)
print('done')

I recommend using Pandas to take care of the details of reading CSVs. It also has convenient features for manipulating the data.
import pandas as pd
import json
# Note that your column names include one less column than the sample CSV
df = pd.read_csv('test.csv', names=['Time', 'Yaw', 'Pitch', 'Roll', 'Ax', 'Ay', 'Az', 'Gx', 'Gy', 'Gz', 'Mx', 'My', 'Mz', 'extra'])
# Note that this can be simplified if you don't need the top-level "session 1"
output = {'session 1': df.to_dict(orient='list')}
with open('output.json', 'w') as f:
json.dump(output, f)
If for some reason you need a solution using only stdlib, here's one which uses a defaultdict to build the inner output. An important note is that the csv module reads things as strings by default, so you need to convert the numbers to floats otherwise they will be quoted in the output json.
import csv
from collections import defaultdict
inner_output = defaultdict(list)
fieldnames = ['Time', 'Yaw', 'Pitch', 'Roll', 'Ax', 'Ay', 'Az', 'Gx', 'Gy', 'Gz', 'Mx', 'My', 'Mz', 'extra']
reader = csv.reader(open('test.csv'))
for row in reader:
for name, value in zip(fieldnames, row):
# Note that the csv module only reads strings
inner_output[name].append(float(value))
output = {'session 1': inner_output}

Related

How to group elements in structures, count them and calculate a sum in Python?

This is probably a simple question. I am reading csv file with two columns: name + value. I can have a lot of entries there. What is the easiest and most efficient way to count occurrences of each 'name' + sum of values? I can do it on my own with loops but probably there is some smart way in Python to do this.
Example:
adam;10000
bartek;1000
tomasz;5000
adam;1000
bartek;3000
Result:
adam;11000;2
tomasz;5000;1
bartek;4000;2
You can leverage the csv module for this. Read the data from your file into a dictionary - use the name as key and store values in a list under this key. Using the collections.defaultdict is easiest:
Write data file:
name = "f.txt"
with open(name, "w") as f:
f.write("""adam;10000
bartek;1000
tomasz;5000
adam;1000
bartek;3000""" )
Process data file:
import csv # https://docs.python.org/3/library/csv.html
from collections import defaultdict
# read data into dictionary
results = defaultdict(list)
with open(name, newline='') as f:
reader = csv.reader(f, delimiter=";")
for line in reader:
if line:
results[line[0]].append(int(line[1]))
print(results)
# write data from dictionary to file
with open("new" + name, "w", newline="") as f:
writer = csv.writer(f, delimiter=";")
for key in results:
writer.writerow([key, sum(results[key]), len(results[key])])
# read file and print it
print(open("new"+name).read())
Output:
# read data
defaultdict(<class 'list'>, {'adam': [10000, 1000],
'bartek': [1000, 3000],
'tomasz': [5000]})
# written results
adam;11000;2
bartek;4000;2
tomasz;5000;1
Assuming your data is inside a tupled list (and you don't/can't use pandas), you can do the following:
people = [('adam', 10000), ('bartek', 1000),
('tomasz', 5000), ('adam', 1000), ('bartek', 3000)]
report = {}
for person in people:
name, salary = person
# we initialize the counter
if name not in report:
report[name] = {'salary': 0, 'times': 0}
# then we add to it
report[name]['salary'] = report[name]['salary'] + salary
report[name]['times'] += 1
Then you can retrieve each value using:
print(report)
print(report['adam'])
print(report['adam']['salary'])
print(report['adam']['times'])
one of the most popular packages in Python to deal with data is Pandas. It will allow you to store your csv data (by a read_csv function) into a python object (called a Pandas Dataframe), and then apply multiple functions on it.
Once your data put onto a pandas dataframe (calling it df), you can do the following
df_result = df.groupby('name')['value'].sum().reset_index()
Doing that, you regroup your data by name, and calculate the sum of each value with the same name.

Output nested array with variable length to CSV

I've got a datasample that I would like to output in a CSV file. The data structure is a nested list of different german terms (dict) and the corresponding possible english translations (list):
all_terms = [{'Motor': ['engine', 'motor']},
{'Ziel': ['purpose', 'goal', 'aim', 'destination']}]
As you can see, one german term could hold variable quantities of english translations. I want to output each german term and each of its corresponding translations into separate columns in one row, so "Motor" is in column 1, "engine" in column 2 and "motor" in column 3.
See example:
I just don't know how to loop correctly through the data.
So far, my code to output:
with open(filename, 'a') as csv_file:
writer = csv.writer(csv_file)
# The for loop
for x in all_terms:
for i in x:
for num in i:
writer.writerow([i, x[i][num]])
But this error is thrown out:
writer.writerow([i, x[i][num]]) TypeError: list indices must be integers, not unicode
Any hint appreciated, and maybe there's even a smarter way than 3 nested for loops.
How about the following solution:
import csv
all_terms = [{'Motor': ['engine', 'motor']},
{'Ziel': ['purpose', 'goal', 'aim', 'destination']}]
with open('test.csv', 'a') as csv_file:
writer = csv.writer(csv_file)
# The for loop
for small_dict in all_terms:
for key in small_dict:
output = [key, *small_dict[key]]
writer.writerow(output)
Output in test.txt:
Motor,engine,motor
Ziel,purpose,goal,aim,destination
I used * operator to unpack all items inside the dictionary's values to create a row for the writerow to write in. This can potentially take care of the case if you have multiple entries in a dictionary inside of all_terms.
Here's a way to do it:
import csv
all_terms = [{'Motor': ['engine', 'motor']},
{'Ziel': ['purpose', 'goal', 'aim', 'destination']}]
filename = 'tranlations.csv'
with open(filename, 'a', newline='') as csv_file:
writer = csv.writer(csv_file)
for term in all_terms:
word, translations = term.popitem()
row = [word] + translations
writer.writerow(row)
CSV file's contents afterwards:
Motor,engine,motor
Ziel,purpose,goal,aim,destination

Simple script for importing csv, modifying columns and writing back in python

I am working on a project and I have a large csv doc in Excel that I'd like to modify with python so that I can correctly run it through a statistical analysis program. I am learning python as my first language and know enough to be dangerous but I'm getting flustered with piecing together code from several sources. Right now I have the import down:
import csv
with open("file_path.csv", 'r') as file:
reader = csv.reader(file)
file_list = list(reader)
print(file_list)
What I need to do is go from this:
['person_id', 'person_name', 'sex', 'blood_pressure_type', 'blood_pressure_value'], ['1', 'Fred', 'M', 'systolic', '160'], ['1', 'Fred', 'M', 'diastolic', '80'], ['2', 'Linda', 'F', 'systolic', '155'], ['2', 'Linda', 'F', 'diastolic', '78']
to this:
['person_id', 'person_name', 'sex', 'blood_pressure_type', 'blood_pressure_value', 'blood_pressure_type', 'blood_pressure_value'], ['1', 'Fred', 'M', 'systolic', '160', 'diastolic', '80'], ['2', 'Linda', 'F', 'systolic', '155','diastolic', '78']
such that the tuples aren't repeating for the same person and both systolic and diastolic blood pressures are in a single tuple for each person while maintaining a separated attribute column.
After this, I'd like to write back to the csv file with the new tuples overwriting the old, or simply creating a new one if that's easier. Can anyone help with a import/modify/export for python 3.x? I'm most appreciative for any help, even partial.
P.S. Like I said I'm a newbie and I've read that python 2.7 has a lot to offer. If I should be using something other than 3.x for this I'd love to hear opinions.
If the input file is sorted by person_id, then you could just read the rows in pairs and merge them into a single row:
rows = ['person_id', 'person_name', 'sex', 'blood_pressure_type',
'blood_pressure_value', 'blood_pressure_type', 'blood_pressure_value']
for first, second in zip(file_list[::2], file_list[1::2]):
row = first + second[3:]
rows.append(row)
Then write rows into a new CSV file.
Update: you can use a similar approach to process multiple pairs that go together. Just split the list of rows into two by the value of 'blood_pressure_type', then zip() and merge them:
systolic = []
diastolic = []
for row in file_list:
if row[-2] == 'systolic':
systolic.append(row)
else:
diastolic.append(row)
for first, second in zip(systolic, diastolic):
row = first + second[3:]
rows.append(row)
I like to use DictWriter() and DictReader() for things like this.
DictReader() operates like a regular reader but maps the information read into a dict. So the fieldnames are keys to the dictionary. eg:
for row in reader:
row['fieldname'] = data
I think this would be easily achievable by using a dictionary and storing the information to write:
import csv
import os
from collections import defaultdict
# create a dictionary to store user information in
info_dict = defaultdict(list)
file_data = csv.DictReader(open(os.path.join(pathname, filename)).readlines())
for info in file_data:
# if you have already added info into the dictionary for that id, just add the new information (blood_pressure_type and blood_pressure_value)
if not info['person_id'] in info_dict:
info_dict[info['person_id']].extend(info['person_name'], info['sex'], info['blood_pressure_type'], info['blood_pressure_value'])
else:
info_dict[info['person_id']].extend(info['blood_pressure_type'], info['blood_pressure_value'])
# now you have a dictionary that looks like
# { person_id: [ person_name, sex, blood_pressure_type, blood_pressure_value, blood_pressure_type, blood_pressure_value ] }
with open("file_write_path.csv", 'w') as file_write:
# declare the fieldnames for the top row of the csv
# it doesn't have to be blood_pressure_...1/2
fieldnames = ['person_id', 'person_name', 'sex', 'blood_pressure_type1', 'blood_pressure_value1', 'blood_pressure_type2', 'blood_pressure_value2']
# make a writer and pass the headers as the fieldnames for the csv writer.
writer = csv.DictWriter(file_write, fieldnames=fieldnames)
# write the headers
writer.writeheader()
# iterate through your info dictionary, you have a list of the information you want for the person_id that you have, so write each row with that info.
for person_id, info in info_dict.items():
writer.writerow({ fieldnames[0]: person_id, fieldnames[1]: info[0], fieldnames[2]: info[1], fieldnames[3]: info[2], fieldnames[4]: info[3], fieldnames[5]: info[4], fieldnames[6]: info[5] })
If this doesn't make sense, I can clarify any of it!
Update: I fixed the opening for DictReader
You could alternatively do the same thing like: file_data = csv.DictReader(file.readlines()) and use the way of with open(filename, 'r') as file:

Storing list in to csv file using python

I'm trying to store a type list variable in to a csv file using python. Here is what I got after hours on StackOverflow and python documentation:
Code:
row = {'SgDescription': 'some group', 'SgName': 'sku', 'SgGroupId': u'sg-abcdefgh'}
new_csv_file = open("new_file.csv",'wb')
ruleswriter = csv.writer(new_csv_file,dialect='excel',delimiter=',')
ruleswriter.writerows(row)
new_csv_file.close()
Result:
$ more new_file.csv
S,g,D,e,s,c,r,i,p,t,i,o,n
S,g,N,a,m,e
S,g,G,r,o,u,p,I,d
Can anyone please advice how to store the values to the file like this:
some group,sku,sg-abcdefgh
Thanks a ton!
writerows() expects a sequence of sequences, for example a list of lists. You're passing in a dict, and a dict happens to be iterable: It returns the keys of the dictionary. Each key -- a string -- happens to be iterable as well. So what you get is an element of each iterable per cell, which is a character. You got exactly what you asked for :-)
What you want to do is write one row, with the keys in it, and then maybe another with the values, eg:
import csv
row = {
'SgDescription': 'some group',
'SgName': 'sku',
'SgGroupId': u'sg-abcdefgh'
}
with open("new_file.csv",'wb') as f:
ruleswriter = csv.writer(f)
ruleswriter.writerows([row.keys(), row.values()])
If order is important, use collections.OrderedDict.
Extract your desired data before writing into a csv file,
row = [row['SgDescription'], row['SgName'], row['SgGroupId']] # ['some group', 'sku', u'sg-abcdefgh']
# write to a csv file
with open("new_file.csv",'wb') as f:
ruleswriter = csv.writer(f)
ruleswriter.writerow(row)
PS: if you don't care about the order, just use row.values().
Or use csv.DictWriter,
import csv
row = {'SgDescription': 'some group', 'SgName': 'sku', 'SgGroupId': u'sg-abcdefgh'}
with open('new_file.csv', 'w') as csvfile:
fieldnames = ['SgDescription', 'SgName', 'SgGroupId']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writerow(row)

Syntax - saving a dictionary as a csv file

I am trying to "clean up" some data - I'm creating a dictionary of the channels that I need to keep and then I've got an if block to create a second dictionary with the correct rounding.
Dictionary looks like this:
{'time, s': (imported array), 'x temp, C':(imported array),
'x pressure, kPa': (diff. imported array).....etc}
Each imported array is 1-d.
I was looking at this example, but I didn't quite get the way to parse it so that I ended up with what I want.
My desired output is a csv file (do not care if the delimiter is spaces or commas or whatever) with the first row being the keys and the subsequent rows simply being the values.
I feel like what I'm missing is how to use the map function properly.
Also, I'm wondering if I'm using DictWriter when I should be using DictReader.
This is what I originally tried:
with open((filename), 'wb') as outfile:
write = csv.DictWriter(outfile, Fieldname_order)
write.writer.writerow(Fieldname_order)
write.writerows(data)
DictWriter's API doesn't match the data structure you have. DictWriter requires list of dictionaries. You have a dictionary of lists.
You can use the ordinary csv.writer:
my_data = {'time, s': [0,1,2,3], 'x temp, C':[0,10,20,30],
'x pressure, kPa': [0,100,200,300]}
import csv
with open('outfile.csv', 'w') as outfile:
writer = csv.writer(outfile)
writer.writerow(my_data.keys())
writer.writerows(zip(*my_data.values()))
That will write the columns in arbitrary order, which order may change from run to run. One way to make the order to be consistent is to replace the last two lines with:
writer.writerow(sorted(my_data.keys()))
writer.writerows(zip(*(my_data[k] for k in sorted(my_data.keys()))))
Edit: in this example data is a list of dictionaries. Each row in the csv contains one value for each key.
To write your dictionary with a header row and then data rows:
with open(filename, 'wb') as outfile:
writer = csv.DictWriter(outfile, fieldnames)
writer.writeheader()
writer.writerows(data)
To read in data as a dictionary then you do need to use DictReader:
with open(filename, 'r') as infile:
reader = csv.DictReader(infile)
data = [row for row in reader]

Categories