Joining multiple json files using python - python

I am new to JSON, never used python packages to manipulate JSON files. I have 10 JSON files that I would like to merge into one using python.
Each of 10 files has the exact same structure and has about 50,000 entries
Example:
File one
{"tracking_code":"21703238","from_country":"FR","to_country":"FR","amount":3.23}
...
Example: File two
{"tracking_code":"41545695","from_country":"FR","to_country":"FR","amount":2.9}
...
Desired output would simply be:
{"tracking_code":"21703238","from_country":"FR","to_country":"FR","amount":3.23}
{"tracking_code":"41545695","from_country":"FR","to_country":"FR","amount":2.9}
The second part of my questions would be this - how would I join JSON files based on one key? I would like to join these 2 files by "tracking_code", the output file would simply add '"amount":3.23' to the first file.
Example: File one:
{"tracking_code":"29285908","from_country":"FR","to_country":"FR",
"package_type_id":10,"transaction_id":172238850,
"shipping_label_created":"2018-09-25 18:40:52"}
Example: File two
{"tracking_code":"29285908","from_country":"FR","to_country":"FR","amount":3.23}
Desired output:
{"tracking_code":"29285908","from_country":"FR","to_country":"FR",
"package_type_id":10,"transaction_id":172238850,
"shipping_label_created":"2018-09-25 18:40:52","amount":3.23}
Thank you.

If you use json.loads() (that "converts" json to python dictionary), you can merge them using a similar function :
def dict_merge(dict1, dict2):
return(dict2.update(dict1))
and then use json.dumps() to serialize the resulting dictionary to json.
Other solution :
you can also use json-merger (installation via pip install json-merger)

Related

How to read Json files in a directory separately with a for loop and performing a calculation

Update: Sorry it seems my question wasn't asked properly. So I am analyzing a transportation network consisting of more than 5000 links. All the data included in a big CSV file. I have several JSON files which each consist of subset of this network. I am trying to loop through all the JSON files INDIVIDUALLY (i.e. not trying to concatenate or something), read the JSON file, extract the information from the CVS file, perform calculation, and save the information along with the name of file in new dataframe. Something like this:
enter image description here
This is the code I wrote, but not sure if it's efficient enough.
name=[]
percent_of_truck=[]
path_to_json = \\directory
import glob
z= glob.glob(os.path.join(path_to_json, '*.json'))
for i in z:
with open(i, 'r') as myfile:
l=json.load(myfile)
name.append(i)
d_2019= final.loc[final['LINK_ID'].isin(l)] #retreive data from main CSV file
avg_m=(d_2019['AADTT16']/d_2019['AADT16']*d_2019['Length']).sum()/d_2019['Length'].sum() #calculation
percent_of_truck.append(avg_m)
f=pd.DataFrame()
f['Name']=name
f['% of truck']=percent_of_truck
I'm assuming here you just want a dictionary of all the JSON. If so, use the JSON library ( import JSON). If so, this code may be of use:
import json
def importSomeJSONFile(f):
return json.load(open(f))
# make sure the file exists in the same directory
example = importSomeJSONFile("example.json")
print(example)
#access a value within this , replacing key with what you want like "name"
print(JSON_imported[key])
Since you haven't added any Schema or any other specific requirements.
You can follow this approach to solve your problem, in any language you prefer
Get Directory of the JsonFiles, which needs to be read
Get List of all files present in directory
For each file-name returned in Step2.
Read File
Parse Json from String
Perform required calculation

Merge CSV columns with irregular timestamps and different header names per file

I have long CSV files with different headers in every file.
The first column is always a timestamp which is irregular with its timings, so it rarely matches.
file1.csv
time,L_pitch,L_roll,L_yaw
2020-08-21T09:58:07.570,-0.0,-6.1,0.0
2020-08-21T09:58:07.581,-0.0,-6.1,0.0
2020-08-21T09:58:07.591,-0.0,-6.1,0.0
....
file2.csv
time,R_pitch,R_roll,R_yaw
2020-08-21T09:58:07.591,1.3,-5.7,360.0
2020-08-21T09:58:07.607,1.3,-5.7,360.0
2020-08-21T09:58:07.617,1.3,-5.7,360.0
....
file3.csv
time,L_accel_lat,L_accel_long,L_accel_vert
2020-08-21T09:58:07.420,-0.00,-0.00,0.03
2020-08-21T09:58:07.430,-0.00,0.00,0.03
2020-08-21T09:58:07.440,-0.00,0.00,0.03
....
At the moment there can be up to 6 CSV files in that format in a folder.
I would like to merge these CSV into one file where all columns are recognized and sorted according to the timestamps. When timestamps are matching, data gets merged into its corresponding line. If time is not matched, it gets a separate line with empty fields.
The result should look like this.
time,L_pitch,L_roll,L_yaw,R_pitch,R_roll,R_yaw,L_accel_lat,L_accel_long,L_accel_vert
2020-08-21T09:58:07.420,,,,,,,-0.00,-0.00,0.03
2020-08-21T09:58:07.430,,,,,,,-0.00,0.00,0.03
2020-08-21T09:58:07.440,,,,,,,-0.00,0.00,0.03
....
2020-08-21T09:58:07.581,-0.0,-6.1,0.0,,,,,,
2020-08-21T09:58:07.591,-0.0,-6.1,0.0,1.3,-5.7,360.0,,,
Last line would be an example of a matching timecode and with this also datamerging into one line
So far I tried this Github Link, but this merges with filenames into the CSV and no sorting.
Panda in Python seems to be up to the task, but my skills are not. I also tried some python files from GitHub...
This one seemed the most promising with changing the user, but it runs with no end (files to big?).
Is this possible to do this in a PowerShell ps1 or a somewhat (for me) "easy" python script?
I would build this into a batch file to work in several folders.
Thanks in advance
goam
As you mentioned, you can solve your problems rather conveniently using pandas.
import pandas as pd
import glob
tmp=[]
for f in glob.glob("file*"):
print(f)
tmp.append(pd.read_csv(f, index_col=0, parse_dates=True))
pd.concat(tmp,axis=1,sort=True).to_csv('merged')
Some explanation:
Here, we use glob to get the list of files using the wildcard pattern file*. We loop over this list and read each file using pandas read_csv. Note, we parse the dates of the file (converts to dtype datetime64[ns]) and use the date column as an index of the dataframe. We store the dataframes in a list called tmp. Finally we concatinate the individual dataframes (of the individual files) in tmp using concat and immediately write it to a file called merged.csv using pandas to_csv.

How to update data to CSV file by specific field using Robot Framework

Now I use keyword
Append Data
${list}= Create List Test1 Test2
${data}= create list ${list}
Append To Csv File ${File_Path} ${list}
but it cannot specific the data's position that I want to update, In my test case I have to update new data everytimes after finished case to use new data in next case. (I kept the test data is in CSV file)
Looks like you are already making use of CSVLibrary
in this library you have only the following KWS, what we can notice from here is that, we do not have replace CSV line/file anything, hence, we need to come up with our own procedure.
Append To Csv File
Csv File From Associative
Empty Csv File
Read Csv File To Associative
Read Csv File To List
APPROACH#1
In my test case I have to update new data everytimes after finished
case to use new data in next case.
One of the ways which can be employed to solve your problem, is by converting all of the csv file data into list of dicts.
Read the cvs into list of dicts using Read Csv File To
Associative
make a copy of the original list of dicts
Start of Testcase#1
make the modification to the list of dicts, just in case you would like to go back in time for a quick referral
End of Testcase#1
Start of Testcase#2
make and use the modified content of list of dists from Testcase#1
End of Testcase#2
So on for the rest of the test cases.
Here no need to use CSV library.
If we want to create new csv file with new data always then we can use Create File keyword from OperatingSystem library
Create File filename.csv content=content_added_in_csvFile
e.g. Create File ${CURDIR}/Demo.csv content=675432561
If we want to add multiple data in CSV then
Create File ${CURDIR}/Demo.csv content=68868686,85757464,5757474
Here when we will run this code then old file will be replace by new file with provided content .
Hope It will resolve this issue

How can I compare the differences between the two files(python Dictionary as a format)?

Use the python script to output the contents of the dictionary to a file , the file name is 2017-12-29.txt, the content of the file from output of dictionary content :
{u'AP1': [u'i-001',u'i-002'], u'AP2': [u'i-003', u'i-004'], u'AP3': [u'i-005', u'i-006'], u'AP4': [u'i-007', u'i-008'], u'AP5': ,[u'i-009', u'i-010']}
another file name is 2017-12-30.txt, the content of the file from output of dictionary content :
{u'AP1': [u'i-001'], u'AP2': [u'i-003', u'i-004'], u'AP3': [u'i-005'], u'AP4': [u'i-007', u'i-008'], u'AP5': ,[u'i-009', u'i-010',u'i-011']}
How can I compare the differences between the two files , then export this difference to another file in the dictionary format or other format ?
Generate output files with pprint, run /usr/bin/diff -u or difflib against them, and you're done.
Oh, and do code this in python3. You'll be glad you did.

Writing list of lists in float format from python to text file

I have a list of lists which I want to write to a text file. Then, I have to read that text file, edit that list of lists and again write it back to the text file.
Now, I am not able to write it without string format. So, when I read it, I can not make the required changes because it becomes a string.
For my case, list of lists looks like
A = [[0,[0.0,1010.0,10.0],[10.0,1110.0,10.0]],[0,[10.0,1011.0,15.0],[15.0,1111.0,19.0]]]
I didn't find any solution for this problem. Any help is highly appreciated.
Note: When I read A and try converting it to float, it was trying to convert [ into float. Hence, it didn't work.
You should dump into the file as-is, there's many ways of doing this, one way will be to use json.
import json
A=[[0,[0.0,1010.0,10.0],[10.0,1110.0,10.0]],[0,[10.0,1011.0,15.0],[15.0,1111.0,19.0]]]
json.dump(A, open("file.txt","w")) # your file path
To load it you can do:
json.load(open("file.txt"))
Another way is the dump the file as a string of the list and retrieve it using ast.literal_eval as suggested by Jean-François.

Categories