Trying to convert a radar data file, that was sent to me in JSON format, to manageable DataFrame.
The first three lines of the file look like this:
{"id":1,"length":43,"crc":"D81B2DB5","timestamp":1617,"hexdata":"30002EFFD7021483000069E03BF78BE702A001E0FE2104B51D21020234269604D174E75DA008A50312B0620620B6","CAT048":{"I010":{"SAC":20,"SIC":131},"I140":{"ToD":0.8203125},"I020":{"TYP":7,"SIM":0,"RDP":0,"SPI":0,"RAB":0,"FX":0},"I040":{"RHO":59.9648438,"THETA":196.7376709},"I070":{"V":0,"G":0,"L":0,"spare":0,"Mode3A":"1240"},"I090":{"V":0,"G":0,"FL":120},"I130":{"SRLP":{"SRL":1.4501953},"SRRP":{"SRR":4},"SAMP":{"SAM":-75},"PRLP":{"PRL":1.2744141},"PAMP":{"PAM":33},"RPDP":{"RPD":0.0078125},"APDP":{"APD":0.0439453}},"I220":{"ACAddr":"342696"},"I240":{"TId":"AME4956 "},"I161":{"Tn":2213},"I200":{"CGS":172.92,"CHdg":248.0383301},"I170":{"CNF":0,"RAD":0,"DOU":0,"MAH":0,"CDM":3,"FX":0},"I230":{"COM":1,"STAT":0,"SI":0,"spare":0,"ModeSSSC":1,"ARC":0,"AIC":1,"BDS16":1,"BDS37":6}},"lat":38.585666818124,"lon":2.3784905351223,"h":3658.0244306503}
{"id":1,"length":40,"crc":"065756DA","timestamp":2468,"hexdata":"30002BFBB70214830000D2A000C8C0510A38E01804EA34239701803000000000004008BE00369EAE4624A0","CAT048":{"I010":{"SAC":20,"SIC":131},"I140":{"ToD":1.640625},"I020":{"TYP":5,"SIM":0,"RDP":0,"SPI":0,"RAB":0,"FX":0},"I040":{"RHO":0.78125,"THETA":270.4449463},"I070":{"V":0,"G":0,"L":0,"spare":0,"Mode3A":"5070"},"I130":{"SRLP":{"SRL":1.0546875},"SRRP":{"SRR":4},"SAMP":{"SAM":-22}},"I220":{"ACAddr":"342397"},"I250":[{"MCP_ALT_STATUS":1,"MCP_ALT":96,"FMS_ALT_STATUS":0,"FMS_ALT":0,"BP_STATUS":0,"BP":0,"res":0,"MODE_STATUS":0,"VNAV":0,"ALT_HOLD":0,"APP":0,"TARGET_ALT_STATUS":0,"TARGET_ALT_SOURCE":0,"BDS":"40"}],"I161":{"Tn":2238},"I200":{"CGS":11.88,"CHdg":223.1433105},"I170":{"CNF":0,"RAD":2,"DOU":0,"MAH":0,"CDM":3,"FX":0},"I230":{"COM":1,"STAT":1,"SI":0,"spare":0,"ModeSSSC":1,"ARC":0,"AIC":1,"BDS16":0,"BDS37":0}},"lat":39.543535327942,"lon":2.7284206653891,"h":4.2666605189443}
{"id":2,"length":64,"crc":"A45FA0D0","timestamp":2468,"hexdata":"300043FFF7021483000115A0896BE1B70AC105C8E01403BC4BB184508672CB482003C8480030A4018040FFD3C13A7FFCEC509E1A1F342037FF6008C1081E3CF54620F5","CAT048":{"I010":{"SAC":20,"SIC":131},"I140":{"ToD":2.1640625},"I020":{"TYP":5,"SIM":0,"RDP":0,"SPI":0,"RAB":0,"FX":0},"I040":{"RHO":137.4179688,"THETA":317.411499},"I070":{"V":0,"G":0,"L":0,"spare":0,"Mode3A":"5301"},"I090":{"V":0,"G":0,"FL":370},"I130":{"SRLP":{"SRL":0.8789062},"SRRP":{"SRR":3},"SAMP":{"SAM":-68}},"I220":{"ACAddr":"4BB184"},"I240":{"TId":"THY224 "},"I250":[{"MCP_ALT_STATUS":1,"MCP_ALT":37008,"FMS_ALT_STATUS":0,"FMS_ALT":0,"BP_STATUS":1,"BP":213,"res":0,"MODE_STATUS":1,"VNAV":1,"ALT_HOLD":0,"APP":0,"TARGET_ALT_STATUS":0,"TARGET_ALT_SOURCE":0,"BDS":"40"},{"RA_STATUS":1,"RA":-0.3515625,"TTA_STATUS":1,"TTA":84.375,"GS_STATUS":1,"GS":466,"TAR_STATUS":1,"TAR":-0.03125,"TAS_STATUS":1,"TAS":472,"BDS":"50"},{"HDG_STATUS":1,"HDG":84.5507812,"IAS_STAT":1,"IAS":271,"MACH_STATUS":1,"MACH":0.832,"BAR_STATU
I can see these lines contain the info I need, like callsign ("TId": "AME4956 "), heading and so on.
Is there a nice Pythonic way to get these values into a Dataframe?
This is almost valid JSON, except the final line seems to be truncated.
Pandas can import dictionaries with almost no pain:
import json
import pandas as pd
infos = []
with open(infofile) as fid:
for ln in fid:
infos.append(json.loads(ln))
df = pd.DataFrame(infos)
print(df)
prints:
id length crc timestamp hexdata CAT048 lat lon h
0 1 43 D81B2DB5 1617 30002EFFD7021483000069E03BF78BE702A001E0FE2104... {'I010': {'SAC': 20, 'SIC': 131}, 'I140': {'To... 38.585667 2.378491 3658.024431
1 1 40 065756DA 2468 30002BFBB70214830000D2A000C8C0510A38E01804EA34... {'I010': {'SAC': 20, 'SIC': 131}, 'I140': {'To... 39.543535 2.728421 4.266661
2 2 64 A45FA0D0 2468 300043FFF7021483000115A0896BE1B70AC105C8E01403... {'I010': {'SAC': 20, 'SIC': 131}, 'I140': {'To... NaN NaN NaN
It worked with the following code:
infos = []
with open('C:/Users/jobbr/Downloads/211201-est-000001/211201-est-000001.json') as fid:
for ln in fid:
infos.append(json.loads(ln))
df1 = pd.DataFrame(infos)
#print(df1)
# unravel this dataframe
c = df1['CAT048'].to_dict() # The CAT048 field holds important parameters
for i in range(0,len(df1)):
#print(i, c[i]['I200']['CGS'], c[i]['I200']['CHdg'])
df1.at[i, 'ground_speed'] = c[i]['I200']['CGS']
df1.at[i, 'heading'] = c[i]['I200']['CHdg']
df1.at[i, 'ACAddr'] = c[i]['I220']['ACAddr']
df1.at[i, 'ToD'] = c[i]['I140']['ToD']
try: # not all parameters seem to be present always
df1.at[i, 'flight_level'] = c[i]['I090']['FL']
except:
df1.at[i, 'flight_level'] = 0
try:
df1.at[i, 'callsign'] = c[i]['I240']['TId']
except:
df1.at[i, 'callsign'] = 'Unknown'
Thanks for the help!
In an Excel file I have two large tables. Table A ("Dissection", 409 rows x 25 cols) contains unique entries, each separated by a unique ID. Table B ("Dissection", 234 rows x 39 columns) uses the ID of Table A in the first cell and extends it. To analyze the data in Minitab, all data must be in a single long row, meaning the values of "Damage" have to follow "Dissection". The whole thing looks like this:
Table A - i.e. Dissection
- ID1 [valueTabA] [valueTabA]
- ID2 [valueTabA] [valueTabA]
- ID3 [valueTabA] [valueTabA]
- ID4 [valueTabA] [valueTabA]
Table B - i.e. Damage
- ID1 [valueTabB1] [valueTabB1]
- ID1 [valueTabB2] [valueTabB2]
- ID4 [valueTabB] [valueTabB]
They are supposed to combine something like this:
Table A
- ID1 [valueTabA] [valueTabA] [valueTabB1] [valueTabB1] [valueTabB2] [valueTabB2]
- ID2 [valueTabA] [valueTabA]
- ID3 [valueTabA] [valueTabA]
- ID4 [valueTabA] [valueTabA] [valueTabB] [valueTabB]
What is the best way to do that?
The following describes my two approaches. Both use the same data in the same tables but in two different files, to be able to test both scenarios.
The first approach uses a file, where both tables are in the same worksheet, the second uses a file where both tables are in different worksheets.
Scenario: both tables are in the same worksheet, where I'm trying to move the row as a range
current_row = 415 # start without headers of table A
current_line = 2 # start without headers of table B
for row in ws.iter_rows(min_row=415, max_row=647):
# loop through damage
id_A = ws.cell(row=current_row, column=1).value
max_col = 25
for line in ws.iter_rows(min_row=2, max_row=409):
# loop through dissection
id_B = ws.cell(row=current_line, column=1).value
if id_A == id_B:
copy_range = ((ws.cell(row=current_line, column=2)).column_letter + str(current_line) + ":" +
(ws.cell(row=current_line, column=39)).column_letter + str(current_line))
ws.move_range(copy_range, rows=current_row, cols=max_col+1)
print("copied range: " + copy_range +" to: " + str(current_row) + ":"+str(max_col+1))
count += 1
break
if current_line > 409:
current_line = 2
else:
current_line += 1
current_row += 1
-> Here I'm struggling to append the range to the right row of Table A, without overwriting the previous row (see example ID1 above)
Scenario: both tables are located in separated sheets
dissection = wb["Dissection"]
damage = wb["Damage"]
recovery = wb["Recovery"]
current_row, current_line = 2, 2
for row in damage.iter_rows():
# loop through first table
id_A = damage.cell(row=current_row, column=1).value
for line in dissection.iter_rows():
# loop through second table
id_B = dissection.cell(row=current_line, column=1).value
copyData = []
if id_A == id_B:
for col in range(2, 39):
# add data to the list, skipping the ID
copyData.append(damage.cell(row=current_line, column=col).value)
# print(copyData) for debugging purposes
for item in copyData:
column_count = dissection.max_column
dissection.cell(row=current_row, column=column_count).value = item
column_count += 1
current_row += 1
break
if not current_line > 409:
# prevent looping out of range
current_line += 1
else:
current_line = 2
-> Same problem as in 1., at some point it's not adding the damage values to copyData anymore but None instead, and finally it's just not pasting the items (cells stay blank)
I've tried everything excel related that I could find, but unfortunately nothing worked. Would pandas be more useful here or am I just not seeing something?
Thanks for taking the time to read this :)
I highly recommend using pandas for situations like this. It is still a bit unclear how your data is formatted in the excel file, but given your second option I assume that the tables are both on different sheets in the excel file. I also assume that the first row contains the table title (e.g. Table A - i.e. Dissection). If this is not the case, just remove skiprows=1:
import pandas as pd
df = pd.concat(pd.read_excel("filename.xlsx", sheet_name=None, skiprows=1, header=None), axis=1, ignore_index=True)
df.to_excel('combined_data.xlsx) #save to excel
read_excel will load the excel file into a pandas dataframe. sheet_name=None indicates that all sheets should be loaded into an OrderedDict of dataframes. pd.concat will concatenate these dataframes into one single dataframe (axis=1 indicates the axis). You can explore the data with df.head(), or save the dataframe to excel with df.to_excel.
I ended up using the 2. scenario (one file, two worksheets) but this code should be adaptable to the 1. scenario (one file, one worksheet) as well.
I copied the rows of Table B using code taken from here.
And handled the offset with code from here.
Also, I added a few extras to my solution to make it more generic:
import openpyxl, os
from openpyxl.utils import range_boundaries
# Introduction
print("Welcome!\n[!] Advice: Always have a backup of the file you want to sort.\n[+] Please put the file to be sorted in the same directory as this program.")
print("[+] This program assumes that the value to be sorted by is located in the first column of the outgoing table.")
# File listing
while True:
files = [f for f in os.listdir('.') if os.path.isfile(f)]
valid_types = ["xlsx", "xltx", "xlt", "xls"]
print("\n[+] Current directory: " + os.getcwd())
print("[+] Excel files in the current directory: ")
for f in files:
if str(f).split(".")[1] in valid_types:
print(f)
file = input("\nWhich file would you like to sort: ")
try:
ending = file.split(".")[1]
except IndexError:
print("please only enter excel files.")
continue
if ending in valid_types:
break
else:
print("Please only enter excel files")
wb = openpyxl.load_workbook(file)
# Handling Worksheets
print("\nAvailable Worksheets: " + str(wb.sheetnames))
print("Which file would you like to sort? (please copy the name without the parenthesis)")
outgoing_sheet = wb[input("Outgoing sheet: ")]
print("\nAvailable Worksheets: " + str(wb.sheetnames))
print("Which is the receiving sheet? (please copy the name without the parenthesis)")
receiving_sheet = wb[input("Receiving sheet: ")]
# Declaring functions
def copy_row(source_range, target_start, source_sheet, target_sheet):
# Define start Range(target_start) in the new Worksheet
min_col, min_row, max_col, max_row = range_boundaries(target_start)
# Iterate Range you want to copy
for row, row_cells in enumerate(source_sheet[source_range], min_row):
for column, cell in enumerate(row_cells, min_col):
# Copy Value from Copy.Cell to given Worksheet.Cell
target_sheet.cell(row=row, column=column).value = cell.value
def ask_yes_no(prompt):
"""
:param prompt: The question to be asked
:return: Value to check
"""
while True:
answer = input(prompt + " (y/n): ")
if answer == "y":
return True
elif answer == "n":
return False
print("Please only enter y or n.")
def ask_integer(prompt):
while True:
try:
answer = int(input(prompt + ": "))
break
except ValueError:
print("Please only enter integers (e.g. 1, 2 or 3).")
return answer
def scan_empty(index):
print("Scanning for empty cells...")
scan, fill = False, False
min_col = outgoing_sheet.min_column
max_col = outgoing_sheet.max_column
cols = range(min_col, max_col+1)
break_loop = False
count = 0
if not scan:
search_index = index
for row in outgoing_sheet.iter_rows():
for n in cols:
cell = outgoing_sheet.cell(row=search_index, column=n).value
if cell:
pass
else:
choice = ask_yes_no("\n[!] Empty cells found, would you like to fill them? (recommended)")
if choice:
fill = input("Fill with: ")
scan = True
break_loop = True
break
else:
print("[!] Attention: This can produce to mismatches in the sorting algorithm.")
confirm = ask_yes_no("[>] Are you sure you don't want to fill them?\n[+] Hint: You can also enter spaces.\n(n)o I really don't want to\noka(y) I'll enter something, just let me sort already.\n")
if confirm:
fill = input("Fill with: ")
scan = True
break_loop = True
break
else:
print("You have chosen not to fill the empty cells.")
scan = True
break_loop = True
break
if break_loop:
break
search_index += 1
if fill:
search_index = index
for row in outgoing_sheet.iter_rows(max_row=outgoing_sheet.max_row-1):
for n in cols:
cell = outgoing_sheet.cell(row=search_index, column=n).value
if cell:
pass
elif cell != int(0):
count += 1
outgoing_sheet.cell(row=search_index, column=n).value = fill
search_index += 1
print("Filled " + str(count) + " cells with: " + fill)
return fill, count
# Declaring basic variables
first_value = ask_yes_no("Is the first row containing values the 2nd in both tables?")
if first_value:
current_row, current_line = 2, 2
else:
current_row = ask_integer("Sorting table first row")
current_line = ask_integer("Receiving table first row")
verbose = ask_yes_no("Verbose output?")
reset = current_line
rec_max = receiving_sheet.max_row
scan_empty(current_row)
count = 0
print("\nSorting: " + str(outgoing_sheet.max_row - 1) + " rows...")
for row in outgoing_sheet.iter_rows():
# loop through first table - Table you want to sort
id_A = outgoing_sheet.cell(row=current_row, column=1).value
if verbose:
print("\nCurrently at: " + str(current_row - 1) + "/" + str(outgoing_sheet.max_row - 1) + "")
try:
print("Sorting now: " + id_A)
except TypeError:
# Handling None type exceptions
pass
for line in receiving_sheet.iter_rows():
# loop through second table - The receiving table
id_B = receiving_sheet.cell(row=current_line, column=1).value
if id_A == id_B:
try:
# calculate the offset
offset = max((row.column for row in receiving_sheet[current_line] if row.value is not None)) + 1
except ValueError:
# typical "No idea why, but it doesn't work without it" - code
pass
start_paste_from = receiving_sheet.cell(row=current_line, column=offset).column_letter + str(current_line)
copy_Range = ((outgoing_sheet.cell(row=current_row, column=2)).column_letter + str(current_row) + ":" +
(outgoing_sheet.cell(row=current_row, column=outgoing_sheet.max_column)).column_letter + str(current_row))
# Don't copy the ID, alternatively set damage.min_column for the first and damage.max_column for the second
copy_row(copy_Range, start_paste_from, outgoing_sheet, receiving_sheet)
count += 1
current_row += 1
if verbose:
print("Copied " + copy_Range + " to: " + str(start_paste_from))
break
if not current_line > rec_max:
# prevent looping out of range
current_line += 1
else:
current_line = reset
wb.save(file)
print("\nSorted: " + str(count) + " rows.")
print("Saving the file to: " + os.getcwd())
print("Done.")
Note: The values of table B ("Damage") are sorted according to the ID, although that is not required. However, if you choose to do so, this can be done using pandas.
import pandas as pd
df = pd.read_excel("excel/separated.xlsx","Damage")
# open the correct worksheet
df.sort_values(by="Identification")
df.to_excel("sorted.xlsx")
So I have a text file that I need to trim based on a value in the second last column - if it says 1, delete the line, if 0, keep the line.
The text looks like this, it just has thousands of rows:
#name #bunch of values #column of interest
00051079+4547116 00 05 10.896 +45 47 11.570 0 0 \n
00051079+4547117 00 05 10.896 +45 47 11.570 432 3 0 0 \n
00051079+4547118 00 05 10.896 +45 47 11.570 34 6 1 0 \n
I have tried this (plus about a hundred variations of this):
with open("Desktop/MStars.txt") as M:
data = M.read()
data = data.split('\n')
mactivity = [row.split()[-2] for row in data]
#name = [row.split(' ')[0] for row in data]
#print ((mactivity))
with open("Desktop/MStars.txt","r") as input:
with open("Desktop/MStarsReduced.txt","w") as output:
for line in input:
if mactivity =="0":
output.write(line)
Thank you in advance, it is driving me mad.
Recall that a line from a CSV reader is a list, where each cell/column is another value.
Editing your last little code block:
with open("Desktop/MStars.txt","r") as input:
with open("Desktop/MStarsReduced.txt","w") as output:
for line in input:
if line[-2] == 0:
output.write(line)
This will write your line if and only if the second to last field is 0. Otherwise, it will not be written.