Python rows and columns arrangment

Python rows and columns arrangment - python

I have this below data file and I want three columns with heading "TIMESTEP", "id" and "mass".
Its corresponding values are just immediately below itenter image description here. How to do it. Please help
Below link 1 is my snapshot of data file and 2 is my desired arrangement.

I agree with the comments that the question is hard to understand, but from my understanding of your problem I have found this solution:
import pandas as pd
data_input = """TIMESTEP
5000
id mass
TIMESTEP
5100
id mass
42 24
TIMESTEP
5200
id mass
99 123
32 84
79 424"""
columns=["TIMESTEP", "id", "mass"]
data = []
previous_line = ""
for line in data_input.split("\n"):
if columns[0] in previous_line and columns[1] not in line:
data.append({"TIMESTEP": line})
elif columns[1] in previous_line and columns[0] not in line:
data[len(data)-1]["id"], data[len(data)-1]["mass"] = line.split(" ")
elif all(col not in line for col in columns):
data.append({"TIMESTEP": data[len(data)-1]["TIMESTEP"]})
data[len(data)-1]["id"], data[len(data)-1]["mass"] = line.split(" ")
previous_line = line
df = pd.DataFrame(data)
print(df)
Try to run this script and see if this is what you were looking for.

Related

Reading an Asterix file from JSON output

Trying to convert a radar data file, that was sent to me in JSON format, to manageable DataFrame.
The first three lines of the file look like this:
{"id":1,"length":43,"crc":"D81B2DB5","timestamp":1617,"hexdata":"30002EFFD7021483000069E03BF78BE702A001E0FE2104B51D21020234269604D174E75DA008A50312B0620620B6","CAT048":{"I010":{"SAC":20,"SIC":131},"I140":{"ToD":0.8203125},"I020":{"TYP":7,"SIM":0,"RDP":0,"SPI":0,"RAB":0,"FX":0},"I040":{"RHO":59.9648438,"THETA":196.7376709},"I070":{"V":0,"G":0,"L":0,"spare":0,"Mode3A":"1240"},"I090":{"V":0,"G":0,"FL":120},"I130":{"SRLP":{"SRL":1.4501953},"SRRP":{"SRR":4},"SAMP":{"SAM":-75},"PRLP":{"PRL":1.2744141},"PAMP":{"PAM":33},"RPDP":{"RPD":0.0078125},"APDP":{"APD":0.0439453}},"I220":{"ACAddr":"342696"},"I240":{"TId":"AME4956 "},"I161":{"Tn":2213},"I200":{"CGS":172.92,"CHdg":248.0383301},"I170":{"CNF":0,"RAD":0,"DOU":0,"MAH":0,"CDM":3,"FX":0},"I230":{"COM":1,"STAT":0,"SI":0,"spare":0,"ModeSSSC":1,"ARC":0,"AIC":1,"BDS16":1,"BDS37":6}},"lat":38.585666818124,"lon":2.3784905351223,"h":3658.0244306503}
{"id":1,"length":40,"crc":"065756DA","timestamp":2468,"hexdata":"30002BFBB70214830000D2A000C8C0510A38E01804EA34239701803000000000004008BE00369EAE4624A0","CAT048":{"I010":{"SAC":20,"SIC":131},"I140":{"ToD":1.640625},"I020":{"TYP":5,"SIM":0,"RDP":0,"SPI":0,"RAB":0,"FX":0},"I040":{"RHO":0.78125,"THETA":270.4449463},"I070":{"V":0,"G":0,"L":0,"spare":0,"Mode3A":"5070"},"I130":{"SRLP":{"SRL":1.0546875},"SRRP":{"SRR":4},"SAMP":{"SAM":-22}},"I220":{"ACAddr":"342397"},"I250":[{"MCP_ALT_STATUS":1,"MCP_ALT":96,"FMS_ALT_STATUS":0,"FMS_ALT":0,"BP_STATUS":0,"BP":0,"res":0,"MODE_STATUS":0,"VNAV":0,"ALT_HOLD":0,"APP":0,"TARGET_ALT_STATUS":0,"TARGET_ALT_SOURCE":0,"BDS":"40"}],"I161":{"Tn":2238},"I200":{"CGS":11.88,"CHdg":223.1433105},"I170":{"CNF":0,"RAD":2,"DOU":0,"MAH":0,"CDM":3,"FX":0},"I230":{"COM":1,"STAT":1,"SI":0,"spare":0,"ModeSSSC":1,"ARC":0,"AIC":1,"BDS16":0,"BDS37":0}},"lat":39.543535327942,"lon":2.7284206653891,"h":4.2666605189443}
{"id":2,"length":64,"crc":"A45FA0D0","timestamp":2468,"hexdata":"300043FFF7021483000115A0896BE1B70AC105C8E01403BC4BB184508672CB482003C8480030A4018040FFD3C13A7FFCEC509E1A1F342037FF6008C1081E3CF54620F5","CAT048":{"I010":{"SAC":20,"SIC":131},"I140":{"ToD":2.1640625},"I020":{"TYP":5,"SIM":0,"RDP":0,"SPI":0,"RAB":0,"FX":0},"I040":{"RHO":137.4179688,"THETA":317.411499},"I070":{"V":0,"G":0,"L":0,"spare":0,"Mode3A":"5301"},"I090":{"V":0,"G":0,"FL":370},"I130":{"SRLP":{"SRL":0.8789062},"SRRP":{"SRR":3},"SAMP":{"SAM":-68}},"I220":{"ACAddr":"4BB184"},"I240":{"TId":"THY224 "},"I250":[{"MCP_ALT_STATUS":1,"MCP_ALT":37008,"FMS_ALT_STATUS":0,"FMS_ALT":0,"BP_STATUS":1,"BP":213,"res":0,"MODE_STATUS":1,"VNAV":1,"ALT_HOLD":0,"APP":0,"TARGET_ALT_STATUS":0,"TARGET_ALT_SOURCE":0,"BDS":"40"},{"RA_STATUS":1,"RA":-0.3515625,"TTA_STATUS":1,"TTA":84.375,"GS_STATUS":1,"GS":466,"TAR_STATUS":1,"TAR":-0.03125,"TAS_STATUS":1,"TAS":472,"BDS":"50"},{"HDG_STATUS":1,"HDG":84.5507812,"IAS_STAT":1,"IAS":271,"MACH_STATUS":1,"MACH":0.832,"BAR_STATU
I can see these lines contain the info I need, like callsign ("TId": "AME4956 "), heading and so on.
Is there a nice Pythonic way to get these values into a Dataframe?

This is almost valid JSON, except the final line seems to be truncated.
Pandas can import dictionaries with almost no pain:
import json
import pandas as pd
infos = []
with open(infofile) as fid:
for ln in fid:
infos.append(json.loads(ln))
df = pd.DataFrame(infos)
print(df)
prints:
id length crc timestamp hexdata CAT048 lat lon h
0 1 43 D81B2DB5 1617 30002EFFD7021483000069E03BF78BE702A001E0FE2104... {'I010': {'SAC': 20, 'SIC': 131}, 'I140': {'To... 38.585667 2.378491 3658.024431
1 1 40 065756DA 2468 30002BFBB70214830000D2A000C8C0510A38E01804EA34... {'I010': {'SAC': 20, 'SIC': 131}, 'I140': {'To... 39.543535 2.728421 4.266661
2 2 64 A45FA0D0 2468 300043FFF7021483000115A0896BE1B70AC105C8E01403... {'I010': {'SAC': 20, 'SIC': 131}, 'I140': {'To... NaN NaN NaN

It worked with the following code:
infos = []
with open('C:/Users/jobbr/Downloads/211201-est-000001/211201-est-000001.json') as fid:
for ln in fid:
infos.append(json.loads(ln))
df1 = pd.DataFrame(infos)
#print(df1)
# unravel this dataframe
c = df1['CAT048'].to_dict() # The CAT048 field holds important parameters
for i in range(0,len(df1)):
#print(i, c[i]['I200']['CGS'], c[i]['I200']['CHdg'])
df1.at[i, 'ground_speed'] = c[i]['I200']['CGS']
df1.at[i, 'heading'] = c[i]['I200']['CHdg']
df1.at[i, 'ACAddr'] = c[i]['I220']['ACAddr']
df1.at[i, 'ToD'] = c[i]['I140']['ToD']
try: # not all parameters seem to be present always
df1.at[i, 'flight_level'] = c[i]['I090']['FL']
except:
df1.at[i, 'flight_level'] = 0
try:
df1.at[i, 'callsign'] = c[i]['I240']['TId']
except:
df1.at[i, 'callsign'] = 'Unknown'
Thanks for the help!

Openpyxl - combine matching rows of two tables into one long row

In an Excel file I have two large tables. Table A ("Dissection", 409 rows x 25 cols) contains unique entries, each separated by a unique ID. Table B ("Dissection", 234 rows x 39 columns) uses the ID of Table A in the first cell and extends it. To analyze the data in Minitab, all data must be in a single long row, meaning the values of "Damage" have to follow "Dissection". The whole thing looks like this:
Table A - i.e. Dissection
- ID1 [valueTabA] [valueTabA]
- ID2 [valueTabA] [valueTabA]
- ID3 [valueTabA] [valueTabA]
- ID4 [valueTabA] [valueTabA]
Table B - i.e. Damage
- ID1 [valueTabB1] [valueTabB1]
- ID1 [valueTabB2] [valueTabB2]
- ID4 [valueTabB] [valueTabB]
They are supposed to combine something like this:
Table A
- ID1 [valueTabA] [valueTabA] [valueTabB1] [valueTabB1] [valueTabB2] [valueTabB2]
- ID2 [valueTabA] [valueTabA]
- ID3 [valueTabA] [valueTabA]
- ID4 [valueTabA] [valueTabA] [valueTabB] [valueTabB]
What is the best way to do that?
The following describes my two approaches. Both use the same data in the same tables but in two different files, to be able to test both scenarios.
The first approach uses a file, where both tables are in the same worksheet, the second uses a file where both tables are in different worksheets.
Scenario: both tables are in the same worksheet, where I'm trying to move the row as a range
current_row = 415 # start without headers of table A
current_line = 2 # start without headers of table B
for row in ws.iter_rows(min_row=415, max_row=647):
# loop through damage
id_A = ws.cell(row=current_row, column=1).value
max_col = 25
for line in ws.iter_rows(min_row=2, max_row=409):
# loop through dissection
id_B = ws.cell(row=current_line, column=1).value
if id_A == id_B:
copy_range = ((ws.cell(row=current_line, column=2)).column_letter + str(current_line) + ":" +
(ws.cell(row=current_line, column=39)).column_letter + str(current_line))
ws.move_range(copy_range, rows=current_row, cols=max_col+1)
print("copied range: " + copy_range +" to: " + str(current_row) + ":"+str(max_col+1))
count += 1
break
if current_line > 409:
current_line = 2
else:
current_line += 1
current_row += 1
-> Here I'm struggling to append the range to the right row of Table A, without overwriting the previous row (see example ID1 above)
Scenario: both tables are located in separated sheets
dissection = wb["Dissection"]
damage = wb["Damage"]
recovery = wb["Recovery"]
current_row, current_line = 2, 2
for row in damage.iter_rows():
# loop through first table
id_A = damage.cell(row=current_row, column=1).value
for line in dissection.iter_rows():
# loop through second table
id_B = dissection.cell(row=current_line, column=1).value
copyData = []
if id_A == id_B:
for col in range(2, 39):
# add data to the list, skipping the ID
copyData.append(damage.cell(row=current_line, column=col).value)
# print(copyData) for debugging purposes
for item in copyData:
column_count = dissection.max_column
dissection.cell(row=current_row, column=column_count).value = item
column_count += 1
current_row += 1
break
if not current_line > 409:
# prevent looping out of range
current_line += 1
else:
current_line = 2
-> Same problem as in 1., at some point it's not adding the damage values to copyData anymore but None instead, and finally it's just not pasting the items (cells stay blank)
I've tried everything excel related that I could find, but unfortunately nothing worked. Would pandas be more useful here or am I just not seeing something?
Thanks for taking the time to read this :)

I highly recommend using pandas for situations like this. It is still a bit unclear how your data is formatted in the excel file, but given your second option I assume that the tables are both on different sheets in the excel file. I also assume that the first row contains the table title (e.g. Table A - i.e. Dissection). If this is not the case, just remove skiprows=1:
import pandas as pd
df = pd.concat(pd.read_excel("filename.xlsx", sheet_name=None, skiprows=1, header=None), axis=1, ignore_index=True)
df.to_excel('combined_data.xlsx) #save to excel
read_excel will load the excel file into a pandas dataframe. sheet_name=None indicates that all sheets should be loaded into an OrderedDict of dataframes. pd.concat will concatenate these dataframes into one single dataframe (axis=1 indicates the axis). You can explore the data with df.head(), or save the dataframe to excel with df.to_excel.

I ended up using the 2. scenario (one file, two worksheets) but this code should be adaptable to the 1. scenario (one file, one worksheet) as well.
I copied the rows of Table B using code taken from here.
And handled the offset with code from here.
Also, I added a few extras to my solution to make it more generic:
import openpyxl, os
from openpyxl.utils import range_boundaries
# Introduction
print("Welcome!\n[!] Advice: Always have a backup of the file you want to sort.\n[+] Please put the file to be sorted in the same directory as this program.")
print("[+] This program assumes that the value to be sorted by is located in the first column of the outgoing table.")
# File listing
while True:
files = [f for f in os.listdir('.') if os.path.isfile(f)]
valid_types = ["xlsx", "xltx", "xlt", "xls"]
print("\n[+] Current directory: " + os.getcwd())
print("[+] Excel files in the current directory: ")
for f in files:
if str(f).split(".")[1] in valid_types:
print(f)
file = input("\nWhich file would you like to sort: ")
try:
ending = file.split(".")[1]
except IndexError:
print("please only enter excel files.")
continue
if ending in valid_types:
break
else:
print("Please only enter excel files")
wb = openpyxl.load_workbook(file)
# Handling Worksheets
print("\nAvailable Worksheets: " + str(wb.sheetnames))
print("Which file would you like to sort? (please copy the name without the parenthesis)")
outgoing_sheet = wb[input("Outgoing sheet: ")]
print("\nAvailable Worksheets: " + str(wb.sheetnames))
print("Which is the receiving sheet? (please copy the name without the parenthesis)")
receiving_sheet = wb[input("Receiving sheet: ")]
# Declaring functions
def copy_row(source_range, target_start, source_sheet, target_sheet):
# Define start Range(target_start) in the new Worksheet
min_col, min_row, max_col, max_row = range_boundaries(target_start)
# Iterate Range you want to copy
for row, row_cells in enumerate(source_sheet[source_range], min_row):
for column, cell in enumerate(row_cells, min_col):
# Copy Value from Copy.Cell to given Worksheet.Cell
target_sheet.cell(row=row, column=column).value = cell.value
def ask_yes_no(prompt):
"""
:param prompt: The question to be asked
:return: Value to check
"""
while True:
answer = input(prompt + " (y/n): ")
if answer == "y":
return True
elif answer == "n":
return False
print("Please only enter y or n.")
def ask_integer(prompt):
while True:
try:
answer = int(input(prompt + ": "))
break
except ValueError:
print("Please only enter integers (e.g. 1, 2 or 3).")
return answer
def scan_empty(index):
print("Scanning for empty cells...")
scan, fill = False, False
min_col = outgoing_sheet.min_column
max_col = outgoing_sheet.max_column
cols = range(min_col, max_col+1)
break_loop = False
count = 0
if not scan:
search_index = index
for row in outgoing_sheet.iter_rows():
for n in cols:
cell = outgoing_sheet.cell(row=search_index, column=n).value
if cell:
pass
else:
choice = ask_yes_no("\n[!] Empty cells found, would you like to fill them? (recommended)")
if choice:
fill = input("Fill with: ")
scan = True
break_loop = True
break
else:
print("[!] Attention: This can produce to mismatches in the sorting algorithm.")
confirm = ask_yes_no("[>] Are you sure you don't want to fill them?\n[+] Hint: You can also enter spaces.\n(n)o I really don't want to\noka(y) I'll enter something, just let me sort already.\n")
if confirm:
fill = input("Fill with: ")
scan = True
break_loop = True
break
else:
print("You have chosen not to fill the empty cells.")
scan = True
break_loop = True
break
if break_loop:
break
search_index += 1
if fill:
search_index = index
for row in outgoing_sheet.iter_rows(max_row=outgoing_sheet.max_row-1):
for n in cols:
cell = outgoing_sheet.cell(row=search_index, column=n).value
if cell:
pass
elif cell != int(0):
count += 1
outgoing_sheet.cell(row=search_index, column=n).value = fill
search_index += 1
print("Filled " + str(count) + " cells with: " + fill)
return fill, count
# Declaring basic variables
first_value = ask_yes_no("Is the first row containing values the 2nd in both tables?")
if first_value:
current_row, current_line = 2, 2
else:
current_row = ask_integer("Sorting table first row")
current_line = ask_integer("Receiving table first row")
verbose = ask_yes_no("Verbose output?")
reset = current_line
rec_max = receiving_sheet.max_row
scan_empty(current_row)
count = 0
print("\nSorting: " + str(outgoing_sheet.max_row - 1) + " rows...")
for row in outgoing_sheet.iter_rows():
# loop through first table - Table you want to sort
id_A = outgoing_sheet.cell(row=current_row, column=1).value
if verbose:
print("\nCurrently at: " + str(current_row - 1) + "/" + str(outgoing_sheet.max_row - 1) + "")
try:
print("Sorting now: " + id_A)
except TypeError:
# Handling None type exceptions
pass
for line in receiving_sheet.iter_rows():
# loop through second table - The receiving table
id_B = receiving_sheet.cell(row=current_line, column=1).value
if id_A == id_B:
try:
# calculate the offset
offset = max((row.column for row in receiving_sheet[current_line] if row.value is not None)) + 1
except ValueError:
# typical "No idea why, but it doesn't work without it" - code
pass
start_paste_from = receiving_sheet.cell(row=current_line, column=offset).column_letter + str(current_line)
copy_Range = ((outgoing_sheet.cell(row=current_row, column=2)).column_letter + str(current_row) + ":" +
(outgoing_sheet.cell(row=current_row, column=outgoing_sheet.max_column)).column_letter + str(current_row))
# Don't copy the ID, alternatively set damage.min_column for the first and damage.max_column for the second
copy_row(copy_Range, start_paste_from, outgoing_sheet, receiving_sheet)
count += 1
current_row += 1
if verbose:
print("Copied " + copy_Range + " to: " + str(start_paste_from))
break
if not current_line > rec_max:
# prevent looping out of range
current_line += 1
else:
current_line = reset
wb.save(file)
print("\nSorted: " + str(count) + " rows.")
print("Saving the file to: " + os.getcwd())
print("Done.")
Note: The values of table B ("Damage") are sorted according to the ID, although that is not required. However, if you choose to do so, this can be done using pandas.
import pandas as pd
df = pd.read_excel("excel/separated.xlsx","Damage")
# open the correct worksheet
df.sort_values(by="Identification")
df.to_excel("sorted.xlsx")

Reading a CSV file as Text File that performs Addition and Average

import csv
with open('Annual_Budget.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
Column_Sum = []
Third_Column_Avg = []
High_Value = []
Low_Value = []
for row in readCSV:
Column_Sum = []
Third_Column_Avg = []
High_Value = []
Low_Value = []
Column_Sum.append(Column_Sum)
Third_Column_Avg.append(Third_Column_Avg)
High_Value.append(High_Value)
Low_Value.append(Low_Value)
print(Column_Sum)
print(Third_Column_Avg)
print(High_Value)
print(Low_Value)`
How to read a csv as a text file and for each row add up the all of the numeric columns, skipping any columns that cannot be perceived as numbers and displays the sum when it is completed. It must also display the average of all values in the third column. It must also display the highest value and lowest value from the second column and show which row these values appeared in. I put a mock annual budget in picture format so you can get the idea of what I am trying to accomplish.
CSV SCREENSHOT EXAMPLE
Output: [SUM OF ALL NUMERIC COLUMNS], [AVERAGE OF ALL VALUES IN THIRD COLUMN], [HIGHEST VALUE FROM SECOND COLUMN][LOWEST VALUE FROM SECOND COLUMN]

With the pandas library (I made a file just like your screenshot) if you don't have this library just pip install pandas
then
In [1]: import pandas as pd
In [2]: my_file = pd.read_csv('stack.csv')
In [3]: my_file
Out[3]:
anual budget q2 q4
0 100 450 20
1 600 765 50
2 500 380 79
3 800 480 455
4 1100 65 4320
Anual budget, q2 and q4 sum
my_file['anual budget '].sum()
my_file['q2'].sum()
my_file['q4'].sum()
Average of third column
my_file['q4'].mean()
Min and max value of second column
my_file['q2'].max()
my_file['q2'].min()

Remove rows based on a value in second last column in Python

So I have a text file that I need to trim based on a value in the second last column - if it says 1, delete the line, if 0, keep the line.
The text looks like this, it just has thousands of rows:
#name #bunch of values #column of interest
00051079+4547116 00 05 10.896 +45 47 11.570 0 0 \n
00051079+4547117 00 05 10.896 +45 47 11.570 432 3 0 0 \n
00051079+4547118 00 05 10.896 +45 47 11.570 34 6 1 0 \n
I have tried this (plus about a hundred variations of this):
with open("Desktop/MStars.txt") as M:
data = M.read()
data = data.split('\n')
mactivity = [row.split()[-2] for row in data]
#name = [row.split(' ')[0] for row in data]
#print ((mactivity))
with open("Desktop/MStars.txt","r") as input:
with open("Desktop/MStarsReduced.txt","w") as output:
for line in input:
if mactivity =="0":
output.write(line)
Thank you in advance, it is driving me mad.

Recall that a line from a CSV reader is a list, where each cell/column is another value.
Editing your last little code block:
with open("Desktop/MStars.txt","r") as input:
with open("Desktop/MStarsReduced.txt","w") as output:
for line in input:
if line[-2] == 0:
output.write(line)
This will write your line if and only if the second to last field is 0. Otherwise, it will not be written.

Reading one 'cell' of a fixed width table (.txt file) that is split over two lines in python/pandas

How do I read one "cell" of a fixed width column that is split over two lines? The data input is a fixed width table, like so;
ID Description QTY
1 Description split over 1
two lines
2 Description on one line 2
I'd like to have the data frame format the data as per below;
ID Description QTY
1 Description split over two lines 1
2 Description on one line 2
My current code is;
import pandas as pd
df = pd.read_fwf('test.txt', names = ['ID', 'Description', 'QTY'])
df
But this gives me;
ID Description QTY
1 Description split over 1
NaN two lines NaN
2 Description on one line 2
Any ideas?

#Conditionally concatenate description from next row to current row if the ID of next row is NAN>
df['Description'] = df.apply(lambda x: x.Description if x.name==(len(df)-1) else x.Description + ' ' + df.iloc[x.name+1]['Description'] if np.isnan(df.iloc[x.name+1]['ID']) else x.Description, axis=1)
#Drop rows with NA.
df = df.dropna()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python rows and columns arrangment - python

I have this below data file and I want three columns with heading "TIMESTEP", "id" and "mass". Its corresponding values are just immediately below itenter image description here. How to do it. Please help Below link 1 is my snapshot of data file and 2 is my desired arrangement.

Related

Reading an Asterix file from JSON output

Openpyxl - combine matching rows of two tables into one long row

Reading a CSV file as Text File that performs Addition and Average

Remove rows based on a value in second last column in Python

Reading one 'cell' of a fixed width table (.txt file) that is split over two lines in python/pandas

Categories

Resources