How to extract particular set of value from a file in Python?

How to extract particular set of value from a file in Python? - python

I am stuck with the logic here... i have to extract some values from a text file that looks like this
AAA
+-------------+------------------+
| ID | count |
+-------------+------------------+
| 3 | 1445 |
| 4 | 105 |
| 9 | 160 |
| 10 | 30 |
+-------------+------------------+
BBB
+-------------+------------------+
| ID | count |
+-------------+------------------+
| 3 | 1445 |
| 4 | 105 |
| 9 | 160 |
| 10 | 30 |
+-------------+------------------+
CCC
+-------------+------------------+
| ID | count |
+-------------+------------------+
| 3 | 1445 |
| 4 | 105 |
| 9 | 160 |
| 10 | 30 |
+-------------+------------------+
I am not able to extract value from BBB alone and append it to a list like
f = open(sys.argv[1], "r")
text = f.readlines()
B_Values = []
for i in text:
if i.startswith("BBB"):(Example)
B_Values.append("only values of BBB")
if i.startswith("CCC"):
break
print B_Values
should result
['| 3 | 1445 |','| 4 | 105 |','| 9 | 160 |','| 10 | 30 |']

d = {}
with open(sys.argv[1]) as f:
for line in f:
if line[0].isalpha(): # is first character in the line a letter?
curr = d.setdefault(line.strip(), [])
elif filter(str.isdigit, line): # is there any digit in the line?
curr.append(line.strip())
for this file, d is now:
{'AAA': ['| 3 | 1445 |',
'| 4 | 105 |',
'| 9 | 160 |',
'| 10 | 30 |'],
'BBB': ['| 3 | 1445 |',
'| 4 | 105 |',
'| 9 | 160 |',
'| 10 | 30 |'],
'CCC': ['| 3 | 1445 |',
'| 4 | 105 |',
'| 9 | 160 |',
'| 10 | 30 |']}
Your B_values are d['BBB']

You can use a state flag bstarted to track when the B-group has begun.
After scanning the B-Group, delete the three header rows and the one footer row.
B_Values = []
bstarted = False
for i in text:
if i.startswith("BBB"):
bstarted = True
elif i.startswith("CCC"):
bstarted = False
break
elif bstarted:
B_Values.append(i)
del B_Values[:3] # get rid of the header
del B_Values[-1] # get rid of the footer
print B_Values

You should avoid iterating over the already read lines. Call readline whenever you want to read the next line and check to see what it is:
f = open(sys.argv[1], "r")
B_Values = []
while i != "":
i = f.readline()
if i.startswith("BBB"): #(Example)
for temp in range(3):
f.skipline() #Skip the 3 lines of table headers
i = f.readline()
while i != "+-------------+------------------+" and i !="":
#While we've not reached the table footer
B_Values.append(i)
i = f.readline()
break
#Although not necessary, you'd better put a close function there, too.
f.close()
print B_Values
EDIT: #eumiro 's method is more flexible than mine. Since it reads all the values from all sections. While you can implement isalpha testing in my example to read all the values, still his method is easier to read.

Related

Issues appending to a list in python

I have the following data file.
>| --- | | Adelaide | | --- | | 2021 | | --- | | Rnd | T | Opponent | Scoring | F | Scoring | A | R | M | W-D-L | Venue | Crowd |
> Date | | R1 | H | Geelong | 4.4 11.7 13.9 15.13 | 103 | 2.3 5.5 10.8
> 13.13 | 91 | W | 12 | 1-0-0 | Adelaide Oval | 26985 | Sat 20-Mar-2021 4:05 PM | | R2 | A | Sydney | 3.2 4.6 6.14 11.22 | 88 |
> 4.1 9.6 15.11 18.13 | 121 | L | -33 | 1-0-1 | S.C.G. | 23946 | Sat 27-Mar-2021 1:45 PM |
I created a code to manipulate that data to my desired results which is a list. When I print my variable row at the current spot it prints correctly.
However, when I append my list row to another list my_array I have issues. I get an empty list returned.
I think the issue is the placement of where I am appending?
My code is this:
with open('adelaide.md', 'r') as f:
my_array = []
team = ''
year = ''
for line in f:
row=[]
line = line.strip()
fields = line.split('|')
num_fields = len(fields)
if len(fields) == 3:
val = fields[1].strip()
if val.isnumeric():
year = val
elif val != '---':
team = val
elif num_fields == 15:
row.append(team)
row.append(year)
for i in range(1, 14):
row.append(fields[i].strip())
print(row)
my_array.append(row)

You need to append the row array inside the for loop.

I think last line should be inside the for loop. Your code is probably appending the last 'row' list. Just give it a tab.

How to take data from specific column and seperate it?

I have question in CSV, i have csv data like this:
Runway | Data1 | Data2 | Data3 |
13 | 425 | 23 | Go Straight |
| 222 | 24 | Go Straight |
| 424 | 25 | Go Left |
---------------------------------------
16 | 555 | 13 | Go Right |
| 858 | 14 | Go Right |
| 665 | 15 | Go Straight |
How can i turn it into seperate runway that look like this:
Runway | Data1 | Data2 | Data3 | S | Runway | Data1 | Data2 | Data3 |
13 | 425 | 23 | Go Straight | P | 16 | 555 | 13 | Go Right |
| 222 | 24 | Go Straight | A | | 858 | 14 | Go Right |
| 424 | 25 | Go Left | C | |
| | | | E |
Is this possible to do? Thank You

Sorry for take that long, here is my code, I tryied to be simple.
import csv
with open("file_name.csv", "r") as file:
csv_reader = csv.reader(file, delimiter = ',')
csv_reader = list(csv_reader)
# Get table header and pop it
header = csv_reader.pop(0)
# Recursive function to extract and large row
def getLargeRow(rows, csvRows = []):
if (len(rows) != 0):
largeRow = [rows.pop(0)]
while (rows != [] and rows[0][0] == ''):
largeRow.append(rows.pop(0))
csvRows.append(largeRow)
return getLargeRow(rows, csvRows)
else:
return csvRows
# Now we have all rows as an list of lists
rows = getLargeRow(csv_reader)
# Assuming that all large rows got the same height (same number of regular rows)
largeRowsHeight = len(rows[0])
# Amount of large rows
largeRowsAmount = len(rows)
print(rows)
# The new text of a new csv file
newCsvFileText = ''
for i in range(largeRowsAmount):
newCsvFileText += ','.join(header)
if i < largeRowsAmount - 1:
newCsvFileText += ',,'
newCsvFileText += '\n'
for i in range(largeRowsHeight):
for j, row in enumerate(rows):
newCsvFileText += ','.join(row[i])
if j < len(rows) - 1:
newCsvFileText += ',,'
newCsvFileText += '\n'
# Save into a new file
with open("new_file.csv", "w") as newfile:
newfile.write(newCsvFileText)

How to Create Footer for Python PrettyTable

I'm looking to add a footer to my PrettyTable, totalling the data stored in the rows above. I've created a count in the script, but I'd like to add this into the table.
The code I have to create the table below is as follows (.add_row is in a loop):
outTbl = PrettyTable(["Projects", "Number"])
outTbl.add_row([eachProj, count])
...which generates a table looking like this:
+--------------------------+-----------+
| Projects | Number |
+--------------------------+-----------+
| Project A | 5 |
| Project B | 9 |
| Project C | 8 |
| Project D | 2 |
+--------------------------+-----------+
...but I'm looking for the functionality to create the above table with a summary footer at the bottom:
+--------------------------+-----------+
| Projects | Number |
+--------------------------+-----------+
| Project A | 5 |
| Project B | 9 |
| Project C | 8 |
| Project D | 2 |
+--------------------------+-----------+
| Total | 24 |
+--------------------------+-----------+
I've searched the module docs online: PrettyTable tutorial, Google prettytable - Tutorial and can't see any reference to a footer, which I find surprising given header is one. Can this be done in PrettyTable, or is there another Python module with this functionality anyone can recommend?

You can use texttable with small hack around it:
import texttable
table = texttable.Texttable()
table.add_rows([['Projects', 'Number'],
['Project A\nProject B\nProject C\nProject D', '5\n9\n8\n2'],
['Total', 24]])
print(table.draw())
Output:
+-----------+--------+
| Projects | Number |
+===========+========+
| Project A | 5 |
| Project B | 9 |
| Project C | 8 |
| Project D | 2 |
+-----------+--------+
| Total | 24 |
+-----------+--------+

There is no separate function to create footer in pretty table. However you can do little trick to create, in case you are particular to use only pretty table as follows
sum = 0
for row in outTbl:
sum = sum + int(row.get_string(fields=["Number"]).split('\n')[3].replace('|','').replace(' ',''))
outTbl.add_row(['------------','-----------'])
outTbl.add_row(['Total',sum])
print (outTbl)
or if you are looking for particular function with footers you can look at
https://stackoverflow.com/a/26937531/3249782
for different approaches you can use

I had the same problem today and used the following approach to treat the last n lines of my table as result lines that are seperated by a horizontal line (like one that is separating the header):
from prettytable import PrettyTable
t = PrettyTable(['Project', 'Numbers'])
t.add_row(['Project A', '5'])
t.add_row(['Project B', '9'])
t.add_row(['Project C', '8'])
t.add_row(['Project D', '2'])
# NOTE: t is the prettytable table object
# Get string to be printed and create list of elements separated by \n
list_of_table_lines = t.get_string().split('\n')
# Use the first line (+---+-- ...) as horizontal rule to insert later
horizontal_line = list_of_table_lines[0]
# Print the table
# Treat the last n lines as "result lines" that are seperated from the
# rest of the table by the horizontal line
result_lines = 1
print("\n".join(list_of_table_lines[:-(result_lines + 1)]))
print(horizontal_line)
print("\n".join(list_of_table_lines[-(result_lines + 1):]))
This results in the following output:
+-----------+---------+
| Project | Numbers |
+-----------+---------+
| Project A | 5 |
| Project B | 9 |
| Project C | 8 |
+-----------+---------+
| Project D | 2 |
+-----------+---------+

I know I'm late but I've created a function to automagically append a "Total" row to the table. For now NOT resolving if the column is wider than the table.
Python3.6++
Function:
def table_footer(tbl, text, dc):
res = f"{tbl._vertical_char} {text}{' ' * (tbl._widths[0] - len(text))} {tbl._vertical_char}"
for idx, item in enumerate(tbl.field_names):
if idx == 0:
continue
if not item in dc.keys():
res += f"{' ' * (tbl._widths[idx] + 1)} {tbl._vertical_char}"
else:
res += f"{' ' * (tbl._widths[idx] - len(str(dc[item])))} {dc[item]} {tbl._vertical_char}"
res += f"\n{tbl._hrule}"
return res
Usage:
tbl = PrettyTable()
tbl.field_names = ["Symbol", "Product", "Size", "Price", "Subtotal", "Allocation"]
tbl.add_row([......])
print(tbl)
print(table_footer(tbl, "Total", {'Subtotal': 50000, 'Allocation': '29 %'}
+--------+-------------------------------+-------+---------+----------+------------+
| Symbol | Product | Size | Price | Subtotal | Allocation |
+--------+-------------------------------+-------+---------+----------+------------+
| AMD | Advanced Micro Devices Inc | 999.9 | 75.99 | 20000.0 | 23.00 |
| NVDA | NVIDIA Corp | 88.8 | 570.63 | 30000.0 | 6.00 |
+--------+-------------------------------+-------+---------+----------+------------+
| Total | | | | 50000 | 29 % |
+--------+-------------------------------+-------+---------+----------+------------+

After inspecting the source code of pretty table you can see that after you print the table you can get each column width. Using this you can create a footer by yourself, because pretty table do not give you that option. Here is my approach:
from prettytable import PrettyTable
t = PrettyTable(['Project', 'Numbers'])
t.add_row(['Project A', '5'])
t.add_row(['Project B', '9'])
t.add_row(['Project C', '8'])
t.add_row(['Project D', '2'])
print(t)
total = '24'
padding_bw = (3 * (len(t.field_names)-1))
tb_width = sum(t._widths)
print('| ' + 'Total' + (' ' * (tb_width - len('Total' + total)) +
' ' * padding_bw) + total + ' |')
print('+-' + '-' * tb_width + '-' * padding_bw + '-+')
And here is the output:
+-----------+---------+
| Project | Numbers |
+-----------+---------+
| Project A | 5 |
| Project B | 9 |
| Project C | 8 |
| Project D | 2 |
+-----------+---------+
| Total 24 |
+---------------------+
Just change the total var in the code and everything should be working fine

I stole #Niels solution and did this function to print with a delimiter before the last num_footers lines:
def print_with_footer(ptable, num_footers=1):
""" Print a prettytable with an extra delimiter before the last `num` rows """
lines = ptable.get_string().split("\n")
hrule = lines[0]
lines.insert(-(num_footers + 1), hrule)
print("\n".join(lines))

Best way to compare 2 dfs, get the name of different col & before + after vals?

What is the best way to compare 2 dataframes w/ the same column names, row by row, if a cell is different have the Before & After value and which cellis different in that dataframe.
I know this question has been asked a lot, but none of the applications fit my use case. Speed is important. There is a package called datacompy but it is not good if I have to compare 5000 dataframes in a loop (i'm only comparing 2 at a time, but around 10,000 total, and 5000 times).
I don't want to join the dataframes on a column. I want to compare them row by row. Row 1 with row 1. Etc. If a column in row 1 is different, I only need to know the column name, the before, and the after. Perhaps if it is numeric I could also add a column w/ the abs val. of the dif.
The problem is, there is sometimes an edge case where rows are out of order (only by 1 entry), and don’t want these to come up as false positives.
Example:
These dataframes would be created when I pass in race # (there are 5,000 race numbers)
df1
+-----+-------+--+------+--+----------+----------+-------------+--+
| Id | Speed | | Name | | Distance | | Location | |
+-----+-------+--+------+--+----------+----------+-------------+--+
| 181 | 10.3 | | Joe | | 2 | | New York | |
| 192 | 9.1 | | Rob | | 1 | | Chicago | |
| 910 | 1.0 | | Fred | | 5 | | Los Angeles | |
| 97 | 1.8 | | Bob | | 8 | | New York | |
| 88 | 1.2 | | Ken | | 7 | | Miami | |
| 99 | 1.1 | | Mark | | 6 | | Austin | |
+-----+-------+--+------+--+----------+----------+-------------+--+
df2:
+-----+-------+--+------+--+----------+----------+-------------+--+
| Id | Speed | | Name | | Distance | | | Location |
+-----+-------+--+------+--+----------+----------+-------------+--+
| 181 | 10.3 | | Joe | | 2 | | New York | |
| 192 | 9.4 | | Rob | | 1 | | Chicago | |
| 910 | 1.0 | | Fred | | 5 | | Los Angeles | |
| 97 | 1.5 | | Bob | | 8 | | New York | |
| 99 | 1.1 | | Mark | | 6 | | Austin | |
| 88 | 1.2 | | Ken | | 7 | | Miami | |
+-----+-------+--+------+--+----------+----------+-------------+--+
diff:
+-------+----------+--------+-------+
| Race# | Diff_col | Before | After |
+-------+----------+--------+-------+
| 123 | Speed | 9.1 | 9.4 |
| 123 | Speed | 1.8 | 1.5 |
An example of a false positive is with the last 2 rows, Ken + Mark.
I could summarize the differences in one line per race, but if the dataframe has 3000 records and there are 1,000 differences (unlikely, but possible) than I will have tons of columns. I figured this was was easier as I could export to excel and then sort by race #, see all the differences, or by diff_col, see which columns are different.
def DiffCol2(df1, df2, race_num):
is_diff = False
diff_cols_list = []
row_coords, col_coords = np.where(df1 != df2)
diffDf = []
alldiffDf = []
for y in set(col_coords):
col_df1 = df1.iloc[:,y].name
col_df2 = df2.iloc[:,y].name
for index, row in df1.iterrows():
if df1.loc[index, col_df1] != df2.loc[index, col_df2]:
col_name = col_df1
if col_df1 != col_df2: col_name = (col_df1, col_df2)
diffDf.append({‘Race #’: race_num,'Column Name': col_name, 'Before: df2.loc[index, col_df2], ‘After’: df1.loc[index, col_df1]})
try:
check_edge_case = df1.loc[index, col_df1] == df2.loc[index+1, col_df1]
except:
check_edge_case = False
try:
check_edge_case_two = df1.loc[index, col_df1] == df2.loc[index-1, col_df1]
except:
check_edge_case_two = False
if not (check_edge_case or check_edge_case_two):
col_name = col_df1
if col_df1 != col_df2:
col_name = (col_df1, col_df2) #if for some reason column name isn’t the same, which should never happen but in case, I want to know both col names
is_diff = True
diffDf.append({‘Race #’: race_num,'Column Name': col_name, 'Before: df2.loc[index, col_df2], ‘After’: df1.loc[index, col_df1]})
return diffDf, alldiffDf, is_diff
[apologies in advance for weirdly formatted tables, i did my best given how annoying pasting tables into s/o is]

The code below works if dataframes have the same number and names of columns and the same number of rows, so comparing only values in the tables
Not sure where you want to get Race# from
df1 = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
df2 = df1.copy(deep=True)
df2['B'][5] = 100 # Creating difference
df2['C'][6] = 100 # Creating difference
dif=[]
for col in df1.columns:
for bef, aft in zip(df1[col], df2[col]):
if bef!=aft:
dif.append([col, bef, aft])
print(dif)
Results below
Alternative solution without loops
df = df1.melt()
df.columns=['Column', 'Before']
df.insert(2, 'After', df2.melt().value)
df[df.Before!=df.After]

Random newline characters appearing in list [duplicate]

This question already has answers here:
Deal with new line character “\n” in Sqlite database using Python?
(2 answers)
Closed 8 years ago.
The problem
I am currently having a problem with lists. Whenever I get data from the table students(the classes) and put it into a list. I am using the following line of code to do this
classes = [list(i) for i in query("SELECT class FROM students GROUP BY class").fetchall()]
The table students contains
+----+---------+-------+
| ID | name | class |
+----+---------+-------+
| 1 | John | 901 |
| 2 | Pat | 904 |
| 3 | Hal | 911 |
| 4 | Bill | 905 |
| 5 | Lyon | 902 |
| 6 | Lauren | 907 |
| 7 | Phillip | 908 |
| 8 | Charlie | 906 |
| 9 | Amy | 911 |
| 10 | Paul | 904 |
| 11 | Lewis | 903 |
+----+---------+-------+
However when I
print(classes)
The result is
[['901\n'], ['902'], ['903\n'], ['904'], ['905'], ['906'], ['907'], ['908'], ['911\n']]
Question
Why are the new line characters there and is there a way to get rid of them? I would have thought this
for e in classes:
e = str(e).replace("\n", ' ')
would have worked for replacing but it appears it is not working.

you can not convert list to string. Use enumerate to access the index of any entry in each loop , and also as classes is a nested list you need to use index j[0] for access to entries :
for i,j in enumerate(classes):
if '\n' in j[0]:
classes[i] = str(j[0]).replace("\n", ' ')
Then result should be :
['901','902','903',.....]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to extract particular set of value from a file in Python? - python

Related

Issues appending to a list in python

How to take data from specific column and seperate it?

How to Create Footer for Python PrettyTable

Best way to compare 2 dfs, get the name of different col & before + after vals?

Random newline characters appearing in list [duplicate]

Categories

Resources