Disable scroll down and show all data on table, plotly - python

I created a table using plotly to calculate some financials, I would like to show the whole table in the graph interface (not just a few rows):
As you can see in the image, only 11 of my 30 rows are shown. I would like to show all the data of the table (all 30 rows with no scrollbar).
The code for the table is the following:
fig6 = go.Figure(data=[go.Table(
header=dict(values=list(df_table.columns),
fill_color='#d3d3d3',
align='left'),
cells=dict(values=[df_table['date'],
df_table['P/E_Ratio'],
df_table['Stock Price']],
fill_color='white',
align='left'))
])

As Juan correctly stated, adding height to fig6.update_layout() will do the trick. If you are however looking for a more dynamic workaround, you can use this function to calculate the height when input with a dataframe-
def calc_table_height(df, base=208, height_per_row=20, char_limit=30, height_padding=16.5):
'''
df: The dataframe with only the columns you want to plot
base: The base height of the table (header without any rows)
height_per_row: The height that one row requires
char_limit: If the length of a value crosses this limit, the row's height needs to be expanded to fit the value
height_padding: Extra height in a row when a length of value exceeds char_limit
'''
total_height = 0 + base
for x in range(df.shape[0]):
total_height += height_per_row
for y in range(df.shape[1]):
if len(str(df.iloc[x][y])) > char_limit:
total_height += height_padding
return total_height
You might have to play around with the other features if you have a different font_size than the default, or if you change the margin from the default. Also, the char_limit argument of the function is the other weakpoint as some characters take up more space than others, capital characters take up more space, and a single word if long can force a row to be extended. It should also be increased if the number or columns are less and vice versa. The function is written taking 4 table columns into consideration.
You would have to add fig6.update_layout(height=calc_table_height(df_table)) to make it work.

Related

Is there a more efficient way to process this data?

I have a bunch of data points in the format (x, y, z, a, b, c) where x is an integer, y is a date, z is an integer, and a, b, and c are different numerical values (integers and floats).
My goal is to allow the user to provide two dates (so two y values), and then be presented with the values of (delta_a, delta_b, delta_c) for all existing (x, z) values; delta_a would be the increase/decrease in the value of a between the two dates, etc.
For example, let's say there's just 3 possible values of x and 3 possible values of z. The user provides two dates, y1=date(2023,2,7) and y2=date(2023,2,15) This data is presented in a table, like this:
Now in reality, there's about 30 different values of x and about 400 different values of z, so this table would actually have about 30 rows and about 400 columns (I have a horizontal scrollbar inside the table to look through the data).
Also, the value of y can be any date (at least since I started importing this data, about a month ago). So every day, about 12,000 new data entries are added to the database.
The way I'm currently handling this is I have a model DataEntry which basically looks like this:
class DataEntry(models.Model):
x = models.IntegerField()
y = models.DateField()
z = models.IntegerField()
a = models.IntegerField()
b = models.FloatField()
c = models.FloatField()
So every time the user generates a data table by inputting two dates, it can take quite a while since the system is comparing the ~12,000 data entries for y1 with the ~12,000 data entries for y2 and displaying all the different values. I will say that not every z value is actually displayed, because the user also inputs a minimum delta_a value, which is 5 by default - so if a did not increase by at least 5, then that table cell is empty. And if an entire column is just empty data cells, i.e. there's no x value for that given z column which had an a value increase by at least 5, then the column is hidden. So sometimes there's as few as 20 columns actually showing, but sometimes there's closer to 100. The user can choose to display all data though, meaning the full ~400 columns.
I hope I've explained this sufficiently. Is there a more efficient way to be handling all this? Does it actually make sense to have distinct objects for every single data entry or is there some way I could condense this down to maybe speed up the process?
Any pointers?

Conflicting results when grouping observations in Stata vs Python

I have a longitudinal dataset and I am trying to create two variables that correspond to two time periods based on specific date ranges (period_1 and period_2) to be able to analyze the effect of each of those time periods on my outcome.
My Stata code for grouping variables by ID is
gen period_1 = date_eval < mdy(5,4,2020)
preserve
collapse period_1=period_1
count if period_1
and it gives me a number of individuals during that period.
However, I get a different number if I use the SQL query in Python
evals_period_1 = ps.sqldf('SELECT id, COUNT(date_eval) FROM df WHERE strftime(date_eval) < strftime("%m/%d/%Y",{}) GROUP BY id'.format('5/4/2020'))
Am I grouping by ID differently in these two codes? Please let me know what you think.
Agree with Nick that a reproducible example would have been useful. Or at least a description of the results and how it is not as you expected. However, I can still say something about your Stata code. See a reproducible example below, and see how your code always results in the count 1. Even though the example below randomize the data to be different each time.
* Create a data set with 50 rows where period_1 is dummy (0,1) randomized
* differently each run
clear
set obs 50
gen period_1 = (runiform() < .5)
* List the first 5 rows
list in 1/5
* This collapses all rows and what you are left with is one row where the value
* is the average of all rows
collapse period_1=period_1
* List the one remaining observation
list
* Here Stata syntax is probably not what you are expecting. period_1 will
* here be replaced with the value in the first row. The random mean around .5.
* (This is my understanding assuming it follows what "display period_1" would do)
count if period_1
* That is identical to count if .5. And Stata evaluates
* any number >0 to "true" meaning the count where
* this statement is true to 1. This will always be the case in this code
* unless the random number generator creates the corner case where all rows are 0
count if .5
You probably want to drop the row with collapse and change the last row to count if period_1 == 1. But how your data is formatted is relevant for if this is the solution to your original question.

Python-docx - merge ALL cells in a row or column of a table (or a specific subset of cells in a column) with one command

I am using python-docx to programmatically generate a very large and messy table inside of word document.
How, as part of beautification process I need to merge together all cells in specific rows or columns.
When I know how many cells are there in a row or column in advance merge is trivial. MVP below:
from docx import Document
doc = Document()
#adding table of details with 4 columns
tableOverview = doc.add_table(rows=1, cols=4)
tableOverview.style = 'Table Grid'
#add some text in a first cell of a row
row = tableOverview.row_cells(0)
row[0].text = "Job details"
#merge all 4 cells in a row into one
merge1 = row[0].merge(row[1]).merge(row[2]).merge(row[3])
However:
This looks really ugly and un-pythonic to specify chain of merges
It gets tricky if I don't know in advance how many cells are there in a row (or in a column). This becomes a problem as I am generating this tables based on inputs hence number of cells per row and column is dynamic - so I can't hardcode such merge chain
Quick check of documentation didn't yield any good examples, it is always the case that there are just two cells being merged at a time. Is there some reasonable way to merge together a whole list of cells?
Thanks!
All you need provide to _Cell.merge() is the cell at the opposite diagonal of the rectangle you want to merge.
So, for example, if you wanted to merge the top-left 3 x 3 cell area in a 9 x 9 table, you could use:
table.cell(0, 0).merge(table.cell(2, 2))
or, more verbose but perhaps more instructive:
top_left = table.cell(0, 0)
bottom_right = table.cell(2, 2)
top_left.merge(bottom_right)
So all you need do is get a reference to any two diagonal corners. Note that:
bottom_right.merge(top_left)
works just as well as the other direction. For that matter:
top_right = table.cell(0, 2)
bottom_left = table.cell(2, 0)
bottom_left.merge(top_right)
works just as well too. Any two diagonal "corner" cells can be used to define a merged cell.

Python Texttable - Total width for each row

Is there a method to determine the column width for the Table after the rows have been added.
For example even if I set initialization parameters below:
text_table = Texttable( max_width= 160)
The table may default to a smaller size if the row's total width is less than the number, which is a good rendering.
However, I would like to know what was the actual width for the entire row, in the case of rows where the width does not hit the max_width limit.
The solution text_table.width returns a list of numbers but the sum does not equate the table width.
Using the following accurately returns.
table_width = max( [ len(x) for x in table_as_str.split('\n') ])
where table_as_str is the result from text_table.draw()
Thanks

What rows are in view of a QAbstractTableModel

I have a custom QTableView with a custom QAbstractTableModel. I update every row where data has changed. The class that manages the data has a dirty flag which works well to help cut down the number of updates.
When I have a large number of rows, 1000 or more, the table gets a little less responsive. Instead of a for loop for each row to check it is dirty, I'd like to just loop over the 20 or so rows visible to the user, but I can't seem to determine how to get that information.
Is there a method or a convenient way to determine what rows are visible to a QAbstractTableModel?
The following will update only the rows visible to the user:
minRow = treeView.rowAt(0) # very top of scrollable area
if minRow >= 0: # ensure there is at least one row
maxRow = treeView.rowAt(treeView.height()) # very bottom...
# there may not be enough rows to fill scrollable area
if maxRow < 0: maxRow = model.rowCount() - 1
for row in range(minRow, maxRow + 1):
model.dataChanged.emit(model.index(row, 0), model.index(row, model.columnCount()))

Categories