Python Texttable - Total width for each row - python

Is there a method to determine the column width for the Table after the rows have been added.
For example even if I set initialization parameters below:
text_table = Texttable( max_width= 160)
The table may default to a smaller size if the row's total width is less than the number, which is a good rendering.
However, I would like to know what was the actual width for the entire row, in the case of rows where the width does not hit the max_width limit.

The solution text_table.width returns a list of numbers but the sum does not equate the table width.
Using the following accurately returns.
table_width = max( [ len(x) for x in table_as_str.split('\n') ])
where table_as_str is the result from text_table.draw()
Thanks

Related

Conflicting results when grouping observations in Stata vs Python

I have a longitudinal dataset and I am trying to create two variables that correspond to two time periods based on specific date ranges (period_1 and period_2) to be able to analyze the effect of each of those time periods on my outcome.
My Stata code for grouping variables by ID is
gen period_1 = date_eval < mdy(5,4,2020)
preserve
collapse period_1=period_1
count if period_1
and it gives me a number of individuals during that period.
However, I get a different number if I use the SQL query in Python
evals_period_1 = ps.sqldf('SELECT id, COUNT(date_eval) FROM df WHERE strftime(date_eval) < strftime("%m/%d/%Y",{}) GROUP BY id'.format('5/4/2020'))
Am I grouping by ID differently in these two codes? Please let me know what you think.
Agree with Nick that a reproducible example would have been useful. Or at least a description of the results and how it is not as you expected. However, I can still say something about your Stata code. See a reproducible example below, and see how your code always results in the count 1. Even though the example below randomize the data to be different each time.
* Create a data set with 50 rows where period_1 is dummy (0,1) randomized
* differently each run
clear
set obs 50
gen period_1 = (runiform() < .5)
* List the first 5 rows
list in 1/5
* This collapses all rows and what you are left with is one row where the value
* is the average of all rows
collapse period_1=period_1
* List the one remaining observation
list
* Here Stata syntax is probably not what you are expecting. period_1 will
* here be replaced with the value in the first row. The random mean around .5.
* (This is my understanding assuming it follows what "display period_1" would do)
count if period_1
* That is identical to count if .5. And Stata evaluates
* any number >0 to "true" meaning the count where
* this statement is true to 1. This will always be the case in this code
* unless the random number generator creates the corner case where all rows are 0
count if .5
You probably want to drop the row with collapse and change the last row to count if period_1 == 1. But how your data is formatted is relevant for if this is the solution to your original question.

Disable scroll down and show all data on table, plotly

I created a table using plotly to calculate some financials, I would like to show the whole table in the graph interface (not just a few rows):
As you can see in the image, only 11 of my 30 rows are shown. I would like to show all the data of the table (all 30 rows with no scrollbar).
The code for the table is the following:
fig6 = go.Figure(data=[go.Table(
header=dict(values=list(df_table.columns),
fill_color='#d3d3d3',
align='left'),
cells=dict(values=[df_table['date'],
df_table['P/E_Ratio'],
df_table['Stock Price']],
fill_color='white',
align='left'))
])
As Juan correctly stated, adding height to fig6.update_layout() will do the trick. If you are however looking for a more dynamic workaround, you can use this function to calculate the height when input with a dataframe-
def calc_table_height(df, base=208, height_per_row=20, char_limit=30, height_padding=16.5):
'''
df: The dataframe with only the columns you want to plot
base: The base height of the table (header without any rows)
height_per_row: The height that one row requires
char_limit: If the length of a value crosses this limit, the row's height needs to be expanded to fit the value
height_padding: Extra height in a row when a length of value exceeds char_limit
'''
total_height = 0 + base
for x in range(df.shape[0]):
total_height += height_per_row
for y in range(df.shape[1]):
if len(str(df.iloc[x][y])) > char_limit:
total_height += height_padding
return total_height
You might have to play around with the other features if you have a different font_size than the default, or if you change the margin from the default. Also, the char_limit argument of the function is the other weakpoint as some characters take up more space than others, capital characters take up more space, and a single word if long can force a row to be extended. It should also be increased if the number or columns are less and vice versa. The function is written taking 4 table columns into consideration.
You would have to add fig6.update_layout(height=calc_table_height(df_table)) to make it work.

Revising the rates in a row/Python

I want to take the ratio of the target variable and distribute it to other non-zero variables in the same line, by their own weight. Can you help with this?
I want to make a row sum 100% including the target variable. I want to take the ratio of the target variable and distribute it to other variables. I want the rates to be 100% again. (target will be zero)
What you describe is just normalization of the rows:
no_target = df.columns != 'target'
norm = df.loc[:, no_target].sum(axis=1) # sum of all values except target
df.loc[:, no_target] /= norm * 100
df['target'] = 0
I think you might be asking how to split the target value by the percentages in each column, replacing the percent value x with the target * x. You could do this by iterating over each percentage value and multiplying by the target. Zero is not a special case because 0 * target = 0. After each item in each row is changed, set the corresponding target value to zero. If the sum of the original column values is 1 before multiplying, the sum of each of the columns after multiplying will be equal to the former target.
If I'm not understanding you question, please post more details including what you have tried so far.

Find the values that meet a certain criteria in a Pivot table created by Python

I used Python to create a table:
data_pivot = pd.pivot_table(df, values='occ',
index='ALT_grid', columns='LCT_grid', aggfunc='count')
print(data_pivot)
the table is similar to that in image:
The figure shows the number of occurrences of an event of 90-130 kilometers corresponding to 1-24 hours. The minimum number of times in the table is 3 times.
I want to select the rows with numbers greater than 100.
My code is as follows:
limit2 = data_pivot.values >=100
print(data_pivot[limit2])
However, all I get is a value greater than 100, not a whole row with values greater than 100. Just like below:
I hope that after filtering the data, all the values in any row are> 100.
The result I want is similar to the following figure, in all the rows, the value is greater than 100
I checked a lot of information, but still could not solve this problem, any help is appreciated.

Weighted database query with randomisation

I have a database model with a positive integer column named 'weight'. There's also other columns but they're not important for this problem. The weight column basically describes how 'important' this row is. The higher the value of weight, the more important. The weight will only range from 0 - 3. The default is 0 (least important).
I'd like to perform a query which selects 50 rows ordered by the weight column, but has been slightly randomised and includes rows with weights lower than what's in the results.
For example, the first 50 rows ordered by weight may all have a weight of 3 and 2. The query needs to include mostly these results, but also include some with a weight of 1 and 0. They need to be slightly randomised as well so the same query won't always return the same results. Also, even though it's limiting the results to 50, it needs to do this last, otherwise the same 50 results will be returned just in a different order.
This will be integrated in a Django project, but the DB is MySQL, so raw SQL is OK.
Performance is critical because this will happen on a landing page of a high traffic website.
Any ideas would be appreciated.
Thanks
You can use the rand() function combined with your weight column
select * from YOUR_TABLE order by weight * rand() desc
Note that this means that a weight 3 can appear more probably at the beginning than a weight 2.
Weight 0 appears always at the end because 0 * any number is always 0. If you don't like that you can add 1 to weight and transform the query to
select * from YOUR_TABLE order by (weight + 1) * rand() desc
Obviously if you need only the first 50 values you can add the limit clause to the query

Categories