Web2Py - using starred expression for rendering HTML table - python

This question is an extension of: Web2Py - rendering AJAX response as HTML table
Basically, I come up with a dynamic list of response rows that I need to display on the UI as an HTML table.
Essentially the code looks like this,
response_results = []
row_one = ['1', 'Col 11', 'Col 12', 'Col 13']
response_results.append(row_one)
row_two = ['2', 'Col 21', 'Col 22', 'Col 23']
response_results.append(row_two)
html = DIV(TABLE(THEAD(TR(TH('Row #'), TH('Col 1'), TH('Col 2'), TH('Col 3')),
_id=0), TR([*response for response in response_results]),
_id='records_table', _class='table table-bordered'),
_class='table-responsive')
return html
When I use this kind of code: TR([request.vars[input] for input in inputs]) or TR(*the_list), it works fine.
However, I have come up with a need to use a hybrid of these two i.e. TR([*response for response in response_results]). But, it fails giving an error message:
"Python version 2.7 does not support this syntax. Starred expressions are not allowed as assignment targets in Python 2."
When I run this code instead i.e. without a '*': TR([response for response in response_results]) it runs fine but puts all the columns of my row together in the first column of the generated HTML table, leaving all other columns blank.
Can someone kindly help me resolve this issue and guide on how can I achieve the required result of displaying each column of the rows at their proper spots in the generated HTML table?

You need to generate a TR for each item in response_results, which means you need a list of TR elements with which you can then use Python argument expansion (i.e., the * syntax) to treat each TR as a positional argument to TABLE.
html = DIV(TABLE(THEAD(TR(TH('Row #'), TH('Col 1'), TH('Col 2'), TH('Col 3')), _id=0),
*[TR(response) for response in response_results],
_id='records_table', _class='table table-bordered'),
_class='table-responsive')
Note, because each response is itself a list, you could also use argument expansion within the TR:
*[TR(*response) for response in response_results]
But that is not necessary, as TR optionally takes a list, converting each item in the list into a table cell.
Another option is to make response_results a list of TR elements, starting with the THEAD element, and then just pass that list to TABLE:
response_results = [THEAD(TR(TH('Row #'), TH('Col 1'), TH('Col 2'), TH('Col 3')), _id=0)]
row_one = ['1', 'Col 11', 'Col 12', 'Col 13']
response_results.append(TR(row_one))
row_two = ['2', 'Col 21', 'Col 22', 'Col 23']
response_results.append(TR(row_two))
html = DIV(TABLE(response_results, _id='records_table', _class='table table-bordered'),
_class='table-responsive')
Again, you could do TABLE(*response_results, ...), but the * is not necessary, as TABLE can take a list of row elements.

Related

Spark Dataframe column name change does not reflect

I am trying to rename some special characters from my spark dataframe. For some weird reason, it shows the updated column name when I print the schema, but any attempt to access the data results in an error complaining about the old column name. Here is what I am trying:
# Original Schema
upsertDf.columns
# Output: ['col 0', 'col (0)', 'col {0}', 'col =0', 'col, 0', 'col; 0']
for c in upsertDf.columns:
upsertDf = upsertDf.withColumnRenamed(c, c.replace(" ", "_").replace("(","__").replace(")","__").replace("{","___").replace("}","___").replace(",","____").replace(";","_____").replace("=","_"))
upsertDf.columns
# Works and returns expected result
# Output: ['col_0', 'col___0__', 'col____0___', 'col__0', 'col_____0', 'col______0']
# Print contents of dataframe
# Throws error for original attribute name "
upsertDf.show()
AnalysisException: 'Attribute name "col 0" contains invalid character(s) among " ,;{}()\\n\\t=". Please use alias to rename it.;'
I have tried other options to rename the column (using alias etc...) and they all return the same error. Its almost as if the show operation is using a cached version of the schema but I can't figure out how to force it to use the new names.
Has anyone run into this issue before?
Have a look at this minimal example (using your renaming code, ran in a pyspark shell version 3.3.1):
df = spark.createDataFrame(
[("test", "test", "test", "test", "test", "test")],
['col 0', 'col (0)', 'col {0}', 'col =0', 'col, 0', 'col; 0']
)
df.columns
['col 0', 'col (0)', 'col {0}', 'col =0', 'col, 0', 'col; 0']
for c in df.columns:
df = df.withColumnRenamed(c, c.replace(" ", "_").replace("(","__").replace(")","__").replace("{","___").replace("}","___").replace(",","____").replace(";","_____").replace("=","_"))
df.columns
['col_0', 'col___0__', 'col____0___', 'col__0', 'col_____0', 'col______0']
df.show()
+-----+---------+-----------+------+---------+----------+
|col_0|col___0__|col____0___|col__0|col_____0|col______0|
+-----+---------+-----------+------+---------+----------+
| test| test| test| test| test| test|
+-----+---------+-----------+------+---------+----------+
As you see, this executes successfully. So your renaming functionality is OK.
Since you haven't shared all your code (how upsertDf is defined), we can't really know exactly what's going on. But looking at your error message, this comes from ParquetSchemaConverter.scala in a Spark version earlier than 3.2.0 (this error message changed in 3.2.0, see SPARK-34402).
Make sure that you read in your data and then immediately rename the columns, without doing any other operation.

for loop dataframe last row of a group

I'm struggling with a for loop for a dataframe.
I want a function where I loop through a dataframe with object names and their properties.
Suggest the dataframe looks like this:
data = [['object 1', 'property 1'], ['object 1','property 11'], ['object 2', 'property 2'],['object 2','property 22'], ['object 3', 'property 3'],['object 3','property 33']]
I want to generate a string where the last row of each object doesn't contain a comma and all other rows don't.
def addProperties(objects):
obj = objects
for index, row in obj.iterrows():
if row['label'] =! #last element
string = row['label'] + row['attribuutLabel'] + ','
else:
string = row['label'] + row['attribuutLabel']
return string
Output should be something like this:
string = 'object 1 property 1, property 11, property 111 object 2 property 2, property 22 property 3, property 33'
I'm quite new to python so don't know what the best way is to achieve this.
Can someone help out?

How to filter a pandas column by list of strings?

The standard code for filtering through pandas would be something like:
output = df['Column'].str.contains('string')
strings = ['string 1', 'string 2', 'string 3']
Instead of 'string' though, I want to filter such that it goes through a collection of strings in list, "strings". So I tried something such as
output = df['Column'].str.contains('*strings')
This is the closest solution I could find, but did not work
How to filter pandas DataFrame with a list of strings
Edit: I should note that I'm aware of the | or operator. However, I'm wondering how to tackle all cases in the instance list strings is changing and I'm looping through varying lists of changing lengths as the end goal.
You can create a regex string and search using this string.
Like this:
df['Column'].str.contains('|'.join(strings),regex=True)
you probably should look into using isin() function (pandas.Series.isin) .
check the code below:
df = pd.DataFrame({'Column':['string 1', 'string 1', 'string 2', 'string 2', 'string 3', 'string 4', 'string 5']})
strings = ['string 1', 'string 2', 'string 3']
output = df.Column.isin(strings)
df[output]
output:
Column
0 string 1
1 string 1
2 string 2
3 string 2
4 string 3

Access dynamically created data frames

Hello Python community,
I have a problem with my code creation.
I wrote a code that creates dynamically dataframes in a for loop. The problem is that I don't know how to access to them.
Here is a part of code
list = ['Group 1', 'Group 2', 'Group 3']
for i in list:
exec('df{} = pd.DataFrame()'.format(i))
for i in list:
print(df+i)
The dataframes are created but i can not access them.
Could someone help me please?
Thank you in advance
I'm not sure exactly how your data is stored/accessed but you could create a dictionary to pair your list items with each dataframe as follows:
list_ = ['Group 1', 'Group 2', 'Group 3']
dataframe_dict = {}
for i in list_:
data = np.random.rand(3,3) #create data for dataframe here
dataframe_dict[i] = pd.DataFrame(data, columns=["your_column_one", "two","etc"])
Can then retrieve each dataframe by calling its associated group name as the key of the dictionary as follows:
for key in dataframe_dict.keys():
print(key)
print(dataframe_dict[key])

Pandas - Iterate through lists / dictionaries for calculations

I am new to coding & I am looking for a pythonic way to implement the following code. Here is a sample dataframe with code:
np.random.seed(1111)
df2 = pd.DataFrame({
'Product':np.random.choice( ['Prod 1','Prod 2','Prod 3', 'Prod 4','Prod 5','Prod 6','Box 1','Box 2','Box 3'], 10000),
'Transaction_Type': np.random.choice(['Produced','Transferred','Scrapped','Sold'], 10000),
'Quantity':np.random.randint(1,100, size=(10000)),
'Date':np.random.choice( pd.date_range('1/1/2017','12/31/2018',
freq='D'), 10000)})
idx = pd.IndexSlice
In the data set, each 'Box' ('Box 1', 'Box 2', etc.) is a raw material that corresponds to multiple products. For example, 'Box 1' is used for 'Prod 1' & 'Prod 2', 'Box 2' is used for 'Prod 3' & 'Prod 4', & 'Box 3' is used for 'Prod 5' & 'Prod 6'.
The data set I'm working with is much larger, but I have these data sets stored as lists, for example I have 'Box 1' = ['Prod 1', 'Prod 2', 'Prod 3']. If need be, I could store as a dictionary with a tuple like Box1 = {'Box 1':('Prod 1','Prod 2') - whatever is best.
For each grouping, I'm looking to calculate the total number of boxes used which is the sum of 'Produced' + 'Scrapped' inventory. To get this value, I'm currently doing a manual filter on a groupby of each product & filtering manually. You can see I'm manually writing a list of the products as a the second assign statement.
For example, to calculate how much of 'Box 1' to relieve from inventory, each month, you would sum the values of 'Box 1' that was produced & scrapped. Then, you would calculate the values of 'Prod 1' through 'Prod 3' (since they use 'Box 1') that were produced & scrapped & add them all together to get a total 'Box 1' used & scrapped for each time frame. Here's an example of what I'm currently doing:
box1 = ['Box 1','Prod 1','Prod 2']
df2[df2['Transaction_Type'].isin(['Produced','Scrapped'])].groupby([pd.Grouper(key='Date',freq='A' ),'Product','Transaction_Type']).agg({'Quantity':'sum'})\
.unstack()\
.loc[idx[:,box1],idx[:]]\
.assign(Box_1 = lambda x: 'Box 1')\
.assign(List_of_Products = lambda x: 'Box 1, Prod 1, Prod 2')\
.reset_index()\
.set_index(['Box_1','List_of_Products','Date','Product'])\
.groupby(level=[0,1,2]).sum()\
I'd then have to do the same clunky manual same exercise for 'Box 2', etc.
Is there a more pythonic way? I would like to complete this analysis each month going forward. The actual data is much more complex with roughly 20 different 'Boxes' that have a varying number of products associated with each. I'm not sure if I should be looking to create a function or use a dictionary vs. lists, but would appreciate any help along the way. As a last request, I'd love to have the flexibility to write each of these 'Box_1' to a different excel worksheet.
Thanks in advance!
Not sure how you want the result at the end, but as each Prod uses only one Box, then you can replace the Prod by its Box and do the groupby like you do. Let's suppose you have a dictionary such as:
box_dict = {'Box 1': ('Prod 1', 'Prod 2'),
'Box 2': ('Prod 3', 'Prod 4'),
'Box 3': ('Prod 5', 'Prod 6')}
then you want to reverse it to get the prod as the key and the box as the value:
dict_prod = { prod:box for box, l_prod in box_dict.items() for prod in l_prod}
Now you can use replace:
print (df2[df2['Transaction_Type'].isin(['Produced','Scrapped'])]
.replace({'Product':dict_prod}) #here to change the prod to the box used
.groupby([pd.Grouper(key='Date',freq='A' ),'Product','Transaction_Type'])['Quantity']
.sum().unstack())
Quantity
Transaction_Type Produced Scrapped
Date Product
2017-12-31 Box 1 20450 19152
Box 2 20848 21145
Box 3 22475 21518
2018-12-31 Box 1 19404 16964
Box 2 21655 20753
Box 3 21343 21576
I think I would filter my source dataframe down to just want I need to query first off then do you grouping and aggregrations:
df2.query('Transaction_Type in ["Produced","Scrapped"] and Product in ["Box 1","Prod 1","Prod 2"]')\
.groupby([pd.Grouper(key='Date',freq='A'),'Product','Transaction_Type'])['Quantity'].sum()\
.unstack().reset_index(level=1).groupby(level=0).agg({'Product':lambda x: ', '.join(x),'Produced':'sum','Scrapped':'sum'})
Output:
Product Produced Scrapped
Date
2017-12-31 Box 1, Prod 1, Prod 2 20450 19152
2018-12-31 Box 1, Prod 1, Prod 2 19404 16964
I do not understand why such a long expression is needed. It seems you only care about the total number of rows satisfying the condition, if I am not totally wrong.
d = {'Box 1': ('Box 1', 'Prod 1', 'Prod 2')}
d_type = {'Box 1': ('Produced', 'Scrapped')}
selected = df2[df2['Product'].isin(d['Box 1']) & df2['Transaction_Type'].isin(d_type['Box 1'])]
print(len(selected))
For your excel exporting needs, something as below would work.
writer = pd.ExcelWriter("test.xlsx")
selected.to_excel(writer, 'Sheet1')
writer.save()

Categories