Naming columns by mathematical symbols in pandas dataframe

Naming columns by mathematical symbols in pandas dataframe - python

I want to add the units of my parameters next to each parameter as the name of a column in my dataframe. I also need to use statistical symbols for some column names such as μ and σ2.
I tried following code according to mathematical symbols in python that is r"$...$ but it does not work for dataframe:
P[r"Infiltration rate ($1/\h^-1$)"]=r['ACH_Base']
in order to give (1/h^-1) unit to Infiltration rate parameter.
In my code I have already created a new dataframe "P" and I am adding the ACH_Base column in "r" dataframe to P.
How can I add mathematical symbols for naming the columns in dataframes?
Thanks!!

It should work, but it depends on the backend used to display the dataframe. For instance, matplotlib has support to render LaTeX in plots.
Here is an example:
https://matplotlib.org/users/usetex.html#text-rendering-with-latex
LaTeX can also be rendered in jupyter notebooks, but this does not apply to Python code, only for markdown cells:
http://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html?highlight=latex#LaTeX-equations

"\h" is an unknown symbol.
Does P[r"Infiltration rate ($1/h^-1$)"]=r['ACH_Base'] work to display what you want?
What unit do you wish to display? You can refer to https://matplotlib.org/users/mathtext.html and https://matplotlib.org/users/usetex.html#usetex-tutorial for more information on how to render text with LaTex.

Related

Is it possible to calculate column totals/sums within a python panel tabulator table?

I can't seem to find anywhere of anyone being able to do this.
I've search all of panel tabulator, and it seems that they reference maybe being able to do it using the tabulator api essentially, but I can't seem to figure it out.

You can use the calcs option in the Tabulator constructor by specifying which columns should have calculations applied to them and the type of calculation such as sum, average, minimum, maximum, etc.
import tabulator
# Load data into Tabulator
table = tabulator.Stream("data.csv")
# Specify calculation for a column
table.calcs("sum", columns=[{"field":"column_name", "title":"Column Total"}])
# Stream the data and display the table
table.stream().interactive()

How can I iterate over columns of a csv file to split it into several files?

I am completely new to Python (I started last week!), so while I looked at similar questions, I have difficulty understanding what's going on and even more difficulty adapting them to my situation.
I have a csv file where rows are dates and columns are different regions (see image 1). I would like to create a file that has 3 columns: Date, Region, and Indicator where for each date and region name the third column would have the correct indicator (see image 2).
I tried turning wide into long data, but I could not quite get it to work, as I said, I am completely new to Python. My second approach was to split it up by columns and then merge it again. I'd be grateful for any suggestions.

This gives your solution using stack() in pandas:
import pandas as pd
# In your case, use pd.read_csv instead of this:
frame = pd.DataFrame({
'Date': ['3/24/2020', '3/25/2020', '3/26/2020', '3/27/2020'],
'Algoma': [None,0,0,0],
'Brant': [None,1,0,0],
'Chatham': [None,0,0,0],
})
solution = frame.set_index('Date').stack().reset_index(name='Indicator').rename(columns={'level_1':'Region'})
solution.to_csv('solution.csv')
This is the inverse of doing a pivot, as explained here: Doing the opposite of pivot in pandas Python. As you can see there, you could also consider using the melt function as an alternative.

first, you're region column is currently 'one hot encoded'. What you are trying to do is to "reverse" one hot encode your region column. Maybe check if this link answers your question:
Reversing 'one-hot' encoding in Pandas.

Pandas right align columns with numbers after to_html()

I have a pandas dataframe that I want to export to HTML using the to_html() method. Is there a way this can be achieved with the formatters or float_format parameter? How would this look like? Could I create a function that checks if a column contains numbers only and then assign a certain class to it that can then be linked to a css style? Note that I only want columns with numbers only to be right-aligned. All other columns should note be affected. Also, the headers should all be left-aligned.

This regexp will replace the tag of cells containing a float or int.
Afterwards, you should add the relevant style to your page.
import re
re.sub(r"<td>((\d)+?(.(\d)+))", r"<td class='my_class'>\1", df.to_html())

How to center align headers and values in a dataframe, and how to drop the index in a dataframe

I have the following dataframe:
import pandas as pd
df = pd.DataFrame({'text': ['foo foo', 'bar bar'],
'number': [1, 2]})
df
How do I center-align both the column titles/headers and the values in a dataframe, and how do I drop the index (the column with the values 0 and 1) in a dataframe?

Found an answer for this. This should do the trick to center-align both headers and values and hiding the index:
df1 = df.style.set_table_styles([dict(selector='th', props=[('text-align', 'center')])])
df1.set_properties(**{'text-align': 'center'}).hide_index()

Try IPython.display
from IPython.display import HTML
HTML(df.to_html(index=False))

Just to clarify the objective - what you want is to modify the display of the dataframe, not the dataframe itself. Of course, in the context of Jupyter, you may not care about the distinction - for example, if the only point of having the dataframe is to display it - but it's helpful to distinguish these things so you can use the right tools for the right thing.
In this case, as you've discovered, the styler gives you control over most aspects of the display of the dataframe - it does that by outputting and rendering html.
So if your dataframe 'looks' like this in Jupyter:
But you want it to look more like this:
you can use the styler to apply any number of styles (you chain them), so that styled HTML is rendered.
df.style.set_table_styles([dict(selector='th', props=[('text-align', 'center')])])\
.hide(axis='index')
In Jupyter this will display as I've shown above, but if you wanted to display this elsewhere and wanted the underlying HTML (let's say you're rendering a page in Flask based on this), you can use the .to_html() method, like so:
There are two essential advantages to working in this way:
The data in the dataframe remains in its original state - you haven't changed datatypes or content in anyway
The styler opens up a vast array of tools to make your output look exactly the way you want.
In the context of Jupyter and numbers, this is particularly helpful because you don't have to modify the numbers (e.g. with rounding or string conversion), you just apply a style format and avoid exponential notation when you don't want it.
Here's a modified example, which shows how easy it is to use the styler to 'fix' the Jupyter Pandas numeric display problem:

df.style.set_properties(subset=["Feature", "Value"], **{'text-align': 'center'})

How to insert array formula in an Excel sheet with openpyxl?

I'm using OpenPyxl to create and modify an Excel sheet.
I have the following formula in Excel:
=(SUM(IF(LEFT(Balances!$B$2:$B$100,LEN($B4))=$B4,Balances!$D$2:$D$100)))
This formula which is an "array formula" is working but in order to write it by hand, I have to finish with CTRL+SHIFT+ENTER (because it's an array formula).
This transform then the formula as follow:
{=(SUM(IF(LEFT(Balances!$B$2:$B$100,LEN($B4))=$B4,Balances!$D$2:$D$100)))}
I want to be able to write this formula via OpenPyxl with the following code:
sheet.cell(row=j, column=i).value = '{=(SUM(IF(LEFT(Balances!$B$2:$B$100,LEN($B4))=$B4,Balances!$D$2:$D$100)))}'
However, it doesn't work. OpenPyxl can't manage it. It give me the formula written but not working.
I could do it with XLSX Writer
https://xlsxwriter.readthedocs.io/example_array_formula.html
However XLSX writer doesn't work with already created files.
I don't see which path to follow.

Use the worksheet.formula_attributes to set the array formula. Place the formula in the desired cell, A1 for this example. Then set the formula_attributes to the cell range you want to apply the formula to.
ws["A1"] = "=B4:B8"
ws.formula_attributes['A1'] = {'t': 'array', 'ref': "A1:A5"}

In case solution provided above does not work, check whether you are using english name of functions in your formulae.
In my case I have been using czech function name and although formulae works if inserted manually, it did not work when inserted via openpyxl.
Switching to english name of the function solved the issue!

In my case the formula was using arrays for intermediate results before summarizing with a MAX. The formula worked OK when typed in but not when inserted via openpyxl. Office 365 version of Excel was inserting the new implicit intersection operator, #, incorrectly.
formula: ="Y" & MAX(tbl_mcare_opt[Year]*(tbl_mcare_opt[Who]=[#Who])*(tbl_mcare_opt[Year]<=intyear(this_col_name())))
It turns out that the properties needed to be set, as above. This allowed Excel to correctly interpret the formula. In my case the ref turned out to be just the single cell address.
I was able to determine that the formula was using dynamic arrays with a regex. If it was then I added the formula properties.
# provision for dynamic arrays to be included in formulas - notify excel
if is_formula(values[cn]):
regex_column=r'[A-Za-z_]+(\[\[?[ A-Za-z0-9]+\]?\])'
pattern=re.compile(regex_column)
matches=pattern.findall(values[cn])
if len(matches): # looks like a dynamic formula
address=get_column_letter(cix)+str(rix)
ws.formula_attributes[address]={'t':'array','ref': address}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Naming columns by mathematical symbols in pandas dataframe - python

Related

Is it possible to calculate column totals/sums within a python panel tabulator table?

How can I iterate over columns of a csv file to split it into several files?

Pandas right align columns with numbers after to_html()

How to center align headers and values in a dataframe, and how to drop the index in a dataframe

How to insert array formula in an Excel sheet with openpyxl?

Categories

Resources