Formatting Pandas DataFrame with Quantopian qgrid

Formatting Pandas DataFrame with Quantopian qgrid - python

I'm attempting to use quantopian qgrid to print dataframes in iPython notebook. Simple example based on example notebook:
import qgrid
qgrid.nbinstall(overwrite=True)
qgrid.set_defaults(remote_js=True, precision=2)
from pandas import Timestamp
from pandas_datareader.data import get_data_yahoo
data = get_data_yahoo(symbols='SPY',
start=Timestamp('2014-01-01'),
end=Timestamp('2016-01-01'),
adjust_price=True)
qgrid.show_grid(data, grid_options={'forceFitColumns': True})
Other than the precision args, how do you format the column data? It seems to be possible to pass in grid options like formatterFactory or defaultFormatter but unclear exactly how to use them for naive user.
Alternative approaches suggested in another question but I like the interaction the SlickGrid object provides.
Any help or suggestions much appreciated.

Short answer is that most grid options are passed through grid_options as a dictionary, eg:
grid_options requires a dictionary, eg for setting options for a specific grid:
qgrid.show_grid(data,
grid_options={'fullWidthRows': True,
'syncColumnCellResize': True,
'forceFitColumns': True,
'rowHeight': 40,
'enableColumnReorder': True,
'enableTextSelectionOnCells': True,
'editable': True})
Please, see details here

I changed the float format using df.round (would be useful change the format using the column_definitions).
Anyway the slider filter isn't in line with the values in the columns, why?
import pandas as pd
import qgrid
import numpy as np
col_def = { 'B': {"Slick.Editors.Float.DefaultDecimalPlaces":2}}
np.random.seed(25)
df = pd.DataFrame(np.random.random([5, 4]), columns =["A", "B", "C", "D"])
df = df.round({"A":1, "B":2, "C":3, "D":4})
qgrid_widget = qgrid.show_grid(df, show_toolbar=True, column_definitions=col_def)
qgrid_widget

Related

create variables in python with available data

I have read a data like this:
import numpy as np
arr=n.loadtext("data/za.csv",delimeter=",")
display(arr)
Now the display looks like this:
array([[5.0e+01,1.8e+00,1.6e+00,1.75+e00],
[4.8e+01,1.77e+00,1.63e+00,1.75+e00],
[5.5e+01,1.8e+00,1.6e+00,1.75+e00],
...,
[5.0e+01,1.8e+00,1.6e+00,1.75+e00],
[4.8e+01,1.77e+00,1.63e+00,1.75+e00],
[5.0e+01,1.8e+00,1.6e+00,1.75+e00]])
Now I would like to give this variables to this array
the first ist weight of person
second is height of person
third is height of mother
fourth is height of father
Now I would like to now how can I create this variables that representin the columns?

install pandas library
import pandas as pd
use pd.read_csv("data/za.csv", columns= ["height", "weight", "etc"]) for read the data
hope you get the solution.

As it has already been advised, you may use pandas.read_csv for the purpose as per below:
df = pd.read_csv(**{
'filepath_or_buffer': "data/za.csv",
'header': None,
'names': ('weight_of_person', 'height_of_person', 'height_of_mother', 'height_of_father'),
})

Python "value_count" output to formatted table

I have value_count output data for a single column that I would like to feed into a table and format nicely. I would like to bold the headings, have "alternating colors" for the rows, change the font to "serif", and italicize the main column. Kind of like this.
I thought I found something applicable, but I do not know how to apply it to my data (or perhaps it is not suited for what I want to achieve).
I found "table styles" with the following example:
df4 = pd.DataFrame([[1,2],[3,4]])
s4 = df4.style
props = 'font-family: "Times New Roman", Times, serif; color: #e83e8c; font-size:1.3em;'
df4.style.applymap(lambda x: props, subset=[1])
Here is my code on its own. Please note I had to first split my data (here) so that I could properly count to end up with the value_count output data. These are a few libraries I have been working with (but there could be a few unnecessary ones in here).
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
#access file
data = pd.read_csv('E:/testing_data.csv')
supplies = pd.DataFrame(data)
supplies.Toppings = supplies.Toppings.str.split('\r\n')
supplies = supplies.explode('Toppings').reset_index(drop=True)
supplies.Toppings.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'
Please be as specific as possible as I am still getting used to Python terms. Thank you.

How to stop matplotlib from skipping gaps in data?

I have this simple csv:
date,count
2020-07-09,144.0
2020-07-10,143.5
2020-07-12,145.5
2020-07-13,144.5
2020-07-14,146.0
2020-07-20,145.5
2020-07-21,146.0
2020-07-24,145.5
2020-07-28,143.0
2020-08-05,146.0
2020-08-10,147.0
2020-08-11,147.5
2020-08-14,146.5
2020-09-01,143.5
2020-09-02,143.0
2020-09-09,144.5
2020-09-10,143.5
2020-09-25,144.0
2021-09-21,132.4
2021-09-23,131.2
2021-09-25,131.0
2021-09-26,130.8
2021-09-27,130.6
2021-09-28,128.4
2021-09-30,126.8
2021-10-02,126.2
If I copy it into excel and scatter plot it, it looks like this
This is correct; there should be a big gap in the middle (look carefully at the data, it jumps from 2020 to 2021)
However if I do this in python:
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv')
data.plot.scatter('date', 'count')
plt.show()
It looks like this:
It evenly spaces them at the gap is gone. How do I stop that behavior? I tried to do
plt.xticks = data.date
But that didn't do anything different.

I dont exactly know the types of columns in data but it is probably beacuse tpye of 'date' column is string. So python does not understand comperable value. Before plotting try to convert it's type.
data['date'] = pd.to_datetime(data['date'])

I've tested:
import io
import pandas as pd
txt = """
date,count
2020-07-09,144.0
2020-07-10,143.5
2020-07-12,145.5
2020-07-13,144.5
2020-07-14,146.0
2020-07-20,145.5
2020-07-21,146.0
2020-07-24,145.5
2020-07-28,143.0
2020-08-05,146.0
2020-08-10,147.0
2020-08-11,147.5
2020-08-14,146.5
2020-09-01,143.5
2020-09-02,143.0
2020-09-09,144.5
2020-09-10,143.5
2020-09-25,144.0
2021-09-21,132.4
2021-09-23,131.2
2021-09-25,131.0
2021-09-26,130.8
2021-09-27,130.6
2021-09-28,128.4
2021-09-30,126.8
2021-10-02,126.2"""
data = pd.read_csv(io.StringIO(txt), sep=r",", parse_dates=["date"])
data.plot.scatter('date', 'count')
and the result is:
Two observations:
date must be of date type, which is ensured by parse_dates=["date"] option
importing matplotlib.pyplot is not necessary, because You used pandas.DataFrame.plot.scatter method.

How to solve the FunctionError and MapError

Python 3.6 pycharm
import prettytable as pt
import numpy as np
import pandas as pd
a=np.random.randn(30,2)
b=a.round(2)
df=pd.DataFrame(b)
df.columns=['data1','data2']
tb = pt.PrettyTable()
def func1(columns):
def func2(column):
return tb.add_column(column,df[column])
return map(func2,columns)
column1=['data1','data2']
print(column1)
print(func1(column1))
I want to get the results are:
tb.add_column('data1',df['data1'])
tb.add_column('data2',df['data2'])
As a matter of fact，the results are:
<map object at 0x000001E527357828>
I am trying find the answer in Stack Overflow for a long time, some answer tell me can use list(func1(column1)), but the result is [None, None].

Based on the tutorial at https://ptable.readthedocs.io/en/latest/tutorial.html, PrettyTable.add_column modifies the PrettyTable in-place. Such functions generally return None, not the modified object.
You're also overcomplicating the problem by trying to use map and a fancy wrapper function. The below code is much simpler, but produces the desired result.
import prettytable as pt
import numpy as np
import pandas as pd
column_names = ['data1', 'data2']
a = np.random.randn(30, 2)
b = a.round(2)
df = pd.DataFrame(b)
df.columns = column_names
tb = pt.PrettyTable()
for col in column_names:
tb.add_column(col, df[col])
print(tb)
If you're still interesting in learning about the thing that map returns, I suggest reading about iterables and iterators. map returns an iterator over the results of calling the function, and does not actually do any work until you iterate over it.

Is it possible to extraxt information about HOW a user defined function is being called?

Or is it possible to capture the function call itself in any way (describe which values are assigned to the different arguments)?
Sorry for the poor phrasing of the question. Let me explain with some reproducible code:
import pandas as pd
import numpy as np
import matplotlib.dates as mdates
import inspect
# 1. Here is Dataframe with some random numbers
np.random.seed(123)
rows = 10
df = pd.DataFrame(np.random.randint(90,110,size=(rows, 2)), columns=list('AB'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
#print(df)
# 2. And here is a very basic function to do something with the dataframe
def manipulate(df, factor):
df = df * factor
return df
# 3. Now I can describe the function using:
print(inspect.getargspec(manipulate))
# And get:
# ArgSpec(args=['df', 'factor'], varargs=None, keywords=None,
# defaults=None)
# __main__:1: DeprecationWarning: inspect.getargspec() is
# deprecated, use inspect.signature() or inspect.getfullargspec()
# 4. But what I'm really looking for is a way to
# extract or store the function AND the variables
# used when the function is called, like this:
df2 = manipulate(df = df, factor = 20)
# So in the example using Inspect, the desired output could be:
# ArgSpec(args=['df = df', 'factor = 10'], varargs=None,
# and so on...
I realize that this may seem a bit peculiar, but it would actually be of great use to me to be able to do something like this. If anyone is interested, I'd be happy to explain everything in more detail, including how this would fit in in mye data science work-flow.
Thank you for any suggestions!

You can bind the parameters to the function and create a new callable
import functools
func = functools.partial(manipulate, df=df, factor=20)
the resulting partial object allows argument inspection and modification using the attributes args and keywords:
func.keywords # {'df': <pandas dataframe>, 'factor': 20}
and and can finally be called using
func()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Formatting Pandas DataFrame with Quantopian qgrid - python

Related

create variables in python with available data

Python "value_count" output to formatted table

How to stop matplotlib from skipping gaps in data?

How to solve the FunctionError and MapError

Is it possible to extraxt information about HOW a user defined function is being called?

Categories

Resources