import streamlit as st
import pandas as pd
data = {'Days': ["Sunday", "Wednesday", "Friday"],
'Predictions': [433.11, 97.9, 153.65]}
df = pd.DataFrame(data)
st.write(df)
streamlit writes dataframe with four decimals by default, but i expected two decimals. With print() it produces the expected, and when st.write() is used it produces the below output:
Days | Predictions
-------------|-------------
0 Sunday | 433.1100 |
1 Wednesday| 97.9000 |
2 Friday | 153.6500 |
I tried:
df.round(2)
but it didn't help.
Desired output format:
Days | Predictions
-------------|-------------
0 Sunday | 433.11 |
1 Wednesday| 97.90 |
2 Friday | 153.65 |
With the df.style.format() method, you can specify the desired number of decimal places to override the default behavior of pandas in streamlit.
Note: This approach only provides the desired output format but when working farther you will have to still use the original dataframe which is df.
formatted_df = df.style.format({"Predictions": "{:.2f}".format})
st.write(formatted_df)
You could also choose to write the dataframe without having to initialize a variable, like so:
st.write(df.style.format({"Predictions": "{:.2f}"}))
Related
In order to plot the frequency of tornados every 10 days I have grouped the data in groups of 10 days using
df_grouped = pd.DataFrame()
df_grouped['COUNT'] = df.groupby(pd.Grouper(key='DATE', freq='10D'))['DATE'].count().to_frame()
however the column DATE does not exist in the code as shown when I run:
>>> df_grouped.shape
(1041,1)
despite the fact that I am able to view and plot the dates in the Jupiter notebook GUI 1.
This is an issue as I wish to access this data later for other purposes and I am unable to using:
year = pd.to_datetime(df_grouped['DATE'], dayfirst = True, errors='coerce').dt.year.values
df_grouped['year'] = year
It states that there is an invalid indexing error since the column no longe exists. Does anyone know what I can do to access the data?
MINIMUM REPRODUCIBLE EXAMPLE
import pandas as pd
df = pd.DataFrame(pd.date_range(start='1994-01-01', end='1994-01-21'), columns=['DATE'])
df_grouped = pd.DataFrame()
df_grouped['COUNT'] = df.groupby(pd.Grouper(key='DATE', freq='10D'))['DATE'].count().to_frame()
expected output
|DATE |COUNT |
|1994-01-01|10 |
|1994-01-11|10 |
|1994-01-21|10 |
|1994-01-31|01 |
actual output
| |COUNT |
|DATE | |
|1994-01-01|10 |
|1994-01-11|10 |
|1994-01-21|10 |
|1994-01-31|01 |
import pandas as pd
df = pd.DataFrame(pd.date_range(start='1994-01-01', end='1994-01-21'), columns=['DATE'])
df = (df.assign(COUNT=lambda x: 1)
.groupby(pd.Grouper(key='DATE', freq='10D')).count()
.reset_index())
print(df)
# DATE COUNT
# 0 1994-01-01 10
# 1 1994-01-11 10
# 2 1994-01-21 1
I have a data frame that contains (among others) columns for the time of day (00:00-23:59:59) day (1-7), month (1-12), and year (2000-2019). How can I combine the values of each of these columns on a row by row basis into a new DateTime object and then store these new date-times in a new column? I've read other posts pertaining to such a task but they all seem to involve one date column to one DateTime column whereas I have 4 columns that need to be transformed into DateTime. Any help is appreciated!
e.g.
| 4:30:59 | 1 | 1 | 2000 | TO 200/1/1 4:30:59
this is the only code I have so far which probably doesn't do anything
#creating datetime object (MISC)
data = pd.read_csv('road_accidents_data_clean.csv',delimiter=',')
df = pd.DataFrame(data)
format = '%Y-%m-%d %H:%M:%S'
n = 0
df['datetime'] = data.loc[n,'Crash_Day'],data.loc[n,'Crash_Month'],data.loc[n,'Year']
My DataFrame is layed out as follows:
Index | Age | Year | Crash_Month | Crash_Day | Crash_Time | Road_User | Gender |
0 37 2000 1 1 4:30:59 DRIVER MALE
1 42 2000 1 1 7:45:10 DRIVER MALE
2 25 2000 1 1 10:15:30 PEDESTRIAN FEMALE
Crash_Type | Injury_Severity | Crash_LGA | Crash_Area_Type | Datetime |
UNKNOWN 1 YARRA MELBOURNE NaN
OVERTAKING 1 YARRA MELBOURNE NaN
ADJACENT DIR 0 MONASH MELBOURNE NaN
NOTE: the dataframe is 13 columns wide i just couldn't fit them all on one line so Crash_Type starts to the right of Gender.
below is the code i've been suggested to use/my adaptation of it
df = pd.DataFrame(dict(
Crash_Time=['4:30:59','4:20:00'],
Crash_Day=[1,20],
Crash_Month=[1,4],
Year=[2000,2020],
))
data['Datetime'] = df['Datetime']=pd.to_datetime(
np.sum([
df['Year'].astype(str),
'-',
df['Crash_Month'].astype(str),
'-',
df['Crash_Day'].astype(str),
' ',
df['Crash_Time'],
]),
format = '%Y-%m-%d %H:%M:%S',
)
I've adapted this code in order to combine the values for the datetime column with the my original dataframe.
Combine the columns into a single series of stings using + (converting to str where needed with pandas.Series.astype method) then pass that new series into pd.to_datetime before assigning it to a new column in your df:
import pandas as pd
df = pd.DataFrame(dict(time=['4:30:59'],date=[1],month=[1],year=[2000]))
df['datetime'] = pd.to_datetime(
df['year'].astype(str)+'-'+df['month'].astype(str)+'-'+df['date'].astype(str)+' '+df['time'],
format = '%Y-%m-%d %H:%M:%S',
)
print(df)
example in python tutor
edit: You can also use a numpy.sum to make that one long line adding columns together easier on the eyes:
import pandas as pd
import numpy as np
df = pd.DataFrame(dict(
time=['4:30:59','4:20:00'],
date=[1,20],
month=[1,4],
year=[2000,2020],
))
df['datetime']=pd.to_datetime(
np.sum([
df['year'].astype(str),
'-',
df['month'].astype(str),
'-',
df['date'].astype(str),
' ',
df['time'],
]),
format = '%Y-%m-%d %H:%M:%S',
)
sum example in python tutor
edit 2: Using your actual column names, it should be something like this:
import pandas as pd
import numpy as np
'''
Index | Age | Year | Crash_Month | Crash_Day | Crash_Time | Road_User | Gender |
0 37 2000 1 1 4:30:59 DRIVER MALE
Crash_Type | Injury_Severity | Crash_LGA | Crash_Area_Type | Datetime |
UNKNOWN 1 YARRA MELBOURNE NaN
'''
df = pd.DataFrame(dict(
Crash_Time=['4:30:59','4:20:00'],
Crash_Day=[1,20],
Crash_Month=[1,4],
Year=[2000,2020],
))
df['Datetime']=pd.to_datetime(
np.sum([
df['Year'].astype(str),
'-',
df['Crash_Month'].astype(str),
'-',
df['Crash_Day'].astype(str),
' ',
df['Crash_Time'],
]),
format = '%Y-%m-%d %H:%M:%S',
)
print(df)
another python tutor link
One thing to note is that you might want to double check if your csv file is separated by just a comma or could it be a comma and a space? possible that you may need to load the data with df = pd.read_csv('road_accidents_data_clean.csv',sep=', ') if there is an extra space separating the data in addition to the comma. You don't want to have that extra space in your data.
and thank you for helping!
I would like to generate a heatmap in python, from the data df.
(i am using pandas, seaborn, numpy, and matplotlib in my project)
The dataframe df looks like:
index | a | b | c | year | month
0 | | | | 2013 | 1
1 | | | | 2015 | 4
2 | | | | 2016 | 10
3 | | | | 2017 | 1
in the dataset the rows are each a ticket.
The dataset is big (51 colums and 100k+ rows), so a, b, c is just to show some random columns. (for month => 1 = jan, 2= feb...)
For the heatmap:
x-axis = year,
y-axis = month,
value: and in the heatmap, I wanted the value between the two axes to be a count of the number of rows, in which a ticket has been given in that year and month.
The result I imagine should look something like the from the seaborn documentation:
https://seaborn.pydata.org/_images/seaborn-heatmap-4.png
I am new to coding and tried a lot of random things I found on the internet and has not been able to make it work.
Thank you for helping!
This should do (with generated data):
import pandas as pd
import seaborn as sns
import random
y = [random.randint(2013,2017) for n in range(2000)]
m = [random.randint(1,12) for n in range(2000)]
df = pd.DataFrame([y,m]).T
df.columns=['y','m']
df['count'] = 1
df2 = df.groupby(['y','m'], as_index=False).count()
df_p = pd.pivot_table(df2,'count','m','y')
sns.heatmap(df_p)
You probably won't need the column count but I added it because I needed an extra column for the groupby to work.
I want to append columns from tables generated in a loop to a dataframe. I was hoping to accomplish this using pandas.merge, but it doesn't seem to be working out for me.
My code:
from datetime import date
from datetime import timedelta
import pandas
import numpy
import pyodbc
date1 = date(2017, 1, 1) #Starting Date
date2 = date(2017, 1, 10) #Ending Date
DateDelta = date2 - date1
DateAdd = DateDelta.days
StartDate = date1
count = 1
# Create the holding table
conn = pyodbc.connect('Server Information')
**basetable = pandas.read_sql("SELECT....")
while count <= DateAdd:
print(StartDate)
**datatable = pandas.read_sql("SELECT...WHERE Date = "+str(StartDate)+"...")
finaltable = basetable.merge(datatable,how='left',left_on='OrganizationName',right_on='OrganizationName')
StartDate = StartDate + timedelta(days=1)
count = count + 1
print(finaltable)
Shortened the select statements for brevity's sake, but the tables produced look like this:
**Basetable
School_District
---------------
District_Alpha
District_Beta
...
District_Zed
**Datatable
School_District|2016-01-01|
---------------|----------|
District_Alpha | 400 |
District_Beta | 300 |
... | 200 |
District_Zed | 100 |
I have the datatable written so the column takes the name of the date selected for that particular loop, so column names can be unique once i get this up and running. My problem, however, is that the above code only produces one column of data. I have a good guess as to why: Only the last merge is being processed - I thought using pandas.append would be the way to get around that, but pandas.append doesn't "join" like merge does. Is there some other way to accomplish a sort of Join & Append using Pandas? My goal is to keep this flexible so that other dates can be easily input depending on our data needs.
In the end, what I want to see is:
School_District|2016-01-01|2016-01-02|... |2016-01-10|
---------------|----------|----------|-----|----------|
District_Alpha | 400 | 1 | | 45 |
District_Beta | 300 | 2 | | 33 |
... | 200 | 3 | | 5435 |
District_Zed | 100 | 4 | | 333 |
Your error is in the statement finaltable = basetable.merge(datatable,...). At each loop iteration, you merge the original basetable with the new datatable, store the result in the finaltable... and discard it. What you need is basetable = basetable.merge(datatable,...). No finaltables.
I have dataframe in the following format;
ID | 01/01/2016 | 02/03/2016 | 02/15/2016 | ........
11 | 100 | 200 | 100 | ........
I am trying to calculate the sum of e.g.: last 3 months data in the new column. Expected output should be as follows;
ID | 01/01/2016 | 02/03/2016 | 02/15/2016 | ........ | Last 3 Months
11 | 100 | 200 | 100 | ........ | 300
As a solution, I need to pick today's date and compare it with the dates in the column and sum up the values. However, I am not sure how to do that? Could you please give some tips?
Thank you.
This is not as straight forward as it may initially seem. You need to determine how you will handle year-to-year changes and having differing number of days in each month. I do this using a simple function. You can adjust the code below to meet your needs, but it should get you started.
from __future__ import division, print_function
def subtract_months(m):
'''subtracts specified number of months from current date
Parameters
----------
m : integer
how many months to subtract from today's date
Returns
-------
date : datetime value'''
yr = dt.date.today().year
mon = dt.date.today().month - m
day = dt.date.today().day
# test whether we went into another year
if mon<=0:
yr -= 1
mon = 12 + mon
# test whether we have exceeded maximum number of days in month
if day>calendar.monthrange(yr,mon)[1]:
day = calendar.monthrange(yr,mon)[1]
return dt.date(yr,mon,day)
import pandas as pd
import datetime as dt
import calendar
dates = pd.date_range('20160101','20170101',freq='1D')
data = pd.np.random.randint(0,100,(5,367))
df = pd.DataFrame(data=data,index=list('ABCDE'),columns=dates)
# now add a new column
df['Last 3 Months'] = df.T.truncate(before=subtract_months(3),after=dt.date.today()).sum(axis=0)