Retrieve date from column index position in pandas and paste in PyQt - python

I want to retrieve the date of one index position of a Pandas data frame and paste it into the LineEdit of a PyQt Application.
What I have so far is:
purchase = sales [['Total','Date']]
pandas_value = purchase.iloc[-1:]['Date'] # last position of the "Date" column
pyqt_value = str(pandas_value)
# This returns :
67 2016-10-20
Name: Data, dtype: datetime64[ns]
The entire output appears in the LineEdit as : 67 2016-10-20 Name: Data, dtype: datetime64[ns]
I have also tried converting the date, to no avail:
pandas_value.strftime('%Y-%m-%d')
'Series' object has no attribute 'strftime'
Is there a way to retrieve and paste just the date like : 2016-10-20 ?
Or better : Is there a way to retrieve any value as a string from any index position in pandas?
Thanks in advance for any help.

You can do it this way:
In [37]: df
Out[37]:
Date a
0 2016-01-01 0.228208
1 2016-01-02 0.695593
2 2016-01-03 0.493608
3 2016-01-04 0.728678
4 2016-01-05 0.369823
5 2016-01-06 0.336615
6 2016-01-07 0.012200
7 2016-01-08 0.481646
8 2016-01-09 0.773467
9 2016-01-10 0.550114
In [38]: df.iloc[-1, df.columns.get_loc('Date')].strftime('%Y-%m-%d')
Out[38]: '2016-01-10'

pandas returns it as Series which is like a list (normally it keeps one row or one column of data) so you have to use index to get value. You Series has only one value so you can use index [0] (or maybe [67] because your text shows value 67 as index)
pyqt_values = str(panda_values[0])

Related

Change NaT to blank in pandas dataframe

I have a dataframe (df) that looks like:
DATES
0 NaT
1 01/08/2003
2 NaT
3 NaT
4 04/08/2003
5 NaT
6 30/06/2003
7 01/03/2004
8 18/05/2003
9 NaT
10 NaT
11 31/10/2003
12 NaT
13 NaT
I am struggling to find out how I transform the data-frame to remove the NaT values so the final output looks like
DATES
0
1 01/08/2003
2
3
4 04/08/2003
5
6 30/06/2003
7 01/03/2004
8 18/05/2003
9
10
11 31/10/2003
12
13
I have tried :
df["DATES"].fillna("", inplace = True)
but with no success.
For information the column is in a datatime format set with
df["DATES"] = pd.to_datetime(df["DATES"],errors='coerce').dt.strftime('%d/%m/%Y')
What can I do to resolve this?
There is problem NaT are strings, so need:
df["DATES"] = df["DATES"].replace('NaT', '')
df.fillna() works on numpy.NaN values. Your "Nat" are probably strings. So you can do following,
if you want to use fillna()
df["DATES"].replace("NaT",np.NaN, inplace=True)
df.fillna("", inplace=True)
Else, you can just replace with your desired string
df["DATES"].replace("NaT","", inplace=True)
Convert column to object and then use Series.where:
df['Dates'] = df['Dates'].astype(object).where(df['Date'].notnull(),np.nan)
Or whatever you want np.nan to be
Your conversion to datetime did not work properly on the NaTs.
You can check this before calling the fillna by printing out df['DATES'][0] and seeing that you get a 'NaT' (string) and not NaT (your wanted format)
Instead, use (for example): df['DATES'] = df['DATES'].apply(pd.Timestamp)
This example worked for me as is, but notice that it's not datetime but rather pd.Timestamp (it's another time format, but it's an easy one to use). You do not need to specify your time format with this, your current format is understood by pd.Timestamp.

comparing date time values in a pandas DataFrame with a specific data_time value and returning the closet one

I have a date column in a pandas DataFrame as follows:
index date_time
1 2013-01-23
2 2014-01-23
3 2015-8-14
4 2015-10-23
5 2016-10-28
I want to compare the values in date_time column with a specific date, for example date_x = 2015-9-14 ad return a date that is before this date and it is the most closet, which is 2015-8-14.
I thought about converting the values in date_time column to a list and then compare them with the specific date. However, I do not think it is an efficient solution.
Any solution?
Thank you.
Here is one way using searchsorted, and all my method is assuming the data already order , if not doing the df=df.sort_values('date_time')
df.date_time=pd.to_datetime(df.date_time)
date_x = '2015-9-14'
idx=np.searchsorted(df.date_time,pd.to_datetime(date_x))
df.date_time.iloc[idx-1]
Out[408]:
2 2015-08-14
Name: date_time, dtype: datetime64[ns]
Or we can do
s=df.date_time-pd.to_datetime(date_x)
df.loc[[s[s.dt.days<0].index[-1]]]
Out[417]:
index date_time
2 3 2015-08-14

get next value in list Pandas

I have a list of unique dates in chronological order.
I have a dataframe with dates in it. I want to use the list of dates in the dataframe to get the NEXT date in the list (find the date in dataframe in the list, return the date to the right of it ( next chronological date).
Any ideas?
It appears that printing the list wouldn't work, and you haven't provided us with any code to work with, or an example print of what your date time looks like. My best suggestion is to use the sort function.
dataframe.sort()
If I wanted a specific date to print, I would have to say to print it by index number once you have it sorted. Without knowing what your computers ability is to handle print statements of this size, I suggest copying this sorted file to a out txt file to ensure that you are getting the proper response.
so for every item in the dataframe there is an exact match for its date in the list of unique dates and you want to move it to the next date
you should use a dictionary for this really
next_date_dictionary = dict(zip(sequential_list_of_dates,sequential_list_of_dates[1:]))
then you simply look up the next date in the dictionary
next_date = next_date_dictionary.get(row.date)
alternatively if you want to replace the date column you can use replace
data_frame.replace({"date":next_date_dictionary})
OK here is one way of doing this:
In [210]:
# generate some data
df = pd.DataFrame({'dates':pd.date_range(start=dt.datetime(2014,3,2), end=dt.datetime(2014,4,23))})
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 53 entries, 0 to 52
Data columns (total 1 columns):
dates 53 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 848.0 bytes
Now I'd create a df from your date list:
In [219]:
base = dt.datetime(2014,5,3)
date_list = [base - dt.timedelta(days=x) for x in range(0, 70)]
date_df = pd.DataFrame({'dates':date_list})
date_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 70 entries, 0 to 69
Data columns (total 1 columns):
dates 70 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 1.1 KB
Then add a new column to this date_df that shifts the dates column by 1 and then set the index to be the dates:
In [220]:
date_df['date_lookup'] = date_df['dates'].shift(1)
date_df = date_df.set_index('dates')
date_df.head()
Out[220]:
date_lookup
dates
2014-05-03 NaT
2014-05-02 2014-05-03
2014-05-01 2014-05-02
2014-04-30 2014-05-01
2014-04-29 2014-04-30
Then call map on the orig df and pass the date_df and access the date_lookup column, map will use the index to perform a lookup which will return the corresponding next value:
In [221]:
df['date_next'] = df['dates'].map(date_df['date_lookup'])
df.head()
Out[221]:
dates date_next
0 2014-03-02 2014-03-03
1 2014-03-03 2014-03-04
2 2014-03-04 2014-03-05
3 2014-03-05 2014-03-06
4 2014-03-06 2014-03-07

Getting rid of a hierarchical index in Pandas

I have just pivoted a dataframe to create the dataframe below:
date 2012-10-31 2012-11-30
term
red -4.043862 -0.709225
blue -18.046630 -8.137812
green -8.339924 -6.358016
The columns are supposed to be dates, the left most column in supposed to have strings in it.
I want to be able to run through the rows (using the .apply()) and compare the values under each date column. The problem I am having is that I think the df has a hierarchical index.
Is there a way to give the whole df a new index (e.g. 1, 2, 3 etc.) and then have a flat index (but not get rid of the terms in the first column)?
EDIT: When I try to use .reset_index() I get the error ending with 'AttributeError: 'str' object has no attribute 'view''.
EDIT 2: this is what the df looks like:
EDIT 3: here is the description of the df:
<class 'pandas.core.frame.DataFrame'>
Index: 14597 entries, 101016j to zymogens
Data columns (total 6 columns):
2012-10-31 00:00:00 14597 non-null values
2012-11-30 00:00:00 14597 non-null values
2012-12-31 00:00:00 14597 non-null values
2013-01-31 00:00:00 14597 non-null values
2013-02-28 00:00:00 14597 non-null values
2013-03-31 00:00:00 14597 non-null values
dtypes: float64(6)
Thanks in advance.
df= df.reset_index()
this will take the current index and make it a column then give you a fresh index from 0
Adding example:
import pandas as pd
import numpy as np
df = pd.DataFrame({'2012-10-31': [-4, -18, -18], '2012-11-30': [-0.7, -8, -6]}, index = ['red', 'blue','green'])
df
2012-10-31 2012-11-30
red -4 -0.7
blue -18 -8.0
green -18 -6.0
df.reset_index()
term 2012-10-31 2012-11-30
0 red -4 -0.7
1 blue -18 -8.0
2 green -18 -6.0
EDIT: When I try to use .reset_index() I get the error ending with 'AttributeError: 'str' object has no attribute 'view''.
Try to convert your date columns to string type columns first.
I think pandas doesn't like to reset_index() here because you try to reset your string index into a columns which only consist of dates. If you only have dates as columns, pandas will handle those columns internally as a DateTimeIndex. When calling reset_index(), pandas tries to set up your string index as a further column to your date columns and fails somehow. Looks like a bug for me, but not sure.
Example:
t = pandas.DataFrame({pandas.to_datetime('2011') : [1,2], pandas.to_datetime('2012') : [3,4]}, index=['A', 'B'])
t
2011-01-01 00:00:00 2012-01-01 00:00:00
A 1 3
B 2 4
t.columns
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-01-01 00:00:00, 2012-01-01 00:00:00]
Length: 2, Freq: None, Timezone: None
t.reset_index()
...
AttributeError: 'str' object has no attribute 'view'
If you try with a string columns it will work.

Efficiently handling missing dates when aggregating Pandas Dataframe

Follow up from Summing across rows of Pandas Dataframe and Pandas Dataframe object types fillna exception over different datatypes
One of the columns that I am aggregating using
df.groupby(['stock', 'same1', 'same2'], as_index=False)['positions'].sum()
this method is not very forgiving if there are missing data. If there are any missing data in same1, same2, etc it pads totally unrelated values. Workaround is to do a fillna loop over the columns to replace missing strings with '' and missing numbers with zero solves the problem.
I do however have one column with missing dates as well. column type is 'object' with nan of type float and in the missing cells and datetime objects in the existing data fields. important that I know that the data is missing, i.e. the missing indicator must survive the groupby transformation.
Dataset outlining the problem:
csv file that I use as input is:
Date,Stock,Position,Expiry,same
2012/12/01,A,100,2013/06/01,AA
2012/12/01,A,200,2013/06/01,AA
2012/12/01,B,300,,BB
2012/6/01,C,400,2013/06/01,CC
2012/6/01,C,500,2013/06/01,CC
I then read in file:
df = pd.read_csv('example', parse_dates=[0])
def convert_date(d):
'''Converts YYYY/mm/dd to datetime object'''
if type(d) != str or len(d) != 10: return np.nan
dd = d[8:]
mm = d[5:7]
YYYY = d[:4]
return datetime.datetime(int(YYYY), int(mm), int(dd))
df['Expiry'] = df.Expiry.map(convert_date)
df
df looks like:
Date Stock Position Expiry same
0 2012-12-01 00:00:00 A 100 2013-06-01 00:00:00 AA
1 2012-12-01 00:00:00 A 200 2013-06-01 00:00:00 AA
2 2012-12-01 00:00:00 B 300 NaN BB
3 2012-06-01 00:00:00 C 400 2013-06-01 00:00:00 CC
4 2012-06-01 00:00:00 C 500 2013-06-01 00:00:00 CC
can quite easily change the convert_date function to pop anything else for missing data in Expiry column.
Then using:
df.groupby(['Stock', 'Expiry', 'same'] ,as_index=False)['Position'].sum()
to aggregate the Position column. Get a TypeError: can't compare datetime.datetime to str with any non date that I plug into missing date data. Important for later functionality to know if Expiry is missing.
You need to convert your dates to the datetime64[ns] dtype (which manages how datetimes work). An object column is not efficient nor does it deal well with datelikes. datetime64[ns] allow missing values usingNaT (not-a-time), see here: http://pandas.pydata.org/pandas-docs/dev/missing_data.html#datetimes
In [6]: df['Expiry'] = pd.to_datetime(df['Expiry'])
# alternative way of reading in the data (in 0.11.1, as ``NaT`` will be set
# for missing values in a datelike column)
In [4]: df = pd.read_csv('example',parse_dates=['Date','Expiry'])
In [9]: df.dtypes
Out[9]:
Date datetime64[ns]
Stock object
Position int64
Expiry datetime64[ns]
same object
dtype: object
In [7]: df.groupby(['Stock', 'Expiry', 'same'] ,as_index=False)['Position'].sum()
Out[7]:
Stock Expiry same Position
0 A 2013-06-01 00:00:00 AA 300
1 B NaT BB 300
2 C 2013-06-01 00:00:00 CC 900
In [8]: df.groupby(['Stock', 'Expiry', 'same'] ,as_index=False)['Position'].sum().dtypes
Out[8]:
Stock object
Expiry datetime64[ns]
same object
Position int64
dtype: object

Categories