Pandas KeyError when using .loc() [duplicate]

Pandas KeyError when using .loc() [duplicate] - python

This question already has answers here:
How are iloc and loc different?
(6 answers)
Closed 2 years ago.
I have a pandas DataFrame portfolio whose keys are dates. I'm trying to access multiple rows through
print(portfolio.loc[['2007-02-26','2008-02-06'],:]),
but am getting an error
KeyError: "None of [Index(['2007-02-26', '2008-02-06'], dtype='object', name='Date')] are in the [index]"
However, print(portfolio.loc['2007-02-26',:]) successfully returns
holdings 1094.6124
pos_diff 100.0000
cash 98905.3876
total 100000.0000
returns 0.0000
Name: 2007-02-26 00:00:00, dtype: float64
Isn't this a valid format--> df.loc[['key1', 'key2', 'key3'], 'Column1]?

It seems that the issue is with type conversion from strings to timestamps. The solution is, therefore, to explicitly convert the set of labels to DateTime before passing them to loc:
df = pd.DataFrame({"a" : range(5)}, index = pd.date_range("2020-01-01", freq="1D", periods=5))
print(df)
==>
a
2020-01-01 0
2020-01-02 1
2020-01-03 2
2020-01-04 3
2020-01-05 4
try:
df.loc[["2020-01-01", "2020-01-02"], :]
except Exception as e:
print (e)
==>
"None of [Index(['2020-01-01', '2020-01-02'], dtype='object')] are in the [index]"
# But - if you convert the labels to datetime before calling loc,
# it works fine.
df.loc[pd.to_datetime(["2020-01-01", "2020-01-02"]), :]
===>
a
2020-01-01 0
2020-01-02 1

Related

a['Year'] = a['Date'].dt.year creates a additional .0 [duplicate]

This question already has answers here:
Convert Pandas column containing NaNs to dtype `int`
(27 answers)
Closed 1 year ago.
I extracted the year from the Date and added it as a new column to the Dataframe.
I need it to be like 2001 but it is 2001.0
Where is the .0 coming from?
This is the Output:
Datum LebensverbrauchMIN ... Lastfaktor Jahr
0 2001-01-01 00:00:00 0.001986 ... 0.249508 2001.0
1 2001-01-01 00:01:00 0.000839 ... 0.249847 2001.0
2 2001-01-01 00:02:00 0.000387 ... 0.250186 2001.0
# Read in Data
InnenTemp = ["LebensverbrauchMIN","HPT", "Innentemperatur", "Verlustleistung", "SolarEintrag", "Lastfaktor"]
Klima1min = pd.read_csv("Klima_keinPV11.csv", names=InnenTemp,
skiprows=0)
Datum = pd.read_csv("Klima_Lufttemp_GLobalstrahlung_Interpoliert_1min.csv", usecols=["Datum"],
skiprows=0)
Luft = pd.read_csv("Klima_Lufttemp_GLobalstrahlung_Interpoliert_1min.csv", usecols=["Lufttemperatur"],
skiprows=0)
frames = [Datum, Klima1min]
a = pd.concat(frames, axis=1)
a['Datum'] = pd.to_datetime(a['Datum'], format="%Y-%m-%dT%H:%M:%S")
a.set_index('Datum')
# Extract Year from Date(tried both lines)
a['Jahr'] = pd.DatetimeIndex(a['Datum']).year
#a['Jahr'] = a['Datum'].dt.year
print(a)

If there is a missing value in dataframe column, it considers it as a float datatype. This happens only for int, for string it remains the same.

how to add data in index column in panda dataframe [duplicate]

This question already has answers here:
Offset date for a Pandas DataFrame date index
(3 answers)
Closed 1 year ago.
I have below dataframe where the first column is without any header. and I need to add 14 days to each value in that column. How can I do it?
L122.Y 5121.Y 110.Y
2021-08-30 14:00:00 0.0 0.0 35.778441
2021-08-30 15:00:00 0.0 0.0 35.741066
2021-08-30 16:00:00 0.0 0.0 35.737846

I think first column is called index, test it:
print (df.index)
If need convert it to DatetimeIndex and add days use:
df.index = pd.to_datetime(df.index) + pd.Timedelta('14 days')
If already DatetimeIndex:
df.index += pd.Timedelta('14 days')

Python Timedelta[M] adds incomplete days

I have a table that has a column Months_since_Start_fin_year and a Date column. I need to add the number of months in the first column to the date in the second column.
DateTable['Date']=DateTable['First_month']+DateTable['Months_since_Start_fin_year'].astype("timedelta64[M]")
This works OK for month 0, but month 1 already has a different time and for month 2 onwards has the wrong date.
Image of output table where early months have the correct date but month 2 where I would expect June 1st actually shows May 31st
It must be adding incomplete months, but I'm not sure how to fix it?
I have also tried
DateTable['Date']=DateTable['First_month']+relativedelta(months=DateTable['Months_since_Start_fin_year'])
but I get a type error that says
TypeError: cannot convert the series to <class 'int'>
My Months_since_Start_fin_year is type int32 and my First_month variable is datetime64[ns]

The problem with adding months as an offset to a date is that not all months are equally long (28-31 days). So you need pd.DateOffset which handles that ambiguity for you. .astype("timedelta64[M]") on the other hand only gives you the average days per month within a year (30 days 10:29:06).
Ex:
import pandas as pd
# a synthetic example since you didn't provide a mre
df = pd.DataFrame({'start_date': 7*['2017-04-01'],
'month_offset': range(7)})
# make sure we have datetime dtype
df['start_date'] = pd.to_datetime(df['start_date'])
# add month offset
df['new_date'] = df.apply(lambda row: row['start_date'] +
pd.DateOffset(months=row['month_offset']),
axis=1)
which would give you e.g.
df
start_date month_offset new_date
0 2017-04-01 0 2017-04-01
1 2017-04-01 1 2017-05-01
2 2017-04-01 2 2017-06-01
3 2017-04-01 3 2017-07-01
4 2017-04-01 4 2017-08-01
5 2017-04-01 5 2017-09-01
6 2017-04-01 6 2017-10-01
You can find similar examples here on SO, e.g. Add months to a date in Pandas. I only modified the answer there by using an apply to be able to take the months offset from one of the DataFrame's columns.

Add days to datetime object

My data set contains multiple columns of sales-related data. I have ORDEREDDATE and SHIPPINGDAYS in the DataFrame. I want to add a new column named DELIVEREDDATE in the dataset.
Current DataFrame
ORDEREDDATE SHIPPINGDAYS
2018-5-13 6
2017-8-24 4
2018-6-1 2
Expected output
ORDEREDDATE SHIPPINGDAYS DELIVEREDDATE
2018-5-13 6 2018-5-19
2017-8-24 4 2017-8-28
2018-6-1 2 2018-6-3
Types
ORDEREDDATE object
SHIPPINGDAYS object
Attempt to solve
df1['DELIVERYDATE'] = (datetime.datetime.strptime(df1['ORDEREDDATE'].astype(str), '%Y-%m-%d') + datetime.timedelta(df1['SHIPPINGDAYS'].astype(str).astype(int))

Here's a way to do:
# make sure types are correct format
df['ORDEREDDATE'] = pd.to_datetime(df['ORDEREDDATE'])
df['SHIPPINGDAYS'] = df['SHIPPINGDAYS'].astype(int)
df['DELIVEREDDATE'] = (df
.apply(lambda x: x['ORDEREDDATE'] + pd.Timedelta(days= x['SHIPPINGDAYS']),
axis=1)
ORDEREDDATE SHIPPINGDAYS DELIVEREDDATE
0 2018-05-13 6 2018-05-19
1 2017-08-24 4 2017-08-28
2 2018-06-01 2 2018-06-03

First off you need to transform the column into a datetime object:
df1['ORDEREDDATE'] = pd.to_datetime(df1['ORDEREDDATE']
Then you define your new column while also turning the int values from SHIPPINGDAYS to timedelta objects. That way you can sum these objects returning the desired output:
df['DELIVEREDDATE'] = df['ORDEREDDATE'] + df['SHIPPINGDAYS'].apply(lambda x: pd.Timedelta(x,unit='D'))
Output:
ORDEREDDATE SHIPPINGDAYS DELIVEREDDATE
0 2018-05-13 6 2018-05-19
1 2017-08-24 4 2017-08-28
2 2018-06-01 2 2018-06-03

Because you are adding seconds, not days!
You may initialize timedelta with names days argument. If you won't provide name, it assumes seconds.
Also, you end with datetime object, so you need to format it the way you want after calculation is done.

Pandas: [Errno 75] Value too large for defined data type

I'm having this strange error when converting a datetime column.
This is the offending line of code:
data['date'] = data['datetime'].map(lambda x:datetime.utcfromtimestamp(x/1000))
To make things more interesting this works:
datetime.utcfromtimestamp(data.datetime.max()/1000)
So the max value can be converted but for some other value I get a value too large for defined data type error.
Thanks for the help!

In Pandas we can do it this way:
data['date'] = data['datetime'].astype(np.int64) // 10**9
that gives us a number of seconds since 1970-01-01 00:00:00 UTC.
If you want/need to get # of milliseconds:
data['date'] = data['datetime'].astype(np.int64) // 10**6
Demo:
In [15]: data = pd.DataFrame({'datetime':pd.date_range('2000-01-01', freq='99D', periods=10)})
In [16]: data
Out[16]:
datetime
0 2000-01-01
1 2000-04-09
2 2000-07-17
3 2000-10-24
4 2001-01-31
5 2001-05-10
6 2001-08-17
7 2001-11-24
8 2002-03-03
9 2002-06-10
In [17]: data['date'] = data['datetime'].astype(np.int64) // 10**9
In [18]: data
Out[18]:
datetime date
0 2000-01-01 946684800
1 2000-04-09 955238400
2 2000-07-17 963792000
3 2000-10-24 972345600
4 2001-01-31 980899200
5 2001-05-10 989452800
6 2001-08-17 998006400
7 2001-11-24 1006560000
8 2002-03-03 1015113600
9 2002-06-10 1023667200

You probably have a NaN inside your dataframe - computations with NaN won't be caught (as in x/1000), but when trying to convert it to a datetime object, it will throw the Value Too Large error. When you call max on the dataframe, it won't return NaN, which is why it works in that case.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas KeyError when using .loc() [duplicate] - python

Related

a['Year'] = a['Date'].dt.year creates a additional .0 [duplicate]

how to add data in index column in panda dataframe [duplicate]

Python Timedelta[M] adds incomplete days

Add days to datetime object

Pandas: [Errno 75] Value too large for defined data type

Categories

Resources