Dataframe questions about read excel - python

I canĀ“t see the result...
My result is 0 and it should be 824
import pandas as pd
apple = r'C:\Users\User\Downloads\AAPL.xlsx'
data = pd.read_excel(apple)
dateindextime = data.set_index("timestamp")
rango = dateindextime.loc["2011-08-20":"2008-05-15"]
print(len(rango))
If I do
print(rango)
output:
Empty DataFrame Columns: [open, high, low, close, adjusted_close, volume] Index: []

Kinda hard to tell without the AAPL.xlsx dataset, but I'm guessing you will need to convert the "timestamp" column to a datetime object first using pd.to_datetime. From there you would slice on the datetime object vs slicing on a string, which is what you were doing below. If you posted the AAPL.xlsx dataset, I could dig deeper.
import pandas as pd
import datetime
apple = r'C:\Users\User\Downloads\AAPL.xlsx'
data = pd.read_excel(apple)
data["datetime_timestamp"] = pd.to_datetime(data["timestamp"], infer_datetime_format=True)
dateindextime = data.set_index("datetime_timestamp")
ti = datetime.date(2008,5,15)
tf = datetime.date(2011,8,20)
rango = dateindextime.loc[ti:tf]
print(len(rango))

Related

Change dateformat

I have this code where I wish to change the dataformat. But I only manage to change one line and not the whole dataset.
Code:
import pandas as pd
df = pd.read_csv ("data_q_3.csv")
result = df.groupby ("Country/Region").max().sort_values(by='Confirmed', ascending=False)[:10]
pd.set_option('display.max_column', None)
print ("Covid 19 top 10 countries based on confirmed case:")
print(result)
from datetime import datetime
datetime.fromisoformat("2020-03-18T12:13:09").strftime("%Y-%m-%d-%H:%M")
Does anyone know how to fit the code so that the datetime changes in the whole dataset?
Thanks!
After looking at your problem for a while, I figured out how to change the values in the 'DateTime' column. The only problem that may arise is if the 'Country/Region' column has duplicate location names.
Editing the time is simple, as all you have to do is make use of pythons slicing. You can slice a string by typing
string = 'abcdefghijklnmopqrstuvwxyz'
print(string[0:5])
which will result in abcdef.
Below is the finished code.
import pandas as pd
# read unknown data
df = pd.read_csv("data_q_3.csv")
# List of unknown data
result = df.groupby("Country/Region").max().sort_values(by='Confirmed', ascending=False)[:10]
pd.set_option('display.max_column', None)
# you need a for loop to go through the whole column
for row in result.index:
# get the current stored time
time = result.at[row, 'DateTime']
# reformat the time string by slicing the
# string from index 0 to 10, and from index 12 to 16
# and putting a dash in the middle
time = time[0:10] + "-" + time[12:16]
# store the new time in the result
result.at[row, 'DateTime'] = time
#print result
print ("Covid 19 top 10 countries based on confirmed case:")
print(result)

Pandas Dataframe: New Column that uses Country if Province is empty, else use the Province

The meat of what I'm trying to do can be seen at the bottom.
Here's the dataset I'm using: https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
What I want is to add to ['Names'] the data from ['Province/State'] if it isn't empty, else the data from ['Country/Region'].
I'm building an interactive heat map using plotly, and it works. But the problem is, there are multiple markers named "Canada" (for each of the states there) and Greenland is named "Denmark," because in the CSV file, "Greenland" is under "State/Province" and "Denmark" is under "Country/Region."
import pandas as pd
import plotly.graph_objects as go
import requests
from datetime import date, timedelta
yesterday = date.today() - timedelta(days=1)
confirmed_url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
deaths_url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'
yesterdays_date = yesterday.strftime('%-m/%d/%y')
confirmed = pd.read_csv(confirmed_url)
deaths = pd.read_csv(deaths_url)
confirmed.iloc[0]['Country/Region'] #Test
for place in deaths[['Province/State','Country/Region']]:
if place is float:
deaths_names.append('Country/Region')
else:
deaths_names.append('Province/State')
confirmed['Name'] = df(confirmed_names)
deaths['Name'] = df(deaths_names)
This worked:
def names_column(frame, lst): #Makes a new column called Name
for i in range(len(frame)):
if type(frame['Province/State'][i]) is str:
lst.append(frame['Province/State'][i])
else:
lst.append(frame['Country/Region'][i])
frame['Name'] = df(lst)
names_column(confirmed, confirmed_names)
names_column(deaths, deaths_names)

Reading csv ,saving in dataframe, putting if condition, not getting exectped result

I'm reading csv, saving it into dataframe and using if condition but I'm not getting expected result.
My python code below :
import pandas as pd
import numpy as np
import datetime
import operator
from datetime import datetime
dt = datetime.now ( ).strftime ( '%m/%d/%Y' )
stockRules = pd.read_csv("C:\stock_rules.csv", dtype={"Product Currently Out of Stock": str}).drop_duplicates(subset="Product Currently Out of Stock", keep="last" )
pd.to_datetime(stockRules['FROMMONTH'], format='%m/%d/%Y')
pd.to_datetime(stockRules['TOMONTH'], format='%m/%d/%Y')
if stockRules['FROMMONTH'] <= dt and stockRules['TOMONTH'] >= dt:
print(stockRules)
My csv file is below :
Productno FROMMONTH TOMONTH
120041 2/1/2019 5/30/2019
112940 2/1/2019 5/30/2019
121700 2/1/2019 2/1/2019
I want to read csv file and want to print the product number, which meets the condition only.
I played around with the code a bit and simplified it somewhat, but the idea behind the selection should still work the same:
dt = datetime.now().strftime("%m/%d/%Y")
stockRules = pd.read_csv("data.csv", delimiter=";")
stockRules["FROMMONTH"] = pd.to_datetime(stockRules["FROMMONTH"], format="%m/%d/%Y")
stockRules["TOMONTH"] = pd.to_datetime(stockRules["TOMONTH"], format="%m/%d/%Y")
sub = stockRules[(stockRules["FROMMONTH"] <= dt) & (dt <= stockRules["TOMONTH"])]
print(sub["Productno"])
Notice that when using pd.to_datetime I am assigning the result of the operation to the original column, overriding whatever was in it before.
Hope this helps.
EDIT:
For my tests I changed the CSV to use ; as delimiter, since I had trouble reading in the data you provided in your question. Might be that you will have to specify another delimiter. For tabs for example:
stockRules = pd.read_csv("data.csv", delimiter="\t")

Python/Pandas convert string to time only

I have the following Pandas dataframe in Python 2.7.
import pandas as pd
trial_num = [1,2,3,4,5]
sail_rem_time = ['11:33:11','16:29:05','09:37:56','21:43:31','17:42:06']
dfc = pd.DataFrame(zip(*[trial_num,sail_rem_time]),columns=['Temp_Reading','Time_of_Sail'])
print dfc
The dataframe looks like this:
Temp_Reading Time_of_Sail
1 11:33:11
2 16:29:05
3 09:37:56
4 21:43:31
5 17:42:06
This dataframe comes from a *.csv file. I use Pandas to read in the *.csv file as a Pandas dataframe. When I use print dfc.dtypes, it shows me that the column Time_of_Sail has a datatype object. I would like to convert this column to datetime datatype BUT I only want the time part - I don't want the year, month, date.
I can try this:
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'])
dfc['Time_of_Sail'] = [time.time() for time in dfc['Time_of_Sail']]
but the problem is that the when I run print dfc.dtypes it still shows that the column Time_of_Sail is object.
Is there a way to convert this column into a datetime format that only has the time?
Additional Information:
To create the above dataframe and output, this also works:
import pandas as pd
trial_num = [1,2,3,4,5]
sail_rem_time = ['11:33:11','16:29:05','09:37:56','21:43:31','17:42:06']
data = [
[trial_num[0],sail_rem_time[0]],
[trial_num[1],sail_rem_time[1]],[trial_num[2],sail_rem_time[2]],
[trial_num[3],sail_rem_time[3]]
]
dfc = pd.DataFrame(data,columns=['Temp_Reading','Time_of_Sail'])
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'])
dfc['Time_of_Sail'] = [time.time() for time in dfc['Time_of_Sail']]
print dfc
print dfc.dtypes
These two lines:
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'])
dfc['Time_of_Sail'] = [time.time() for time in dfc['Time_of_Sail']]
Can be written as:
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'],format= '%H:%M:%S' ).dt.time
Using to_timedelta,we can convert string to time format(timedelta64[ns]) by specifying units as second,min etc.,
dfc['Time_of_Sail'] = pd.to_timedelta(dfc['Time_of_Sail'], unit='s')
This seems to work:
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'], format='%H:%M:%S' ).apply(pd.Timestamp)
If anyone is searching for a more generalized answer try
dfc['Time_of_Sail']= pd.to_datetime(dfc['Time_of_Sail'])
If you just want a simple conversion you can do the below:
import datetime as dt
dfc.Time_of_Sail = dfc.Time_of_Sail.astype(dt.datetime)
or you could add a holder string to your time column as below, and then convert afterwards using an apply function:
dfc.Time_of_Sail = dfc.Time_of_Sail.apply(lambda x: '2016-01-01 ' + str(x))
dfc.Time_of_Sail = pd.to_datetime(dfc.Time_of_Sail).apply(lambda x: dt.datetime.time(x))

Use Pandas GroupBy Columns in new DataFrame

I have a large temperature time series that I'm performing some functions on. I'm taking hourly observations and creating daily statistics. After I'm done with my calculations, I want to use the grouped year and Julian days that are objects in the Groupby ('aa' below) and the drangeT and drangeHI arrays that come out and make an entirely new DataFrame with those variables. Code is below:
import numpy as np
import scipy.stats as st
import pandas as pd
city = ['BUF']#,'PIT','CIN','CHI','STL','MSP','DET']
mons = np.arange(5,11,1)
for a in city:
data = 'H:/Classwork/GEOG612/Project/'+a+'Data_cut.txt'
df = pd.read_table(data,sep='\t')
df['TempF'] = ((9./5.)*df['TempC'])+32.
df1 = df.loc[df['Month'].isin(mons)]
aa = df1.groupby(['Year','Julian'],as_index=False)
maxT = aa.aggregate({'TempF':np.max})
minT = aa.aggregate({'TempF':np.min})
maxHI = aa.aggregate({'HeatIndex':np.max})
minHI = aa.aggregate({'HeatIndex':np.min})
drangeT = maxT - minT
drangeHI = maxHI - minHI
df2 = pd.DataFrame(data = {'Year':aa.Year,'Day':aa.Julian,'TRange':drangeT,'HIRange':drangeHI})
All variables in the df2 command are of length 8250, but I get this error message when I run the it:
ValueError: cannot copy sequence with size 3 to array axis with dimension 8250
Any suggestions are welcomed and appreciated. Thanks!

Categories