I'm reading csv, saving it into dataframe and using if condition but I'm not getting expected result.
My python code below :
import pandas as pd
import numpy as np
import datetime
import operator
from datetime import datetime
dt = datetime.now ( ).strftime ( '%m/%d/%Y' )
stockRules = pd.read_csv("C:\stock_rules.csv", dtype={"Product Currently Out of Stock": str}).drop_duplicates(subset="Product Currently Out of Stock", keep="last" )
pd.to_datetime(stockRules['FROMMONTH'], format='%m/%d/%Y')
pd.to_datetime(stockRules['TOMONTH'], format='%m/%d/%Y')
if stockRules['FROMMONTH'] <= dt and stockRules['TOMONTH'] >= dt:
print(stockRules)
My csv file is below :
Productno FROMMONTH TOMONTH
120041 2/1/2019 5/30/2019
112940 2/1/2019 5/30/2019
121700 2/1/2019 2/1/2019
I want to read csv file and want to print the product number, which meets the condition only.
I played around with the code a bit and simplified it somewhat, but the idea behind the selection should still work the same:
dt = datetime.now().strftime("%m/%d/%Y")
stockRules = pd.read_csv("data.csv", delimiter=";")
stockRules["FROMMONTH"] = pd.to_datetime(stockRules["FROMMONTH"], format="%m/%d/%Y")
stockRules["TOMONTH"] = pd.to_datetime(stockRules["TOMONTH"], format="%m/%d/%Y")
sub = stockRules[(stockRules["FROMMONTH"] <= dt) & (dt <= stockRules["TOMONTH"])]
print(sub["Productno"])
Notice that when using pd.to_datetime I am assigning the result of the operation to the original column, overriding whatever was in it before.
Hope this helps.
EDIT:
For my tests I changed the CSV to use ; as delimiter, since I had trouble reading in the data you provided in your question. Might be that you will have to specify another delimiter. For tabs for example:
stockRules = pd.read_csv("data.csv", delimiter="\t")
Related
I canĀ“t see the result...
My result is 0 and it should be 824
import pandas as pd
apple = r'C:\Users\User\Downloads\AAPL.xlsx'
data = pd.read_excel(apple)
dateindextime = data.set_index("timestamp")
rango = dateindextime.loc["2011-08-20":"2008-05-15"]
print(len(rango))
If I do
print(rango)
output:
Empty DataFrame Columns: [open, high, low, close, adjusted_close, volume] Index: []
Kinda hard to tell without the AAPL.xlsx dataset, but I'm guessing you will need to convert the "timestamp" column to a datetime object first using pd.to_datetime. From there you would slice on the datetime object vs slicing on a string, which is what you were doing below. If you posted the AAPL.xlsx dataset, I could dig deeper.
import pandas as pd
import datetime
apple = r'C:\Users\User\Downloads\AAPL.xlsx'
data = pd.read_excel(apple)
data["datetime_timestamp"] = pd.to_datetime(data["timestamp"], infer_datetime_format=True)
dateindextime = data.set_index("datetime_timestamp")
ti = datetime.date(2008,5,15)
tf = datetime.date(2011,8,20)
rango = dateindextime.loc[ti:tf]
print(len(rango))
I have a dataframe:
id timestamp
1 "2025-08-02 19:08:59"
1 "2025-08-02 19:08:59"
1 "2025-08-02 19:09:59"
I need to turn timestamp into integer number to iterate over conditions. So it look like this:
id timestamp
1 20250802190859
1 20250802190859
1 20250802190959
you can convert string using string of pandas :
df = pd.DataFrame({'id':[1,1,1],'timestamp':["2025-08-02 19:08:59",
"2025-08-02 19:08:59",
"2025-08-02 19:09:59"]})
pd.set_option('display.float_format', lambda x: '%.3f' % x)
df['timestamp'] = df['timestamp'].str.replace(r'[-\s:]', '').astype('float64')
>>> df
id timestamp
0 1 20250802190859.000
1 1 20250802190859.000
2 1 20250802190959.000
Have you tried opening the file, skipping the first line (or better: validating that it contains the header fields as expected) and for each line, splitting it at the first space/tab/whitespace. The second part, e.g. "2025-08-02 19:08:59", can be parsed using datetime.fromisoformat(). You can then turn the datetime object back to a string using datetime.strftime(format) with e.g. format = '%Y%m%d%H%M%S'. Note that there is no "milliseconds" format in strftime though. You could use %f for microseconds.
Note: if datetime.fromisoformat() fails to parse the dates, try datetime.strptime(date_string, format) with a different format, e.g. format = '%Y-%m-%d %H:%M:%S'.
You can use the solutions provided in this post: How to turn timestamp into float number? and loop through the dataframe.
Let's say you have already imported pandas and have a dataframe df, see the additional code below:
import re
df = pd.DataFrame(l)
df1 = df.copy()
for x in range(len(df[0])):
df1[0][x] = re.sub(r'\D','', df[0][x])
This way you will not modify the original dataframe df and will get desired output in a new dataframe df1.
Full code that I tried (including creatiion of first dataframe), this might help in removing any confusions:
import pandas as pd
import re
l = ["2025-08-02 19:08:59", "2025-08-02 19:08:59", "2025-08-02 19:09:59"]
df = pd.DataFrame(l)
df1 = df.copy()
for x in range(len(df[0])):
df1[0][x] = re.sub(r'\D','', df[0][x])
I have this code where I wish to change the dataformat. But I only manage to change one line and not the whole dataset.
Code:
import pandas as pd
df = pd.read_csv ("data_q_3.csv")
result = df.groupby ("Country/Region").max().sort_values(by='Confirmed', ascending=False)[:10]
pd.set_option('display.max_column', None)
print ("Covid 19 top 10 countries based on confirmed case:")
print(result)
from datetime import datetime
datetime.fromisoformat("2020-03-18T12:13:09").strftime("%Y-%m-%d-%H:%M")
Does anyone know how to fit the code so that the datetime changes in the whole dataset?
Thanks!
After looking at your problem for a while, I figured out how to change the values in the 'DateTime' column. The only problem that may arise is if the 'Country/Region' column has duplicate location names.
Editing the time is simple, as all you have to do is make use of pythons slicing. You can slice a string by typing
string = 'abcdefghijklnmopqrstuvwxyz'
print(string[0:5])
which will result in abcdef.
Below is the finished code.
import pandas as pd
# read unknown data
df = pd.read_csv("data_q_3.csv")
# List of unknown data
result = df.groupby("Country/Region").max().sort_values(by='Confirmed', ascending=False)[:10]
pd.set_option('display.max_column', None)
# you need a for loop to go through the whole column
for row in result.index:
# get the current stored time
time = result.at[row, 'DateTime']
# reformat the time string by slicing the
# string from index 0 to 10, and from index 12 to 16
# and putting a dash in the middle
time = time[0:10] + "-" + time[12:16]
# store the new time in the result
result.at[row, 'DateTime'] = time
#print result
print ("Covid 19 top 10 countries based on confirmed case:")
print(result)
I have a code with python that cleans a .csv up before I append it to another data set. It is missing a couple columns so I have been trying to figure how to use Pandas to add the column and fill the rows.
I currently have a column DiscoveredDate in a format of 10/1/2017 12:49.
What I'm trying to do is take that column and anything from the date range 10/1/2016-10/1/2017 have a column FedFY have its row filled with 2017 and like wise for 2018.
Below is my current script minus a few different column cleanups.
import os
import re
import pandas as pd
import Tkinter
import numpy as np
outpath = os.path.join(os.getcwd(), "CSV Altered")
# TK asks user what file to assimilate
from Tkinter import Tk
from tkFileDialog import askopenfilename
Tk().withdraw()
filepath = askopenfilename() # show an "Open" dialog box and return the path to the selected file
#Filepath is acknowledged and disseminated with the following totally human protocols
filenames = os.path.basename(filepath)
filename = [filenames]
for f in filename:
name = f
df = pd.read_csv(f)
# Make Longitude values negative if they aren't already.
df['Longitude'] = - df['Longitude'].abs()
# Add Federal Fiscal Year Field (FedFY)
df['FedFY'] = df['DiscoveredDate']
df['FedFY'] = df['FedFY'].replace({df['FedFY'].date_range(10/1/2016 1:00,10/1/2017 1:00): "2017",df['FedFY'].date_range(10/1/2017 1:00, 10/1/2018 1:00): "2018"})
I also tried this but figured I was completely fudging it up.
for rows in df['FedFY']:
if rows = df['FedFY'].date_range(10/1/2016 1:00, 10/1/2017 1:00):
then df['FedFY'] = df['FedFY'].replace({rows : "2017"})
elif df['FedFY'] = df['FedFY'].replace({rows : "2018"})
How should I go about this efficiently? Is it just my syntax messing me up? Or do I have it all wrong?
[Edited for clarity in title and throughout.]
Ok thanks to DyZ I am making progress; however, I figured out a much simpler way to do so, that figures all years.
Building on his np.where, I:
From datetime import datetime
df['Date'] = pd.to_datetime(df['DiscoveredDate'])
df['CalendarYear'] = df['Date'].dt.year
df['Month'] = df.Date.dt.month
c = pd.to_numeric(df['CalendarYear'])
And here is the magic line.
df['FedFY'] = np.where(df['Month'] >= 10, c+1, c)
To Mop up I added a line to get it back into date time format from numeric.
df['FedFY'] = (pd.to_datetime(df['FedFY'], format = '%Y')).dt.year
This is what really crossed the bridge for me Create a column based off a conditional with pandas.
Edit: Forgot to mention to import date time for .dt stuff
If you are concerned only with these two FYs, you can compare your date directly to the start/end dates:
df["FedFY"] = np.where((df.DiscoveredDate < pd.to_datetime("10/1/2017")) &\
(df.DiscoveredDate > pd.to_datetime("10/1/2016")),
2017, 2018)
Any date before 10/1/2016 will be labeled incorrectly! (You can fix this by adding another np.where).
Make sure that the start/end dates are correctly included or not included (change < and/or > to <= and >=, if necessary).
I have the following Pandas dataframe in Python 2.7.
import pandas as pd
trial_num = [1,2,3,4,5]
sail_rem_time = ['11:33:11','16:29:05','09:37:56','21:43:31','17:42:06']
dfc = pd.DataFrame(zip(*[trial_num,sail_rem_time]),columns=['Temp_Reading','Time_of_Sail'])
print dfc
The dataframe looks like this:
Temp_Reading Time_of_Sail
1 11:33:11
2 16:29:05
3 09:37:56
4 21:43:31
5 17:42:06
This dataframe comes from a *.csv file. I use Pandas to read in the *.csv file as a Pandas dataframe. When I use print dfc.dtypes, it shows me that the column Time_of_Sail has a datatype object. I would like to convert this column to datetime datatype BUT I only want the time part - I don't want the year, month, date.
I can try this:
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'])
dfc['Time_of_Sail'] = [time.time() for time in dfc['Time_of_Sail']]
but the problem is that the when I run print dfc.dtypes it still shows that the column Time_of_Sail is object.
Is there a way to convert this column into a datetime format that only has the time?
Additional Information:
To create the above dataframe and output, this also works:
import pandas as pd
trial_num = [1,2,3,4,5]
sail_rem_time = ['11:33:11','16:29:05','09:37:56','21:43:31','17:42:06']
data = [
[trial_num[0],sail_rem_time[0]],
[trial_num[1],sail_rem_time[1]],[trial_num[2],sail_rem_time[2]],
[trial_num[3],sail_rem_time[3]]
]
dfc = pd.DataFrame(data,columns=['Temp_Reading','Time_of_Sail'])
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'])
dfc['Time_of_Sail'] = [time.time() for time in dfc['Time_of_Sail']]
print dfc
print dfc.dtypes
These two lines:
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'])
dfc['Time_of_Sail'] = [time.time() for time in dfc['Time_of_Sail']]
Can be written as:
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'],format= '%H:%M:%S' ).dt.time
Using to_timedelta,we can convert string to time format(timedelta64[ns]) by specifying units as second,min etc.,
dfc['Time_of_Sail'] = pd.to_timedelta(dfc['Time_of_Sail'], unit='s')
This seems to work:
dfc['Time_of_Sail'] = pd.to_datetime(dfc['Time_of_Sail'], format='%H:%M:%S' ).apply(pd.Timestamp)
If anyone is searching for a more generalized answer try
dfc['Time_of_Sail']= pd.to_datetime(dfc['Time_of_Sail'])
If you just want a simple conversion you can do the below:
import datetime as dt
dfc.Time_of_Sail = dfc.Time_of_Sail.astype(dt.datetime)
or you could add a holder string to your time column as below, and then convert afterwards using an apply function:
dfc.Time_of_Sail = dfc.Time_of_Sail.apply(lambda x: '2016-01-01 ' + str(x))
dfc.Time_of_Sail = pd.to_datetime(dfc.Time_of_Sail).apply(lambda x: dt.datetime.time(x))