I have a code with python that cleans a .csv up before I append it to another data set. It is missing a couple columns so I have been trying to figure how to use Pandas to add the column and fill the rows.
I currently have a column DiscoveredDate in a format of 10/1/2017 12:49.
What I'm trying to do is take that column and anything from the date range 10/1/2016-10/1/2017 have a column FedFY have its row filled with 2017 and like wise for 2018.
Below is my current script minus a few different column cleanups.
import os
import re
import pandas as pd
import Tkinter
import numpy as np
outpath = os.path.join(os.getcwd(), "CSV Altered")
# TK asks user what file to assimilate
from Tkinter import Tk
from tkFileDialog import askopenfilename
Tk().withdraw()
filepath = askopenfilename() # show an "Open" dialog box and return the path to the selected file
#Filepath is acknowledged and disseminated with the following totally human protocols
filenames = os.path.basename(filepath)
filename = [filenames]
for f in filename:
name = f
df = pd.read_csv(f)
# Make Longitude values negative if they aren't already.
df['Longitude'] = - df['Longitude'].abs()
# Add Federal Fiscal Year Field (FedFY)
df['FedFY'] = df['DiscoveredDate']
df['FedFY'] = df['FedFY'].replace({df['FedFY'].date_range(10/1/2016 1:00,10/1/2017 1:00): "2017",df['FedFY'].date_range(10/1/2017 1:00, 10/1/2018 1:00): "2018"})
I also tried this but figured I was completely fudging it up.
for rows in df['FedFY']:
if rows = df['FedFY'].date_range(10/1/2016 1:00, 10/1/2017 1:00):
then df['FedFY'] = df['FedFY'].replace({rows : "2017"})
elif df['FedFY'] = df['FedFY'].replace({rows : "2018"})
How should I go about this efficiently? Is it just my syntax messing me up? Or do I have it all wrong?
[Edited for clarity in title and throughout.]
Ok thanks to DyZ I am making progress; however, I figured out a much simpler way to do so, that figures all years.
Building on his np.where, I:
From datetime import datetime
df['Date'] = pd.to_datetime(df['DiscoveredDate'])
df['CalendarYear'] = df['Date'].dt.year
df['Month'] = df.Date.dt.month
c = pd.to_numeric(df['CalendarYear'])
And here is the magic line.
df['FedFY'] = np.where(df['Month'] >= 10, c+1, c)
To Mop up I added a line to get it back into date time format from numeric.
df['FedFY'] = (pd.to_datetime(df['FedFY'], format = '%Y')).dt.year
This is what really crossed the bridge for me Create a column based off a conditional with pandas.
Edit: Forgot to mention to import date time for .dt stuff
If you are concerned only with these two FYs, you can compare your date directly to the start/end dates:
df["FedFY"] = np.where((df.DiscoveredDate < pd.to_datetime("10/1/2017")) &\
(df.DiscoveredDate > pd.to_datetime("10/1/2016")),
2017, 2018)
Any date before 10/1/2016 will be labeled incorrectly! (You can fix this by adding another np.where).
Make sure that the start/end dates are correctly included or not included (change < and/or > to <= and >=, if necessary).
Related
The available options dates are below. How can I write a code so that it pulls all those dates instead of having to type them all out in a separate row?
2022-03-11, 2022-03-18, 2022-03-25, 2022-04-01, 2022-04-08, 2022-04-14, 2022-04-22, 2022-05-20, 2022-06-17, 2022-07-15, 2022-10-21, 2023-01-20, 2024-01-19
import yfinance as yf
gme = yf.Ticker("gme")
opt = gme.option_chain('2022-03-11')
print(opt)
First of all, as these dates have no regular pattern, you should create a list of the dates.
list1=['2022-03-11', '2022-03-18', '2022-03-25', '2022-04-01', '2022-04-08', '2022-04-14', '2022-04-22', '2022-05-20', '2022-06-17', '2022-07-15', '2022-10-21', '2023-01-20', '2024-01-19']
After you have created the list, you can initiate your code as how you have done:
import yfinance as yf
gme = yf.Ticker("gme")
But right now, since you would want to have everything being printed out, and I assume you would need to save it to file for a better view (as I have checked the output and I personally prefer csv for yfinance), you can do this:
for date in list1:
df = gme.option_chain(date)
df_call = df[0]
df_put = df[1]
df_call.to_csv(f'call_{date}.csv')
df_put.to_csv(f'put_{date}.csv')
I have this code where I wish to change the dataformat. But I only manage to change one line and not the whole dataset.
Code:
import pandas as pd
df = pd.read_csv ("data_q_3.csv")
result = df.groupby ("Country/Region").max().sort_values(by='Confirmed', ascending=False)[:10]
pd.set_option('display.max_column', None)
print ("Covid 19 top 10 countries based on confirmed case:")
print(result)
from datetime import datetime
datetime.fromisoformat("2020-03-18T12:13:09").strftime("%Y-%m-%d-%H:%M")
Does anyone know how to fit the code so that the datetime changes in the whole dataset?
Thanks!
After looking at your problem for a while, I figured out how to change the values in the 'DateTime' column. The only problem that may arise is if the 'Country/Region' column has duplicate location names.
Editing the time is simple, as all you have to do is make use of pythons slicing. You can slice a string by typing
string = 'abcdefghijklnmopqrstuvwxyz'
print(string[0:5])
which will result in abcdef.
Below is the finished code.
import pandas as pd
# read unknown data
df = pd.read_csv("data_q_3.csv")
# List of unknown data
result = df.groupby("Country/Region").max().sort_values(by='Confirmed', ascending=False)[:10]
pd.set_option('display.max_column', None)
# you need a for loop to go through the whole column
for row in result.index:
# get the current stored time
time = result.at[row, 'DateTime']
# reformat the time string by slicing the
# string from index 0 to 10, and from index 12 to 16
# and putting a dash in the middle
time = time[0:10] + "-" + time[12:16]
# store the new time in the result
result.at[row, 'DateTime'] = time
#print result
print ("Covid 19 top 10 countries based on confirmed case:")
print(result)
The meat of what I'm trying to do can be seen at the bottom.
Here's the dataset I'm using: https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
What I want is to add to ['Names'] the data from ['Province/State'] if it isn't empty, else the data from ['Country/Region'].
I'm building an interactive heat map using plotly, and it works. But the problem is, there are multiple markers named "Canada" (for each of the states there) and Greenland is named "Denmark," because in the CSV file, "Greenland" is under "State/Province" and "Denmark" is under "Country/Region."
import pandas as pd
import plotly.graph_objects as go
import requests
from datetime import date, timedelta
yesterday = date.today() - timedelta(days=1)
confirmed_url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
deaths_url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'
yesterdays_date = yesterday.strftime('%-m/%d/%y')
confirmed = pd.read_csv(confirmed_url)
deaths = pd.read_csv(deaths_url)
confirmed.iloc[0]['Country/Region'] #Test
for place in deaths[['Province/State','Country/Region']]:
if place is float:
deaths_names.append('Country/Region')
else:
deaths_names.append('Province/State')
confirmed['Name'] = df(confirmed_names)
deaths['Name'] = df(deaths_names)
This worked:
def names_column(frame, lst): #Makes a new column called Name
for i in range(len(frame)):
if type(frame['Province/State'][i]) is str:
lst.append(frame['Province/State'][i])
else:
lst.append(frame['Country/Region'][i])
frame['Name'] = df(lst)
names_column(confirmed, confirmed_names)
names_column(deaths, deaths_names)
I am looking for Ironpython script that can help me reverse, sort, and order date column from report/data flow in spotfire 7.8v, & it has to select the latest date/current date in the column (filter type: Listbox) within text area visualization.
Here's what i have tried to reverse sort dates but this doesn't help to select latest date (may be a little tweak somewhere in the script below needed i guess). If not please provide new script resolves this issue.
IRONPython SCRIPT:
from System.Reflection import Assembly
from Spotfire.Dxp.Data.Collections import *
from System.Runtime.Serialization import ISerializable
from System.Collections import IComparer
from System.Collections.Generic import IComparer
column = Document.Data.Tables['DATA_TABLE'].Columns['COLUMN_NAME']
values = column.RowValues.GetEnumerator()
myValues = []
for val in values:
if val.HasValidValue: #exclude empty values
myValues.Add(val.ValidValue)
myValues.sort(reverse=True)
column.Properties.SetCustomSortOrder(myValues)
Thanks
Maddy
I have found answer to my question hope this helps for other users as well. Please find below Iron python script for reverse sort date order(descending order) that has to select latest date in list box filter.
from System.Reflection import Assembly
from Spotfire.Dxp.Data.Collections import *
from System.Runtime.Serialization import ISerializable`
from System.Collections import IComparer
from System.Collections.Generic import IComparer
from System.Collections.Generic import IComparer
column = Document.Data.Tables['TABLE_NAME'].Columns['COLUMN_NAME']
values = column.RowValues.GetEnumerator()
#Val is a value inside the Column specified in the GetFilter
#print values
myValues = []
for Val in values:
if Val.HasValidValue: #exclude empty values
myValues.Add(Val.ValidValue)
print Val.ValidValue
myValues.sort(reverse=True)
column.Properties.SetCustomSortOrder(myValues)
myPanel = Document.ActivePageReference.FilterPanel
myFilter= myPanel.TableGroups[0].GetFilter("COLUMN_NAME")
#ListBoxFilter
#myFilter.FilterReference.TypeId = FilterTypeIdentifiers.ListBoxFilter
listBoxFilter = myFilter.FilterReference.As[filters.ListBoxFilter]()
listBoxFilter.IncludeAllValues=False
listBoxFilter.SetSelection(myValues[0])
#uncheck all
#ListBoxFilter.IncludeEmpty = False #make sure to clear the empty values
#for value in ListBoxFilter.Values:
# ListBoxFilter.Uncheck(value)
#ListBoxFilter.Check(myValues)
I'm reading csv, saving it into dataframe and using if condition but I'm not getting expected result.
My python code below :
import pandas as pd
import numpy as np
import datetime
import operator
from datetime import datetime
dt = datetime.now ( ).strftime ( '%m/%d/%Y' )
stockRules = pd.read_csv("C:\stock_rules.csv", dtype={"Product Currently Out of Stock": str}).drop_duplicates(subset="Product Currently Out of Stock", keep="last" )
pd.to_datetime(stockRules['FROMMONTH'], format='%m/%d/%Y')
pd.to_datetime(stockRules['TOMONTH'], format='%m/%d/%Y')
if stockRules['FROMMONTH'] <= dt and stockRules['TOMONTH'] >= dt:
print(stockRules)
My csv file is below :
Productno FROMMONTH TOMONTH
120041 2/1/2019 5/30/2019
112940 2/1/2019 5/30/2019
121700 2/1/2019 2/1/2019
I want to read csv file and want to print the product number, which meets the condition only.
I played around with the code a bit and simplified it somewhat, but the idea behind the selection should still work the same:
dt = datetime.now().strftime("%m/%d/%Y")
stockRules = pd.read_csv("data.csv", delimiter=";")
stockRules["FROMMONTH"] = pd.to_datetime(stockRules["FROMMONTH"], format="%m/%d/%Y")
stockRules["TOMONTH"] = pd.to_datetime(stockRules["TOMONTH"], format="%m/%d/%Y")
sub = stockRules[(stockRules["FROMMONTH"] <= dt) & (dt <= stockRules["TOMONTH"])]
print(sub["Productno"])
Notice that when using pd.to_datetime I am assigning the result of the operation to the original column, overriding whatever was in it before.
Hope this helps.
EDIT:
For my tests I changed the CSV to use ; as delimiter, since I had trouble reading in the data you provided in your question. Might be that you will have to specify another delimiter. For tabs for example:
stockRules = pd.read_csv("data.csv", delimiter="\t")