Python how to convert monthly employment data into annual, csv, panda

Python how to convert monthly employment data into annual, csv, panda - python

I've been stuck on this problem for two days. Below is the csv file.
df = pd.read_csv('/14100017.csv')
df = pd.DataFrame(data)
df.head()
df_year = df.groupby('REF_DATE')['REF_DATE'].count()
print(df_year)
This is my code. Could you please tell me or give me a hint or show me the website has similar questions. How to convert monthly employment data into annual by taking average? This is so confused.
Thank you very much! Much appreciated
I tried search similar questions in reddit forum and Stack overflow, they all used rsample and get the result.

Just convert REF_DATE to datetime and then extract year:
df['date'] = pd.to_datetime(df['REF_DATE'])
df['year'] = pd.DatetimeIndex(df['date']).year
After, you need to aggregate the value by year:
monthly_year_avg = df.groupby('year')['VALUE'].mean()

Related

Why does pd.to_datetime not take the year into account?

I've searched for 2 hours but can't find an answer for this that works.
I have this dataset I'm working with and I'm trying to find the latest date, but it seems like my code is not taking the year into account. Here are some of the dates that I have in the dataset.
Date
01/09/2023
12/21/2022
12/09/2022
11/19/2022
Here's a snippet from my code
import pandas as pd
df=pd.read_csv('test.csv')
df['Date'] = pd.to_datetime(df['Date'])
st.write(df['Date'].max())
st.write gives me 12/21/2022 as the output instead of 01/09/2023 as it should be. So it seems like the code is not taking the year into account and just looking at the month and date.
I tried changing the format to
df['Date'] = df['Date'].dt.strftime('%Y%m%d').astype(int) but that didn't change anything.

pandas.read_csv allows you to designate column for conversion into dates, let test.csv content be
Date
01/09/2023
12/21/2022
12/09/2022
11/19/2022
then
import pandas as pd
df = pd.read_csv('test.csv', parse_dates=["Date"])
print(df['Date'].max())
gives output
2023-01-09 00:00:00
Explanation: I provide list of names of columns holding dates, which then read_csv parses.
(tested in pandas 1.5.2)

Dataframe date sorting is reversed. How to fix it?

So, I have a dataframe (mean_df) with a very messy column with dates. It's messy because it is in this format: 1/1/2018, 1/2/2018, 1/3/2018.... When it should be 01/01/2018, 02/01/2018, 03/01/2018... Not only has the wrong format, but it's ascending by the first day of every month, and then following second day of every month, and so on...
So I wrote this code to fix the format:
mean_df["Date"] = mean_df["Date"].astype('datetime64[ns]')
mean_df["Date"] = mean_df["Date"].dt.strftime('%d-%m-%Y')
Then, from displaying this:
It's now showing this (I have to run the same cell 3 times to make it work, it always throws error the first time):
Finally, in the last few hours I've been trying to sort the 'Dates' column, in an ascending way, but it keeps sorting it the wrong way:
mean_df = mean_df.sort_values(by='Date') # I tried this
But this is the output:
As you can see, it is still ascending prioritizing days.
Can someone guide me in the right direction?
Thank you in advance!

Make it into right format
mean_df["sort_date"] = pd.to_datetime(mean_df["Date"],format = '%d/%m/%Y')
mean_df = mean_df.sort_values(by='sort_date') # Try this now

You should sort the date just after convert it to datetime since dt.strftime convert datetime to string
mean_df["Date"] = pd.to_datetime(mean_df["Date"], dayfirst=True)
mean_df = mean_df.sort_values(by='Date')
mean_df["Date"] = mean_df["Date"].dt.strftime('%d-%m-%Y')

Here is my sample code.
import pandas as pd
df = pd.DataFrame()
df['Date'] = "1/1/2018, 1/2/2018, 1/3/2018".split(", ")
df['Date1'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
df['Date2'] = df['Date1'].dt.strftime('%d/%m/%Y')
df.sort_values(by='Date2')
First, I convert Date to datetime format. As I observed, you data follows '%d/%m/%Y' format. If you want to show data in another form, try the following line, for example
df['Date2'] = df['Date1'].dt.strftime('%d/%m/%Y')

Using pandas and datetime in Jupyter to see during what hours no sales were ever made (on any day)

So I have sales data that I'm trying to analyze. I have datetime data ["Order Date Time"] and I'd like to see the most common hours for sales but more importantly I'd like to see what minutes have NO sales.
I have been spinning my wheels for a while and I can't get my brain around a solution. Any help is greatly appreciated.
I import the data:
df = pd.read_excel ('Audit Period.xlsx')
print (df)
I clean up the data:
# Remove all columns except `applieddate` and null rows
time_df = df[df["Order Date Time"].notnull()]
# Ensure the index is still sequential
time_df = time_df[["Order Date Time"]].reset_index(drop=True)
# Select the first 10 rows
time_df.head(10)
I convert to datetime and I look at the month totals:
# Convert applieddate to datetime
time_df = time_df.copy()
time_df["Order Date Time"] = time_df["Order Date Time"].apply(pd.to_datetime)
time_df = time_df.set_index(time_df["Order Date Time"])
# Group by month
grouped = time_df.resample("M").count()
time_df = pd.DataFrame({"count": grouped.values.flatten()}, index=grouped.index)
time_df.head(10)
I try to group by hour but that gives me totals per day/hour rather than totals per hour like every order ever at noon, etc:
# Group by hour
grouped = time_df.resample("2H").count()
time_df = pd.DataFrame({"count": grouped.values.flatten()}, index=grouped.index)
time_df.head(10)
And that is where I'm stuck. I'm trying to integrate the below suggestions but can't quite get a grasp on them yet. Any help would be appreciated.

Not sure if this is the most brilliant solution, but I would start by generating a dataframe at the level of detail I wanted, whether that is 1-hour intervals, 5-minute intervals, etc. Then in your df with all the actual data, you could do your grouping as you currently are doing it above. Once it is grouped, join the two. That way you have one dataframe that includes empty rows associated with time spans with no records. The tricky part will just be making sure you have your date and time formatted in a way that it will match and join properly.

How could calculate the time between dates in my dataframe to an outside date?

Please excuse my lack of knowledge but I am completely new at this, coming from a social services background. My classmates and I are all having trouble following our prof sadly. We have a data frame that I've reduced to the 2 columns needed (an excel doc). One column has different dates. We'd like to create a new df that tells us how many months are between all those dates and May 31, 2019, using DateTime. I'd appreciate any input or reference to something similar. The most recent step I've tried is x = DateTime.datetime(2019, 5, 31) but I'm not sure what to do next. I also made the df into an array but I'm also not sure if I'm even supposed to do that, to begin with.

let say the column is called 'date'
first convert it to a date object:
df.date = pd.to_datetime(df.date)
create new column
df['days_difference'] = (df.date - x).days
if you want it in month you can divide by 30.42?

how to extract month and year from a given date in th form of a string

good Evening,
I have a dataframe which consists of order date, dispatch date each having dates in the format 02-25-2013. I want to extract month and year from these dates and I want to generate new columns in my dataset as Order_Mt, Order_yr, Dispatch_Mt, Dispatch_Yr. I tried to extract by using strptime(). But no use. Can anyone tell me how to do this?
Thanks in advance

Use .dt to access the datetime methods.
Ex:
import pandas as pd
df = pd.DataFrame({'Order Date': ["02-25-2013"]})
df["Order Date"] = pd.to_datetime(df["Order Date"])
df["Order_Mt"] = df["Order Date"].dt.month
df["Order_yr"] = df["Order Date"].dt.year
print(df)
Output:
Order Date Order_Mt Order_yr
0 2013-02-25 2 2013

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python how to convert monthly employment data into annual, csv, panda - python

Just convert REF_DATE to datetime and then extract year: df['date'] = pd.to_datetime(df['REF_DATE']) df['year'] = pd.DatetimeIndex(df['date']).year After, you need to aggregate the value by year: monthly_year_avg = df.groupby('year')['VALUE'].mean()

Related

Why does pd.to_datetime not take the year into account?

Dataframe date sorting is reversed. How to fix it?

Using pandas and datetime in Jupyter to see during what hours no sales were ever made (on any day)

How could calculate the time between dates in my dataframe to an outside date?

how to extract month and year from a given date in th form of a string

Categories

Resources