How to put the dates in chronological order in python CSV - python

So I am working on processing some data and after running my file, everything is fine except that the date are not in order. I used the code below to try putting them in order but didnt work. by the way 'updated_at' is the column I am trying to put in chronological
df = df.sort_values(by=["updated_at"], ascending=True)
Please let me know how I can make this work. I have attached a picture to for better understanding of my question.
"updated_at" column pic

We are missing a bit of the context, but it could be that the column "updated_at" is not a datetime column, but a simple string, since it looks to me it is sorted alfabetically. Check with df["updated_at"].dtype and if it's not a datetime type (it will probably be of type "object") then do
df["update_at"] = pd.to_datetime(df["update_at"])

Related

How to correctly read specific csv column

Hey everyone my question is kinda silly but i am new to python)
I am writing a python script for c# aplication and i got kinda strange issue when i work with csv document.
When i open it it and work with Date column it works fine
df=pd.read_csv("../Debug/StockHistoryData.csv")
df = df[['Date']]
But when i try to work with another columns it throws error
df = df[['Close/Last']]
KeyError: "None of [Index(['Close/Last'], dtype='object')] are in the [columns]"
It says there are no such Index but but when i print the whole table it works fine and shows all columns
Table Image
Error image
Take a look at the first row of your CSV file.
It contains colum names (comma-separated).
From time to time it happens that this initial line contains a space after
each comma.
For a human being it is quite readable and even intuitive.
But read_csv adds these spaces to column names, what is sometimes difficult
to discover.
Another option is to run print(df.columns) after you read your file.
Then look for any extra spaces in column names.

Dropping rows in a Data Frame

I am trying to drop some specific rows in a DataFrame df where, the column Time is anything except 06:00:00. I tried the following code but it dosen't seem to work. I even tried adding another column Index to my file to aid the process but still it is not working. Can you please help me. I am attaching the screenshots.
The val just contains the specific time 06:00:00. Also, please ignore the variable req. Thanks a lot.
In pandas, by default drop isn't inplace operation. Try specifying df.drop(j, inplace=True).
Have you tried?
df = df.drop(df[//expresion here//].index)
Or even better:
df = df[~df.a.str.contains("06:00:00")]
Where a is the name of the column you want to search the time in

Reading Date times from Excel to Python using Pandas

I'm trying to read from an Excel file that gets converted to python and then gets split into numbers (Integers and floats) and everything else. There are numerous columns of different types.
I currently bring in the data with
pd.read_excel
and then split the data up with
DataFrame.select_dtypes("number")
When users upload a time (so 12:30:00) they expect for it to be recognized as a time. However python (currently) treats it as dtype object.
If I specify the column with parse_dates then it works, however since I don't know what the data is in advance I ideally want this to be done automatically. I`ve tried setting parse_dates = True however it doesn't seem to make a difference.
I'm not sure if there is a way to recognize the datatime after the file is uploaded. Again however I would want this to be done without having to specify the column (so anything that can be converted is)
Many Thanks
If your data contains only one column with dtype object (I assume it is a string) you can do the following:
1) filter the column with dtype object
import pandas as pd
datatime_col = df.select_dtypes(object)
2) convert it to seconds
datetime_col_in_seconds = pd.to_timedelta(datatime_col.loc[0]).dt.total_seconds()
Then you can re-append the converted column to your original data and/or do whatever processing you want.
Eventually, you can convert it back to datetime.
datetime_col = pd.to_datetime(datetime_col_in_seconds, unit='s')
if you have more than one column with dtype object you might have to do some more pre-processing but I guess this is a good way to start tackling your particular case.
This does what I need
for column_name in df.columns:
try:
df.loc[:, column_name] = pd.to_timedelta(df.loc[:, column_name].astype(str))
except ValueError:
pass
This tries to convert every column into a timedelta format. If it isn't capable of transforming it, it returns a value error and moves onto the next column.
After being run any columns that could be recognized as a timedelta format are transformed.

Pandas Dataframe Sorting by Date Time [duplicate]

New to Pandas, so maybe I'm missing a big idea?
I have a Pandas DataFrame of register transactions with shape like (500,4):
Time datetime64[ns]
Net Total float64
Tax float64
Total Due float64
I'm working through my code in a Python3 Jupyter notebook. I can't get past sorting any column. Working through the different code examples for sort, I'm not seeing the output reorder when I inspect the df. So, I've reduced the problem to trying to order just one column:
df.sort_values(by='Time')
# OR
df.sort_values(['Total Due'])
# OR
df.sort_values(['Time'], ascending=True)
No matter which column title, or which boolean argument I use, the displayed results never change order.
Thinking it could be a Jupyter thing, I've previewed the results using print(df), df.head(), and HTML(df.to_html()) (the last example is for Jupyter notebooks). I've also rerun the whole notebook from import CSV to this code. And, I'm also new to Python3 (from 2.7), so I get stuck with that sometimes, but I don't see how that's relevant in this case.
Another post has a similar problem, Python pandas dataframe sort_values does not work. In that instance, the ordering was on a column type string. But as you can see all of the columns here are unambiguously sortable.
Why does my Pandas DataFrame not display new order using sort_values?
df.sort_values(['Total Due']) returns a sorted DF, but it doesn't update DF in place.
So do it explicitly:
df = df.sort_values(['Total Due'])
or
df.sort_values(['Total Due'], inplace=True)
My problem, fyi, was that I wasn't returning the resulting dataframe, so PyCharm wasn't bothering to update said dataframe. Naming the dataframe after the return keyword fixed the issue.
Edit:
I had return at the end of my method instead of
return df,
which the debugger must of noticed, because df wasn't being updated in spite of my explicit, in-place sort.

Why does my Pandas DataFrame not display new order using `sort_values`?

New to Pandas, so maybe I'm missing a big idea?
I have a Pandas DataFrame of register transactions with shape like (500,4):
Time datetime64[ns]
Net Total float64
Tax float64
Total Due float64
I'm working through my code in a Python3 Jupyter notebook. I can't get past sorting any column. Working through the different code examples for sort, I'm not seeing the output reorder when I inspect the df. So, I've reduced the problem to trying to order just one column:
df.sort_values(by='Time')
# OR
df.sort_values(['Total Due'])
# OR
df.sort_values(['Time'], ascending=True)
No matter which column title, or which boolean argument I use, the displayed results never change order.
Thinking it could be a Jupyter thing, I've previewed the results using print(df), df.head(), and HTML(df.to_html()) (the last example is for Jupyter notebooks). I've also rerun the whole notebook from import CSV to this code. And, I'm also new to Python3 (from 2.7), so I get stuck with that sometimes, but I don't see how that's relevant in this case.
Another post has a similar problem, Python pandas dataframe sort_values does not work. In that instance, the ordering was on a column type string. But as you can see all of the columns here are unambiguously sortable.
Why does my Pandas DataFrame not display new order using sort_values?
df.sort_values(['Total Due']) returns a sorted DF, but it doesn't update DF in place.
So do it explicitly:
df = df.sort_values(['Total Due'])
or
df.sort_values(['Total Due'], inplace=True)
My problem, fyi, was that I wasn't returning the resulting dataframe, so PyCharm wasn't bothering to update said dataframe. Naming the dataframe after the return keyword fixed the issue.
Edit:
I had return at the end of my method instead of
return df,
which the debugger must of noticed, because df wasn't being updated in spite of my explicit, in-place sort.

Categories