This is a dataframe with data of military spending for some countries from 2010-2017. I would to convert the years row of the dataframe:
into a column with the name "Year" and another one with the values corresponding to each year for each country. It should look like this dataframe (ignore name of third column, it's just an example):
Using
df.reset_index().melt('Country')
Related
See the following data snippet:
The column from the right is a data variable ranging from August 2001 to August '97
What I would like to do is merge all these columns together into one 'Date' column. For further context, the columns are of equal length.
If all you need is the dates, how much was purchased and the id of the material you could drop the columns that aren't dates (i.e. Del. time- Total) and transpose your dataset.
In pandas
dataframe = dataframe.T
Within one dataframe, trying to concatenate rows which have the same price, customer etc. Only variable that's changing is Month.
Would the best solution be to split the dataframe into 2 ? and then use a merge function?
I'm trying a project to get the average stock price of each year but currently, I'm stuck with a problem. I have a CSV file with two columns: Date(YYYY-MM-DD) and High. Basically, I want to create a third column called 'Year' and for every row, I want to take just the year from the date column and add it to the 'Year' column.
Here is my initial table:
Here is my desired output table:
Note: I just know how to add a column but I am not sure how to index the date of each row and append it to the 'Year' column for each row. So for example, for the row with the date '1980-12-12', I want the year column to have just '1980', for the row with the date '1980-12-18', I want the year column to have just '1980', etc.
Here is my code currently:
import pandas as pd
appleStock = pd.read_csv("Apple_stock_history.csv")
for i in appleStock["Date"]:
appleStock["Year"] = i[0:4]
print(appleStock.head())
My output for the code is:
I figured out that my code is pretty inconsistent; basically there is are more rows in the original CSV file... The last row has a date of '2022-01-03' (which probably explains why I am getting that in my year column every time. In line 4 of my code, when I change it to appleStock["Year"] = i[0:], it gives me the entire date (2022-01-03).
If your df['date'] is str format like this :
df = pd.DataFrame({
'Date' : ['1980-12-12','1981-12-12'],
'High' : [0.1, 0.2]
})
print(df['Date'][0],type(df['Date'][0]))
1980-12-12 <class 'str'>
You can try this :
df['year'] = df['Date'].str[0:4]
I have a DataFrame with a date_time column. The date_time column contains a date and time. I also managed to convert the column to a datetime object.
I want to create a new DataFrame containing all the rows of a specific DAY.
I managed to do it when I set the date column as the index and used the "loc" method.
Is there a way to do it even if the date column is not set as the index? I only found a method which returns the rows between two days.
You can use groupby() function. Let's say your dataframe is df,
df_group = df.groupby('Date') # assuming the column containing dates is called Date.
Now you can access rows of any date by passing the date in the get_group function,
df_group.get_group('date_here')
I am trying to filter on certain values in many columns
in Column (Dimension) filter on Education, then in next column (indicator name), filter on Mean years of schooling (years), then in Country Name Column filter on USA, Canada,.....etc
I have tried the below script but I couldn't filter on the specifics mentioned above
raw_data={}
for Dimension in new_df["Dimension"]:
dimension_df=new_df.loc[new_df["Dimension"]==Dimension]
arr=[]
arr.append(dimension_df["Indicator Name"].values[0])
arr.append(dimension_df["ISO Country Code"].values[0])
raw_data[Dimension]=arr
pd.DataFrame(raw_data)