This question already has answers here:
Increase index of pandas DataFrame by one
(2 answers)
Closed 6 months ago.
Currently I am trying to read in a .csv file and then use the to_html() to create a table with indexing on the side. All lines of code here:
import pandas as pd
df = pd.read_csv('file.csv')
df.to_html('example.html')
As expected I am currently getting:
Year Population Annual Growth Rate
0 1950 2557628654 1.458
1 1951 2594919657 1.611
2 1952 2636732631 1.717
3 1953 2681994386 1.796
4 1954 2730149884 1.899
However I want to start the indexing at 2 instead of 0. For example:
Year Population Annual Growth Rate
2 1950 2557628654 1.458
3 1951 2594919657 1.611
4 1952 2636732631 1.717
5 1953 2681994386 1.796
6 1954 2730149884 1.899
I know I could achieve this outcome by adding two dummy rows in the .csv file and then deleting them with df.ix[], but I do not want to do this.
Is there a way to change the indexing to start at something other than 0 without having to add or delete rows in the .csv file?
Thanks!
I know it looks like a hack, but what if just change index series. For example:
df.index = df.index + 2
Related
This question already has answers here:
How do I Pandas group-by to get sum?
(11 answers)
Closed 11 months ago.
I am quite new to pandas , this my data frame
name quantity weightage type
AAPL 10 20.0 TECH
FORD 20 12.0 AUTO
AMZN 15 10.0 TECH
TSLA 20 5.0 AUTO
op data frame should be, grouped by tech with sum of quantity and weightage,
name quantity weightage
TECH 25 30.0
AUTO 40 17.0
Use the pandas.DataFrame.groupby method to split the data frame according to the values in the type column, select quantity and weightage columns and sum over them:
df.groupby('type')[['quantity', 'weightage']].sum()
This question already has answers here:
Normalizing pandas DataFrame rows by their sums
(2 answers)
Getting NaN for dividing each row value by row sum
(2 answers)
Pandas sum across columns and divide each cell from that value
(5 answers)
Closed last year.
I got a dataframe roughly like this:
Continent 1 2
Country USA Canada Germany France
City Boston Chicago Vancouver Cologne Paris Marseille
Date
---------------------------------------------------------------------------
2018-01-01 176 10982 794 34225 1875 29001
2018-02-01 500 756 10001 4523 11022 NaN
What I would like to do is creating a df2 with relative values per row (e.g. instead of 176 I want to show how much percent 176 is of the total of the row 2018-01-01).
If I try df / df.sum(axis=1) * 100, I get 'ValueError: cannot join with no overlapping index names'
It does work for one row however: df.iloc[0,:] / df.iloc[0,:].sum() * 100
And it does work with a workaround (transpose and sum columns):
df2 = df.T / df.T.sum() * 100
df2 = df2.T
So I guess it has something to do with the Multilevel header?
I have a data frame like so. I am trying to make a plot with the mean of 'number' for each year on the y and the year on the x. I think what I have to do to do this is make a new data frame with 2 columns 'year' and 'avg number' for each year. How would I go about doing that?
year number
0 2010 40
1 2010 44
2 2011 33
3 2011 32
4 2012 34
5 2012 56
When opening a question about pandas please make sure you following these guidelines: How to make good reproducible pandas examples. It will help us reproduce your environment.
Assuming your dataframe is stored in the df variable:
df.groupby('year').mean().plot()
My pandas df has a column containg the birthyearof the household members and looks like this:
Birthyear_household_members
1960
1982 + 1989
1941
1951 + 1953
1990 + 1990
1992
I want to create a column with a variable that contains the number of people above 64 years old in a household.
Therefore, for each row, I need to separate the string and count the number of people with a birthyear before 1956.
How can I do this using pandas? My original df is very large.
Try use apply method of your df
df['cnt'] = df['Birthyear_household_members'].apply(lambda x: len([None for year in x.split(" + ") if year < '1956']))
This question already has answers here:
How to analyze all duplicate entries in this Pandas DataFrame?
(3 answers)
Closed 7 years ago.
I am new on Python. I would like to find the duplicated lines in a data frame.
To explain myself, I have the following data frame
type(data)
pandas.core.frame.DataFrame
data.head()
User Hour Min Day Month Year Latitude Longitude
0 0 1 48 17 10 2010 39.75000 -105.000000
1 0 6 2 16 10 2010 39.90625 -105.062500
2 0 3 48 16 10 2010 39.90625 -105.062500
3 0 18 25 14 10 2010 39.75000 -105.000000
I would like to find the duplicated lines in this data frame and to return the 'User' that corresponds to this line.
Thanks a lot,
Is this what you are looking for?
user = data[data.duplicated()]['User']