I am reading a JSON with the following contents
{"aa": 10, "bb": 20}
df = pd.read_json("filename.json", orient='index')
print(df)
0
aa 10
bb 20
How can I rename the columns of the data frame to something like "country, value"?
here is one way about it
df.reset_index(inplace=True)
cols = ['Country','value']
df.columns=cols
df
Country value
0 aa 10
1 bb 20
OR
cols = [ 'value']
df.columns=cols
df.rename_axis(columns=['Country'], inplace=True)
df
Country value
aa 10
bb 20
Related
My Pandas data frame contains the following data reading from a csv file:
id,values
1001-MAC, 10
1034-WIN, 20
2001-WIN, 15
3001-MAC, 45
4001-LINUX, 12
4001-MAC, 67
df = pd.read_csv('example.csv')
df.set_index('id', inplace=True)
I have to sort this data frame based on the id column order by given suffix list = ["WIN", "MAC", "LINUX"]. Thus, I would like to get the following output:
id,values
1034-WIN, 20
2001-WIN, 15
1001-MAC, 10
3001-MAC, 45
4001-MAC, 67
4001-LINUX, 12
How can I do that?
Here is one way to do that:
import pandas as pd
df = pd.read_csv('example.csv')
idx = df.id.str.split('-').str[1].sort_values(ascending=False).index
df = df.loc[idx]
df.set_index('id', inplace=True)
print(df)
Try:
df = df.sort_values(
by=["id"], key=lambda x: x.str.split("-").str[1], ascending=False
)
print(df)
Prints:
id values
1 1034-WIN 20
2 2001-WIN 15
0 1001-MAC 10
3 3001-MAC 45
5 4001-MAC 67
4 4001-LINUX 12
Add a column to a dataframe that would contain only prefixes (use str.split() function for that) and sort whole df based on that new column.
import pandas as pd
df = pd.DataFrame({
"id":["1001-MAC", "1034-WIN", "2001-WIN", "3001-MAC", "4001-LINUX", "4001-MAC"],
"values":[10, 20, 15, 45, 12, 67]
})
df["id_postfix"] = df["id"].apply(lambda x: x.split("-")[1])
df = df.sort_values("id_postfix", ascending=False)
df = df[["id", "values"]]
print(df)
Please be sure to answer the question. Provide details and share your research!
:
I taking the confirm cases of data from here :
https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
I load the data in python using Pandas Dataframe .
my problem is : i am trying to make the columns of the date as rows , and the ' Country/Region' column as columns .
url_confirmed = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
df = pd.read_csv(url_confirmed)
df = df.drop(columns=['Province/State','Lat','Long'],axis=1)
df_piv = pd.melt(df,id_vars=['Country/Region'],var_name='Date',value_name="Value")
I get until here and really don't know how to proceed
my final dataframe suppose to look like this :
Date Afghanistan Albania and so on
0 1/22/20 0 val
1 1/23/20 300 val
3 1/24/20 4023 val
6 1/25/20 300 val
7 1/26/20 2000 val
8 .. ..
.
.
**Thank You Very Much **
I think a simple transpose with renaming a column should do it:
url_confirmed = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
df = pd.read_csv(url_confirmed)
df = df.drop(columns=['Province/State','Lat','Long'],axis=1)
df = df.T.reset_index() # Transpose and reset index
df.columns = df.iloc[0] # Set first row as header
df = df[1:]
df.rename(columns = {'Country/Region' : 'Date'}, inplace=True)
Want to replace some rows of some columns in a bigger pandas df by data in a smaller pandas df. The column names are same in both.
Tried using combine_first but it only updates the null values.
For example lets say df1.shape is 100, 25 and df2.shape is 10,5
df1
A B C D E F G ...Z Y Z
1 abc 10.20 0 pd.NaT
df2
A B C D E
1 abc 15.20 1 10
Now after replacing df1 should look like:
A B C D E F G ...Z Y Z
1 abc 15.20 1 10 ...
To replace values in df1 the condition is where df1.A = df2.A and df1.B = df2.B
How can it be achieved in the most pythonic way? Any help will be appreciated.
Don't know I really understood your question does this solves your problem ?
df1 = pd.DataFrame(data={'A':[1],'B':[2],'C':[3],'D':[4]})
df2 = pd.DataFrame(data={'A':[1],'B':[2],'C':[5],'D':[6]})
new_df=pd.concat([df1,df2]).drop_duplicates(['A','B'],keep='last')
print(new_df)
output:
A B C D
0 1 2 5 6
You could play with Multiindex.
First let us create those dataframe that you are working with:
cols = pd.Index(list(ascii_uppercase))
vals = np.arange(100*len(cols)).reshape(100, len(cols))
df = pd.DataFrame(vals, columns=cols)
df1 = pd.DataFrame(vals[:10,:5], columns=cols[:5])
Then transform A and B in indices:
df = df.set_index(["A","B"])
df1 = df1.set_index(["A","B"])*1.5 # multiply just to make the other values different
df.loc[df1.index, df1.columns] = df1
df = df.reset_index()
I have two dataFrame in Python.
The first one is df1:
'ID' 'B'
AA 10
BB 20
CC 30
DD 40
The second one is df2:
'ID' 'C' 'D'
BB 30 0
DD 35 0
What I want to get finally is like df3:
'ID' 'C' 'D'
BB 30 20
DD 35 40
how to reach this goal?
my code is:
for i in df.ID
if len(df2.ID[df2.ID==i]):
df2.D[df2.ID==i]=df1.B[df2.ID==i]
but it doesn't work.
So first of all, I've interpreted the question differently, since your description is rather ambiguous. Mine boils down to this:
df1 is this data structure:
ID B <- column names
AA 10
BB 20
CC 30
DD 40
df2 is this data structure:
ID C D <- column names
BB 30 0
DD 35 0
Dataframes have a merge option, if you wanted to merge based on index the following code would work:
import pandas as pd
df1 = pd.DataFrame(
[
['AA', 10],
['BB', 20],
['CC', 30],
['DD', 40],
],
columns=['ID','B'],
)
df2 = pd.DataFrame(
[
['BB', 30, 0],
['DD', 35, 0],
], columns=['ID', 'C', 'D']
)
df3 = pd.merge(df1, df2, on='ID')
Now df3 only contains rows with ID's in both df1 and df2:
ID B C D <- column names
BB 20 30 0
DD 40 35 0
Now you were trying to remove D, and fill it in with column B, a.k.a
ID C D
BB 30 20
DD 35 40
Something that can be done with these simple steps:
df3 = pd.merge(df1, df2, on='ID') # merge them
df3.D = df3['B'] # set D to B's values
del df3['B'] # remove B from df3
Or to summarize:
def match(df1, df2):
df3 = pd.merge(df1, df2, on='ID') # merge them
df3.D = df3['B'] # set D to B's values
del df3['B'] # remove B from df3
return df3
Following code will replace zero in df1 with value df2
df1=pd.DataFrame(['A','B',0,4,6],columns=['x'])
df2=pd.DataFrame(['A','X',3,0,5],columns=['x'])
df3=df1[df1!=0].fillna(df2)
Hello I have the following Data Frame:
df =
ID Value
a 45
b 3
c 10
And another dataframe with the numeric ID of each value
df1 =
ID ID_n
a 3
b 35
c 0
d 7
e 1
I would like to have a new column in df with the numeric ID, so:
df =
ID Value ID_n
a 45 3
b 3 35
c 10 0
Thanks
Use pandas merge:
import pandas as pd
df1 = pd.DataFrame({
'ID': ['a', 'b', 'c'],
'Value': [45, 3, 10]
})
df2 = pd.DataFrame({
'ID': ['a', 'b', 'c', 'd', 'e'],
'ID_n': [3, 35, 0, 7, 1],
})
df1.set_index(['ID'], drop=False, inplace=True)
df2.set_index(['ID'], drop=False, inplace=True)
print pd.merge(df1, df2, on="ID", how='left')
output:
ID Value ID_n
0 a 45 3
1 b 3 35
2 c 10 0
You could use join(),
In [14]: df1.join(df2)
Out[14]:
Value ID_n
ID
a 45 3
b 3 35
c 10 0
If you want index to be numeric you could reset_index(),
In [17]: df1.join(df2).reset_index()
Out[17]:
ID Value ID_n
0 a 45 3
1 b 3 35
2 c 10 0
You can do this in a single operation. join works on the index, which you don't appear to have set. Just set the index to ID, join df after also setting its index to ID, and then reset your index to return your original dataframe with the new column added.
>>> df.set_index('ID').join(df1.set_index('ID')).reset_index()
ID Value ID_n
0 a 45 3
1 b 3 35
2 c 10 0
Also, because you don't do an inplace set_index on df1, its structure remains the same (i.e. you don't change its indexing).