I have problem with our code to compare every 2 row with different excel file:
and we have code to compare every row:
import pandas as pd
import numpy as np
old_df = pd.read_excel('Test.xlsx', sheet_name="Best Practice Config", names="A", header=None)
new_df = pd.read_excel('Test.xlsx',sheet_name="Existing Config", names="B", header=None)
compare = old_df[~old_df["A"].isin(new_df["B"])
but i need compare 2 row , Please advise what is the best way of pandas to do that.
Try to use pandas.DataFrame.compare method. The documentation is available here.
old_df.compare(new_df)
I hope it will be useful for you.
Related
Well I am trying to create a dataframe in pandas and print it reading a csv file, however it is not pretty displayed
This is the code:
import pandas as pd
df = pd.read_csv("weather.csv")
print(df)
And this is my output:
What can I do?
A sample of weather.csv would help but I believe that this will solve the issue:
import pandas as pd
df = pd.read_csv("weather.csv", sep=';')
print(df)
Next time try to provide your data in text. You need to change separator, default is ','. So try this:
df = pd.read_csv('weather.csv', sep=';')
I have to import this Excel in code and I would like to unify the multi-index in a single column. I would like to delete the unnamed columns and unify everything into one. I don't know if it's possible.
I have tried the following and it imports, but the output is not as expected. I add the code here too
import pandas as pd
import numpy as np
macro = pd.read_excel(nameExcel, sheet_name=nameSheet, skiprows=3, header=[1,3,4])
macro = macro[macro.columns[1:]]
macro
The way to solve it is to save another header of the same length as the previous header.
cols = [...]
if len(df1.columns) == len(cols):
df1.columns = cols
else:
print("error")
I have a CSV of data I've loaded into a dataframe that I'm trying to massage: I want to create a new column that contains the difference from one record to another, grouped by another field.
Here's my code:
import pandas as pd
import matplotlib.pyplot as plt
rl = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv'
all_counties = pd.read_csv(url, dtype={"fips": str})
all_counties.date = pd.to_datetime(all_counties.date)
oregon = all_counties.loc[all_counties['state'] == 'Oregon']
oregon.set_index('date', inplace=True)
oregon.sort_values('county', inplace=True)
# This is not working; I was hoping to find the differences from one day to another on a per-county basis
oregon['delta'] = oregon.groupby(['state','county'])['cases'].shift(1, fill_value=0)
oregon.tail()
Unfortunately, I'm getting results where the delta is always the same as the cases.
I'm new at Pandas and relatively inexperienced with Python, so bonus points if you can point me towards how to best read the documentation.
Lets Try
oregon['delta']=oregon.groupby(['state','county'])['cases'].diff().fillna(0)
Thank you in advance for taking the time to help me! (Code provided below) (Data Here)
I am trying to average the first 3 columns and insert it as a new column labeled 'Topsoil'. What is the best way to go about doing that?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
raw_data = pd.read_csv('all-deep-soil-temperatures.csv', index_col=1, parse_dates=True)
df_all_stations = raw_data.copy()
df_selected_station.fillna(method = 'ffill', inplace=True);
df_selected_station_D=df_selected_station.resample(rule='D').mean()
df_selected_station_D['Day'] = df_selected_station_D.index.dayofyear
mean=df_selected_station_D.groupby(by='Day').mean()
mean['Day']=mean.index
#mean.head()
Try this :
mean['avg3col']=mean[['5 cm', '10 cm','15 cm']].mean(axis=1)
df['new column'] = (df['col1'] + df['col2'] + df['col3'])/3
You could use the apply method in the following way:
mean['Topsoil'] = mean.apply(lambda row: np.mean(row[0:3]), axis=1)
You can read about the apply method in the following link: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
The logic is that you perform the same task along a specific axis multiple times.
Note: It is not wise to call data-structures in names of functions, in your case it might be better be mean_df rather the mean
Use DataFrame.iloc for select by positions - first 3 columns with mean:
mean['Topsoil'] = mean.iloc[:, :3].mean(axis=1)
I would like to create a counter for a Panda DataFrame to save only until a certain Value in a specific column.
f.e save only until df['cycle'] == 2.
For what I gathered from the answers below is that df[df['cycle']<=2] will solve my Problem.
Edit: If I am correct python pandas always read the whole file, only if us nrows than you say f.e go to index x but want if I don't want to use Index but a specific value from a column. How can I do that?
See my code below:
import pandas as pd
import numpy as np
l = list(np.linspace(0,10,12))
data = [
('time',l),
('A',[0,5,0.6,-4.8,-0.3,4.9,0.2,-4.7,0.5,5,0.1,-4.6]),
('B',[ 0,300,20,-280,-25,290,30,-270,40,300,-10,-260]),
]
df = pd.DataFrame.from_dict(dict(data))
df['cycle'] = [df.index.get_loc(i) // 4 + 1 for i in df.index]
df[df['cycle']<=2]
df.to_csv(path_or_buf='test.out', index=True, sep='\t', columns=['time','A','B','cycle'], decimal='.')
So I modified the answer according to the suggestion from the users.
I am glad for every help that I can get.