import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
data = pd.read_csv(r"C:\Users\amgup\Downloads\Model_Wells\combined.csv", sep=',', usecols=['ACOUSTICIMPEDANCE1', 'CALI', 'DT','GR','NPHI','RHOB','LLD','PIGN','SP','VCL'], dtype='unicode')
cor=data.corr()
print(cor)
probably none of your columns are numeric. You have to typecast them to numeric values. That way you can calculate correlation.
Example:
data['CALI'] = data['CALI'].astype(np.float64)
Related
i've got two datasets with the exact same data but they look different when plotted the same way. One is a .xlsx file and one is a .csv file.
Here are the two codes:
For the CSV:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
from sklearn.cluster import KMeans
daten = pd.read_csv(r"Path\Übungsdaten.csv", header=0, sep=";")
print("Total rows: {0}".format(len(daten)))
print(daten.columns)
plt.scatter(daten['InsuredValue'], daten['Policy'])
plt.xlim(2500000)
plt.ylim(100100)
plt.show()
And for the xlsx:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
from sklearn.cluster import KMeans
daten = pd.read_excel(r"Path\Übungsdaten.xlsx")
print("Total rows: {0}".format(len(daten)))
plt.scatter(daten['InsuredValue'],daten['Policy'] )
plt.xlim(2500000)
plt.ylim(100100)
plt.show()
Here are the two Plots:
csv with plt.xlim(2500000) plt.ylim(100100)
and the csv without restrictions:
and finally the .xlsx plot:
My question is first of all, why is there a black bar on the bottom of the first two plots? (im guessing this is every single value of "InsuredValue") and how can I form the csv plo to the same ratio as the xlsx plot?
Thank you very much
I had to convert the "InsuredValue" column to int with the following code:
daten.astype({'InsuredValue':'int'})
I am trying to decompose a time series. The database is a 2x8638 matrix. Follow the code.
import numpy as np
import pandas as pd
import matplotlib.pylab as plt
%matplotlib inline
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 15, 6
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
df1 = pd.read_csv("u_x_ts.csv").set_index("0")
df1.head()
enter image description here
from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(df1, model='multiplicative')
result.plot()
plt.show()
then python returns the error message:
ValueError: Multiplicative seasonality is not appropriate for zero and negative values
I think statsmodels doesn't support such small values, because at the beginning of the series the values are too small.
But if anyone knows a way out of this problem I appreciate it.
I am trying to fit a cumulative Gaussian distribution to my data, but I get a strange result with negative mu... :
libraries:
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.stats import norm
import numpy as np
First I am importing the data from an Excel
data = pd.read_excel ('....xlsx',sheet_name='test', na_filter=True)
the data look like:
then creating a data frame:
data_sort = pd.DataFrame(data, columns=['x','y'])
and fit the pdf:
mu,sigma = curve_fit(norm.cdf, data_sort['x'], data_sort['y'], p0=[0,1])[0]
and I get back mu= -0.512, sigma=0.106, which is just totally wrong...
Thank you in advance for your help! (Code Provided Below) (Data Here)
I would like to remove the outliers outside of 5/6th standard deviation for columns 5 cm through 225 cm and replace them with the average value for that date (Month/Day) and depth. What is the best way to do that?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
raw_data = pd.read_csv('all-deep-soil-temperatures.csv', index_col=1, parse_dates=True)
df_all_stations = raw_data.copy()
df_selected_station.fillna(method = 'ffill', inplace=True);
df_selected_station_D=df_selected_station.resample(rule='D').mean()
df_selected_station_D['Day'] = df_selected_station_D.index.dayofyear
mean=df_selected_station_D.groupby(by='Day').mean()
mean['Day']=mean.index
mean.head()
For a more general solution, assuming that you are given a dataframe df with some column a.
from scipy import stats.
df[np.abs(stats.zscore(df['a'])) > 5]['a'] = df['a'].mean()
i want to convert that dataframe
into this dataframe and plot a matplotlib graph using date along x axis
changed dataframe
Use df.T.plot(kind='bar'):
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame.from_csv('./housing_price_index_2010-11_100.csv')
df.T.plot(kind='bar')
plt.show()
you can also assign the transpose to a new variable and plot that (what you asked in the comment):
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame.from_csv('./housing_price_index_2010-11_100.csv')
df_transposed = df.T
df_transposed.plot(kind='bar')
plt.show()
both result the same: