I want to select a DataFrame column, iterate over it and select only the the numbers, and to replace the numbers that contains letters and other sings with 'Unknow'. I've tried isreal() method but it didn't work. Is there a way to acomplish this task without a function?
%matplotlib inline
%pylab inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
file = 'C:/Users/Сынкетру/Desktop/attacks.csv'
df = pd.read_csv(file, sep=',', encoding='ISO-8859-1')
df_clean = df.Age.dropna()
def age(number):
try:
number = df.isreal()
except ValueError:
number = 'Unknown'
map(age, df_clean)
print(d)
df = pd.DataFrame(dict(A=['1', 2, '_3', '4.', 'hello', 3.14]))
df['A'] = np.where(pd.to_numeric(df.A, 'coerce').notnull(), df.A, 'unknown')
df
df.Age[~df.Age.apply(np.isreal)] = "unknown"
Related
I want to download column numbers, eg 1,3,2. In the param.txt file I have only such an entry
import pandas as pd
import numpy as np
df = pd.read_csv('sample1.csv')
with open('param.txt') as f:
s = f.read()
b = df.iloc[:, [s]]
print(b.to_string(index=False))
When I start a script
raise IndexError(f".iloc requires numeric indexers, got {arr}")
IndexError: .iloc requires numeric indexers, got ['1,3,2']
How to simply change from such a form to numeric
Thank you for every help
This should work assuming f.read() returns "1,2,3"
import pandas as pd
import numpy as np
df = pd.read_csv('sample1.csv')
with open('param.txt') as f:
s = f.read() # Assuming this is a string such as "1,2,3"
s = s.split(",") # Split string to list where there are commas ["1","2","3"]
s = [int(x) for x in s] # Convert entries from string to int [1,2,3]
b = df.iloc[:, s] # No need for brackets since s is already a list
print(b.to_string(index=False))
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_csv("data/opti.csv")
df03 = df.loc[df["%"]==0.3]
df04 = df.loc[df["%"]==0.4]
df06 = df.loc[df["%"]==0.6]
df08 = df.loc[df["%"]==0.8]
df1 = df.loc[df["%"]==1]
x = np.array([0.3,0.4,0.6,0.8,1])
e = np.array([np.std(df03),np.std(df04),np.std(df06),np.std(df08),np.std(df1)])
df.head(5)
df["Tablettenmasse"]
df Output
When I write df["Tablettenmasse"] I get a key error. But when I select the column with iloc it works. Why isn´t it working the normal way?
edit: as mosc9575 suggested, there was a space before the word. Thanks!
I have a csv file like this:
year,value
1897.386301369863,0.6
1897.3890410958904,1.1
1897.3917808219178,0.0
1897.3945205479451,8.3
1897.3972602739725,3.3
1897.4,6.7
1897.4027397260274,0.6
1897.4054794520548,2.2
1897.4082191780822,0.6
1897.4109589041095,9.4
1897.4136986301369,9.4
1897.4164383561645,31.1
This is the code I've written:
import pandas as pd
df1 = pd.read_csv("[Path to file is here]", header=0, sep=",")
df1["year"] = df1["year"].astype(int)
n1 = df1.groupby("year")["value"].mean()
Yet I keep receiving this error message:
pandas.core.base.DataError: No numeric types to aggregate
I've checked this code many times, it has worked before, but I have no idea what's wrong.
You can do
df1["year"] = df1["year"].astype(int)
df1["value"] = pd.to_numeric(df1["value"])
n1 = df1.groupby("year")["value"].mean()
If replacing missing value data with 0 is Fine, the following will solve your issue
import pandas as pd
import numpy as np
df1 = pd.read_csv("./a.csv", header=0, sep=",")
df1["value"] = df1["value"].replace(r'^\s*$', np.nan, regex=True)
df1["value"] = df1["value"].astype(float)
df1["year"] = df1["year"].astype(int)
df1["value"] = df1["value"].fillna(0)
n1 = df1.groupby("year")["value"].mean()
print(n1)
If you want to omit missing data use the following
import pandas as pd
import numpy as np
df1 = pd.read_csv("./a.csv", header=0, sep=",")
df1["value"] = df1["value"].replace(r'^\s*$', np.nan, regex=True)
df1 = df1[~df1["value"].isnull()]
df1["value"] = df1["value"].astype(float)
df1["year"] = df1["year"].astype(int)
df1["value"] = df1["value"].fillna(0)
n1 = df1.groupby("year")["value"].mean()
print(n1)
I'd like to plot "MJD" vs "MULTIPLE_MJD" for the data given here::
https://www.dropbox.com/s/cicgc1eiwrz93tg/DR14Q_pruned_several3cols.csv?dl=0
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import ast
filename = 'DR14Q_pruned_several3cols.csv'
datafile= path+filename
df = pd.read_csv(datafile)
df.plot.scatter(x='MJD', y='N_SPEC')
plt.show()
ser = df['MJD_DUPLICATE'].apply(ast.literal_eval).str[1]
df['MJD_DUPLICATE'] = pd.to_numeric(ser, errors='coerce')
df['MJD_DUPLICATE_NEW'] = pd.to_numeric(ser, errors='coerce')
df.plot.scatter(x='MJD', y='MJD_DUPLICATE')
plt.show()
This makes a plot, but only for one value of MJD_DUPLICATE::
print(df['MJD_DUPLICATE_NEW'])
0 55214
1 55209
...
Thoughts??
There are two issues here:
Telling Pandas to parse tuples within the CSV. This is covered here: Reading back tuples from a csv file with pandas
Transforming the tuples into multiple rows. This is covered here: Getting a tuple in a Dafaframe into multiple rows
Putting those together, here is one way to solve your problem:
# Following https://stackoverflow.com/questions/23661583/reading-back-tuples-from-a-csv-file-with-pandas
import pandas as pd
import ast
df = pd.read_csv("DR14Q_pruned_several3cols.csv",
converters={"MJD_DUPLICATE": ast.literal_eval})
# Following https://stackoverflow.com/questions/39790830/getting-a-tuple-in-a-dafaframe-into-multiple-rows
df2 = pd.DataFrame(df.MJD_DUPLICATE.tolist(), index=df.MJD)
df3 = df2.stack().reset_index(level=1, drop=True)
# Now just plot!
df3.plot(marker='.', linestyle='none')
If you want to remove the 0 and -1 values, a mask will work:
df3[df3 > 0].plot(marker='.', linestyle='none')
In python pandas apply, the applied function takes each row of the Dataframe and will return another Dataframe, how can I get the combination of (append) these Dataframes returned through applying? For example:
# this is an example
import pandas as pd
import numpy as np
def newdata(X, data2):
return X - data2[data2['no']!=X['no']].sample(1,random_state=100)
col = ['no','a','b']
data1 = pd.DataFrame(np.column_stack((range(5),np.random.rand(5,2))),columns=col)
data2 = pd.DataFrame(np.column_stack((range(3),np.random.rand(3,2))),columns=col)
Newdata = data1.apply(newdata, args=(data2,), axis=1)