I have written this code to drop a column in my dataset but the message is this
df = Data.drop('URL', axis=1)
----> 1 df = Data.drop('URL', axis=1)
AttributeError: '_io.TextIOWrapper' object has no attribute 'drop'
Was hopping to drop the column using pandas but it did not work
Hi 👋 Hope you are doing well!
Looks like Data has type _io.TextIOWrapper, but it should be only a DataFrame type then you will be able to drop columns:
import pandas as pd
dummy_df = pd.DataFrame(
{
"A": [1, 2, 3],
"B": [4, 5, 6],
"C": [7, 8, 9]
},
)
print(dummy_df)
# A B C
# 0 1 4 7
# 1 2 5 8
# 2 3 6 9
dummy_df = dummy_df.drop(columns=["B", "C"])
print(dummy_df)
# A
# 0 1
# 1 2
# 2 3
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html
Related
I have the following Dataframe:
Now i want to insert an empty row after every time the column "Zweck" equals 7.
So for example the third row should be an empty row.
import numpy as np
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [1, 2, 3, 4, 5], 'f': [1, 7, 3, 4, 7]})
ren_dict = {i: df.columns[i] for i in range(len(df.columns))}
ind = df[df['f'] == 7].index
df = pd.DataFrame(np.insert(df.values, ind, values=[33], axis=0))
df.rename(columns=ren_dict, inplace=True)
ind_empt = df['a'] == 33
df[ind_empt] = ''
print(df)
Output
a b f
0 1 1 1
1
2 2 2 7
3 3 3 3
4 4 4 4
5
6 5 5 7
Here the dataframe is overwritten, as the append operation will be resource intensive. As a result, the required strings with values 33 appear. This is necessary because np.insert does not allow string values to be substituted. Columns are renamed to their original state with: df.rename. Finally, we find lines with df['a'] == 33 to set to empty values.
The pandas explode method creates new row for each value found in the inner list of a given column ; this is so a row-wise explode.
Is there an easy column-wise explode already implemented in pandas, ie something to transform df into the second dataframe ?
MWE:
>>> s = pd.DataFrame([[1, 2], [3, 4]]).agg(list, axis=1)
>>> df = pd.DataFrame({"a": ["a", "b"], "s": s})
>>> df
Out:
a s
0 a [1, 2]
1 b [3, 4]
>>> pd.DataFrame(s.tolist()).assign(a=["a", "b"]).reindex(["a", 0, 1], axis=1)
Out[121]:
a 0 1
0 a 1 2
1 b 3 4
You can use apply to convert those values to Pandas Series, which will ultimately transform the dataframe in the required format:
>>> df.apply(pd.Series)
Out[28]:
0 1
0 1 2
1 3 4
As a side note, your df becomes a Pandas series after using agg
For the updated data, you can concat above result to the existing data frame
>>> pd.concat([df, df['s'].apply(pd.Series)], axis=1)
Out[48]:
a s 0 1
0 a [1, 2] 1 2
1 b [3, 4] 3 4
I have a dataframe which has a dictionary inside a nested list and i want to split the column 'C' :
A B C
1 a [ {"id":2,"Col":{"x":3,"y":4}}]
2 b [ {"id":5,"Col":{"x":6,"y":7}}]
expected output :
A B C_id Col_x Col_y
1 a 2 3 4
2 b 5 6 7
From the comments, json_normalize might help you.
After extracting id and col columns with:
df[["Col", "id"]] = df["C"].apply(lambda x: pd.Series(x[0]))
You can explode the dictionary in Col with json_normalize and use concat to merge with existing dataframe:
df = pd.concat([df, json_normalize(df.Col)], axis=1)
Also, use drop to remove old columns.
Full code:
# Import modules
import pandas as pd
from pandas.io.json import json_normalize
# from flatten_json import flatten
# Create dataframe
df = pd.DataFrame([[1, "a", [ {"id":2,"Col":{"x":3,"y":4}}]],
[2, "b", [ {"id":5,"Col":{"x":6,"y":7}}]]],
columns=["A", "B", "C"])
# Add col and id column + remove old "C" column
df = pd.concat([df, df["C"].apply(lambda x: pd.Series(x[0]))], axis=1) \
.drop("C", axis=1)
print(df)
# A B Col id
# 0 1 a {'x': 3, 'y': 4} 2
# 1 2 b {'x': 6, 'y': 7} 5
# Show json_normalize behavior
print(json_normalize(df.Col))
# x y
# 0 3 4
# 1 6 7
# Explode dict in "col" column + remove "Col" colun
df = pd.concat([df, json_normalize(df.Col)], axis=1) \
.drop(["Col"], axis=1)
print(df)
# A B id x y
# 0 1 a 2 3 4
# 1 2 b 5 6 7
You can try .apply method
df['C_id'] = df['C'].apply(lambda x: x[0]['id'])
df['C_x'] = df['C'].apply(lambda x: x[0]['Col']['x'])
df['C_y'] = df['C'].apply(lambda x: x[0]['Col']['y'])
Code
import pandas as pd
A = [1, 2]
B = ['a', 'b']
C = [{"id":2,"Col":{"x":3,"y":4}}, {"id":5,"Col":{"x":6,"y":7}}]
df = pd.DataFrame({"A": A, "B": B, "C_id": [element["id"] for element in C],
"Col_x": [element["Col"]["x"] for element in C],
"Col_y": [element["Col"]["y"] for element in C]})
Ouput:
i want to make a dataframe with defined labels. Dont know how to tell panda to take the labels from the list. Hope someone can help
import numpy as np
import pandas as pd
df = []
thislist = []
thislist = ["A","D"]
thisdict = {
"A": [1, 2, 3],
"B": [4, 5, 6],
"C": [7, 8, 9],
"D": [7, 8, 9]
}
df = pd.DataFrame(data= thisdict[thislist]) # <- here is my problem
I want to get this:
df = A D
1 7
2 8
3 9
Use:
df = pd.DataFrame(thisdict)[thislist]
print(df)
A D
0 1 7
1 2 8
2 3 9
We could also use DataFrame.drop
df = pd.DataFrame(thisdict).drop(columns = ['B','C'])
or DataFrame.reindex
df = pd.DataFrame(thisdict).reindex(columns = thislist)
or DataFrame.filter
df = pd.DataFrame(thisdict).filter(items=thislist)
We can also use filter to filter thisdict.items()
df = pd.DataFrame(dict(filter(lambda item: item[0] in thislist, thisdict.items())))
print(df)
A D
0 1 7
1 2 8
2 3 9
I think this answer is completed with the solution of #anky_91
Finally, I recommend you see how to index
IIUC, use .loc[] with the dataframe constructor:
df = pd.DataFrame(thisdict).loc[:,thislist]
print(df)
A D
0 1 7
1 2 8
2 3 9
Use a dict comprehension to create a new dictionary that is a subset of your original so you only construct the DataFrame you care about.
pd.DataFrame({x: thisdict[x] for x in thislist})
A D
0 1 7
1 2 8
2 3 9
If you want to deal with the possibility of missing Keys, add some logic so it's similar to reindex
pd.DataFrame({x: thisdict[x] if x in thisdict.keys() else np.NaN for x in thislist})
df = pd.DataFrame(thisdict)
df[['A', 'D']]
another alternative for your input:
thislist = ["A","D"]
thisdict = {
"A": [1, 2, 3],
"B": [4, 5, 6],
"C": [7, 8, 9],
"D": [7, 8, 9]
}
df = pd.DataFrame(thisdict)
and than simply remove your columns not in thelist (you can do it directly from the df or aggregate them):
remove_columns = []
for c in df.columns:
if c not in thislist:
remove_columns.append(c)
and remove it:
df.drop(columns=remove_columns, inplace=True)
I'm trying to make a table, and the way Pandas formats its indices is exactly what I'm looking for. That said, I don't want the actual data, and I can't figure out how to get Pandas to print out just the indices without the corresponding data.
You can access the index attribute of a df using .index:
In [277]:
df = pd.DataFrame({'a':np.arange(10), 'b':np.random.randn(10)})
df
Out[277]:
a b
0 0 0.293422
1 1 -1.631018
2 2 0.065344
3 3 -0.417926
4 4 1.925325
5 5 0.167545
6 6 -0.988941
7 7 -0.277446
8 8 1.426912
9 9 -0.114189
In [278]:
df.index
Out[278]:
Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')
.index.tolist() is another function which you can get the index as a list:
In [1391]: datasheet.head(20).index.tolist()
Out[1391]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
You can access the index attribute of a df using df.index[i]
>> import pandas as pd
>> import numpy as np
>> df = pd.DataFrame({'a':np.arange(5), 'b':np.random.randn(5)})
a b
0 0 1.088998
1 1 -1.381735
2 2 0.035058
3 3 -2.273023
4 4 1.345342
>> df.index[1] ## Second index
>> df.index[-1] ## Last index
>> for i in xrange(len(df)):print df.index[i] ## Using loop
...
0
1
2
3
4
You can use lamba function:
index = df.index[lambda x : for x in df.index() ]
print(index)
You can always try df.index. This function will show you the range index.
Or you can always set your index. Let say you had a weather.csv file with headers:
'date', 'temperature' and 'event'. And you want set "date" as your index.
import pandas as pd
df = pd.read_csvte'weather_file)
df.set_index('day', inplace=True)
df