I have pandas column with multiple string values in it, I want to convert them into one list so that I can take count of it
df.columnX
Row 1 ['A','B','A','C']
Row 2 ['A','C']
Row 3 ['D','A']
I want output like
Tag Count
A 4
B 1
C 2
D 1
I am trying to pull them to list but double quote is coming
df.columnX.values = ["'A','B',,,,,,,,,'A'"]
Thanks in advance
What about this ?
df.explode('columnX').columnX.value_counts().to_frame()
Note that you need pandas > 0.25.0 for explode to work.
If your lists are in fact strings, you can first convert them to lists (as suggested by #Jon Clements) :
import ast
df.columnX = df.columnX.map(ast.literal_eval)
I got it
flatList = [item for sublist in list(df.ColumnX.map(ast.literal_eval)) for item in sublist]
dict((x,flatList.count(x)) for x in set(flatList))
Related
I have a Pandas series containing a list of strings like so:
series_of_list.head()
0 ['hello','there','my','name']
1 ['hello','hi','my','name']
2 ['hello','howdy','my','name']
3 ['hello','mate','my','name']
4 ['hello','hello','my','name']
type(series_of_list)
pandas.core.series.Series
I would like to only keep the first to entries of the list like so:
series_of_list.head()
0 ['hello','there']
1 ['hello','hi']
2 ['hello','howdy']
3 ['hello','mate']
4 ['hello','hello']
I have tried slicing it, series_of_list=series_of_list[:2], but doing so just returns the first two indexes of the series...
series_of_list.head()
0 ['hello','there','my','name']
1 ['hello','hi','my','name']
I have also tried .drop and other slicing but the outcome is not what I want.
How can I only keep the first two items of the list for the entire pandas series?
Thank you!
pandas.Series.apply() the function on each element.
series_of_list = series_of_list.apply(lambda x: x[:2])
I have a dataframe containing one column of lists.
names unique_values
[B-PER,I-PER,I-PER,B-PER] 2
[I-PER,N-PER,B-PER,I-PER,A-PER] 4
[B-PER,A-PER,I-PER] 3
[B-PER, A-PER,A-PER,A-PER] 2
I have to count each distinct value in a column of lists and If value appears more than once count it as one. How can I achieve it
Thanks
Combine explode with nunique
df["unique_values"] = df.names.explode().groupby(level = 0).nunique()
You can use the inbulit set data type to do this -
df['unique_values'] = df['names'].apply(lambda a : len(set(a)))
This works as sets do not allow any duplicate elements in their construction so when you convert a list to a set it strips all duplicate elements and all you need to do is get the length of the resultant set.
to ignore NaN values in a list you can do the following -
df['unique_values'] = df['names'].apply(lambda a : len([x for x in set(a) if str(x) != 'nan']))
Try:
df["unique_values"] = df.names.explode().groupby(level = 0).unique().str.len()
Output
df
names unique_values
0 [B-PER, I-PER, I-PER, B-PER] 2
1 [I-PER, N-PER, B-PER, I-PER, A-PER] 4
2 [B-PER, A-PER, I-PER] 3
3 [B-PER, A-PER, A-PER, A-PER] 2
How can I get the values of one column in a csv-file by matching attributes in another column?
CSV-file would look like that:
One,Two,Three
x,car,5
x,bus,7
x,car,9
x,car,6
I only want to get the values of column 3, if they have the value "car" in column 2. I also do not want them to be added but rather have them printed in a list, or like that:
5
9
6
My approach is looking like that, but doesn't really work:
import pandas as pd
df = pd.read_csv(r"example.csv")
ITEMS = [car] #I will need more items, this is just examplified
for item in df.Two:
if item in ITEMS:
print(df.Three)
How can I get the exact value for a matched item?
In one line you can do it like:
print(df['Three'][df['Two']=='car'].values)
Output:
[5 9 6]
For multiple items try:
df = pd.DataFrame({'One': ['x','x','x','x', 'x'],'Two': ['car','bus','car','car','jeep'],'Three': [5,7,9,6,10]})
myitems = ['car', 'bus']
res_list = []
for item in myitems:
res_list += df['Three'][df['Two']==item].values.tolist()
print(*sorted(res_list), sep='\n')
Output:
5
6
7
9
Explanation
df['Two']=='car' returns a Dataframe with boolean True at row positions where value in column Two of of df is car
.values gets these boolean values as a numpy.ndarray, result would be [True False True True]
We can filter the values in column Three by using this list of booleans like so: df['Three'][<Boolean_list>]
To combine the resulting arrays we convert each numpy.ndarray to python list using tolist() and append it to res_list
Then we use sorted to sort res_list
I have a column in a dataframe that contain a list inside. My dataframe column is:
[],
['NORM'],
['NORM'],
['NORM'],
['NORM'],
['MI', 'STTC'],
As you can see I have an empty list and also a list with two elements. How can I change list with two elements to just take one of it (I don't care which one of it).
I tried with df.column.explode()but this just add more rows and I don't want more rows, I just need to take one of it.
Thank you so much
You can use Series.map with a custom mapping function which maps the elements of column according to desired requirements:
df['col'] = df['col'].map(lambda l: l[:1])
Result:
# print(df['col'])
0 []
1 [NORM]
2 [NORM]
3 [NORM]
4 [NORM]
5 [MI]
i, j is the location of the cell you need to access and this will give the first element of the list
list_ = df.loc[i][j]
if len(list_) > 0:
print(list_[0])
As you store lists into a pandas column, I assume that you do not worry for vectorization. So you could just use a list comprehension:
df[col] = [i[:1] for i in df[col]]
Suppose I have List of dictionaries as
l = [{'car':'good'},
{'mileage':'high'},
{'interior':'stylish'},
{'car':'bad'},
{'engine':'powerful'},
{'safety':'low'}]
Basically these are noun-adjective pairs.
How can I visualize whats the most associated list of adjective to
lets say car here.
How to convert this to Data frame? , I have tried pd.Dataframe(l)
, but here the key is not the column name so gets little bit tricky
here.
Any help would be appreciated.
Given that you want this to be done column-wise, then you have to re-structure your list of dictionaries. You need to have one dictionary to represent one row. Therefore, your example list should be (I added a second row for better explainability):
l = [
{'car':'good','mileage':'high','interior':'stylish','car':'bad','engine':'powerful','safety':'low'}, # row 1
{'car':'bad','mileage':'low','interior':'old','car':'bad','engine':'powerful','safety':'low'} # row 2
]
At this point, all you have to do is call pd.DataFrame(l).
EDIT: Based on your comments, I think you need to convert the dictionary to a list to get your desired result. Here is a quick way (I'm sure it can be much more efficient):
l = [{'car':'good'},
{'mileage':'high'},
{'interior':'stylish'},
{'car':'bad'},
{'engine':'powerful'},
{'safety':'low'}]
new_list = []
for item in l:
for key, value in item.items():
temp = [key,value]
new_list.append(temp)
df = pd.DataFrame(new_list, columns=['Noun', 'Adjective'])
You can construct your DataFrame by giving a list of tuples. To get tuples from a dict use the method items(). Construct the list of tuples with a list comprehension by taking the first tuple of each items.
import pandas as pd
df=pd.DataFrame(data=[d.items()[0] for d in l],columns=['A','B'])
print df
Gives :
A B
0 car good
1 mileage high
2 interior stylish
3 car bad
4 engine powerful
5 safety low