Changing pandas column values into another format [duplicate]

Changing pandas column values into another format [duplicate] - python

This question already has answers here:
How to convert string representation of list to a list
(19 answers)
Closed 3 years ago.
The labels column in my test['labels'] dataframe, looks like:
0 ['Edit Distance']
1 ['Island Perimeter']
2 ['Longest Substring with At Most K Distinct Ch...
3 ['Valid Parentheses']
4 ['Intersection of Two Arrays II']
5 ['N-Queens']
For each value in the column, which is a string representation of list ("['Edit Distance']"), I want to apply the function below to convert it into an actual list.
ast.literal_eval(VALUE HERE)
What is a straightforward way to do this?

Use:
import ast
test['labels'] = test['labels'].apply(ast.literal_eval)
print (test)
labels
0 [Edit Distance]
1 [Island Perimeter]
2 [Longest Substring with At Most K Distinct Ch]
3 [Valid Parentheses]
4 [Intersection of Two Arrays II]
5 [N-Queens]

Related

How to select rows from pandas dataframe by looking a feature' data types when a feature contains more than one type of value [duplicate]

This question already has answers here:
Select row from a DataFrame based on the type of the object(i.e. str)
(3 answers)
Closed 3 months ago.
I have a dataframe with 3 features: id, name and point. I need to select rows that type of 'point' value is string.
id
name
point
0
x
5
1
y
6
2
z
ten
3
t
nine
4
q
two
How can I split the dataframe just looking by type of one feature' value?
I tried to modify select_dtypes method but I lost. Also I tried to divide dataset with using
df[df[point].dtype == str] or df[df[point].dtype is str]
but didn't work.

Technically, the answer would be:
out = df[df['point'].apply(lambda x: isinstance(x, str))]
But this would also select rows containing a string representation of a number ('5').
If you want to select "strings" as opposed to "numbers" whether those are real numbers or string representations, you could use:
m = pd.to_numeric(df['point'], errors='coerce')
out = df[df['point'].notna() & m]
The question is now, what if you have '1A' or 'AB123' as value?

Need to plot Pairplot for a dataframe that has duplicate indices [duplicate]

This question already has answers here:
dataframe to long format
(2 answers)
Reshape wide to long in pandas
(2 answers)
Closed 9 months ago.
I have a dataframe 'df' (310, 7) and need to plot a pairplot for it. But I'm getting an error <ValueError: cannot reindex from a duplicate axis> when I do it in a regular way.
sns.pairplot(df,hue='Class')
ValueError: cannot reindex from a duplicate axis
The data is of this form:
[data]
P_incidence P_tilt L_angle S_slope P_radius S_Degree Class
0 38.505273 16.964297 35.112814 21.540976 127.632875 7.986683 Normal
1 54.920858 18.968430 51.601455 35.952428 125.846646 2.001642 Normal
2 44.362490 8.945435 46.902096 35.417055 129.220682 4.994195 Normal
3 48.318931 17.452121 48.000000 30.866809 128.980308 -0.910941 Normal
4 45.701789 10.659859 42.577846 35.041929 130.178314 -3.388910 Normal
I tried removing the duplicates using:
df.loc[df['L_angle'].duplicated(), 'L_angle'] = ''
But, this method converts the column to an object and I'm not able to negate it.
The expected output plot is as follows:
[expected]

Search common items in 2 lists in pandas dataframe [duplicate]

This question already has answers here:
Find intersection of two nested lists?
(21 answers)
Closed 2 years ago.
I have 2 columns (input & target) in the pandas which includes the list. The purpose is to find how many common item in these 2 lists and save the result in the new column "common"
For the 1st row, only have 'ELITE' in common. For the 2nd row, both 'ACURAWATCH' & 'PLUS' exist in both list.
Input:
frame = pd.DataFrame({'input' : [['MINIVAN','TOURING','ELITE'], ['4D','SUV','ACURAWATCH','PLUS']], 'target' : [['MINI','TOUR','ELITE'], ['ACURAWATCH','PLUS']]})
Expect Output:
frame = pd.DataFrame({'input' : [['MINIVAN','TOURING','ELITE'], ['4D','SUV','ACURAWATCH','PLUS']], 'target' : [['MINI','TOUR','ELITE'], ['ACURAWATCH','PLUS']], 'common' :[1, 2]})

You can use set.intersection with df.apply:
In [4307]: frame["common"] = frame.apply(
lambda x: len(set(x["input"]).intersection(set(x["target"]))), 1)
In [4308]: frame
Out[4308]:
input target common
0 [MINIVAN, TOURING, ELITE] [MINI, TOUR, ELITE] 1
1 [4D, SUV, ACURAWATCH, PLUS] [ACURAWATCH, PLUS] 2

You can apply a custom function with np.intersect1d:
frame['common'] = frame.apply(lambda x: len(np.intersect1d(x['input'],x['target'])), axis=1)

You could use:
frame['common'] = [len(set(x) & set(y)) for x, y in frame.to_numpy()]
print(frame)
Output
input target common
0 [MINIVAN, TOURING, ELITE] [MINI, TOUR, ELITE] 1
1 [4D, SUV, ACURAWATCH, PLUS] [ACURAWATCH, PLUS] 2

How to convert Pandas data frame into 2 D array [duplicate]

This question already has answers here:
Pandas convert dataframe to array of tuples
(10 answers)
Closed 3 years ago.
I have a Panda data frame
X =
id var1 var2
0 20000049588638 3 61.62
1 100798486386 3 61.62
2 100799238114 3 61.62
I want to convert this as a simple 2D array so that I can write this into Teradata database
Required Output
X =
[(20000049588638,3,61.62),
(100798486386,3,61.62),
(100799238114,3,61.62)]
I tried this:
X = X.values.tolist()
But, I am getting following output:
[[20000049588638, '3', '61.62'],
[100798486386, '3', '61.62'],
[100799238114, '3', '61.62']]
Which I am not able to write into the database.
Please check this.

As mentioned in this questions, you can use itertuples() and then enclose that in a list.
list(X.itertuples(index=False, name=None))

change value in pandas dataframe based on length of current value [duplicate]

This question already has answers here:
Pandas - Add leading "0" to string values so all values are equal len
(3 answers)
Closed 6 years ago.
I have a pandas dataframe that has a certain column that should have values of a length of four. If the length is three, I would like to add a '0' to the beginning of the value. For example:
a b c
1 2 0054
3 6 021
5 5 0098
8 2 012
So in column c I would like to change the second row to '0021' and last row to '0012.' The values are already strings. I've tried doing:
df.loc[len(df['c']) == 3, 'c'] = '0' + df['c']
but it's not working out. Thanks for any help!

If the type in C is int you can do something like this:
df['C'].apply(lambda x: ('0'*(4-len(str(x))))+str(x) if(len(str(x)) < 4) else str(x))
In the lambda function, I check whether the number of digits/characters in x is less than four. If yes, I add zeros in front, so that the number of digits/characters in x will be four (this is also known as padding). If not, I return the value as string.
In case your type is string, you can remove the str() function calls, but it will work either way.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Changing pandas column values into another format [duplicate] - python

Use: import ast test['labels'] = test['labels'].apply(ast.literal_eval) print (test) labels 0 [Edit Distance] 1 [Island Perimeter] 2 [Longest Substring with At Most K Distinct Ch] 3 [Valid Parentheses] 4 [Intersection of Two Arrays II] 5 [N-Queens]

Related

How to select rows from pandas dataframe by looking a feature' data types when a feature contains more than one type of value [duplicate]

Need to plot Pairplot for a dataframe that has duplicate indices [duplicate]

Search common items in 2 lists in pandas dataframe [duplicate]

How to convert Pandas data frame into 2 D array [duplicate]

change value in pandas dataframe based on length of current value [duplicate]

Categories

Resources