How I can access length of string in Pandas Series - python

How can I access using length of strings(ex.len function) in Pandas Series
How to access 'eee'(3 char) exclude index accessing
test=pd.Series(['aaaa','bbbb','cccc','dddd','eee'])
==> 4 eee

What you want is unclear.
To get the length of each string use str.len:
test.str.len()
output:
0 4
1 4
2 4
3 4
4 3
dtype: int64
To select the strings with 3 characters use boolean indexing:
test[test.str.len().eq(3)]
output:
4 eee
dtype: object

Related

strip object series without deleting the int/float cells in the series

I have a series of type object, When using Series.str.strip() the cells that contain only int are getting turned into Nan.
How do I avoid this?
example
sr = pd.Series([1,2,3,'foo '])
sr.str.strip()
0 NaN
1 NaN
2 NaN
3 foo
dtype: object
desired outcome
0 1
1 2
2 3
3 foo
dtype: object
The simpliest is replace missing values by original values by Series.fillna:
sr = pd.Series([1,2,3,'foo '])
sr.str.strip().fillna(sr)
Or striping only strings tested by isinstance:
print (sr.apply(lambda x: x.strip() if isinstance(x, str) else x))
0 1
1 2
2 3
3 foo
dtype: object
You can cast the series to str altogether and then strip:
>>> sr.astype(str).str.strip()
0 1
1 2
2 3
3 foo
dtype: object
this way 1 becomes "1" and remains unchanged against stripping. But they will remain strings at the end, not integers; not sure if that's the desired output.

how to ignore null values while dropping the matched pattern in a column?

I have a data frame, from where I want to match with a pattern and then drop the unmatched rows from the data frame. So, for doing that I have used str.contains, while I am dropping it. I am getting an error saying that Can't mask null values. How to ignore the bull values while dropping them?
My code:
df =
a b
0 2 3
1 5
2 34we 9
3 4 9
df[df['a'].str.contains(r'^\d+$')]
Error: 'Can't mask Naan Values
Excepted output:
a b
0 2 3
1 5
3 4 9
Use na=True in str.contains function to assign nan as True:
In [907]: df[df['a'].str.contains(r'^\d+$', na=True)]
Out[907]:
a b
0 2 3
1 5
3 4 9

How to reshape a python vector when some elements are empty

I have a df with values:
A B C D
0 1 2 3 2
1 2 3 3 9
2 5 3 6 6
3 3 6 7
4 6 7
5 2
df.shape is 6x4, say
df.iloc[:,1] pulls out the B column, but len(df.iloc[:,1]) is also = 6
How do I "reshape" df.iloc[:,1]? Which function can I use so that the output is the length of the actual values in the column.
My expected output in this case is 3
You can use last_valid_index. Just note that since your series originally contained NaN values and these are considered float, even after filtering your series will be float. You may wish to convert to int as a separate step.
# first convert dataframe to numeric
df = df.apply(pd.to_numeric, errors='coerce')
# extract column
B = df.iloc[:, 1]
# filter to the last valid value
B_filtered = B[:B.last_valid_index()]
print(B_filtered)
0 2.0
1 3.0
2 3.0
3 6.0
Name: B, dtype: float64
You can use list comprehension like this.
len([x for x in df.iloc[:,1] if x != ''])

Replace pandas Series with Series of different length but same indices

Suppose two pandas Series A and B:
A:
1 4
2 4
3 4
4 1
5 3
B:
3 4
4 4
5 2
A is larger than B and B has the same indices as A with different values. I'm trying to replace the values of A with those of B.
A.replace(to_replace=B) seems obvious but does not work. What am I missing here?
I think you can use combine_first:
C = B.combine_first(A).astype(int)
print (C)
1 4
2 4
3 4
4 4
5 2
dtype: int32
An alternative solution with more basic pandas operators.
a.loc[b.index.values]=b.values

I wanna change type of programming languages of designated rows String into int

I wanna change type of programming languages of designated rows String into int.
Now,I am making data analysis app by using numpy&scipy&pandas&etc.
My app read csv files.I wanna designate int type against rows which are lines only 10~15 (other all rows are string type).
When I wrote code like
x = pandas.pd.read_csv('filename/csv file',header=1,parse_data=True,converters={9:14,lambda x:x.decode('int')})
Syntax error happen(:expected error) .I think my code is wrong.But I do not know how to fix it.What should I do?
Is it possible, but not recommended, because get mixed types int with str and some pandas functions fail.
Select rows by loc if need select by index values or by position by iloc and then convert to int:
np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(10,5)), columns=list('ABCDE')).astype(str)
print (df)
A B C D E
0 8 8 3 7 7
1 0 4 2 5 2
2 2 2 1 0 8
3 4 0 9 6 2
4 4 1 5 3 4
5 4 3 7 1 1
6 7 7 0 2 9
7 9 3 2 5 8
8 1 0 7 6 2
9 0 8 2 5 1
df.loc[3:8] = df.loc[3:8].astype(int)
print (type(df.loc[0, 'A']))
<class 'str'>
print (type(df.loc[4, 'A']))
<class 'int'>
converters={9:14,lambda x:x.decode('int')}) is no valid python syntax.
However, you cannot change the data type per row. Only per column as in pandas, each column has a data type, not each cell or row.
Take a look at converters, it's neither a dict nor a set.

Categories