Checking if a data series is strings - python

I want to check if a column in a dataframe contains strings. I would have thought this could be done just by checking dtype, but that isn't the case. A pandas series that contains strings just has dtype 'object', which is also used for other data structures (like lists):
df = pd.DataFrame({'a': [1,2,3], 'b': ['Hello', '1', '2'], 'c': [[1],[2],[3]]})
df = pd.DataFrame({'a': [1,2,3], 'b': ['Hello', '1', '2'], 'c': [[1],[2],[3]]})
print(df['a'].dtype)
print(df['b'].dtype)
print(df['c'].dtype)
Produces:
int64
object
object
Is there some way of checking if a column contains only strings?

You can use this to see if all elements in a column are strings
df.applymap(type).eq(str).all()
a False
b True
c False
dtype: bool
To just check if any are strings
df.applymap(type).eq(str).any()

You could map the data with a function that converts all the elements to True or False if they are equal to str-type or not, then just check if the list contains any False elements
The example below tests a list containing element other then str. It will tell you True if data of other type is present
test = [1, 2, '3']
False in map((lambda x: type(x) == str), test)
Output: True

Related

Pandas replace() string with int "Cannot set non-string value in StringArray"

I'm trying to replace strings with integers in a pandas dataframe. I've already visited here but the solution doesn't work.
Reprex:
import pandas as pd
pd.__version__
> '1.4.1'
test = pd.DataFrame(data = {'a': [None, 'Y', 'N', '']}, dtype = 'string')
test.replace(to_replace = 'Y', value = 1)
> ValueError: Cannot set non-string value '1' into a StringArray.
I know that I could do this individually for each column, either explicitly or using apply, but I am trying to avoid that. I'd ideally replace all 'Y' in the dataframe with int(1), all 'N' with int(0) and all '' with None or pd.NA, so the replace function appears to be the fastest/clearest way to do this.
Use Int8Dtype. IntXXDtype allow integer values and <NA>:
test['b'] = test['a'].replace({'Y': '1', 'N': '0', '': pd.NA}).astype(pd.Int8Dtype())
print(test)
# Output
a b
0 <NA> <NA>
1 Y 1
2 N 0
3 <NA>
>>> [type(x) for x in test['b']]
[pandas._libs.missing.NAType,
numpy.int8,
numpy.int8,
pandas._libs.missing.NAType]

Replace item in list with one of several options given its contents

As in:
data = data.replace(['a', 'b', 'c'], [0, 1, 2])
Given 'a' replace with 0. Given 'b' replace with 1. Given 'c' replace with 2.
I see there is ways to do it with regex, but I want to know if I can do it something like the above.
Currently it's failing, because it thinks I'm trying to replace a list with a list.
You can use a dictionary to create pairs which you want replace and iterate other dictonary:
replace_pairs = { 'a': '0', 'b': '1', 'c': '2' }
data = 'abcba'
for key, value in replace_pairs.items():
data = data.replace(key, value)
Output:
>> data = '01210'
I would recommend the regex method, because the runtime is much shorter: How to replace multiple substrings of a string?

What is the best way to store the results of str.split() into a dictionary as part of a list comprehension? [duplicate]

This question already has answers here:
How to split a string within a list to create key-value pairs in Python
(5 answers)
Closed 4 years ago.
Given the following sample data:
values=['A 1','B 2','C 3']
I want to create a dictionary where A maps to 1, B to 2, and C to 3. The following works, but there is repetition:
my_dict={value.split()[0]:value.split()[1] for value in values}
The repetition of value.split() looks ugly. Is there a way to more elegantly create the dictionary without repeating value.split()?
Two ways I can think of:
>>> {k:v for k,v in (s.split() for s in values)}
{'A': '1', 'B': '2', 'C': '3'}
>>> dict(s.split() for s in values)
{'A': '1', 'B': '2', 'C': '3'}
I suggesting reading about the dict type: https://docs.python.org/3/library/stdtypes.html#mapping-types-dict; in particular:
Each item in the iterable must itself be an iterable with exactly two objects. The first object of each item becomes a key in the new dictionary, and the second object the corresponding value.
as well as the introduction of dict-comprehensions in PEP 274:
The semantics of dict comprehensions can actually be demonstrated in stock Python 2.2, by passing a list comprehension to the built-in dictionary constructor:
>>> dict([(i, chr(65+i)) for i in range(4)])
is semantically equivalent to:
>>> {i : chr(65+i) for i in range(4)}
For a functional solution, you can use dict with map and str.split:
values = ['A 1', 'B 2', 'C 3']
res = dict(map(str.split, values))
{'A': '1', 'B': '2', 'C': '3'}
you can do this way pythonic:
>>> values =['A 1','B 2','C 3']
>>> dict(map(str.split, values))
{'A': '1', 'C': '3', 'B': '2'}
str.split([sep[, maxsplit]])
Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements). If maxsplit is not specified or -1, then there is no limit on the number of splits (all possible splits are made).
If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty strings (for example, '1,,2'.split(',') returns ['1', '', '2']). The sep argument may consist of multiple characters (for example, '1<>2<>3'.split('<>') returns ['1', '2', '3']). Splitting an empty string with a specified separator returns [''].
map(function, iterable, ...)
Apply function to every item of iterable and return a list of the results. If additional iterable arguments are passed, function must take that many arguments and is applied to the items from all iterables in parallel. If one iterable is shorter than another it is assumed to be extended with None items. If function is None, the identity function is assumed; if there are multiple arguments, map() returns a list consisting of tuples containing the corresponding items from all iterables (a kind of transpose operation). The iterable arguments may be a sequence or any iterable object; the result is always a list.
you can see that dictionary is not in ordered as your list. Using collections.orderedDict we can retain the order of the input given.
>>> import collections
>>> values =['A 1','B 2','C 3']
>>> my_ordered_dict = collections.OrderedDict(map(str.split, values))
>>> my_ordered_dict
OrderedDict([('A', '1'), ('B', '2'), ('C', '3')])

Delete specific string from array column python dataframe

I'm trying to remove string '$A' from column a array elements.
But below code doesn't seems to work.
In the below code I'm trying to replace $A string with empty string (it doesn't work though) also, instead I would like to just delete that string.
df = pd.DataFrame({'a': [['$A','1'], ['$A', '3','$A'],[]], 'b': ['4', '5', '6']})
df['a'] = df['a'].replace({'$A': ''}, regex=True)
print(df['a'])
replace doesn't check inside the list element, you'll have to use loops/apply in this case:
df['a'] = df.a.apply(lambda x: [s for s in x if s != '$A'])
df
# a b
#0 [1] 4
#1 [3] 5
#2 [] 6

Simple way to convert list to dict

Please, tell me the simplest way to convert list object to dictionary.
All parameters are looks like this:
['a=1', 'b=2', ...]
And I want to convert it into:
{'a': '1', 'b': '2' ...}
You could use:
>>> x
['a=1', 'b=2']
>>>
>>> dict( i.split('=') for i in x )
{'a': '1', 'b': '2'}
>>>
For each element in the list, split on the equal character, and add to dictionary using the resulting list from split.

Categories