I have this type of DataFrame I wish to utilize. But because the data i imported is using the i letter for the imaginary part of the complex number, python doesn't allow me to convert it as a float.
5.0 0.01511+0.0035769i
5.0298 0.015291+0.0075383i
5.0594 0.015655+0.0094534i
5.0874 0.012456+0.011908i
5.1156 0.015332+0.011174i
5.1458 0.015758+0.0095832i
How can I proceed to change the i to j in each row of the DataFrame?
Thank you.
If you have a string like this: complexStr = "0.015291+0.0075383i", you could do:
complexFloat = complex(complexStr[:-1] + 'j')
If your data is a string like this: str = "5.0 0.01511+0.0035769i", you have to separate the first part, like this:
number, complexStr = str.split()
complexFloat = complex(complexStr[:-1] + 'j')
>>> complexFloat
>>> (0.015291+0.0075383j)
>>> type(complexFloat)
>>> <type 'complex'>
I'm not sure how you obtain your dataframe, but if you're reading it from a text file with a suitable header, then you can use a converter function to sort out the 'j' -> 'i' so that your dtype is created properly:
For file test.df:
a b
5.0 0.01511+0.0035769i
5.0298 0.015291+0.0075383i
5.0594 0.015655+0.0094534i
5.0874 0.012456+0.011908i
5.1156 0.015332+0.011174i
5.1458 0.015758+0.0095832i
the code
import pandas as pd
df = pd.read_table('test.df',delimiter='\s+',
converters={'b': lambda v: complex(str(v.replace('i','j')))}
)
gives df as:
a b
0 5.0000 (0.01511+0.0035769j)
1 5.0298 (0.015291+0.0075383j)
2 5.0594 (0.015655+0.0094534j)
3 5.0874 (0.012456+0.011908j)
4 5.1156 (0.015332+0.011174j)
5 5.1458 (0.015758+0.0095832j)
with column dtypes:
a float64
b complex128
Related
I need to extract numeric values from a string inside a pandas DataFrame.
Let's say the DataFrame cell is as follows (stored as a string):
[1.234,2.345]
I can get the first value with the following:
print(df['column_name'].str.extract('(\d+.\d+)',).astype('float'))
Output:
1.234
Now my thoughts to find both values was to do the following:
print(df['column_name'].str.extract('(\d+.\d+),(\d+.\d+)',).astype('float'))
but the output is then as follows:
NaN NaN
Expected output:
1.234 2.345
Why not just pd.eval:
>>> df['Float'] = pd.eval(df['String'])
>>> df
String Float
0 [1.234, 2.345] [1.234, 2.345]
1 [1.234, 2.345] [1.234, 2.345]
>>>
If you want to use a regex to extract floats, you can use str.findall:
>>> df['column_name'].str.findall(r'(-?\d+\.?\d+)').str.join(' ')
0 1.234 2.345
Name: String, dtype: object
Old answer:
Use ast.literal_eval:
import ast
df = pd.DataFrame({'String': ['[1.234, 2.345]']})
df['Float'] = df['String'].apply(ast.literal_eval)
Output:
>>> df
String Float
0 [1.234, 2.345] [1.234, 2.345]
>>> type(df.at[0, 'String'][0])
str
>>> type(df.at[0, 'Float'][0])
float
You can use pandas.str.split, setting n=2. If you want to expand the DataFrame you must set expand=True.
So the result might look like:
your_dataframe['your_column_name'].str.split(",", n=2, expand=True).astype(float)
this question the reverse problem as
In Python, how to specify a format when converting int to string?
here I have string "0001" to integer 1
string "0023" to integer 23
I wish to use this on pandas dataframe since I have column looks like:
dic = {'UPCCode': ["00783927275569", "0007839272755834", "003485934573", "06372792193", "8094578237"]}
df = pd.DataFrame(data=dic)
I wish it become some thing like this
dic = {'UPCCode': [783927275569, 7839272755834, 3485934573, 6372792193, 8094578237]}
df = pd.DataFrame(data=dic)
if I use int(001) or float(0023)
it will gives me this error
SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers
The best way is to use pd.to_numeric:
df['UPCCode'] = pd.to_numeric(df['UPCCode'])
print(df)
UPCCode
0 783927275569
1 7839272755834
2 3485934573
3 6372792193
4 8094578237
Here is a quick solution, just use the astype method:
>>> df = df.astype(int)
>>> df
UPCCode
0 783927275569
1 7839272755834
2 3485934573
3 6372792193
4 8094578237
If you want to apply this for column 'UPCCode' alone, do like this:
df = df['UPCCode'].astype(int)
Hello I found some way to do that just try:
df["UPCCode"] = df["UPCCode"].str.strip("0")
df["UPCCode"] = df["UPCCode"].astype(int)
I am new in Python Pandas and I am trying to figure it out the problem.
I am fighting with the problem of converting dtype value in my csv.
I wrote a simple example to understand what is the problem but I cannot see there anything and I am not able to find why it is not working .. Please see below.
I have now a CSV table with 3 columns For the A and B the dtypes is Int64 for C it is object
If i will set the variable as str it will change the value from int64 to object.
My code is like this :
import pandas as pd
data_Cisla = pd.read_csv("Cisla.csv", sep=";" , dtype=str)
print(data_Cisla.dtypes)
print(data_Cisla)
def cisla():
vstup = input("Input value ")
print(vstup, type(vstup))
print(data_Cisla.loc[vstup])
When I will use also index_col="C" and print the cisla()
It is working.
Program will ask me for an input from the Column C - So I write for example text_2 and it give me output (C)text_2 (A) 2 (B) 20 ----> This is what I am looking for but for the column A as an index_col.
But if I will use the same thing for index_col A an write 20 when program ask for Input value it doesn´t work and giving me error ..
What I don´t understand is When I am printing each step with data_Cisla.dtypes it will say me that all the time all column are object so what is the differences there ?
Why it is working for column C and not for column A?
Final code looks like this
import pandas as pd
data_Cisla = pd.read_csv("Cisla.csv", sep=";" , dtype=str, index_col="C")
def cisla():
vstup = input("Input value ")
print(data_Cisla.loc[vstup])
cisla()
Thank you for helping me.
The reason for the observed behavior is that column 'C' is your index. I do not know why, because it is not in your code. My solution:
import pandas as pd
# build test data
data_Cisla = [[1, 10, 'text_1'],
[2, 20, 'text_2'],]
data_Cisla = pd.DataFrame.from_records(data=data_Cisla, columns=['A', 'B', 'C'])
data_Cisla = data_Cisla.reset_index()
def cisla(data_Cisla: pd.DataFrame, col: str, vstup: str):
# Do not change data_Cisla, just make sure vstup is in the right format (str or float)
try:
vstup = float(vstup)
except ValueError:
pass
mask = data_Cisla[col] == vstup
return data_Cisla[mask]
It will produce the following result:
cisla(data_Cisla, 'C', 'text_1') #-> 1 | 10 | text_1
cisla(data_Cisla, 'A', '1') #-> -> 1 | 10 | text_1
cisla(data_Cisla, 'A', 1) #-> -> 1 | 10 | text_1
I have a csv with ~10 columns.. One of the columns has information in bytes i.e., b'gAAAA234'. But when I read this from pandas via .read_csv("file.csv"), I get it all in a dataframe and this particular column is in string rather than bytes i.e., b'gAAAA234'.
How do I simply read it as bytes without having to read it as string and then reconverting?
Currently, I'm working with this:
b = df['column_with_data_in_bytes'][i]
bb = bytes(b[2:len(b)-1],'utf-8')
#further processing of bytes
This works but I was hoping to find a more elegant/pythonic or more reliable way to do this?
You might consider parsing with ast.literal_eval:
import ast
df['column_with_data_in_bytes'] = df['column_with_data_in_bytes'].apply(ast.literal_eval)
Demo:
In [322]: df = pd.DataFrame({'Col' : ["b'asdfghj'", "b'ssdgdfgfv'", "b'asdsfg'"]})
In [325]: df
Out[325]:
Col
0 b'asdfghj'
1 b'ssdgdfgfv'
2 b'asdsfg'
In [326]: df.Col.apply(ast.literal_eval)
Out[326]:
0 asdfghj
1 ssdgdfgfv
2 asdsfg
Name: Col, dtype: object
I'm reading a CSV file into a DataFrame. I need to strip whitespace from all the stringlike cells, leaving the other cells unchanged in Python 2.7.
Here is what I'm doing:
def remove_whitespace( x ):
if isinstance( x, basestring ):
return x.strip()
else:
return x
my_data = my_data.applymap( remove_whitespace )
Is there a better or more idiomatic to Pandas way to do this?
Is there a more efficient way (perhaps by doing things column wise)?
I've tried searching for a definitive answer, but most questions on this topic seem to be how to strip whitespace from the column names themselves, or presume the cells are all strings.
Stumbled onto this question while looking for a quick and minimalistic snippet I could use. Had to assemble one myself from posts above. Maybe someone will find it useful:
data_frame_trimmed = data_frame.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
You could use pandas' Series.str.strip() method to do this quickly for each string-like column:
>>> data = pd.DataFrame({'values': [' ABC ', ' DEF', ' GHI ']})
>>> data
values
0 ABC
1 DEF
2 GHI
>>> data['values'].str.strip()
0 ABC
1 DEF
2 GHI
Name: values, dtype: object
We want to:
Apply our function to each element in our dataframe - use applymap.
Use type(x)==str (versus x.dtype == 'object') because Pandas will label columns as object for columns of mixed datatypes (an object column may contain int and/or str).
Maintain the datatype of each element (we don't want to convert everything to a str and then strip whitespace).
Therefore, I've found the following to be the easiest:
df.applymap(lambda x: x.strip() if type(x)==str else x)
When you call pandas.read_csv, you can use a regular expression that matches zero or more spaces followed by a comma followed by zero or more spaces as the delimiter.
For example, here's "data.csv":
In [19]: !cat data.csv
1.5, aaa, bbb , ddd , 10 , XXX
2.5, eee, fff , ggg, 20 , YYY
(The first line ends with three spaces after XXX, while the second line ends at the last Y.)
The following uses pandas.read_csv() to read the files, with the regular expression ' *, *' as the delimiter. (Using a regular expression as the delimiter is only available in the "python" engine of read_csv().)
In [20]: import pandas as pd
In [21]: df = pd.read_csv('data.csv', header=None, delimiter=' *, *', engine='python')
In [22]: df
Out[22]:
0 1 2 3 4 5
0 1.5 aaa bbb ddd 10 XXX
1 2.5 eee fff ggg 20 YYY
The "data['values'].str.strip()" answer above did not work for me, but I found a simple work around. I am sure there is a better way to do this. The str.strip() function works on Series. Thus, I converted the dataframe column into a Series, stripped the whitespace, replaced the converted column back into the dataframe. Below is the example code.
import pandas as pd
data = pd.DataFrame({'values': [' ABC ', ' DEF', ' GHI ']})
print ('-----')
print (data)
data['values'].str.strip()
print ('-----')
print (data)
new = pd.Series([])
new = data['values'].str.strip()
data['values'] = new
print ('-----')
print (new)
Here is a column-wise solution with pandas apply:
import numpy as np
def strip_obj(col):
if col.dtypes == object:
return (col.astype(str)
.str.strip()
.replace({'nan': np.nan}))
return col
df = df.apply(strip_obj, axis=0)
This will convert values in object type columns to string. Should take caution with mixed-type columns. For example if your column is zip codes with 20001 and ' 21110 ' you will end up with '20001' and '21110'.
This worked for me - applies it to the whole dataframe:
def panda_strip(x):
r =[]
for y in x:
if isinstance(y, str):
y = y.strip()
r.append(y)
return pd.Series(r)
df = df.apply(lambda x: panda_strip(x))
I found the following code useful and something that would likely help others. This snippet will allow you to delete spaces in a column as well as in the entire DataFrame, depending on your use case.
import pandas as pd
def remove_whitespace(x):
try:
# remove spaces inside and outside of string
x = "".join(x.split())
except:
pass
return x
# Apply remove_whitespace to column only
df.orderId = df.orderId.apply(remove_whitespace)
print(df)
# Apply to remove_whitespace to entire Dataframe
df = df.applymap(remove_whitespace)
print(df)