I want to convert hex in column x into correct negative int as seen in column "true", but instead i got result in column y.
x y true
fdf1 65009 -527
I tried this (I know it's not correct)
df["y"] = df["x"].apply(int,base=16)
and from this link I know this function:
def s16(value):
return -(value & 0x8000) | (value & 0x7fff)
a = s16(int('fdf1', 16))
print(a)
can convert single value into correct one but how do you apply it to make a new column in Pandas data frame?
Use lambda function:
df["y"] = df["x"].apply(lambda x: s16(int(x, base=16)))
Or change function for cleaner code:
def s16(value):
value = int(value, base=16)
return -(value & 0x8000) | (value & 0x7fff)
df["y"] = df["x"].apply(s16)
print (df)
x y true
0 fdf1 -527 -527
The easiest way is to convert it to an integer and reinterpret it as a 16-bit integer by using .astype:
import numpy as np
df["y"] = df["x"].apply(lambda x: int(x, base=16)).astype(np.int16)
The dtype of column y will be int16, so any operation done on this column with other int16's will keep the values between -32768 and 32767.
Related
I suspect the solution is quite simple, but I have been unable to figure it out. Essentially, what I want to do is to query a column with the float object type to see if each value >= 100.00. If it is greater, then I want to take the value x and do so: ((x - 100)*.25)+100 = new value (replace original values inplace, preferably.)
The data looks something like:
Some columns here
A percentage stored as float
foobar
84.85
foobar
15.95
fuubahr
102.25
The result of the above operation mentioned would give the following for the above:
Some columns here
A percentage stored as float
foobar
84.85
foobar
15.95
fuubahr
100.5625
Thanks!
List comprehension is easy solution for this:
dataframe["A percentage stored as float"] = [((x - 100)*.25) + 100 if x >= 100 else x for x in dataframe["A percentage stored as float"]]
What it does: It loops through the each column row, checks if value meets our if stement and then does the applies the calculation, if statement is not met, then it returns the original row value.
m using tensorflow datasets api
and i have a data with a string column that can represents a binary option
(something like ("yes" or "no")
i'm wondering if i convert it into 1 and 0 (integer value) respectively, and leave the other columns unchanged
my skeleton functions is:
def mapper(features,target):
#features["str_col"] TODO "MAP this when yes to 1 when no to 0"
#return features with x transformed # TODO
can u assist?
You can convert a bool to int:
y = tf.equal(features["str_col"], 'YES')
y = tf.cast(y, tf.int32)
I am working with this dataset at below measurements.csv
https://www.kaggle.com/anderas/car-consume/data
It has values inside like this: 21,5 but floating definition must be like that 21.5 Therefore, Python says, "ValueError: could not convert string to float: '21,5'"
My codes are as these,
# get data ready
data = pd.read_csv('measurements.csv')
data.shape
# split out features and label
X = data.iloc[:, :-5].values
y = data.iloc[:, -4]
# map category to binary
y = np.where(y == 'E10', 1, 0)
enc = OneHotEncoder()
Second Question:
I also want to use its another columns which has string values or null (empty) how should I transform them to my input shape?
You can tell read_csv what the character for decimal point is:
data = pd.read_csv('measurements.csv', decimal=',')
From https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
In read_csv, you can speciify decimal values as
data = pd.read_csv('measurements.csv', decimal=",")
I have a dataframe
df = pd.DataFrame(data=np.arange(10),columns=['v']).astype(float)
How to make sure that the numbers in v are whole numbers?
I am very concerned about rounding/truncation/floating point representation errors
Comparison with astype(int)
Tentatively convert your column to int and test with np.array_equal:
np.array_equal(df.v, df.v.astype(int))
True
float.is_integer
You can use this python function in conjunction with an apply:
df.v.apply(float.is_integer).all()
True
Or, using python's all in a generator comprehension, for space efficiency:
all(x.is_integer() for x in df.v)
True
Here's a simpler, and probably faster, approach:
(df[col] % 1 == 0).all()
To ignore nulls:
(df[col].fillna(-9999) % 1 == 0).all()
If you want to check multiple float columns in your dataframe, you can do the following:
col_should_be_int = df.select_dtypes(include=['float']).applymap(float.is_integer).all()
float_to_int_cols = col_should_be_int[col_should_be_int].index
df.loc[:, float_to_int_cols] = df.loc[:, float_to_int_cols].astype(int)
Keep in mind that a float column, containing all integers will not get selected if it has np.NaN values. To cast float columns with missing values to integer, you need to fill/remove missing values, for example, with median imputation:
float_cols = df.select_dtypes(include=['float'])
float_cols = float_cols.fillna(float_cols.median().round()) # median imputation
col_should_be_int = float_cols.applymap(float.is_integer).all()
float_to_int_cols = col_should_be_int[col_should_be_int].index
df.loc[:, float_to_int_cols] = float_cols[float_to_int_cols].astype(int)
For completeness, Pandas v1.0+ offer the convert_dtypes() utility, that (among 3 other conversions) performs the requested operation for all dataframe-columns (or series) containing only integer numbers.
If you wanted to limit the conversion to a single column only, you could do the following:
>>> df.dtypes # inspect previous dtypes
v float64
>>> df["v"] = df["v"].convert_dtype()
>>> df.dtypes # inspect converted dtypes
v Int64
On 27 331 625 rows it works well. Time : 1.3sec
df['is_float'] = df[field_fact_qty]!=df[field_fact_qty].astype(int)
This way took Time : 4.9s
df[field_fact_qty].apply(lambda x : (x.is_integer()))
In a Pandas data frame column, I want to convert each character in a string to an integer (as is done with ord()) and add 100 to the left. I know how to do this with a regular string:
st = "JOHNSMITH4817001141979"
a=[ord(x) for x in st]
b=[]
for x in a:
b.append('{:03}'.format(x)) #Add leading zero, ensuring 3 digits
b=['100']+b
b=''.join([ "%s"%x for x in b])
b=int(b)
b
Result: 100074079072078083077073084072052056049055048048049049052049057055057
But what if I wanted to perform this operation on every cell of a column in a Pandas data frame like this one?
import pandas as pd
df = pd.DataFrame({'string':['JOHNSMITH4817001141979','JOHNSMYTHE4817001141979']})
df
string
0 JOHNSMITH4817001141979
1 JOHNSMYTHE4817001141979
I just need a separate column with the result as an integer for each cell in 'string'.
Thanks in advance!
First, you transform your processing chain into a function such as:
def get_it(a):
a=[ord(x) for x in st]
b=[]
for x in a:
b.append('{:03}'.format(x)) #Add leading zero, ensuring 3 digits
b=['100']+b
b=''.join([ "%s"%x for x in b])
return int(b)
and then you call it iteratively for each element in the column and make this list the new column
df['result'] = [get_it(i) for i in df['string']]
Although this does work, I yet think that you can find a better solution by optimizing your process "get_it"
Also, you can do the following:
def get_it(a):
a=[ord(x) for x in st]
b=[]
for x in a:
b.append('{:03}'.format(x)) #Add leading zero, ensuring 3 digits
b=['100']+b
b=''.join([ "%s"%x for x in b])
return int(b)
df['result'] = df['string'].apply(get_it)
If you want a one-liner(Python 3.6+)
import pandas as pd
df = pd.DataFrame({'string':['JOHNSMITH4817001141979','JOHNSMYTHE4817001141979']})
df['string'].apply(lambda x:''.join(['100']+[f'{ord(i):03}' for i in x])).astype(int)
For Python < 3.6, replace f-format to '{ord(i):03}'.format(i=i). What I have done is transform your function into a lambda expression and apply it to the column.