I'm using xlwings with numpy and unittest in Python to test an Excel spreadsheet. However, when xlwings is importing a value which has #N/A it is resulting in -2146826246.
I understand that this may have something to do with xlwings importing values as float, and there may not be a good float representation of #N/A.
I want to compare #N/A with nan. Any advice on how to accomplish this?
For anyone who may stumble across the same problem in the future. I used a very crude method of building a dictionary with the error numbers and the value which I wanted to return.
error_dict = {-2146826281:np.inf,-2146826246:np.nan}
If anyone has a more elegant solution, please let me know!
Just to add another potential if you brought in (or converted) your excel data as a Pandas Dataframe you can always use replace to convert the old #N/A values to NaN's (which are a little easier to deal with in Python/Pandas..)
df.replace(-2146826246,float('nan'))
When converting range to df, use this option
.options(empty=np.nan)
Then, treating like NaN different errors is very easy
df[df==-2146826246]=np.nan
This is very useful in order to avoid error while using functions and calculations to our data frame
Related
The rapids.ai cudf type is somewhat compatible with pandas, but here is a strange incompatibility. cudf.Series has a .diff() method, but a cudf.DataFrame does not appear to. This is super-annoying (consider, for example, a data frame of stock prices, with columns corresponding to instruments). There are, of course, kludgy ays to get around this (converting to pandas data frame and back comes to mind), but I wonder what the canonical way is. Any advice?
cuDF Python covers a large segment of the pandas API, but there are some gaps (as you've run into here).
Today, the easiest way to run diff on every column and return a dataframe would be the following:
cudf.DataFrame({col: df[col].diff() for col in df.columns})
I am trying to format a pandas DataFrame value representation.
Basically, all I want is to get the "Thousand" separator on my values.
I managed to do it using the pd.style.format function. It does the job, but also "breaks" all my table original design.
here is an example of what is going on:
Is there anything I can do to avoid doing it? I want to keep the original table format, only changing the format of the value.
PS: Don't know if it makes any difference, but I am using Google Colab.
In case anyone is having the same problem as I was using Colab, I have found a solution:
.set_table_attributes('class="dataframe"') seems to solve the problem
More infos can be found here: https://github.com/googlecolab/colabtools/issues/1687
For this case you could do:
pdf.assign(a=pdf['a'].map("{:,.0f}".format))
Here are my data and index value image :
As in the snap pandas Dataframe returning two values. What could be possibly wrong? I am beginner, sorry for the bad editing.
I think I see the issue.
data['Title'].iloc[0]
Try something like this. I think the .head() portion of the code is causinng you issues
I have this dataframe from where I need to exact the act1omschr from the column adresactiviteit, however sinds it is an object with a list and dict I don't know how to extract these values.
Can someone help me out?
It looks like that's not a dictionary, but a 'json' (java script object notation). It's a bit like a csv but with nested values and pretty comumn especially for web data.
Pandas has a function called 'json_normalize' which should help. For specifically using it on one column, this was answered pretty well over here. You should more or less be able to use the exact code given.
I have a 10000 x 250 dataset in a csv file. When I use the command
data = pd.read_csv('pool.csv', delimiter=',',header=None)
while I am in the correct path I actually import the values.
First I get the Dataframe. Since I want to work with the numpy package I need to convert this to its values using
data = data.values
And this is when i gets weird. I have at position [9999,0] in the file a -0.3839 as value. However after importing and calculating with it I noticed, that Python (or numpy) does something strange while importing.
Calling the value of data[9999,0] SHOULD give the expected -0.3839, but gives something like -0.383899892....
I already imported the file in other languages like Matlab and there was no issue of rounding those values. I aswell tried to use the .to_csv command from the pandas package instead of .values. However there is the exact same problem.
The last 10 elements of the first column are
-0.2716
0.3711
0.0487
-1.518
0.5068
0.4456
-1.753
-0.4615
-0.5872
-0.3839
Is there any import routine, which does not have those rounding errors?
Passing float_precision='round_trip' should solve this issue:
data = pd.read_csv('pool.csv',delimiter=',',header=None,float_precision='round_trip')
That's a floating point error. This is because of how computers work. (You can look it up if you really want to know how it works.) Don't be bothered by it, it is very small.
If you really want to use exact precision (because you are testing for exact values) you can look at the decimal module of Python, but your program will be a lot slower (probably like 100 times slower).
You can read more here: https://docs.python.org/3/tutorial/floatingpoint.html
You should know that all languages have this problem, only some are better in hiding it. (Also note that in Python3 this "hiding" of the floating point error has been improved.)
Since this problem cannot be solved by an ideal solution, you are given the task to solve it yourself and choose the most appropriate solution for your situtation
I don't know about 'round_trip' and its limitations, but it probably can help you. Other solutions would be to use float_format from the to_csv method. (https://docs.python.org/3/library/string.html#format-specification-mini-language)