using unique function in pandas for a 2D array - python

In this question's answer I got the idea of using pandas unique function instead of numpy unique. When looking into the documentation here I discovered that this can only be done for 1D arrays or tuples. As my data has the format:
example = [[25.1, 0.03], [25.1, 0.03], [24.1, 15]]
it would be possible to covert it to tuples and after using the unique function again back to an array. Does someone know a 'better' way to do this? This question might be related, but is dealing with cells. I don't want to use numpy as I have to keep the order in the array the same.

You can convert to tuple and the convert to unique list:
list(dict.fromkeys(map(tuple, example)))
Output:
[(25.1, 0.03), (24.1, 15)]

If you'd like to use Pandas:
To find the unique pairs in example, use DataFrame instead of Series and then drop_duplicates:
pd.DataFrame(example).drop_duplicates()
0 1
0 25.1 0.03
2 24.1 15.00
(And .values will give you back a 2-D array.)

Related

List to 2d array in pandas per line NEED MORE EFFICIENT WAY

I have a pandas dataframe for lists. And each one of the lists can use np.asarray(list) to convert the list to a numpy array. The shape of the array should be (263,300) ,so i do this
a=dataframe.to_numpy()
# a.shape is (100000,)
output_array=np.array([])
for list in a:
output_array=np.append(output_array,np.asarray(list))
Since there are 100000 rows in my pandas, so i expect to get
output_array.shape is (100000,263,300)
It works, but it takes long time.
I want to know which part of my code cost the most and how to solve it.
Is there a more efficient method to reach this? Thanks!

How can I convert a Pandas Series to a Numpy Array and maintain order?

I have a pandas series that looks like:
cash 50121.599128
num_shares 436.000000
cost_basis 114.400002
open_price 113.650002
close_10 114.360001
close_9 115.769997
close_8 114.800003
close_7 114.040001
close_6 115.680000
close_5 115.930000
close_4 115.430000
close_3 113.339996
close_2 114.870003
close_1 114.050003
dtype: float64
I want to convert it to a numpy array, so I'm doing:
next_state_val = np.array([next_state.values])
However, there's no guarantee that my series will always have the same order. How can I maintain the same order across many series?
Assuming the indices are always the same (but not necessarily occurring in the same order), you can use .sort_index() on the series. This will ensure the series is consistently ordered by its index each time.
https://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.Series.sort_index.html
If I understand the docs correctly, the .values method returns an array and preserves order.

Why does pandas.to_numeric result in a list of lists?

I am trying to import csv data into a pandas dataframe. To do this I am doing the following:
df = pd.read_csv(StringIO(contents), skiprows=4, delim_whitespace=True,index_col=False,header=None)
index = pd.MultiIndex.from_arrays((columns, units, descr))
df.columns = index
df.columns.names = ['Name','Unit','Description']
df = df.apply(pd.to_numeric)
data['isotherm'] = df
This produces e.g. the following table:
In: data['isotherm']
Out:
Name Relative_Pressure Volume_STP
Unit - ccm/g
Description p/p0
0 0.042691 29.3601
1 0.078319 30.3071
2 0.129529 31.1643
3 0.183355 31.8513
4 0.233435 32.3972
5 0.280847 32.8724
However if I only want to get the values of the column Relative_Pressure I get this output:
In: data['isotherm']['Relative_Pressure'].values
Out:
array([[0.042691],
[0.078319],
[0.129529],
[0.183355],
[0.233435],
[0.280847]])
Of course I could now for every column I want to use flatten
x = [item for sublist in data['isotherm']['Relative_Pressure'].values for item in sublist]
However this would lead to a lot of extra effort and would also reduce the readability. How can I for the whole data frame make sure the data is flat?
array([[...]]) is not a list of lists, but a 2D numpy array. (I'm not sure why the values are returned as a single-column 2D array rather than a 1D array here, though. When I create a primitive DataFrame, a single column's values are returned as a 1D array.)
You can concatenate and flatten them using numpy's built-in functions, eg.
x = data['isotherm']['Relative_Pressure'].flatten()
Edit: This might be caused by the MultiIndex.
The direct way of indexing into one column belonging to your MultiIndex object is with a tuple as follows:
data[('isotherm', 'Relative_Pressure')]
which will return a Series object whose .values attribute will give you the expected 1D array. The docs discuss this here
You should be careful using chained indexing like data['isotherm']['Relative_Pressure'] because you won't know if you are dealing with a copy of the data or a view of the data. Please do a SO search of pandas' SettingWithCopyWarning for more details or read the docs here.

How to create a series of numbers using Pandas in Python

I am new to python and have recently learnt to create a series in python using Pandas. I can define a series eg: x = pd.Series([1, 2, 3, 4, 5]) but how to define the series for a range, say 1 to 100 rather than typing all elements from 1 to 100?
As seen in the docs for pandas.Series, all that is required for your data parameter is an array-like, dict, or scalar value. Hence to create a series for a range, you can do exactly the same as you would to create a list for a range.
one_to_hundred = pd.Series(range(1,101))
one_to_hundred=pd.Series(np.arange(1,101,1))
This is the correct answer where you create a series using the numpy arange function which creates a range starting with 1 till 100 by incrementing 1.
There's also this:
one_to_hundred = pd.RangeIndex(1, 101).to_series()
I'm still looking for a pandas function that creates a series containing a range (sequence) of numbers directly, but I don't think it exists.
try pd.Series([0 for i in range(20)]).
It will create a pd series with 20 rows
num=np.arange(1,101)
s = pd.Series(num)
see the solution just change whatever you want. and for details about np.arange see
below link
https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.arange.html

Pandas: Series of arrays to series of transposed arrays

Ok, this is an easy one, I hope.
Using Pandas, I have a Series of 100 equal length Numpy arrays each with 30000 elements. I'd like to quickly transpose them into a series of 30000 arrays with 100 elements.
I of course can do it with list comprehensions or pulling the arrays but is there an efficient Pandas way to do it? Thanks!
UPDATE:
As per the request by #Alexander to make this a better example, here is some toy data.
import pandas
s1 = pandas.Series([np.array(range(10)) for i in range(10)])
And what I want returned in this example is:
s2 = pandas.Series([np.ones(10)*i for i in range(10)])
That is, an element-wise transpose of a Series of arrays into a new Series of arrays. Thanks!
Ok, this works actually. Any one have a more efficient solution?
pandas.Series(np.asarray(s1.tolist()).T.tolist())

Categories