pandas df transformation: a better way than df.unstack().unstack() - python

Trying to convert pandas DataFrames from wide to long format.
I've tried to melt(), use wide_to_long() (the easy melt()), yet kept being confused with the syntax and the output I received.
I've also read through many posts on SO and the web about this topic and tried quite some proposed approaches, yet the results were never what I was looking for.
This post helped me to discover unstack() - and I finally managed to get the result I wanted using it twice in a row: df.unstack().unstack().
I'm sure that this is not the best way to do this and was hoping for a tip! Here's my example:
import pandas as pd
# an example df (the real data makes more sense):
series_list = [
pd.Series(list("hello! hello!"), name='greeting'),
pd.Series(list("stackoverflow"), name='name'),
pd.Series(list("howsit going?"), name='question')
]
wide_df = pd.DataFrame(series_list)
Creating a df like that always gives me one in wide format:
0 1 2 3 4 5 6 7 8 9 10 11 12
greeting h e l l o ! h e l l o !
name s t a c k o v e r f l o w
question h o w s i t g o i n g ?
However, I'd want the pd.Series()s name= attribute to become the column names.
What worked for me is the mentioned df.unstack().unstack():
greeting name question
0 h s h
1 e t o
2 l a w
3 l c s
4 o k i
5 ! o t
6 v
7 h e g
8 e r o
9 l f i
10 l l n
11 o o g
12 ! w ?
But this sure is clunky and there must be a better way!
Thanks and have a good day : )

Using T
wide_df.T
Out[1108]:
greeting name question
0 h s h
1 e t o
2 l a w
3 l c s
4 o k i
5 ! o t
6 v
7 h e g
8 e r o
9 l f i
10 l l n
11 o o g
12 ! w ?

Related

Control signs appear as smileys in visual studio

When I type
print( "Hello ,\x00\x01\x02world!" )
in IDLE, the special singns are ignored, but when I do this in Visual Studio Code, i get this:
Hello ,☺☻world!
I was wondering just why this is.
for x in range(127):
print(chr(x), end = ' ')
Edit: When I run this code, this displays on the terminal in vscode:
☺ ☻ ♥ ♦ ♣ ♠
♫ ☼ ► ◄ ↕ ‼ ¶ § ▬ ↨ ↑ ↓ → ∟↔▲▼ 1 2 3 4 5 6 7 8 9 : ; < = > ? # A B C D E F >G H I J K L M N O P Q R S T U V W X Y Z [ \ ]
^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~
When i do this in IDLE, this shows up:
! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; >< = > ? # A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a >b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~
Maybe this has to do with the vesrion?
im using python 3.9.1.
Oh and for some reason there are these rectangles in IDLE that get printed before the '!', but they can't be shown in the answer. (these rectangles represent the control codes: https://en.wikipedia.org/wiki/List_of_Unicode_characters#Control_codes)
It seems that my VSC prints the first characters from code page 437
https://en.wikipedia.org/wiki/Code_page_437
Duplicate of Emoji symbols/emoticons in Python IDLE:
As per #mata:
Tcl (and therefore tkinter and idle) supports only characters in the 16bit rahge (U+0000-U+FFFF), so you can't. –

Trying to verify last position of a string

Im trying to verify if the last char is not on my list
def acabar_char(input):
list_chars = "a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1 2 3 4 5 6 7 8 9 0".split()
tam = 0
tam = (len(input)-1)
for char in input:
if char[tam] in list_chars:
return False
else:
return True
When i try this i get this error:
if char[tam] in list_chars:
IndexError: string index out of range
you can index from the end (of a sting or a list) with negative numbers
def acabar_char(input, list_cars):
return input[-1] is not in list_chars
It seems that you are trying to assert that the last element of an input string (or also list/tuple) is NOT in a subset of disallowed chars.
Currently, your loop never even gets to the second and more iteration because you use return inside the loop; so the last element of the input only gets checked if the input has length of 1.
I suggest something like this instead (also using the string.ascii_letters definition):
import string
DISALLOWED_CHARS = string.ascii_letters + string.digits
def acabar_char(val, disallowed_chars=DISALLOWED_CHARS):
if len(val) == 0:
return False
return val[-1] not in disallowed_chars
Does this work for you?
you are already iterating through your list in that for loop, so theres no need to use indices. you can use list comprehension as the other answer suggest, but I'm guessing you're trying to learn python, so here would be the way to rewrite your function.
list_chars = "a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1 2 3 4 5 6 7 8 9 0".split()
for char in input:
if char in list_chars:
return False
return True
list_chars = "a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1 2 3 4 5 6 7 8 9 0".split()
def acabar_char(input):
if input in list_chars:
print('True')

How to create data frame from pandas series containg lists of different length

I've got pandas series withe below structure:
> 0 [{k1:a,k2:b,k3:c},{k1:d,k2:e,k3:f}]
> 1 [{k1:g,k2:h,k3:i},{k1:j,k2:k,k3:l},{k1:ł,k2:m,k3:n}]
> 2 [{k1:o,k2:p,k3:r}
> 3 [{k1:s,k2:t,k3:w},{k1:q,k2:z,k3:w},{k1:x,k2:y,k3:z},{k1:v,k2:f,k3:g}]
As You can see this series contains elemnts as lists of different length. Elements in each list are dictionaries. I would like to create data frame, which will looks like that:
> k1 k2 k3
> 0 a b c
> 1 d e f
> 2 g h i
> 3 j k l
> 4 ł m n
> 5 o p r
> 6 s t w
> 7 q z w
> 8 x y z
> 9 f v g
I have tried below code:
>for index_val, series_val in series.iteritems():
>> for dict in series_val:
>>> for key,value in dict.items():
>>>> actions['key']=value
However PyCharm stops and produces nothing. Are there any other method to do that?
Use concat with apply pd.DataFrame i.e
x = pd.Series([[{'k1':'a','k2':'b','k3':'c'},{'k1':'d','k2':'e','k3':'f'}], [{'k1':'g','k2':'h','k3':'i'},{'k1':'j','k2':'k','k3':'l'},{'k1':'ł','k2':'m','k3':'n'}],
[{'k1':'o','k2':'p','k3':'r'}],[{'k1':'s','k2':'t','k3':'w'},{'k1':'q','k2':'z','k3':'w'},{'k1':'x','k2':'y','k3':'z'},{'k1':'v','k2':'f','k3':'g'}]])
df = pd.concat(x.apply(pd.DataFrame,1).tolist(),ignore_index=True)
Output :
k1 k2 k3
0 a b c
1 d e f
2 g h i
3 j k l
4 ł m n
5 o p r
6 s t w
7 q z w
8 x y z
9 v f g

Search string and get below two strings values in same column in python

"detail" has below contents:
1 2 3 4
a b c
1 2 3 4 5 6 7 8 status 10
a b c d e f g h up x
a b c d e f g h Idle y
What I am trying is I need to get value's below status string in detail contents (up and idle or what ever it has below to status string in next two lines in same column ). In this case up and Idle have in detail contents.
I have tried below method in code
var1, var2 = islice(line, 2)
not able to get below two lines output from detail contents.
Please can any one can help me what is best method to achieve this.
Here is the code what i tried
from itertools import islice
import string
detail = """1 2 3 4 5 6 7 8 status 10
a b c d e f g h up x
a b c d e f g h idle y"""
print detail
for line in detail.split("\n"):
line = ' '.join(line.split())
line = line.split(" ")
print line
if len(line) >= 9:
if line[8] == "status":
var1, var2 = islice(line, 2)
if any("idle" in s for s in var1.lower()) or any("never" in s for s in var1.lower()):
print var1[8]
else:
print var1[8]
if any("idle" in s for s in var2.lower()) or any("never" in s for s in var2.lower()):
print var2[8]
else:
print var2[8]

Optimizing pandas filter inside apply function

I have a list of pairs--stored in a DataFrame--each pair having an 'a' column and a 'b' column. For each pair I want to return the 'b's that have the same 'a'. For example, given the following set of pairs:
a b
0 c d
1 e f
2 c g
3 e h
4 i j
5 e k
I would like to end up with:
a b equivalents
0 c d [g]
1 e f [h, k]
2 c g [d]
3 e h [f, k]
4 i j []
5 e k [h, e]
I can do this with the following:
def equivalents(x):
l = pairs[pairs["a"] == x["a"]]["b"].tolist()
return l[1:] if l else l
pairs["equivalents"] = pairs.apply(equivalents, axis = 1)
But it is painfully slow on larger sets (e.g. 1 million plus pairs). Any suggestions how I could do this faster?
I think this ought to be a bit faster. First, just add them up.
df['equiv'] = df.groupby('a')['b'].transform(sum)
a b equiv
0 c d dg
1 e f fhk
2 c g dg
3 e h fhk
4 i j j
5 e k fhk
Now convert to a list and remove whichever letter is already in column 'b'.
df.apply( lambda x: [ y for y in list( x.equiv ) if y != x.b ], axis=1 )
0 [g]
1 [h, k]
2 [d]
3 [f, k]
4 []
5 [f, h]

Categories