Pandas .stack() issue

Pandas .stack() issue - python

I have been trying to use pandas to do a simple stack and it seems I am missing something.
I have a csv file in this format
I thought I would use stack to get this
The number of columns and number of items will vary
df = pd.read_csv("z-textsource.csv")
data_stacked = df.stack()
data_stacked.to_csv("z-textsource_stacked.csv")
However, when I run the code I get this
Many thanks in advance!

item column is not index now. Please try:
df = pd.read_csv("z-textsource.csv", index_col=0)
And then the same code you use

Related

The Drop() method appears to do nothing to my dataframe, but no error is given

I am using drop() to try to remove a row from a dataframe, based on an index. Nothing seems to happen. I get lots of errors if I play with the syntax, but the below example yields the same dataframe I started with.
data= {"col1":[1, 3, 3,3],"col2":[4,5,6,4],"col3":[7,6,6,8]}
testdf2 = pd.DataFrame(data)
testdf2.drop([1])
testdf2
I assume I'm missing something obvious?

When using drop, you must reassign to your df (Or create a new one).
import pandas as pd
data= {"col1":[1, 3, 3,3],"col2":[4,5,6,4],"col3":[7,6,6,8]}
testdf2 = pd.DataFrame(data)
testdf2 = testdf2.drop([1])
testdf2
Alternatively, supply inplace.
testdf2.drop([1], inplace=True)
However, this could lead to complications regarding view / copy and I usually reassign.
Hope that helps!

Merging and combine columns with duplicates with Pandas

I'm new to Python Pandas and not quite found what I need so hoping for some help. I am trying to format a file that looks something like this
UserId,DomainId
TestTraderCAD,ALL
TestTraderCAD,CAD
TestTraderUSD,ALL
TestTraderUSD,USD
TestTraderGBP,ALL
TestTraderGBP,GBP
and produce a result that groups by the UserId and produces an output as follows where I also produce a count of the number of domains for each user
UserId,NumDomains,Domains
TestTraderCAD,2,ALL|CAD
TestTraderUSD,2,ALL|USD
TestTraderGBP,2,ALL|GBP
I've tried to get started by playing around with the groupby feature but not having much luck with it.
import pandas as pd
df = pd.read_csv('User_Domains.csv')
#print (df)
df2 = df.groupby(['UserId'],['DomainId']).sum()
print (df2)
Any help to get started would be appreciated.

Use agg
>>> df.groupby('UserId').agg({'UserId' : ['first', 'count'],
'DomainId': '|'.join})

How to create new rows based on columns while keeping the index constant?

I have a dataframe similar to this one:
And I would like to create this dataframe:
I tried to implement this using df.melt() and df.transpose() but I did not succeed. Does anyone have any tips for that? I tried some solutions I found here but I guess this problem is slightly different from them.

You can use pd_wide_to_long() - link:
df = pd.wide_to_long(df,
stubnames='month',
i=['id', 'Name', 'City'],
j='month_num',
sep='_').rename(columns = {'month':'month_value' ,'month_num': 'month'}).reset_index()

How to store the string of the column in excel using python

I have attached a screenshot of my excel sheet. I want to store the length of every string in SUPPLIER_id Length column. But when I run my code, CSV columns are blanks.
And when I use this same code in different CSV, it works well.
I am using following code but not able to print the data.
I have attached the snippet of csv. Can somebody tell me why is this happening:
import pandas as pd
data = pd.read_csv(r'C:/Users/patesari/Desktop/python work/nba.csv')
df = pd.DataFrame(data, columns= ['SUPPLIER_ID','ACTION'])
data.dropna(inplace = True)
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(str)
data['SUPPLIER_ID LENGTH']= data['SUPPLIER_ID'].str.len()
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(float)
data
print(df)
data.to_csv("C:/Users/patesari/Desktop/python work/nba.csv")

I faced a similar problem in the past.
Instead of:
df = pd.DataFrame(data, columns= ['SUPPLIER_ID','ACTION'])
Type this:
data.columns=['SUPPLIER_ID','ACTION']
Also, I don't understand why did you create DataFrame df. It was unnecessary in my opinion.

Aren't you getting a SettingWithCopyWarning from pandas? I would imagine (haven't ran this code) that these lines
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(str)
data['SUPPLIER_ID LENGTH']= data['SUPPLIER_ID'].str.len()
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(float)
would not do anything, and should be replaced with
data.loc[:, 'SUPPLIER_ID']= data['SUPPLIER_ID'].astype(str)
data.loc[:, 'SUPPLIER_ID LENGTH']= data['SUPPLIER_ID'].str.len()
data.loc[:, 'SUPPLIER_ID']= data['SUPPLIER_ID'].astype(float)

How to label columns by reading in from a file in pandas

My real issue is that I cant seem to use the header names after I use them but I think its caused by labeling the headers wrong.
My code is as follows:
import pandas as pd
dataFrame1 = pd.read_csv('C:/Users/Desktop/data/data/featurenames.txt', header=None, encoding='utf-8')
dataFrame2 = pd.read_csv('C:/Users/Desktop/data/data/DataSet.txt')
dataFrame2.columns=[dataFrame1]
The result is the following:
If I use print (dataFrame2)
I get this result The headers are in brackets for some reason
`
But if I use print (dataFrame2['id'])
I get - KeyError: 'id'
Can anyone help me with this?

Look at dataFrame2.columns there will be correct column names.
You could use
dataFrame2 = pd.read_csv('C:/Users/Desktop/data/data/DataSet.txt',header=None,names=dataFrame1)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas .stack() issue - python

item column is not index now. Please try: df = pd.read_csv("z-textsource.csv", index_col=0) And then the same code you use

Related

The Drop() method appears to do nothing to my dataframe, but no error is given

Merging and combine columns with duplicates with Pandas

How to create new rows based on columns while keeping the index constant?

How to store the string of the column in excel using python

How to label columns by reading in from a file in pandas

Categories

Resources