I have been trying to use pandas to do a simple stack and it seems I am missing something.
I have a csv file in this format
I thought I would use stack to get this
The number of columns and number of items will vary
df = pd.read_csv("z-textsource.csv")
data_stacked = df.stack()
data_stacked.to_csv("z-textsource_stacked.csv")
However, when I run the code I get this
Many thanks in advance!
item column is not index now. Please try:
df = pd.read_csv("z-textsource.csv", index_col=0)
And then the same code you use
Related
I am using drop() to try to remove a row from a dataframe, based on an index. Nothing seems to happen. I get lots of errors if I play with the syntax, but the below example yields the same dataframe I started with.
data= {"col1":[1, 3, 3,3],"col2":[4,5,6,4],"col3":[7,6,6,8]}
testdf2 = pd.DataFrame(data)
testdf2.drop([1])
testdf2
I assume I'm missing something obvious?
When using drop, you must reassign to your df (Or create a new one).
import pandas as pd
data= {"col1":[1, 3, 3,3],"col2":[4,5,6,4],"col3":[7,6,6,8]}
testdf2 = pd.DataFrame(data)
testdf2 = testdf2.drop([1])
testdf2
Alternatively, supply inplace.
testdf2.drop([1], inplace=True)
However, this could lead to complications regarding view / copy and I usually reassign.
Hope that helps!
I'm new to Python Pandas and not quite found what I need so hoping for some help. I am trying to format a file that looks something like this
UserId,DomainId
TestTraderCAD,ALL
TestTraderCAD,CAD
TestTraderUSD,ALL
TestTraderUSD,USD
TestTraderGBP,ALL
TestTraderGBP,GBP
and produce a result that groups by the UserId and produces an output as follows where I also produce a count of the number of domains for each user
UserId,NumDomains,Domains
TestTraderCAD,2,ALL|CAD
TestTraderUSD,2,ALL|USD
TestTraderGBP,2,ALL|GBP
I've tried to get started by playing around with the groupby feature but not having much luck with it.
import pandas as pd
df = pd.read_csv('User_Domains.csv')
#print (df)
df2 = df.groupby(['UserId'],['DomainId']).sum()
print (df2)
Any help to get started would be appreciated.
Use agg
>>> df.groupby('UserId').agg({'UserId' : ['first', 'count'],
'DomainId': '|'.join})
I have a dataframe similar to this one:
And I would like to create this dataframe:
I tried to implement this using df.melt() and df.transpose() but I did not succeed. Does anyone have any tips for that? I tried some solutions I found here but I guess this problem is slightly different from them.
You can use pd_wide_to_long() - link:
df = pd.wide_to_long(df,
stubnames='month',
i=['id', 'Name', 'City'],
j='month_num',
sep='_').rename(columns = {'month':'month_value' ,'month_num': 'month'}).reset_index()
I have attached a screenshot of my excel sheet. I want to store the length of every string in SUPPLIER_id Length column. But when I run my code, CSV columns are blanks.
And when I use this same code in different CSV, it works well.
I am using following code but not able to print the data.
I have attached the snippet of csv. Can somebody tell me why is this happening:
import pandas as pd
data = pd.read_csv(r'C:/Users/patesari/Desktop/python work/nba.csv')
df = pd.DataFrame(data, columns= ['SUPPLIER_ID','ACTION'])
data.dropna(inplace = True)
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(str)
data['SUPPLIER_ID LENGTH']= data['SUPPLIER_ID'].str.len()
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(float)
data
print(df)
data.to_csv("C:/Users/patesari/Desktop/python work/nba.csv")
I faced a similar problem in the past.
Instead of:
df = pd.DataFrame(data, columns= ['SUPPLIER_ID','ACTION'])
Type this:
data.columns=['SUPPLIER_ID','ACTION']
Also, I don't understand why did you create DataFrame df. It was unnecessary in my opinion.
Aren't you getting a SettingWithCopyWarning from pandas? I would imagine (haven't ran this code) that these lines
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(str)
data['SUPPLIER_ID LENGTH']= data['SUPPLIER_ID'].str.len()
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(float)
would not do anything, and should be replaced with
data.loc[:, 'SUPPLIER_ID']= data['SUPPLIER_ID'].astype(str)
data.loc[:, 'SUPPLIER_ID LENGTH']= data['SUPPLIER_ID'].str.len()
data.loc[:, 'SUPPLIER_ID']= data['SUPPLIER_ID'].astype(float)
My real issue is that I cant seem to use the header names after I use them but I think its caused by labeling the headers wrong.
My code is as follows:
import pandas as pd
dataFrame1 = pd.read_csv('C:/Users/Desktop/data/data/featurenames.txt', header=None, encoding='utf-8')
dataFrame2 = pd.read_csv('C:/Users/Desktop/data/data/DataSet.txt')
dataFrame2.columns=[dataFrame1]
The result is the following:
If I use print (dataFrame2)
I get this result The headers are in brackets for some reason
`
But if I use print (dataFrame2['id'])
I get - KeyError: 'id'
Can anyone help me with this?
Look at dataFrame2.columns there will be correct column names.
You could use
dataFrame2 = pd.read_csv('C:/Users/Desktop/data/data/DataSet.txt',header=None,names=dataFrame1)