How to label columns by reading in from a file in pandas - python

My real issue is that I cant seem to use the header names after I use them but I think its caused by labeling the headers wrong.
My code is as follows:
import pandas as pd
dataFrame1 = pd.read_csv('C:/Users/Desktop/data/data/featurenames.txt', header=None, encoding='utf-8')
dataFrame2 = pd.read_csv('C:/Users/Desktop/data/data/DataSet.txt')
dataFrame2.columns=[dataFrame1]
The result is the following:
If I use print (dataFrame2)
I get this result The headers are in brackets for some reason
`
But if I use print (dataFrame2['id'])
I get - KeyError: 'id'
Can anyone help me with this?

Look at dataFrame2.columns there will be correct column names.
You could use
dataFrame2 = pd.read_csv('C:/Users/Desktop/data/data/DataSet.txt',header=None,names=dataFrame1)

Related

Pandas dataframe

I want to import an excel where I want to keep just some columns.
This is my code:
df=pd.read_excel(file_location_PDD)
col=df[['hkont','dmbtr','belnr','monat','gjahr','budat','shkzg','shkzg','usname','sname','dmsol','dmhab']]
print(col)
col.to_excel("JETNEW.xlsx")
I selected all the columns which I want it but 2 names of columns don't appear all time in the files which I have to import and these columns are 'usname' and 'sname'.
Cause of that I received an error ['usname','sname'] not in index
How can I do this ?
Thanks
Source -- https://stackoverflow.com/a/38463068/14515824
You need to use df.reindex instead of df[[]]. I also have changed 'excel.xlsx' to r'excel.xlsx' to specify to only read the file.
An example:
df.reindex(columns=['a','b','c'])
Which in your code would be:
file_location_PDD = r'excel.xlsx'
df = pd.read_excel(file_location_PDD)
col = df.reindex(columns=['hkont','dmbtr','belnr','monat','gjahr','budat','shkzg','shkzg','usname','sname','dmsol','dmhab'])
print(col)
col.to_excel("output.xlsx")

Pandas .stack() issue

I have been trying to use pandas to do a simple stack and it seems I am missing something.
I have a csv file in this format
I thought I would use stack to get this
The number of columns and number of items will vary
df = pd.read_csv("z-textsource.csv")
data_stacked = df.stack()
data_stacked.to_csv("z-textsource_stacked.csv")
However, when I run the code I get this
Many thanks in advance!
item column is not index now. Please try:
df = pd.read_csv("z-textsource.csv", index_col=0)
And then the same code you use

How to drop the first row number column pandas?

This question may sound similar to other questions posted, but I'm posting this after searching long for this exact solution.
So, I've a JSON from which I'm creating a pandas dataframe:
col_list = ["allocation","completion_date","has_expanded_access"]
final_data = dict((k,d[k]) for k in (col_list) if k in d)
a = json_normalize(final_data)
And then this:
I tried saving with:
df = df.reset_index(drop=True)
And
df = df.rename_axis(None)
As suggested on few answers, but of no use, when I try to save it, this default first column containing row index comes with header as blank (null), even if I try to drop, it doesn't work. Any help?
Try
df.to_csv('df_name.csv', sep = ';', encoding = 'cp1251', index = False)
to save df without indices.
Or change index column with
df.set_index('col_name')
If you want to save the dataframe as csv file then you can do this:
df.to_csv(filename, index=False)

How to store the string of the column in excel using python

I have attached a screenshot of my excel sheet. I want to store the length of every string in SUPPLIER_id Length column. But when I run my code, CSV columns are blanks.
And when I use this same code in different CSV, it works well.
I am using following code but not able to print the data.
I have attached the snippet of csv. Can somebody tell me why is this happening:
import pandas as pd
data = pd.read_csv(r'C:/Users/patesari/Desktop/python work/nba.csv')
df = pd.DataFrame(data, columns= ['SUPPLIER_ID','ACTION'])
data.dropna(inplace = True)
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(str)
data['SUPPLIER_ID LENGTH']= data['SUPPLIER_ID'].str.len()
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(float)
data
print(df)
data.to_csv("C:/Users/patesari/Desktop/python work/nba.csv")
I faced a similar problem in the past.
Instead of:
df = pd.DataFrame(data, columns= ['SUPPLIER_ID','ACTION'])
Type this:
data.columns=['SUPPLIER_ID','ACTION']
Also, I don't understand why did you create DataFrame df. It was unnecessary in my opinion.
Aren't you getting a SettingWithCopyWarning from pandas? I would imagine (haven't ran this code) that these lines
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(str)
data['SUPPLIER_ID LENGTH']= data['SUPPLIER_ID'].str.len()
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(float)
would not do anything, and should be replaced with
data.loc[:, 'SUPPLIER_ID']= data['SUPPLIER_ID'].astype(str)
data.loc[:, 'SUPPLIER_ID LENGTH']= data['SUPPLIER_ID'].str.len()
data.loc[:, 'SUPPLIER_ID']= data['SUPPLIER_ID'].astype(float)

Select a column in dataframe from csv

I am trying to select the 'Name' column from a sample csv file named gradesM3.csv.
I have been following this tutorial but when it comes to selecting a single column, it doesn't work anymore.
My code:
import pandas as pd
df = pd.read_csv('gradesM3.csv')
df
The output:
Out[9]:
StudentID;Name;Assignment1;Assignment2;Assignment3
0 s123456;Michael Andersen;11;7;-3
1 s123789;Bettina Petersen;0;4;10
2 s123579;Marie Hansen;10;4;7
I believe there's already something wrong here as from what I've seen on other discussions, it's supposed to look more like a table.
When I try to display only the 'Name' column, with this command:
df['Name']
It returns:
KeyError: 'Name'
To sum up, I am trying to import my CSV file as a proper dataframe so I can work with it
Thanks
SOLVED
Thanks to W-B's comment, it worked with this code:
df = pd.read_csv('gradesM3.csv',sep=';')

Categories