Select a column in dataframe from csv

Select a column in dataframe from csv - python

I am trying to select the 'Name' column from a sample csv file named gradesM3.csv.
I have been following this tutorial but when it comes to selecting a single column, it doesn't work anymore.
My code:
import pandas as pd
df = pd.read_csv('gradesM3.csv')
df
The output:
Out[9]:
StudentID;Name;Assignment1;Assignment2;Assignment3
0 s123456;Michael Andersen;11;7;-3
1 s123789;Bettina Petersen;0;4;10
2 s123579;Marie Hansen;10;4;7
I believe there's already something wrong here as from what I've seen on other discussions, it's supposed to look more like a table.
When I try to display only the 'Name' column, with this command:
df['Name']
It returns:
KeyError: 'Name'
To sum up, I am trying to import my CSV file as a proper dataframe so I can work with it
Thanks

SOLVED
Thanks to W-B's comment, it worked with this code:
df = pd.read_csv('gradesM3.csv',sep=';')

Related

How to drop the index after creating the csv file in pandas

I am trying to select couple of columns based on column heading with wild card and one more column. When I execute the below code , I am getting the expected result, but there is an index which is appearing. how to drop the index . Any suggestions.
infile:
dir,name,ct1,cn1,ct2,cn2
991,name1,em,a#email.com,ep,1234
999,name2,em,b#email.com,ep,12345
872,name3,em,c#email.com,ep,123456
here is the code which I used.
import pandas as pd
df=pd.read_csv('infile.csv')
df_new=df.loc[:,df.columns.str.startswith('c')]
df_new_1=pd.read_csv('name.csv', usecols= ['dir'])
df_merge=pd.concat([df_new,df_new_1],axis=1, join="inner")
df_merge.to_csv('outfile.csv')

Pass false for index when you save to csv :
df_merge.to_csv('outfile.csv', index=False)

Pandas dataframe

I want to import an excel where I want to keep just some columns.
This is my code:
df=pd.read_excel(file_location_PDD)
col=df[['hkont','dmbtr','belnr','monat','gjahr','budat','shkzg','shkzg','usname','sname','dmsol','dmhab']]
print(col)
col.to_excel("JETNEW.xlsx")
I selected all the columns which I want it but 2 names of columns don't appear all time in the files which I have to import and these columns are 'usname' and 'sname'.
Cause of that I received an error ['usname','sname'] not in index
How can I do this ?
Thanks

Source -- https://stackoverflow.com/a/38463068/14515824
You need to use df.reindex instead of df[[]]. I also have changed 'excel.xlsx' to r'excel.xlsx' to specify to only read the file.
An example:
df.reindex(columns=['a','b','c'])
Which in your code would be:
file_location_PDD = r'excel.xlsx'
df = pd.read_excel(file_location_PDD)
col = df.reindex(columns=['hkont','dmbtr','belnr','monat','gjahr','budat','shkzg','shkzg','usname','sname','dmsol','dmhab'])
print(col)
col.to_excel("output.xlsx")

How to store the string of the column in excel using python

I have attached a screenshot of my excel sheet. I want to store the length of every string in SUPPLIER_id Length column. But when I run my code, CSV columns are blanks.
And when I use this same code in different CSV, it works well.
I am using following code but not able to print the data.
I have attached the snippet of csv. Can somebody tell me why is this happening:
import pandas as pd
data = pd.read_csv(r'C:/Users/patesari/Desktop/python work/nba.csv')
df = pd.DataFrame(data, columns= ['SUPPLIER_ID','ACTION'])
data.dropna(inplace = True)
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(str)
data['SUPPLIER_ID LENGTH']= data['SUPPLIER_ID'].str.len()
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(float)
data
print(df)
data.to_csv("C:/Users/patesari/Desktop/python work/nba.csv")

I faced a similar problem in the past.
Instead of:
df = pd.DataFrame(data, columns= ['SUPPLIER_ID','ACTION'])
Type this:
data.columns=['SUPPLIER_ID','ACTION']
Also, I don't understand why did you create DataFrame df. It was unnecessary in my opinion.

Aren't you getting a SettingWithCopyWarning from pandas? I would imagine (haven't ran this code) that these lines
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(str)
data['SUPPLIER_ID LENGTH']= data['SUPPLIER_ID'].str.len()
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(float)
would not do anything, and should be replaced with
data.loc[:, 'SUPPLIER_ID']= data['SUPPLIER_ID'].astype(str)
data.loc[:, 'SUPPLIER_ID LENGTH']= data['SUPPLIER_ID'].str.len()
data.loc[:, 'SUPPLIER_ID']= data['SUPPLIER_ID'].astype(float)

How to use 'loc' for column selection of a dataframe in dask

Anyone can tell me how i should select one column with 'loc' in a dataframe using dask?
As a side note, when i am loading the dataframe using dd.read_csv with header equals to "None", the column name is starting from zero to 131094. I am about to select the last column with column name as 131094, and i get the error.
code:
> import dask.dataframe as dd
> df = dd.read_csv('filename.csv', header=None)
> y = df.loc['131094']
error:
File "/usr/local/dask-2018-08-22/lib/python2.7/site-packages/dask-0.5.0-py2.7.egg/dask/dataframe/core.py", line 180, in _loc
"Can not use loc on DataFrame without known divisions")
ValueError: Can not use loc on DataFrame without known divisions
Based on this guideline http://dask.pydata.org/en/latest/dataframe-indexing.html#positional-indexing, my code should work right but don't know what causes the problem.

If you have a named column, then use: df.loc[:,'col_name']
But if you have a positional column, like in your example where you want the last column, then use the answer by #user1717828.

I tried this on a dummy csv and it worked. I can't help you for sure without seeing the file giving you problems. That said, you might be picking rows, not columns.
Instead, try this.
import dask.dataframe as dd
df = dd.read_csv('filename.csv', header=None)
y = df[df.columns[-1]]

How to label columns by reading in from a file in pandas

My real issue is that I cant seem to use the header names after I use them but I think its caused by labeling the headers wrong.
My code is as follows:
import pandas as pd
dataFrame1 = pd.read_csv('C:/Users/Desktop/data/data/featurenames.txt', header=None, encoding='utf-8')
dataFrame2 = pd.read_csv('C:/Users/Desktop/data/data/DataSet.txt')
dataFrame2.columns=[dataFrame1]
The result is the following:
If I use print (dataFrame2)
I get this result The headers are in brackets for some reason
`
But if I use print (dataFrame2['id'])
I get - KeyError: 'id'
Can anyone help me with this?

Look at dataFrame2.columns there will be correct column names.
You could use
dataFrame2 = pd.read_csv('C:/Users/Desktop/data/data/DataSet.txt',header=None,names=dataFrame1)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Select a column in dataframe from csv - python

SOLVED Thanks to W-B's comment, it worked with this code: df = pd.read_csv('gradesM3.csv',sep=';')

Related

How to drop the index after creating the csv file in pandas

Pandas dataframe

How to store the string of the column in excel using python

How to use 'loc' for column selection of a dataframe in dask

How to label columns by reading in from a file in pandas

Categories

Resources