Columns names issues using pandas.read_csv - python

I am pretty new to python.
I am trying to import the SMSSpam Collection Data using pandas read_csv module.
I
The import went went.
But as the file does not have header I tried to include columns names(variables names : "status" and "message" and ended up with empty file.
Here is my code:
import numpy as np
import pandas as pd
file_loc="C:\Users\User\Documents\JP\SMSCollection.txt"
df=pd.read_csv(file_loc,sep='\t')
The above code works well I got the I got the 5571 rows x 2 columns].
But when I add columns using the following line of code
df.columns=["status","message"]
I ended up with an empty df
Any help on this ?
Thanks

You could try to set the column names at read time:
df=pd.read_csv(file_loc,sep='\t',header=None,names=["status","message"])

Related

Problem with csv data imported on jupyter notebook

I'm new on this site so be indulgent if i make a mistake :)
I recently imported a csv file on my Jupyter notebook for a student work. I want use some of data of specific column of this file. The problem is that after import, the file appear as a table with 5286 lines (which represent dates and hours of measures) in a single column (that compiles all variables separated by ; that i want use for my work).
I don't know how to do to put this like a regular table.
I used this code to import my csv from my board :
import pandas as pd
data = pd.read_csv('/work/Weather_data/data 1998-2003.csv','error_bad_lines = false')
Output:
Desired output: the same data in multiple columns, separated on ;.
You can try this:
import pandas as pd
data = pd.read_csv('<location>', sep=';')

How to set a variable when using Pandas' read_csv

"test.csv" has columns "col_a", "col_b" and "col_c".
#import pandas import pandas as pd
df = pd.read_csv('./data/test.csv',header=0,dtype={'col_a':object,'col_b':object,'col_c':object})
This code can work well. But I would like to change the code using the variable "key_word" as follow, but it cannot work well.Why? How should I modify this code?
#import pandas import pandas as pd
key_word='col_a':object,'col_b':object,'col_c':object
df = pd.read_csv('./data/test.csv',header=0,dtype={key_word})
make key_word a dictionary by initializing it like this:
key_word={'col_a':object,'col_b':object,'col_c':object}
that should do the trick. right now it cannot possibly work since you produce a massive syntax error without curly brackets.

Extracting data from a large csv file:causes dtype warnings

I work for a company and I recently switched from using spreadsheet package to python. Since, I am very new to python there are alot of things that I have difficulty grasping.Using python, I am trying to extract data from a large csv file(37791 rows and 316 columns.) Here is a piece of code I wrote:
Solution 1
import numpy as np
import pandas as pd
df=pd.read_csv=('C:\\Users\\Maxwell\\Desktop\\Test.data.csv',skiprows=1)
data=df.loc[:,['Steps','Parameter']]
This command generates an error,i.e, it gives a DtypeWwarning:columns (0,1,2,3........81) have mixed types. Specify dtype option on import or set low memory= False
So, I found a workaround.
Solution 2
import pandas as pd
import numpy as np
df=pd.read_csv(('C:\\Users\\Maxwell\\Desktop\\Test.data.csv',skiprows=1,error_bad_lines=False, index_col=False, dtype='unicode')
data=df.loc[:,['Steps','Parameter']]
Two questions:
i)I was able to get around the error, but now the columns that I want(Steps & Parameter)have been converted to objects(probably due to the dtype='unicode' command). How can I convert Steps column into an integer type and parameter into a float.
ii) Some people say that dtype warning isn't really an error. But, I found out that when I use Solution 1 and read the csv file. The Steps column contains some floats.The original csv file doesn't have any floats in Steps column. It looks as if, some floats have been placed by python itself!! Why does this happen?
(I am not able to upload the original csv file, because my company doesn't allow it!)

searching a csv file and then outputting a term associating to the term

I am creating a code, that read a csvs file and search for a specific item code and then it will output the name of the item.
How would i do this?
I don't have any code yet
Thanks
You can use pandas
Install it, then try
import pandas as pd
df = pd.read_csv('yourFile.csv')
print df

read Json file using Pandas

I am trying to read a json file using pandas's read_json function and i am getting result but not what i want
My result have first row as a header (Titles) and i want to ignore first row in my result.
Below is my python code.
import json
import pandas as pd
result=pd.read_json('dummy_DB_clean.json')
print result
I tried pandas's json_normalize() function but did not get desired output.
If anyone of you , come across with this problem, please suggest me the solution.
Thanks,
Try this:
import json
import pandas as pd
df=pd.read_json('dummy_DB_clean.json')
df.drop(df.head(1).index, inplace=True)
print df

Categories