pandas string values are not getting proper format - python

Before reading into pandas my data looks like in sas dataset
Name
Alfred
Alice
After reading into pandas data is getting as
Name
b'Alfred'
b'Alice'
Why I am getting the data is different? Steps followed:
Import pandas as pd
df=pd.read_sas(r'C:/ProgramData/Anaconda3/Python_local/class.sas7bdat',format='sas7bdat')
Need your help.

SAS files need to be imported with special encoding
df=pd.read_sas(r'C:/ProgramData/Anaconda3/Python_local/class.sas7bdat',format='sas7bdat', encoding='iso-8859-1')

Related

How do split columns of csv file after importing in jupetyr notebook?

I have imported the csv file onto jupetyr notebook, but i am unable to visualize properly
Use pandas library and read your data as a DataFrame:
import pandas as pd
dataframe = pd.read_csv('\filepath')
Then you visualize your columns as:
dataframe.columns
or you can visualize a snapshot of your data like this:
dataframe.head()
or perhaps access a column value by referencing it like this: dataframe['column_name'].
Read a quick tutorial here: https://www.datacamp.com/community/blog/python-pandas-cheat-sheet

pandas vs sasdataset ,values are exact correct

Before reading into pandas, data is used in sasdataset. My data looks like
SNYDJCM--integer
740.19999981
After reading into pandas my data is changing as below
SNYDJCM--converting to float
740.200000
How to get same value after reading into pandas dataframe
Steps followed:
1)
import pandas as pd
2)
pd.read_sas(path,format='sas7bdat',encoding='iso-8859-1')
Need your help
Try import SAS7BDAT and casting your file before reading:
from sas7bdat import SAS7BDAT
SAS7BDAT('FILENAME.sas7bdat')
df = pd.read_sas('FILENAME.sas7bdat',format='sas7bdat')
or use it to directly read the file:
from sas7bdat import SAS7BDAT
sas_file = SAS7BDAT('FILENAME.sas7bdat')
df = sas_file.to_data_frame()
or use pyreadstat to read the file:
import pyreadstat
df, meta = pyreadstat.read_sas7bdat('FILENAME.sas7bdat')
First 740.19999981 is no integer 740 would be the nearest integer. But also when you round 740.19999981 down to 6 digits you will get 740.200000. I would sugest printing out with higher precision and see if it is really changed.
print("%.12f"%(x,))

What is the correct way to convert json data (which is undefined/messy) into a DataFrame?

I am trying to understand how JSON data which is not parsed/extracted correctly can be converted into a (Pandas) DataFrame.
I am using python (3.7.1) and have tried the usual way of reading the JSON data. Actually, the code works if I use transpose or axis=1 syntax. But using that completely ignores a large number of values or variables in the data and I am 100% sure that the maybe the code is working but is not giving the desired results.
import pandas as pd
import numpy as np
import csv
import json
sourcefile = open(r"C:\Users\jadil\Downloads\chicago-red-light-and-speed-camera-data\socrata_metadata_red-light-camera-violations.json")
json_data = json.load(sourcefile)
#print(json_data)
type(json_data)
dict
## this code works but is not loading/reading complete data
df = pd.DataFrame.from_dict(json_data, orient="index")
df.head(15)
#This is what I am getting for the first 15 rows
df.head(15)
0
createdAt 1407456580
description This dataset reflects the daily volume of viol...
rights [read]
flags [default, restorable, restorePossibleForType]
id spqx-js37
oid 24980316
owner {'type': 'interactive', 'profileImageUrlLarge'...
newBackend False
totalTimesRated 0
attributionLink http://www.cityofchicago.org
hideFromCatalog False
columns [{'description': 'Intersection of the location...
displayType table
indexUpdatedAt 1553164745
rowsUpdatedBy n9j5-zh
As you have seen, Pandas will attempt to create a data frame out of JSON data even if it is not parsed or extracted correctly. If your goal is to understand exactly what Pandas does when presented with a messy JSON file, you can look inside the code for pd.DataFrame.from_dict() to learn more. If your goal is to get the JSON data to convert correctly to a Pandas data frame, you will need to provide more information abut the JSON data, ideally by providing a sample of the data as text in your question. If your data is sufficiently complicated, you might try the json_normalize() function as described here.

Columns names issues using pandas.read_csv

I am pretty new to python.
I am trying to import the SMSSpam Collection Data using pandas read_csv module.
I
The import went went.
But as the file does not have header I tried to include columns names(variables names : "status" and "message" and ended up with empty file.
Here is my code:
import numpy as np
import pandas as pd
file_loc="C:\Users\User\Documents\JP\SMSCollection.txt"
df=pd.read_csv(file_loc,sep='\t')
The above code works well I got the I got the 5571 rows x 2 columns].
But when I add columns using the following line of code
df.columns=["status","message"]
I ended up with an empty df
Any help on this ?
Thanks
You could try to set the column names at read time:
df=pd.read_csv(file_loc,sep='\t',header=None,names=["status","message"])

Save an object from pandas_datareader into a csv file in Python 3.5

In order to save stock prices from yahoo into Python 3.5, I use the pandas module :
from pandas_datareader import data as dreader
symbols = ['AAPL','MRK']
pnls = {i:dreader.DataReader(i,'yahoo','2010-01-01','2016-09-01') for i in symbols}
It creates two "tables" (I don't know the name, sorry), one for each share (here 'AAPL' and 'MRK'). I want to save each table into a csv file but I don't know how. Anyone does?
Thanks,
Anthony
Just do this:
from pandas_datareader import data as dreader
symbols = ['AAPL','MRK']
for i in symbols:
dreader.DataReader(i,'yahoo','2010-01-01','2016-09-01').to_csv(i+'.csv')
It saves your data to two csv files.
It actually returns a pandas DataFrame. You can easily put a pandas DataFrame to csv file using the to_csv method.

Categories