pandas string values are not getting proper format

pandas string values are not getting proper format - python

Before reading into pandas my data looks like in sas dataset
Name
Alfred
Alice
After reading into pandas data is getting as
Name
b'Alfred'
b'Alice'
Why I am getting the data is different? Steps followed:
Import pandas as pd
df=pd.read_sas(r'C:/ProgramData/Anaconda3/Python_local/class.sas7bdat',format='sas7bdat')
Need your help.

SAS files need to be imported with special encoding
df=pd.read_sas(r'C:/ProgramData/Anaconda3/Python_local/class.sas7bdat',format='sas7bdat', encoding='iso-8859-1')

Related

How do split columns of csv file after importing in jupetyr notebook?

I have imported the csv file onto jupetyr notebook, but i am unable to visualize properly

Use pandas library and read your data as a DataFrame:
import pandas as pd
dataframe = pd.read_csv('\filepath')
Then you visualize your columns as:
dataframe.columns
or you can visualize a snapshot of your data like this:
dataframe.head()
or perhaps access a column value by referencing it like this: dataframe['column_name'].
Read a quick tutorial here: https://www.datacamp.com/community/blog/python-pandas-cheat-sheet

pandas vs sasdataset ,values are exact correct

Before reading into pandas, data is used in sasdataset. My data looks like
SNYDJCM--integer
740.19999981
After reading into pandas my data is changing as below
SNYDJCM--converting to float
740.200000
How to get same value after reading into pandas dataframe
Steps followed:
1)
import pandas as pd
2)
pd.read_sas(path,format='sas7bdat',encoding='iso-8859-1')
Need your help

Try import SAS7BDAT and casting your file before reading:
from sas7bdat import SAS7BDAT
SAS7BDAT('FILENAME.sas7bdat')
df = pd.read_sas('FILENAME.sas7bdat',format='sas7bdat')
or use it to directly read the file:
from sas7bdat import SAS7BDAT
sas_file = SAS7BDAT('FILENAME.sas7bdat')
df = sas_file.to_data_frame()
or use pyreadstat to read the file:
import pyreadstat
df, meta = pyreadstat.read_sas7bdat('FILENAME.sas7bdat')

First 740.19999981 is no integer 740 would be the nearest integer. But also when you round 740.19999981 down to 6 digits you will get 740.200000. I would sugest printing out with higher precision and see if it is really changed.
print("%.12f"%(x,))

What is the correct way to convert json data (which is undefined/messy) into a DataFrame?

I am trying to understand how JSON data which is not parsed/extracted correctly can be converted into a (Pandas) DataFrame.
I am using python (3.7.1) and have tried the usual way of reading the JSON data. Actually, the code works if I use transpose or axis=1 syntax. But using that completely ignores a large number of values or variables in the data and I am 100% sure that the maybe the code is working but is not giving the desired results.
import pandas as pd
import numpy as np
import csv
import json
sourcefile = open(r"C:\Users\jadil\Downloads\chicago-red-light-and-speed-camera-data\socrata_metadata_red-light-camera-violations.json")
json_data = json.load(sourcefile)
#print(json_data)
type(json_data)
dict
## this code works but is not loading/reading complete data
df = pd.DataFrame.from_dict(json_data, orient="index")
df.head(15)
#This is what I am getting for the first 15 rows
df.head(15)
0
createdAt 1407456580
description This dataset reflects the daily volume of viol...
rights [read]
flags [default, restorable, restorePossibleForType]
id spqx-js37
oid 24980316
owner {'type': 'interactive', 'profileImageUrlLarge'...
newBackend False
totalTimesRated 0
attributionLink http://www.cityofchicago.org
hideFromCatalog False
columns [{'description': 'Intersection of the location...
displayType table
indexUpdatedAt 1553164745
rowsUpdatedBy n9j5-zh

As you have seen, Pandas will attempt to create a data frame out of JSON data even if it is not parsed or extracted correctly. If your goal is to understand exactly what Pandas does when presented with a messy JSON file, you can look inside the code for pd.DataFrame.from_dict() to learn more. If your goal is to get the JSON data to convert correctly to a Pandas data frame, you will need to provide more information abut the JSON data, ideally by providing a sample of the data as text in your question. If your data is sufficiently complicated, you might try the json_normalize() function as described here.

Columns names issues using pandas.read_csv

I am pretty new to python.
I am trying to import the SMSSpam Collection Data using pandas read_csv module.
I
The import went went.
But as the file does not have header I tried to include columns names(variables names : "status" and "message" and ended up with empty file.
Here is my code:
import numpy as np
import pandas as pd
file_loc="C:\Users\User\Documents\JP\SMSCollection.txt"
df=pd.read_csv(file_loc,sep='\t')
The above code works well I got the I got the 5571 rows x 2 columns].
But when I add columns using the following line of code
df.columns=["status","message"]
I ended up with an empty df
Any help on this ?
Thanks

You could try to set the column names at read time:
df=pd.read_csv(file_loc,sep='\t',header=None,names=["status","message"])

Save an object from pandas_datareader into a csv file in Python 3.5

In order to save stock prices from yahoo into Python 3.5, I use the pandas module :
from pandas_datareader import data as dreader
symbols = ['AAPL','MRK']
pnls = {i:dreader.DataReader(i,'yahoo','2010-01-01','2016-09-01') for i in symbols}
It creates two "tables" (I don't know the name, sorry), one for each share (here 'AAPL' and 'MRK'). I want to save each table into a csv file but I don't know how. Anyone does?
Thanks,
Anthony

Just do this:
from pandas_datareader import data as dreader
symbols = ['AAPL','MRK']
for i in symbols:
dreader.DataReader(i,'yahoo','2010-01-01','2016-09-01').to_csv(i+'.csv')
It saves your data to two csv files.
It actually returns a pandas DataFrame. You can easily put a pandas DataFrame to csv file using the to_csv method.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas string values are not getting proper format - python

SAS files need to be imported with special encoding df=pd.read_sas(r'C:/ProgramData/Anaconda3/Python_local/class.sas7bdat',format='sas7bdat', encoding='iso-8859-1')

Related

How do split columns of csv file after importing in jupetyr notebook?

pandas vs sasdataset ,values are exact correct

What is the correct way to convert json data (which is undefined/messy) into a DataFrame?

Columns names issues using pandas.read_csv

Save an object from pandas_datareader into a csv file in Python 3.5

Categories

Resources