Reading an excel data set saved as CSV file in pandas - python

There is a very similar question to the one I am about to ask posted here:
Reading an Excel file in python using pandas
Except when I attempt to use the solutions posted here I am countered with
AttributeError: 'DataFrame' object has no attribute 'read'
All I want to do is convert this excel sheet into the pandas format so that I can preform data analysis on some of the subjects of my table. I am super new to this so any information, advice, feedback or whatever that anybody could toss my way would be greatly appreciated.
Heres my code:
import pandas
file = pandas.read_csv('FILENAME.csv', 'rb')
# reads specified file name from my computer in Pandas format
print file.read()
By the way, I also tried running the same query with
file = pandas.read_excel('FILENAME.csv', 'rb') returning the same error.
Finally, when I try to resave the file as a .xlsx I am unable to open the document.
Cheers!

read_csv() return a dataframe by itself so there is no need to convert it, just save it into dataframe.
I think this should work
import pandas as pd #It is best practice to import package with as a short name. Makes it easier to reference later.
file = pd.read_csv('FILENAME.csv')
print (file)

Your error message means exactly what it says: AttributeError: 'DataFrame' object has no attribute 'read'
When you use pandas.read_csv you're actually reading the csv file into a dataframe. BTW, you don't need the 'rb'
df = pandas.read_csv('FILENAME.csv')
You can print (df) but you can not do print(df.read()) because the dataframe object doesn't have a .read() attribute. This is what's causing your error.

Related

Convert pkl file to json file

I'm new on stack-overflow.
I'm trying to convert pkl file into json file using python. Below is my sample code
import pickle
import pandas as pd
# Load pickle file
input_file = open('file.pkl', 'rb')
new_dict = pickle.load(input_file)
input_file()
# Create a Pandas DataFrame
data_frame = pd.DataFrame(new_dict)
# Copy DataFrame index as a column
data_frame['index'] = data_frame.index
# Move the new index column to the from of the DataFrame
index = data_frame['index']
data_frame.drop(labels=['index'], axis=1, inplace = True)
data_frame.insert(0, 'index', index)
# Convert to json values
json_data_frame = data_frame.to_json(orient='values', date_format='iso', date_unit='s')
with open('data.json', 'w') as js_file:
js_file.write(json_data_frame)
When I run this code I got error that TypeError: '_io.TextIOWrapper' object is not callable. By following some same issues This one and This one, these issues suggested to use write method with input_file() at line 7 but still I'm getting this error io.UnsupportedOperation: write which is probably a writing method but I'm using it with reading and for reading I'm unable to fine any method.
I also tried to read pickle file in following way
with open ('file.pkl', 'rb') as input_file:
new_dict = pickle.load(input_file)
and I'm getting this error
DataFrame constructor not properly called!.
I need some kind suggestions that how I can solve this problem?
Any suggestions about other tools which can perform this task, will be appreciable. Thanks

trying to import a excel csv (?!) file with panda

I am new to Python/Panda and I am trying to import the following file in Jupyter notebook via pd.read_
Initial file lines:
either pd.read_excel or pd.read_csv returned an error.
eliminating the first row allowed me to read the file but all csv data were not separated.
could you share the line of code you have used so far to import the data?
Maybe try this one here:
data = pd.read_csv(filename, delimiter=',')
It is always easier for people to help you if you share the relevant code accompanied by the error you are getting.

what is the correct way to read a csv file into a pandas dataframe?

I am doing a data analysis project and while importing the csv file into spyder I am facing this error. Please help me to debug this as I am new to programming.
#import library
>>>import pandas as pd
#read the data from from csv as a pandas dataframe
>>>df = pd.read.csv('/Documents/Melbourne_housing_FULL.csv')
This is the error shown when I use the pd.read.csv command:
File "C:/Users/mylaptop/.spyder-py3/temp.py", line 4, in <module>
df = pd.read.csv('/Documents/Melbourne_housing_FULL.csv')
AttributeError: module 'pandas' has no attribute 'read'
you should use :
df = pd.read_csv('/Documents/Melbourne_housing_FULL.csv')
see here docs
you need to use pandas.read_csv() instead of pandas.read.csv() the error is litterally telling you this method doesn't exist .

pandas pd.read_excel() returning empty dictionary

I am a novice Python programmer and I am having an issue loading an xlsx workbook with the pd.read_excel() function. The pandas read_excel documentation says that specifying 'sheet_name = None' should return "All sheets as a dictionary of DataFrames", however I am getting an empty dictionary back:
template_workbook = pd.read_excel(template_path, sheet_name=None, index_col=None)
template_workbook
Returns:
OrderedDict()
When I try to print the worksheet names in the dictionary:
template_workbook.sheet_name
Returns:
AttributeErrorTraceback (most recent call last)
<ipython-input-67 e76a0b915981> in <module>()
----> 1 template_workbook.sheet_name
AttributeError: 'OrderedDict' object has no attribute 'sheet_name'
It is not clear to me why the worksheets are not being listed in the output dictionary. Any tips are greatly appreciated.
I have 26 tabs/sheets, and am trying to fill 23 using the tab names for indexing.
When you use read_excel with multiple sheets, pandas will return a dictionary:
Returns: DataFrame or Dict of DataFrames
If you have an dictionary, you can use the .keys() method to see the file tabs, as in:
print(template_workbook.keys())
I found this post through Google as I ran into this same problem. Unfortunately, no errors were thrown which is not very helpful, so I'm posting this answer to help the next person who might find this.
The read_excel function in Pandas doesn't exhaustively support ALL Excel functionality. This means if you are using some advanced Excel functionality (named ranges) your data might not be parsed correctly when Pandas tries to read your Excel data.
I tried to simplify my Excel file as much as possible which still didn't work, so I created a new Excel Workbook and copied my data in sheet by sheet. This ended up working for me.
So my advice is to keep your Excel file as simple as possible and you'll probably be able to import it with Pandas. If you send over your exact Excel file I'm happy to help debug (I know this is coming years after the question though).

Converting JSON file to SQLITE or CSV

I'm attempting to convert a JSON file to an SQLite or CSV file so that I can manipulate the data with python. Here is where the data is housed: JSON File.
I found a few converters online, but those couldn't handle the quite large JSON file I was working with. I tried using a python module called sqlbiter but again, like the others, was never really able to output or convert the file.
I'm not. sure where to go now, if anyone has any recommendations or insights on how to get this data into a database, I'd really appreciate it.
Thanks in advance!
EDIT: I'm not looking for anyone to do it for me, I just need to be pointed in the right direction. Are there other methods I haven't tried that I could learn?
You can utilize pandas module for this data processing task as follows:
First, you need to read the JSON file using with, open and json.load.
Second, you need to change the format of your file a bit by changing the large dictionary that has a main key for every airport into a list of dictionaries instead.
Third, you can now utilize some pandas magic to convert your list of dictionaries into a DataFrame using pd.DataFrame(data=list_of_dicts).
Finally, you can utilize pandas's to_csv function to write your DataFrame as a CSV file into disk.
It would look something like this:
import pandas as pd
import json
with open('./airports.json.txt','r') as f:
j = json.load(f)
l = list(j.values())
df = pd.DataFrame(data=l)
df.to_csv('./airports.csv', index=False)
You need to load your json file and parse it to have all the fields available, or load the contents to a dictionary, then you could using pyodbc to write to the database these fields, or write them to the csv if you use import csv first.
But this is just a general idea. You need to study python and how to do every step.
For instance for writting to the database you could do something like:
for i in range(0,max_len):
sql_order = "UPDATE MYTABLE SET MYTABLE.MYFIELD ...."
cursor1.execute(sql_order)
cursor1.commit()

Categories