import pandas as pd
xl=pd.ExcelFile('/Users/denniz/Desktop/WORKINGPAPER/FDIPOLITICS/python.xlsx')
dfs = pd.read_excel(xl,sheet_name=None, dtype={'COUNTRY':str,'YEAR': int, 'govtcon':float, 'trans':float},na_values = "Missing")
dfs.head()
After running the code above i got the following:
collections.OrderedDict object has no attribute 'head'
sheet_name = None will not work and you can combine reading excel file lines like this.
import pandas as pd
import xlrd
dfs=pd.read_excel('/Users/denniz/Desktop/WORKINGPAPER/FDIPOLITICS/python.xlsx',sheet_name=0, dtype={'COUNTRY':str,'YEAR': int, 'govtcon':float, 'trans':float},na_values = "Missing")
dfs.head()
I have read the API reference of pandas.read_excel. pandas.read_excel method will return DataFrame or dict of DataFrames.
As you set sheet_name=None, you will get All sheets returned in the form of a dict of DataFrames. The key of this dict will be the sheet name.
So in your code snippet, dfs is a dict not a DataFrames. Obviously, dict has no head method. Your code should be like this dfs[sheet_name].head().
Related
I want to convert a list of dictionaries to a pandas dataframe, however, I got ValueError: DataFrame constructor not properly called!
Below is an example and how I got the data:
import requests
import pandas as pd
# Send an HTTP GET request to the URL
response = requests.get(url)
# Decode the JSON data into a dictionary
scrapped_data = response.text
Content of response.text is:
[{"id":123456,"date":"12-12-2022","value":37},{"id":123456,"date":"13-12-2022","value":38}]
I want to convert it to a dataframe format like the following:
id
date
value
123456
12-12-2022
37
123456
13-12-2022
38
I tried the following methods:
df = pd.DataFrame(scrapped_data)
df = pd.DataFrame_from_dict(scrapped_data)
df = pd.DataFrame(scrapped_data, orient='columns')
all got the same value errors.
I also tried:
df = pd.json_normalize(scrapped_data)
but got NotImplementedError
The type for scrapped_data is string format
Thanks for your help, let me know if you have any questions
One reason for receiving this error from pandas is providing str as data. I think your data come as str, If it is the case then Try this:
import json
import pandas as pd
orignal_data='[{"id":"123456","date":"12-12-2022","value":"37"}, {"id":"123456","date":"13-12-2022","value":"38"}]'
scraped_data = json.loads(orignal_data)
df = pd.DataFrame(data=scraped_data)
df
As you said, scrapped_data is a string then you need to convert it into a dictionary (with the method loads from the json library for example).
If scrapped_data = '[{"id":"123456","date":"12-12-2022","value":"37"}, {"id":"123456","date":"13-12-2022","value":"38"}]',
then you can just do df = pd.DataFrame(scrapped_data).
Python 3.9.5/Pandas 1.1.3
I use the following code to create a nested dictionary object from a csv file with headers:
import pandas as pd
import json
import os
csv = "/Users/me/file.csv"
csv_file = pd.read_csv(csv, sep=",", header=0, index_col=False)
csv_file['org'] = csv_file[['location', 'type']].apply(lambda s: s.to_dict(), axis=1)
This creates a nested object called org from the data in the columns called location and type.
Now let's say the type column doesn't even exist in the csv file, and I want to pass a literal string as a type value instead of the values from a column in the csv file. So for example, I want to create a nested object called org using the values from the data column as before, but I want to just use the string foo for all values of a key called type. How to accomplish this?
You could just build it by hand:
csv_file['org'] = csv_file['location'].apply(lambda x: {'location': x,
'type': 'foo'})
use Chainmap. This will allow to use multiple columns (columns_to_use), and even override existing ones (if type is in these columns, it will be overridden):
from collections import ChainMap
# .. some code
csv_file['org'] = csv_file[columns_to_use].apply(
lambda s: ChainMap({'type': 'foo'}, s.to_dict()), axis=1)
BTW, without adding constant values it could be done by df.to_dict():
csv_file['org'] = csv_file[['location', 'type']].to_dict('records')
I have one dictionary and one one json file. I want to check the data exist in dictionary and put the value pair in the json on the compared attribute.
#import pandas as pd
import numpy as np
import pandas as pd
df = pd.read_csv("Iris2.csv" , encoding='ISO-8859-1')
df.head()
dict_from_csv = pd.read_csv('Iris2.csv',encoding='ISO-8859-1', header=None, index_col=0, squeeze=True).to_dict()
print(dict_from_csv)
enter image description here
And then I read the JSON attribute
import pandas as pd
json = pd.read_json (r'C:/Users/IT City/Downloads/data.json')
print(json)
json = pd.read_json (r'C:/Users/IT City/Downloads/data.json')
df.venue_info = pd.DataFrame(json.venue_info.values.tolist())['venue_name']
print(df.venue_info)
[enter image description here][2]
Now I have dictionary contains the csv file "dict_from_csv" and json attribute "df.venue_info"
I firstly compared the json venue_name with Dictionary and got the required results. I have the "Lat" attribute finally. And now I want to add this new attribute to JSON file where the "Lat" would be match otherwise it should place empty attribute on that.
for x in df.venue_info:
if((x in dict_from_csv) == True):
#print(x)
#print(dict_from_csv[x])
Lat = x+":"+dict_from_csv[x]
print(Lat)
else:
print("Not found ")
enter image description here
Please help me in this regard
Thank you
Using this page from the Pandas documentation, I wanted to read a CSV into a dataframe, and then turn that dataframe into a list of named tuples.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.itertuples.html?highlight=itertuples
I ran the code below...
import pandas as pd
def csv_to_tup_list(filename):
myfile = filename
df = pd.read_csv(myfile,sep=',')
df.columns = ["term", "code"]
tup_list = []
for row in df.itertuples(index=False, name="Synonym"):
tup_list.append(row)
return (tup_list)
test = csv_to_tup_list("test.csv")
type(test[0])
... and the type returned is pandas.core.frame.Synonym, not named tuple. Is this how it is supposed to work, or am I doing something wrong?
My CSV data is just two columns of data:
a,1
b,2
c,3
for example.
"Named tuple" is not a type. namedtuple is a type factory. pandas.core.frame.Synonym is the type it created for this call, using the name you picked:
for row in df.itertuples(index=False, name="Synonym"):
# ^^^^^^^^^^^^^^
This is expected behavior.
First time I post here and I am rather a newbie.
Anyhow, I have been playing around with Pandas and Numpy to make some calculations from Excel.
Now I want to create an .xlsx file to which I can output my results and I want each sheet to be named after the name of the dataframe that is being outputted.
This is my code, I tried a couple of different solutions but I can't figure how to write it.
In the code you can see that save_excel just makes numbered sheets (and it works great) and save_excelB tries to do what I am describing it but I can't get it to work.
from generate import A,b,L,dr,dx
from pandas import DataFrame as df
from pandas import ExcelWriter as ew
A=df(A) #turning numpy arrays into dataframes
b=df(b)
L=df(L)
dr=df(dr)
dx=df(dx)
C=[A,b,L,dr,dx] #making a list of the dataframes to iterate through
def save_excel(filename, item):
w=ew(filename)
for n, i in enumerate(item):
i.to_excel(w, "sheet%s" % n, index=False, header=False)
w.save()
def save_excelB(filename, item):
w=ew(filename)
for name in item:
i=globals()[name]
i.to_excel(w, sheet_name=name, index=False, header=False)
w.save()
I run both in the same way I call the function and I add the file name and for item I insert the list C I have made.
So it would be:
save_excelB("file.xlsx", C)
and this is what I get
TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed
You need to pass string literals of data frame names in your function and not actual data frame objects:
C = ['A', 'b', 'L', 'dr', 'dx']
def save_excelB(filename, item):
w=ew(filename)
for name in item:
i=globals()[name]
i.to_excel(w, sheet_name=name, index=False, header=False)
w.save()
save_excelB("file.xlsx", C)
You can even dynamically create C with all dataframes currently in global environment by checking items that are pandas data frame class type:
import pandas as pd
...
C = [i for i in globals() if type(globals()[i]) is pd.core.frame.DataFrame]