How to export a dictionary to excel using Pandas - python

I am trying to export some data from python to excel using Pandas, and not succeeding. The data is a dictionary, where the keys are a tuple of 4 elements.
I am currently using the following code:
df = pd.DataFrame(data)
df.to_excel("*file location*", index=False)
and I get an exported 2-column table as follows:
I am trying to get an excel table where the first 3 elements of the key are split into their own columns, and the 4th element of the key (Period in this case) becomes a column name, similar to the example below:
I have tried using different additions to the above code but I'm a bit new to this, and so nothing is working so far

Based on what you show us (which is unreplicable), you need pandas.MultiIndex
df_ = df.set_index(0) # `0` since your tuples seem to be located at the first column
df_.index = pd.MultiIndex.from_tuples(df_.index) # We convert your simple index into NDimensional index
# `~.unstack` does the job of locating your periods as columns
df_.unstack(level=-1).droplevel(0, axis=1).to_excel(
"file location", index=True
)

you could try exporting to a csv instead
df.to_csv(r'Path where you want to store the exported CSV file\File Name.csv', index = False)
which can then be converted to an excel file easily

Related

Phyton - Lookup of a data in another sheet and copy to a new file if not found

I am new to programming and I am trying to learn. I am comparing 2 documents that have very similar data. I want to find out if data from column "concatenate" is found in the same column "contatenate" from the other document because I want to find out what changes where made during the last update of the file.
If the value cannot be found this whole row should be copied to a new document. Then I know that this dataset has been changed.
Here is the code I have:
import pandas as pd
# load the data from the two files into Pandas dataframes
df1 = pd.read_excel('/Users/bjoern/Desktop/PythonProjects/Comparison/MergedKeepa_2023-02-05.xlsx')
df2 = pd.read_excel('/Users/bjoern/Desktop/PythonProjects/Comparison/MergedKeepa_2023-02-04.xlsx')
# extract the values from column Concatenate in both dataframes
col_a_df1 = df1['concatenate']
col_a_df2 = df2['concatenate']
# find the intersection of the values in column A of both dataframes
intersection = col_a_df1.isin(col_a_df2)
# filter the rows of df1 where the value in column A is not found in df2
df1 = df1[intersection]
# write the filtered data to a new Excel file
df1.to_excel('/Users/bjoern/Desktop/PythonProjects/Comparison/filtered_data.xlsx', index=False)
I just duplicated the 2 inputfiles which means I should receive a blank document but the document is still copying data to the new sheet.
What did I do wrong?
Many thanks for your support!
If the value cannot be found, this whole row should be copied to a new
document.
IIUC, you need (~), the NOT operator, to negate your boolean mask :
df1 = df1[~intersection]

How to read and modify csv files in function in loop and save as separated DataFrame in Python Pandas?

I try to create function in Python Pandas where:
I read 5 csv
make some aggregations on each readed csv (just to make it easier, we can delete one column)
save each modified csv as DataFrames
Currently I have something like below, nevertheless it return only one DataFrame as output not 5, how can I change below code ?
def xx():
#1. read 5 csv
for el in [col for col in os.listdir("mypath") if col.endswith(".csv")]:
df = pd.read_csv("path/f"{el}"")
#2. making aggregations
df = df.drop("COL1", axis=1)
#3. saving each modified csv to separated DataFrames
?????
FInally I need to have 5 separated DataFrames after modifications, how can I modify my function to achieve taht in Phython Pandas ?
You can create an empty dictionnary and feed it gradually with the five processed dataframes.
Try this:
def xx():
dico_dfs={}
for el in [file for file in os.listdir("mypath") if file.endswith(".csv")]:
#1. read 5 csv
df = pd.read_csv(f"path/{el}")
#2. making aggregations
df = df.drop("COL1", axis=1)
#3. saving each modified csv to separated DataFrames
dico_dfs[el]= df
You can access to each dataframe by using the filename as a key, e.g dico_dfs["file1.csv"].
If needed, you can make a single dataframe by using pandas.concat : pd.concat(dico_dfs).

ValueError: AAA is not a valid column name

What I am trying to do is that I read xlsm excel file to get a list of tickers(symbols) I inputted, using those to web crawl the corresponding value, and then export to the same xlsm file but a new sheet.
money = openpyxl.load_workbook(r'file_path', read_only=False, keep_vba=True)
allocation=money['Allocation']
df1 = pd.DataFrame (allocation.values)
df1.columns = df1.iloc[6] #set header
df1 = df1[7:]
df2=df1.iloc[0:,1].dropna()
tickers = df2.drop(df2.tail(1).index).tolist()
From the list of tickers, I use that info to web crawl the value, and create a dictionary "closing_price"
for ticker in tickers:
closing_price[ticker]=sector_price_1
So far, things work fine. The problem is when I am trying to export the information to a new sheet created in the original workbook by:
price_data= money.create_sheet('price_data')
price_data.append(closing_price)
money.save(r'file_path')
For the second line of code, it says ValueError: AAPL is not a valid column name.
I tried adding column head("AAA") by transforming dict to dataframe first by,
closing_price_df= pd.DataFrame(list(closing_price.items()),columns=['Ticker','Price'])
but append() doesn't accept dataframe. So I re-transform back to dict from dataframe, which I though should have a new header added already after what I just did, then it shows ValueError: Ticker is not a valid column name. What else can I do?
Thanks in advance.
worksheet.append() will append one row at a time into an sheet. I think you are trying to write the whole dataframe. Instead try something like this...
## Just an example of closing_price dummy data
closing_price = [['AAPL','22nd Mar','1234'], ['AMZN','22nd Mar','1111']]
for row in closing_price:
price_data.append(row)
The problem is here, where append handles dicts:
elif isinstance(iterable, dict):
for col_idx, content in iterable.items():
if isinstance(col_idx, str):
col_idx = column_index_from_string(col_idx)
Since you have AAPL there, the column_index_from_string function expects to get letters or combination of letters such as B, E , AA, AF etc. and convert them to column numbers for you. Since you don't have that inside the dictionary but some ticker-data pairs, this of course fails. This would mean that you would have to alter the dictionary so as to have keys that represent column identifiers and values that represent lists of data that you would fill under that column.

How do I read only specific columns from a JSON dataframe?

I have a JSON dataframe with 12 columns, however, I only want to read columns 2 and 5 which are named "name" and "score."
Currently, the code I have is:
df = pd.read_json("path",orient='columns', lines=True)
print(df.head())
What that does is displays every column, as would be expected.
After reading through the documentation here:
https://pandas.pydata.org/docs/reference/api/pandas.read_json.html
I can't find any real way to only parse certain columns within json, compared to csv where you can parse columns using names=[]
pass a list of columns for indexing
df[["name","score"]]

Is there a way to edit columns in CSV file with python?

I'm trying to standardize data in a large CSV file. I want to replace a string "Greek" with a different string "Q35497" but only in a single column (I don't want to replace every instance of the word "Greek" to "Q35497" in every column but just in a column named "P407"). This is what I have so far
data_frame = pd.read_csv('/data.csv') data_frame["P407"] = data_frame['P407'].astype(str) data_frame["P407"].str.replace('Greek', 'Q35497')
But what this does is just create a single column "P407" with a list of strings (such as 'Q35497') and I can't append it to the whole csv table.
I tried using DataFrame.replace
data_frame = data_frame.replace( #to_replace={"P407":{'Greek':'Q35497'}}, #inplace=True #)
But this just creates an empty set. I also can't figure out why data_frame["P407"] creates a separate series that cannot be added to the original csv file.
Your approach is correct but you have missing to store the modified dataframe.
data_frame = pd.read_csv('/data.csv')
data_frame["P407"] = data_frame["P407"].str.replace('Greek', 'Q35497')

Categories