ValueError: AAA is not a valid column name - python

What I am trying to do is that I read xlsm excel file to get a list of tickers(symbols) I inputted, using those to web crawl the corresponding value, and then export to the same xlsm file but a new sheet.
money = openpyxl.load_workbook(r'file_path', read_only=False, keep_vba=True)
allocation=money['Allocation']
df1 = pd.DataFrame (allocation.values)
df1.columns = df1.iloc[6] #set header
df1 = df1[7:]
df2=df1.iloc[0:,1].dropna()
tickers = df2.drop(df2.tail(1).index).tolist()
From the list of tickers, I use that info to web crawl the value, and create a dictionary "closing_price"
for ticker in tickers:
closing_price[ticker]=sector_price_1
So far, things work fine. The problem is when I am trying to export the information to a new sheet created in the original workbook by:
price_data= money.create_sheet('price_data')
price_data.append(closing_price)
money.save(r'file_path')
For the second line of code, it says ValueError: AAPL is not a valid column name.
I tried adding column head("AAA") by transforming dict to dataframe first by,
closing_price_df= pd.DataFrame(list(closing_price.items()),columns=['Ticker','Price'])
but append() doesn't accept dataframe. So I re-transform back to dict from dataframe, which I though should have a new header added already after what I just did, then it shows ValueError: Ticker is not a valid column name. What else can I do?
Thanks in advance.

worksheet.append() will append one row at a time into an sheet. I think you are trying to write the whole dataframe. Instead try something like this...
## Just an example of closing_price dummy data
closing_price = [['AAPL','22nd Mar','1234'], ['AMZN','22nd Mar','1111']]
for row in closing_price:
price_data.append(row)

The problem is here, where append handles dicts:
elif isinstance(iterable, dict):
for col_idx, content in iterable.items():
if isinstance(col_idx, str):
col_idx = column_index_from_string(col_idx)
Since you have AAPL there, the column_index_from_string function expects to get letters or combination of letters such as B, E , AA, AF etc. and convert them to column numbers for you. Since you don't have that inside the dictionary but some ticker-data pairs, this of course fails. This would mean that you would have to alter the dictionary so as to have keys that represent column identifiers and values that represent lists of data that you would fill under that column.

Related

Python: How can I write a text file that pulls data from an excel file. Format as “Column header: Cell Value)

I am trying to automate an output for an excel file so it is more easily readable. I have an excel file with columns A through BY. My plan is to read each row with a date that matches user input. Then read each cell in the row and if the cell is not blank, output the cell value. After this, the data will be written to a text file in the format:
Name
Column header C: cell Value
Column header K: cell value
Column header Z: Cell Value
Name 2
Column header C: cell value
etc.
“Name” is one of the columns. So I think the idea would be to make a dictionary of names with each name having a nested dictionary of each column header and the value.
I am using openpyxl to complete this task. I am able to properly loop through all of the rows and cells that match my criteria. However I am having trouble converting this to a text file in the format that I am interested in using. Can anyone provide some methods of printing the cell values and the accompanying column header using a for loop for all the columns? As well as in the desired format? Happy to provide any more information that is needed. Thanks to anyone that can help! This is also my first post so let me know how else I can help.
EDIT: I added some of the code I used to loop through the columns and rows. "work date" is a user input value that chooses only rows with the correct date value we want to work with.
Go through each row of given date and print cells with value in them
cards = {}
activity_list = []
for row in active_sheet.iter_rows(max_row=22):
if row[0].value == work_date:
cards[row[1].value] = ['Start']
for cell in row:
if cell.value:
activity_list.append(cell.value)
for key, value in cards.items():
cards[key].append(activity_list)
else:
continue
Here we select only rows with correct date and print only relevant, non-empty cells. And they are printed and formatted to a txt file.
with open(timecardtxt, 'w') as file_object:
for key, values in cards.items():
file_object.write("Name: " + key)
for item in value[1]:
print(item, file=file_object)
Your code contains a number of errors that I don't think need too much discussion here, but it's clear it won't run as presented, or do what you say you need.
However, I think this is really all you need:
import pandas as pd
df = pd.read_excel('data.xlsx')
df.to_csv('data.csv')
Note that you can install pandas with pip install pandas and that pandas requires openpyxl to be installed for read_excel to work.
This simple example doesn't do any filtering or anything, but that's easy using the dataframe df. And empty cells are just represented as nothing in a .csv anyway.

How to export a dictionary to excel using Pandas

I am trying to export some data from python to excel using Pandas, and not succeeding. The data is a dictionary, where the keys are a tuple of 4 elements.
I am currently using the following code:
df = pd.DataFrame(data)
df.to_excel("*file location*", index=False)
and I get an exported 2-column table as follows:
I am trying to get an excel table where the first 3 elements of the key are split into their own columns, and the 4th element of the key (Period in this case) becomes a column name, similar to the example below:
I have tried using different additions to the above code but I'm a bit new to this, and so nothing is working so far
Based on what you show us (which is unreplicable), you need pandas.MultiIndex
df_ = df.set_index(0) # `0` since your tuples seem to be located at the first column
df_.index = pd.MultiIndex.from_tuples(df_.index) # We convert your simple index into NDimensional index
# `~.unstack` does the job of locating your periods as columns
df_.unstack(level=-1).droplevel(0, axis=1).to_excel(
"file location", index=True
)
you could try exporting to a csv instead
df.to_csv(r'Path where you want to store the exported CSV file\File Name.csv', index = False)
which can then be converted to an excel file easily

how to only read rows with the first column containing name i

I have two excel worksheets I am reading in Python. The first worksheet has a list of companies names. The second is a sheet with multiple of the same companies' names and data to the right that corresponds to the row.
[![Worksheet 1][1]][1]
[![Worksheet 2][2]][2]
I want to make some kind of condition, if the name in column A WS 2 matches the name in WS 1, then print the data (columns A:F WS 2) only for the rows corresponding to the name.
I am pretty new to coding, so I've been playing with it a lot without finding much luck. Right now I don't have much code because I tried restarting again. Been trying to use just pandas to read, sometimes I've been trying openpyxl.
import pandas as pd
import xlsxwriter as xlw
import openpyxl as xl
TickList = pd.read_excel("C:\\Users\\Ashley\\Worksheet1.xlsx",sheet_name='Tickers', header=None)
stocks = TickList.values.ravel()
Data = pd.read_excel("C:\\Users\\Ashley\\Worksheet2.xlsx", sheet_name='Pipeline', header=None, usecols="A:F")
data = Pipeline.values.ravel()
for i in stocks:
for t in data:
if i == t:
print(data)
[1]: https://i.stack.imgur.com/f6mXI.png
[2]: https://i.stack.imgur.com/4vKGR.png
I would imagine that the first thing you are doing wrong is not stipulating the key value on which the "i" in stocks is meant to match on the values in "t". Remember - "t" are the values - all of them. You have to specify that you wish to match the value of "i" to (probably) the first column of "t". What you appear to be doing here is akin to a vlookup without the target range properly specified.
Whilst I do not know the exact method in which the ravel() function stores the data, I have to believe something like this would be more likely to work:
for i in stocks:
for t in data:
if i == t[0]:
print(t)

Pandas - merge/join/vlookup df and delete all rows that get a match

I am trying to reference a list of expired orders from one spreadsheet(df name = data2), and vlookup them on the new orders spreadsheet (df name = data) to delete all the rows that contain expired orders. Then return a new spreadsheet(df name = results).
I am having trouble trying to mimic what I do in excel vloookup/sort/delete in pandas. Please view psuedo code/steps as code:
Import simple.xls as dataframe called 'data'
Import wo.xlsm, sheet
name "T" as dataframe called 'data2'
Do a vlookup , using Column
"A" in the "data" to be used to as the values to be
matched with any of the same values in Column "A" of "data2" (there both just Order Id's)
For all values that exist inside Column A in 'data2'
and also exist in Column "A" of the 'data',group ( if necessary) and delete the
entire row(there is 26 columns) for each matched Order ID found in Column A of both datasets. To reiterate, deleting the entire row for the matches found in the 'data' file. Save the smaller dataset as results.
import pandas as pd
data = pd.read_excel("ors_simple.xlsx", encoding = "ISO-8859-1",
dtype=object)
data2 = pd.read_excel("wos.xlsm", sheet_name = "T")
results = data.merge(data2,on='Work_Order')
writer = pd.ExcelWriter('vlookuped.xlsx', engine='xlsxwriter')
results.to_excel(writer, sheet_name='Sheet1')
writer.save()
I re-read your question and think I undertand it correctly. You want to find out if any order in new_orders (you call it data) have expired using expired_orders (you call it data2).
If you rephrase your question what you want to do is: 1) find out if a value in a column in a DataFrame is in a column in another DataFrame and then 2) drop the rows where the value exists in both.
Using pd.merge is one way to do this. But since you want to use expired_orders to filter new_orders, pd.merge seems a bit overkill.
Pandas actually has a method for doing this sort of thing and it's called isin() so let's use that! This method allows you to check if a value in one column exists in another column.
df_1['column_name'].isin(df_2['column_name'])
isin() returns a Series of True/False values that you can apply to filter your DataFrame by using it as a mask: df[bool_mask].
So how do you use this in your situation?
is_expired = new_orders['order_column'].isin(expired_orders['order_column'])
results = new_orders[~is_expired].copy() # Use copy to avoid SettingWithCopyError.
~is equal to not - so ~is_expired means that the order wasn't expired.

Python - How do I create a new column in a dataframe from the unique values from an existing column?

I have created a data frame from an excel file. I would like to create new columns from each unique value in the column 'animals'. Can anyone help with this? I am somewhat new to Python and Pandas. Thanks.
In:
import pandas as pd
#INPUT FILE INFORMATION
path = 'C:\Users\MY_COMPUTER\Desktop\Stack_Example.xlsx'
sheet = "Sheet1"
#READ FILE
dataframe = pd.io.excel.read_excel(path, sheet)
#SET DATE AS INDEX
dataframe = dataframe.set_index('date')
You said you want to create new columns from each unique value in the column "animals". As you did not specify what you want the new columns to have as values, I assume you want None values.
So, here is the code:
for value in dataframe['animals']:
if value not in dataframe:
dataframe[value]=None
The first line loops through each value of the column 'animals'.
The second line checks to make sure the value is not already in one of the columns so that your condition of having only unique values is satisfied.
The third line creates new columns named under each unique value of column 'animals'.

Categories