How to create an effective logging system using excel, pandas, and numpy - python

Background:
I am creating a program that will need to keep track of what it has ran and when and what it has sent and to whom. The logging module in Python doesn't appear to accomplish what I need but I'm still pretty new to this so I may be wrong. Alternative solutions to accomplish the same end are also welcome.
The program will need to take in a data file (preferably .xlsx or .csv) which will be formatted as something like this (the Nones will need to be filled in by the program):
Run_ID
Date_Requested
Time_Requested
Requestor
Date_Completed
Time_Completed
R_423h
9/8/2022
1806
email#email.com
None
None
The program will then need to compare the Run_IDs from the log to the new run_IDs provided in a similar format to the table above (in a .csv) ie:
ResponseId
R_jals893
R_hejl8234
I can compare the IDs myself, but the issue then becomes it will need to update the log with the new IDs it has ran, along with the times they were run and the emails and such, and then resave the log file. I'm sure this is easy but it's throwing me for a loop.
My code:
log = pd.read_excel('run_log.xlsx', usecols=None, parse_dates=True)
new_run_requests=pd.read_csv('Run+Request+Sheet_September+6,+2022_14.28.csv',parse_dates=True)
old_runs = log.Run_ID[:]
new_runs = new_run_requests.ResponseId[:]
log['Run_ID'] = pd.concat([old_runs, new_runs], ignore_index=True)
After this the dataframe does not change.
This is one of the things I have tried out of 2 or 3. Suggestions are appreciated!

Related

Python reads only one column from my CSV file

first post here.
I am very new to programmation, sorry if it is confused.
I made a database by collecting multiple different data online. All these data are in one xlsx file (each data a column), that I converted in csv afterwards because my teacher only showed us how to use csv file in Python.
I installed pandas and make it read my csv file, but it seems it doesnt understand that I have multiple columns, it reads one column. Thus, I can't get the info on each data (and so i can't transform the data).
I tried df.info() and df.info(verbose=True, show_counts=True) but it makes the same thing
len(df.columns) = 1 which proves it doesnt see that each data has its own column
len(df) = 1923, which is right
I was expecting that : https://imgur.com/a/UROKtxN (different project, not the same database)
database used: https://imgur.com/a/Wl1tsYb
And I have that instead : https://imgur.com/a/iV38YNe
database used: https://imgur.com/a/VefFrL4
idk, it looks pretty similar, why doesn't work :((
Thanks.

Appending Excel cell values using pandas

Edit: I found out a solution to my question. More or less look at the user manual for openPyxl instead of online tutorials, the tutorials ran errors when I tried them (I tried more than one) and their thought process was significantly different from the thought process in the user manual. And also I ended up not using pandas as much as I thought I would need to.
I am trying to append certain values in an Excel file with multiple sheets based on user inputs and then rewrite it to the Excel file (without deleting the rest of the sheets). So far I have tried this which seems to combine the data but I didn't quite see how it applied to what I am doing since I want to append a part of a sheet instead of rewrite the whole excel file. I have also tried a few other things with ExcelWriter but I don't quite understand it since it usually wipes all the data in the file (I may be using it wrong).
episode_dataframe = pd.read_excel (r'All_excerpts (Siena Copy)_test.xlsx', sheet_name=episode)
#episode is a specified string inputted by user, this line makes a data frame for the specified sheet
episode_dataframe.loc[(int(pass_num) - 1), 'Resources'] = resources
#resources is also a user inputted string, it's what I am trying to append the spreadsheet cell value to, this appends to corresponding data frame
path_R = open("All_excerpts (Siena Copy)_test.xlsx", "rb")
with pd.ExcelWriter(path_R) as writer:
writer.book = openpyxl.load_workbook(path_R)
#I copied this from [here][3], i think it should make the writer for the to_excel? I don't fully know
episode_dataframe.to_excel(writer, sheet_name=episode, engine=openpyxl, if_sheet_exsits ='replace')
#this should write the sheet data frame onto the file, but I don't want it to delete the other sheets
Additionally, I have been running into a bunch of other smaller errors, a big one was Workbook' object has no attribute 'add worksheet' even though I'm not trying to add a worksheet, also I could not get their solution to work.
I am a bit of a novice at python, so my code might be a bit of a mess.

Convert CSV file to CSV with the same amount of columns, via the command line

I downloaded several CSV files from a finance site. These files are inputs to a python script that I wrote. The rows in the CSV files don't all have the same number of values (i.e) columns. In fact on blank lines there are no values at all.
This is what the first few line of the downloaded file look like :
Performance Report
Date Produced,14-Feb-2020
When I attempt to add the row to a panda dataFrame, the script incurs an error of "mismatched columns".
I got around this by opening up the the files in MAC OSX Numbers and manually exporting each file to CSV. However I don't want to do this each time I download a CSV file from the finance site. I have googled for ways to automate this but have not been successful.
This is what the first few lines of the "Numbers" exported csv file looks like:
,,,,,,,
Performance Report,,,,,,
Date Produced,14-Feb-2020,,,,,
,,,,,,,
I have tried to played with dialect value of the csv.read module but have not been successful.
I also appended the columns manually in the python script but also have not been successful.
Essentially mid way down the CSV file is the table that I place into the dataFrame. Below is an example.
Asset Name,Opening Balance $,Purchases $,Sales $,Change in Value $,Closing Balance $,Income $,% Return for Period
Asset A,0.00,35.25,66.00,26.51,42.74,5.25,-6.93
...
...
Sub Total,48.86,26,12.29,-16.7,75.82,29.06,
That table prior to exporting via "Numbers" looks like so:
Asset Name,Opening Balance $,Purchases $,Sales $,Change in Value $,Closing Balance $,Income $,% Return for Period
Asset A,0.00,35.25,66.00,26.51,42.74,5.25,-6.93
...
...
Sub Total,48.86,26,12.29,-16.7,75.82,29.06
Above the subtotal row does not have a value in the last column, and does so doe snot represent it as ,"", which would make it so that all rows have an equal number of columns.
Does anyone have any ideas on how I can automate the Numbers export process? Any help would be appreciated. I presume they're varying formats of CSV.
In pandas read_csv you can skip rows. If the number of header rows are consistent then:
pd.read_csv(myfile.csv, skiprows=2)
If the first few lines are not consistent or the problem is actually deeper within the file, then you might experiment with try: and except:. Without more information on what the data file looks like, I can't come up with a more specific example using try: and except:.
There is many ways to do this in your script rather than adding commas by means of seperate programs.
One way is to preprocess the file in memory in your script before using pandas.
Now when you are using pandas you should use the built-in power of pandas.
you have not shared what the actual data rows looks like, and without that noone can actually help you.
I would look into using the following 2 kwargs of 'read_csv' to get the job done.
skiprows as a callable,
i.e. make your own function and use it as a filter to filter unwanted rows away
error_bad_lines set to False to just ignore errors and deal with it after it's in the dataframe

Broken Excel output: Openpyxl formula settings?

I am creating some Excel spreadsheets from pandas DataFrames using the pandas.ExcelWriter().
Issue:
For some string input, this creates broken .xlsx files that need to be repaired. (problem with some content --- removed formula, cf error msg below)
I assume this happens because Excel interprets the cell content not as a string, but a formula which it cannot parse, e.g. when a string value starts with "="
Question:
When using xlsxwriter as engine, I can solve this issue by setting the argument options = {"strings_to_formulas" : False }
Is there a similar argument for openpyxl?
Troubleshooting:
I found the data_only argument to Workbook, but it only seems to apply to reading files / I cannot get it to work with ExcelWriter().
Not all output values are strings / I'd like to avoid converting all output to str
Could not find an applicable question on here
Any hints are much appreciated, thanks!
Error messages:
We found a problem with some content in 'file.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes
The log after opening says:
[...] summary="Following is a list of removed records:">Removed Records: Formula from /xl/worksheets/sheet1.xml part [...]
Code
import pandas
excelout = pandas.ExcelWriter(output_file, engine = "openpyxl")
df.to_excel(excelout)
excelout.save()
Versions:
pandas #0.24.2
openpyxl #2.5.6
Excel 2016 for Mac (but replicates on Win)
I've struggled of this issue too.
I have found a strange solution for formulas.
I had to replace all ; (semicolon) signs with , (comma) in the formulas.
When I opened the result xlsx file with Excel, this error didn't rise and the formula in Excel had usual ;.
I spent FAR too long trying to figure out this error.
Turned out I had an extra bracket, so the formula wasn't valid.
I know 99% of people will read this and say "thats not the issue" and move on, but take your formula and paste it into excel if you can (replacing dynamic values as best you can) and see if excel accepts it.
If it accepts it fine, move on and find whatever the other cause it, but if you find it doesn't like the formula, maybe I just saved you a couple of hours....
My command: f'''=IF(ISBLANK(E{row}),FALSE," "))'''
Tiny command, could not understand what was wrong with it. :facepalm:

pd.read_excel does recognize the file but does not actually read it

I've been busy working on some code and one part of it is importing an excel file. I've been using the code below. Now, on one pc it works but on another it does not (I did change the paths though). Python does recognize the excel file and does not give an error when loading, but when I print the table it says:
Empty DataFrame
Columns: []
Index: []
Just to be sure, I checked the filepath which seems to be correct. I also checked the sheetname but that is all good too.
df = pd.read_excel(book_filepath, sheet_name='Potentie_alles')
description = df["#"].map(str)
The key error '#' (# is the header of the first column of the sheet).
Does anyone know how to fix this?
Kind regards,
iCookieMonster

Categories