I'm totally puzzled by the situation at hand. I have a large dataset with a broad range of numbers, all between 0 and 2. However, when I write the data to a .csv file with
df_Signals.to_csv('signals_IDG_TOut1.csv', sep=',')
to be able to import the file in another program something strange happens. When I for example call the number with
print(df_Signals["Column"].iloc[44])
python prints: 2.8147020866287068e-05
However, when I open the .csv file it reads 281470208662,87. A quick inspection shows that this happens for all number written in E-notation I could find. I have tried to figure out what is going on, but have no idea what the answer is . So my main question is: Why? And secondly, how can I resolve this? And is this a structural problem when exporting to .csv files?
I use PyCharm 2017.1.4, with the Anaconda 3 interpreter.
Regards
Update: As the comments correctly pointed out, it is Excel that wrongly opens the data. Which still intrigues me why that happens.
Related
I have two programs written in python and converted to one-file exe using auto-py-to-exe.
the first program writes to a file, which is read by the second program. The problem is when the second program wants to read the file the same time as it is being written, the code stops with a permission error.
The solutions that seemed to work are:
Using time management which is not useful in my case, since the reading and writing times are not constant.
I could check if the file is accessible, which might be a solution, however, I suppose it would raise an error if while reading the file, the writer tries to change the file.
I could use the size of the file to check if writing to the file has been finished, and then execute the reader, however, this does not seem to be both logical and pythonic!
I found some solutions using os.pipe(), but to be honest, I couldn't understand what the process does. If this is a solution, I would be glad to have it explained in simple English.
That's it. Any suggestions?
P.S: OS is windows and I am using Python 3.9
Solved:
Thanks to the replies and suggestions, I didn't know that the try except commands accept ErrorType. Thus, I solved the problem by using 'except' and 'PermissionError'. the code runs in a loop and it is checked again in a few seconds.
However, the drawback is this: the reading time should be less than the time the writer comes back to rewrite the file! In my case, as suggested by friends, I combined the two programs so they are run sequentially.
I am using Python 3.6.3.
A problem remains in my script, which is fully operational.
Main modules are pandas and xslxwriter (with easygui for GUI).
From a single master file (Excel 2010), this script can generate dozens of Excel files (with xlsxwriter), each of them can contain hundreds of columns of data (depending of parameters of the master file).
Indentation, logic and results are OK.
But the last Excel file is not committed to disk, and I have to restart Python to get it.
For example, if one run produces 100 files, only 99 will be written on disk. The last one is calculated, but not visible.
If Python is not restarted, this file is written to disk at the beginning of a next run of the script.
I identified maybe a flush problem, and tried some solutions, but this problem still remains.
Are there some tricks to force the buffer? I am not allowed to modify the environment variables on my professional computer.
Thank you for your help and your time :)
Thank you Mark Tolonen
You were right : the file was not closed properly, and it was because I made a mistake.
My code was a bit difficult to summarize, and I could not post a résumé of my script.
First, a keyword continue (for the main loop) was bad-indented and I replaced it at the right place.
But just before this keyword, I was closing the xlsxwriter file with: workbook.close (only one in the script exists for the main loop).
But this was not mentioned as an error at run-time.
Each xlsxwriter file was committed to disk except the last one, as mentioned in my question above.
I then reviewed the documentation at https://xlsxwriter.readthedocs.io/workbook.html, and I noticed that parenthesis were missing when worbook is closed.
After correction by adding the missing parenthesis: workbook.close(), all is fine now :)
I would like to share this information, because some may have meet the same problem.
Than you also to progmatico for your information on flush properties.
Greetings from Paris, France :)
My code is an update of an existing script which outputs an xslx file with a lot of data. The original script is pretty stable and has worked for ages.
What I'm trying to do is that, after the original script has ended and the xslx is created, I want to input the file into Pandas and then run a series of analyses on it, using the methods .loc(), .iloc(), .index().
But after I read the file into a variable, when I hit '.' after the variable's name in PyCharm, I get all the dataframe and NDArray methods... except those three that I need.
No errors, no explanations. They are just not there.
And if I ignore this and go on and type them up manually, the variable I put the results into doesn't show ANY methods when I hit '.' for it, next (instead of showing the methods for, say, a series).
I've tried clearing the xslx file of all formatting (it originally had empty lines hidden). I tried running .info() and .head() to make sure they both run fine (They seem to, yes). I even updated my code from Python 2.7 to Python 3.7 using the 2to3 scripts to see if that might change anything. It didn't.
import pandas as pd
analysis_file = pd.read_excel("F:\\myprogram\\output1.xlsx", "Sheet1")
analysis_file. <--- The problem's here
Really not sure how to proceed, and no one I've asked so far has been able to help me.
This is literally day 1 of python for me. I've coded in VBA, Java, and Swift in the past, but I am having a particularly hard time following guides online for coding a pdf scraper. Since I have no idea what I am doing, I keep running into a wall every time I want to test out some of the code I've found online.
Basic Info
Windows 7 64bit
python 3.6.0
Spyder3
I have many of the pdf related code packages (PyPDF2, pdfminer, pdfquery, pdfwrw, etc)
Goals
To create something in python that allows me to convert PDFs from a folder into an excel file (ideallY) OR a text file (from which I will use VBA to convert).
Issues
Every time I try some sample code from guides i've found online, I always run into syntax errors on the lines where I am calling the pdf that I want to test the code on. Some guide links and error examples below. Should I be putting my test.pdf into the same file as the .py file?
How to scrape tables in thousands of PDF files?
I got an invalid syntax error due to "for" on the last line
PDFMiner guide (Link)
runfile('C:/Users/U587208/Desktop/pdffolder/pdfminer.py', wdir='C:/Users/U587208/Desktop/pdffolder')
File "C:/Users/U587208/Desktop/pdffolder/pdfminer.py", line 79
print pdf_to_csv('test.pdf', separator, threshold)
^
SyntaxError: invalid syntax
It seems that the tutorials you are following make use of python 2. There are usually few noticable differences, the the biggest is that in python 3, print became a funtion so
print()
I would recomment either changing you version of python or finding a tutorial for python 3. Hope this helps
Here
Pdfminer python 3.5 an example, how to extract informations from a PDF.
But it does not solve the problem with tables you want to export to Excel. Commercial products are probably better in doing that...
I am trying to do this exact same thing! I have been able to convert my pdf to text however the formatting is extremely random and messy and I need the tables to stay in tact to be able to write them into excel data sheets. I am now attempting to convert to XML to see if it will be easier to extract from. If I get anywhere on this I will let you know :)
btw, use python 2 if you're going to use pdfminer. Here's some help with pdfminer https://media.readthedocs.org/pdf/pdfminer-docs/latest/pdfminer-docs.pdf
I have a script to format a bunch of data and then push it into excel, where I can easily scrub the broken data, and do a bit more analysis.
As part of this I'm pushing quite a lot of data to excel, and want excel to do some of the legwork, so I'm putting a certain number of formulae into the sheet.
Most of these ("=AVERAGE(...)" "=A1+3" etc) work absolutely fine, but when I add the standard deviation ("=STDEV.P(...)" I get a name error when I open in excel 2013.
If I click in the cell within excel and hit (i.e. don't change anything within the cell), the cell re-calculates without the name error, so I'm a bit confused.
Is there anything extra that needs to be done to get this to work?
Has anyone else had any experience of this?
Thanks,
Will
--
I've investigated further and this is the issue:
When saving the formula "STDEV.P" openpyxl saves it as:
"=_xludf.STDEV.P(...)"
which is correct for many formula, but not this one.
The result should be:
"=_xlfn.STDEV.P(...)"
When I explicitly change the function to the latter, it works as expected.
I'll file a bug report, so hopefully this is done automatically in the future.
I suspect that there might be a subtle difference in what you think you need to write as the formula and what is actually required. openpyxl itself does nothing with the formula, not even check it. You can investigate this by comparing two files (one from openpyxl, one from Excel) with ostensibly the same formula. The difference might be simple – using "." for decimals and "," as a separator between values even if English isn't the language – or it could be that an additional feature is required: Microsoft has continued to extend the specification over the years.
Once you have some pointers please submit a bug report on the openpyxl issue tracker.