I am working with .csv dataset that I download from an online source. There are two date columns in the file that come in a custom excel format of "d-mmm", but when you highlight over a cell it shows the contents of the cell in "mm/dd/yyyy" format (this is in the formula box, not the cell itself).
Picture to illustrate my problem: https://imgur.com/0qm9ScH
I'm struggling with getting the correct date format in python without manually changing the data type in excel.
Like stated, I have tried to manually change the data types and this works...but then I would have to do this in 15 different .csvs and whenever I update the .csvs, I would have to repeat the process.
I have also tried using parse_dates and date_parser in my pandas read_csv, but haven't had any luck
Ideally, there would be some process that would open up a .csv, read the contents of the two date fields, and convert from "d-mmm" (as shown) to "mm/dd/yyyy" as desired.
Is this possible or would I have to manually change data types within excel?
Related
I have table in pdf in this format.
pdf table format
I want to extract data into excel in this format.
required format of data in excel
I have tried with camelot and tabula. Using both lattice and stream flavor but still not able to achieve the desired results.
In camelot I am facing similar issue as this guy and was not able to move forward.
https://github.com/atlanhq/camelot/issues/414
I also tried using column separator, but It is only allowed in stream flavor (camelot). In that case one row of table is getting divided in multiple rows. Which I don't want.
No luck in tabula also.
So,
How I can solve this problem, mentioned above?
I want "Supp Info" & "(AMT)" in two different columns as shown in excel. How can I achieve this?
I am new in data extraction kind of thing. Plz help!
Thanks!
I have a main dataframe called df1, from this df1 I take six different dataframes based on different conditions. Later I write all this to different excel sheets, So my excel sheet will have 6 sheets.
What I want to do is that I need to colour the column heading of A:D with one colour, E:H to another colour etc. But xlsx writer is already taking time without formatting the file so with formatting it could take much more time.
So I wonder if somehow I can set colour to df1 itself and when taking data from it, all the six df will already have the format property.
What is the best way to do this ?
I have come across a problem in pandas which goes as follows
Have look at the attached jpg
I need to make it in Pandas data frame and the entire data to be processed in the form of JSON.
Been trying but not getting the proper output/json format for UI purpose.
Though good at pandas, couldn't map X-Total, Y-total, Z-Total in proper indices.
Your suggestions would help me a lot.TIA
I am reading an unstructured excel file(with lot of merged cells of no particular size) into pandas data frame and the content in merged cells is getting read into top left cell position in pandas and it fills the other cells with null values. Now there are many null values already present in the excel file and i want to find a way to specifically track the locations of cells with null values that were created after unmerging when read into pandas. I don't find any way in python that could do this thing. Could anyone guide me how to approach this problem?
I guess you use pandas.read_excel() to get data from Excel into the data frame?
Given the merged cells in the Excel sheet, I believe you will need to read the data yourself using one of the libraries listed on http://www.python-excel.org/ (I recommend openpyxl).
You can then control the excel-to-dataframe conversion yourself.
I need to open the existing Excel file, iterate through the column with dates (showed as numbers - like 42311 etc) and change the format of the cells from numbers to dates (like 16/12/2016). Then save and close the file.
I've tried to search through xlrd documentation and similar questions of other people, but still no understanding.