Pandas position into an excel - python

Does anyone knows how can I insert a dataframe into an excel in a desired position ?
For example, I would like to start my dataframe into the cell "V78"

there is startrow and startcol argument in the .to_excel() method
df.to_excel('excel.xls', startrow=78, startcol=24)

I have a solution which may or may not fit your requirements.
I would not directly import it into an existing Excel file which may contain valuable data and furthermore keeping the files separate may be of use one day.
You could simply save the dataframe as an Excel file;
df.to_excel('df.xls')
And in the Excel file that you want to insert it into create an object of type file and link the two that way. See here.
Personally keeping them separate seems better as once two files become one there is no going back. You could also have multiple files this way for easy comparisons, without fiddling row/column numbers!
Hope was of some help!

Related

Other Ways to read specific columns

I am facing a small problem while reading columns.
My columns are ["onset1", "onset2", "onset3"], and I want to read the values from excel. But each of the Dataframe has different column names so I need to change the name each time, it's a waste of time.
Wondering if they are any way to read in an efficient way instead of reading df["onset1"].iloc[-1], df["onset2"].iloc[-1]....
(I am thinking of reading the top of the alphabet, like df["V].iloc[-1], df["W].iloc[-1] )

Columns moved after importing csv

I am new to Python/Pandas. I am wondering if there's a code that can help me fix how the columns move to the right inside the .csv we pull out of our systems - one column is filled with user input (containing messy characters ",) so usually after loading the user input column spreads out on several columns instead of one, wrongly moving out to the right the other columns as well.
I fix this manually in excel, manually filtering, deleting, moving the columns to their correct place - it takes 20 mins / day.
I would like to ask advice if there is code which I could try to clean and arrange correctly the columns or if it is easier the manual fix in excel as I do it now. Thank you!
pandas is altering the columns because it sees 'separators' in the import file.
In Excel, for each newline, count how many times a comma appears. Using your example above there should be 3 per line.
My quick and dirty solution would be replace the last three commas in your file with a character that is almost impossible for a user to type, I typically go for a pipe '|' character.
Try importing that into pandas, specifying a new delimier/separator example below:
import pandas as pd
df = pd.read_csv(filepath, sep="|")
df.head()
You cannot play with the layout with CSV that is a pure data transport format. Hopefully, there are 3rd party libs that can play with .xlsx files here and here.

Make hash on excel data to detect data changed with openpyxl

I have an excel file with a lot of sheets (100+). Each sheet is independant. I would like to know if the data in a specific sheet has been altered since it last was opened. At the moment, I have a solution based on a for loop on all the relevant cells and calculate a checksum from there. If it is different, then the sheet has been changed. The problem is that I need to access a lot of cells and python is notoriously slow at that kind of task.
My question is: would you people have a better solution than my very naive one that would be more efficient?
I am using pyopenxl, but I could use another library for this specific task but it must be a python library.
The data is not of a single kind: there is a mix of numbers and strings in each sheet. But every sheet is formatted with the same pattern. (i.e. always the same data type at a given coordinate)

CSV Files and pandas

I assume this is a trick question on this hw i'm working on but maybe it's not?
What object do you get after reading a csv file?
data frame
character vector
panel
all of the above
From what I know, you can use pandas to read in a csv file into a dataframe. But i know a panel is a data structure in pandas too...character vector I've never even heard of.
Any one got any ideas? I'm fairly certain the answer is just dataframe, but hey never know.
The time when you read a CSV file into a variable it is stored as a pandas.core.frame.DataFrame object which you are familiar of.
Now, talking about Panel, which represents wide format panel data, stored as 3-dimensional array have been deprecated since version 0.20.0 as listed Pandas Panel.

Using Pandas to create, read, and update hdf5 file structure

We would like to be able to allow the HDF5 files themselves to define their columns, indexes, and column types instead of maintaining a separate file that defines structure of the HDF5 data.
How can I create an empty HDF5 file from Pandas with a specific table structure like:
Columns
id (Int)
name (Str)
update_date (datetime)
some_float (float)
Indexes
id
name
Once the HDF5 is created and saved to disk, how do I retrieve the column and index information without having to open the file completely each time since it will likely contain several GB of data.
Many thanks in advance...
-- UPDATE --
Thanks for the comments. To clarify a bit more:
We do have some experience with Pandas but by no means are really proficient. The part that is tripping us up is creating an empty data structure and reading that structure from a file that you will not want to fully open. In all of the Pandas examples there is data. The Pandas examples also only show two ways to retrieve data/structure which are to read the entire frame into memory or issue a where clause. In this case, we would like to be able to see the table structure without query operations if possible.
I know this is an odd case. Why the heck would you want an empty dataframe?? Well, we want to have a great deal of flexility in moving data around and want to be able to define a target dataframe structure prior to data writing, which could take place much later (e.g. hours or days). Since the HDF5 specification maintains all that information it seems directionally incorrect to store the table structure information separately. Thus our desire to crack the code on this subject.
-- UPDATE 2 --
To add more detail as #jeff requested.
We would like to abstract some of the common Pandas functions like summing data or merging two frames. Thus we would like to be able to ask each frame what their columns are so we can present a view for the user to select the result frame columns.
For example, if we imported a CSV with columns A, B, C, D, and V and saved the frame to HDF5 as my_csv.hdf then we would be able to determine the columns by opening the file.
However, in our use case it is likely that the import frame for the CSV could be cleared periodically and no longer contain the data. The reason knowing that the my_csv frame has certain columns and types is important because we want to enable a user to then select those columns for summing in a downstream operation. Lets say a user wants to sum column V by the values in columns A and B only and save the frame as my_sum. Since we can't ensure my_csv will always have data we would like to ensure it at least contains the structure.
Open to other suggestions obviously. It is also possible to store the table structure info in the user_block. This, again, is not ideal because the structure is now being kept in two different areas but I guess it would be possible to always update the user_block on save using the latest column and index information for the frame, although I believe the to_* operations in Pandas will blow away the user_block so...blah. I feel like I'm talking myself into maintaining a peer structure definition but I REALLY would love some suggestions to not have to do that.

Categories