Adding 100 columns at once to a Dataframe - python

I am attempting to add a large number of columns [0:100] at once to a DataFrame. (For now the values of the columns are NA). I know how to add or insert multiple at a time, through several methods such as the insert() method. But it still involves writing out each column name and attributes out. To do this for 100 columns would be tedious and not very clean. I was wondering if there was an easier way to make a series of columns [0:100] without writing each out one. Possibly through a loop?

Related

How to check differences between column values in pandas?

I'm manually comparing two or three rows very similar using pandas. Is there a more automated way to do this? I would like a better method than using '=='.
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.diff.html
See if this will satisfy your needs.
df['sales_diff'] = df['sales'].diff()
The above code snippet creates a new column in your data frame, which contains the difference between the previous row by default. You can screw around with the parameters (axis) to compare rows or columns and you can change (period) to compare to a specific row or column.

Other Ways to read specific columns

I am facing a small problem while reading columns.
My columns are ["onset1", "onset2", "onset3"], and I want to read the values from excel. But each of the Dataframe has different column names so I need to change the name each time, it's a waste of time.
Wondering if they are any way to read in an efficient way instead of reading df["onset1"].iloc[-1], df["onset2"].iloc[-1]....
(I am thinking of reading the top of the alphabet, like df["V].iloc[-1], df["W].iloc[-1] )

Replacing values in entire dataframe based on dictionary

I have a large dataframe in which I want to change every value based on a dictionary. Currently I am using a loop:
for column in df.columns:
df = df.replace({column: dictionary})
This works but it is pretty slow. This single for loop takes around 90% of the time that it takes my entire code to run. Is there a faster method using loc, map or replace?
There are lots of other questions like this already but it seems pretty much everyone else is just trying to change individual columns rather than the entire dataframe.
I'm using Spyder with Python 3.9 on Windows. Thanks!
Edit: I found a way to make it faster by switching rows and columns. Previously I had lots of columns and few rows, now that it's the other way around the code is a lot faster. Still, is there a way to replace values in the entire dataframe rather than just individual columns?
Is there a way to replace values in the entire dataframe rather than just individual columns?
Use:
df = df.replace(dictionary)

Pandas and complicated filtering and merge/join multiple sub-data frames

I have a seemingly complicated problem and I have a general idea of how I should solve it but I am not sure if it is the best way to go about it. I'll give the scenario and would appreciate any help on how to break this down. I'm fairly new with Pandas so please excuse my ignorance.
The Scenario
I have a CSV file that I import as a dataframe. My example I am working through contains 2742 rows × 136 columns. The rows are variable but the columns are set. I have a set of 23 lookup tables (also as CSV files) named per year, per quarter (range is 2020 3rd quarter - 2015 1st quarter) The lookup files are named as such: PPRRVU203.csv. So that contains values from the 3rd quarter of 2020. The lookup tables are matched by two columns ('Code' and 'Mod') and I use three values that are associated in the lookup.
I am trying to filter sections of my data frame, pull the correct values from the matching lookup file, merge back into the original subset, and then replace into the original dataframe.
Thoughts
I can probably abstract this and wrap in a function but not sure how I can place back in. My question, for those that understand Pandas better than myself, what is the best method to filter, replace the values, and write the file back out.
The straight forward solution would be to filter the original dataframe into 23 separate dataframes, then do the merge on each individual file, then concat into a new dataframe and output to CSV.
This seems highly inefficient?
I can post code but I am looking for more of any high-level thoughts?
Not sure exactly how your DataFrame looks like but Pandas.query() method will maybe prove useful for the selection of data.
name = df.query('columnname == "something"')

Storing independent data tables in python with pandas

I have put ~100 dataframes containing data into a list tables and a list of names (so I can call by name or just iterate over the whole bunch without needing names)
This data will need to be stored, appended to and later queried. So I want to store it as a pandas hdf5 store.
There are ~100 DFs but I can group them into pairs (two different observers).
In the end I want to iterate over all the list of tables but also
I've thought about Panels (but that will have annoying NaN values since the tables aren't the same length), hierachical hd5f (but that doesn't really solve anything, just groups by observer), one continuous dataframe (seeming as they have the same number of columns) (but that will just make it harder because I'll have to piece the DFs back together afterwards).
Is there anything blatantly obvious I'm missing, or am I just going to have to grin and bear it with one these? (if so which one would you go for to give the greatest flexibility?)
Thanks

Categories