Change the column name of dataframe at runtime - python

I am trying to initialize an empty dataframe with 5 column values. Say column1, column2, column3, column4, column5.
Now I want to read data from database and want to insert specific column values from the database to this dataframe. Since there are 5 columns its easier to do it individually. But i have to extend the number of columns of the dataframe to 70. For that I am using for loop.
To update the coulmn value I was using
dataframe['column "+count+"'] = .... where count is an incremental variable ranging upto 70.
But the above code adds a new column to the dataframe. How can I use the count variable to access these column names?

i would recommend just using pandas.io.sql to download your database data. it returns your data in a DataFrame.
but if, for some reason, you want to access the columns, you already have your answer:
assignment: df['column%d' % count] = data
retrieval: df['column%d' % count]

Related

How to get data from a row on a DataFrame

I have a list of lists that contain the indexes of the mininum values on each column of a DataFrame that has row and column name going from 0 to 399 (on columns) and 0 to 1595 (on rows). I want to use this list to access the data of another DataFrame. For example, I have the list (43,579,100) and I want to access the 43rd, 579th and 100th value of a column in the second DataFrame. However, this DataFrame has row number names that do not go from 0 to 1595 so I don't want to make the mistake of accessing the data on the row that may have the name "43", I want to access the 43rd row.
I added a picture of my Data Frames.
I would like to get a list with the data on the selected rows.
You can use .values to convert the column data to a numpy array and index with your list. For example, if your data is in variable df and the list of indexes is idxs, then for a given column:
df[column].values[idxs]

Update multiple rows in mysql table with single call

I have a table in MySql DB. I need to update a particular column for all the rows in the table.
I need to do this in a single call without deleting the rows, only updating the column values.
I have tried using df.to_sql(if_exists='replace') but this deletes the rows and re-inserts them. Doing so drops the rows from other table which are linked by the foreign key.
merge_df = merge_df[['id', 'deal_name', 'fund', 'deal_value']]
for index, row in merge_df.iterrows():
ma_deal_obj = MA_Deals.objects.get(id=row['id'])
ma_deal_obj.deal_value = row['deal_value']
ma_deal_obj.save()
merge_df has other columns as well. I only need to update the 'deal_value' column for all rows.
One solution I have is by iterating over the dataframe rows and using Django ORM to save the value but this is quite slow for too many rows.

How do we convert or copy values of a column in apache spark dataframe into a list?

I have an apache spark data frame with two columns and I want to copy all values in the second column into a list. Let me know if any one way to do this. I am new to spark
I assume you want to store the content of the second column of your data frame locally in a list. For this, the following steps can be used. They may not be the fastest or best way:
rows = df.select("column_name_2").collect()
# => Returns a list of Rows with one column
my_list = map(lambda x: x["column_name_2"], rows)
# => Extracts the column value out of the row construct
You don't have to select the column in the first line, however it reduces the space required to store the rows in driver memory.

Python Pandas DataFrame Data Identification

I took data from an .xlsx file and stored it in the dataframe. The data frame is called df, and the size of the dataframe is (51,3). 51 rows. 3 columns. The columns are unnamed and numbered 0,1,2. The rows are indexed from 0-50. What syntax would I use to extract data from a dataframe with pandas in python and put it into a csv? I know I would use DataFrame.to_csv("outputFile.csv" ), but I'm not sure how to identify a specific piece of data (row/column pair), so I can put it in a new location in the csv table in comparison to the old excel table.
You can use integer based indexing using iloc: http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-position
In your case, the row/col value you are looking for can be retrieved by:
df.iloc[row_id, col_id]

Python - How do I create a new column in a dataframe from the unique values from an existing column?

I have created a data frame from an excel file. I would like to create new columns from each unique value in the column 'animals'. Can anyone help with this? I am somewhat new to Python and Pandas. Thanks.
In:
import pandas as pd
#INPUT FILE INFORMATION
path = 'C:\Users\MY_COMPUTER\Desktop\Stack_Example.xlsx'
sheet = "Sheet1"
#READ FILE
dataframe = pd.io.excel.read_excel(path, sheet)
#SET DATE AS INDEX
dataframe = dataframe.set_index('date')
You said you want to create new columns from each unique value in the column "animals". As you did not specify what you want the new columns to have as values, I assume you want None values.
So, here is the code:
for value in dataframe['animals']:
if value not in dataframe:
dataframe[value]=None
The first line loops through each value of the column 'animals'.
The second line checks to make sure the value is not already in one of the columns so that your condition of having only unique values is satisfied.
The third line creates new columns named under each unique value of column 'animals'.

Categories