This question already has answers here:
Pandas column access w/column names containing spaces
(6 answers)
Closed 10 months ago.
so i am trying to split a column in a DataFrame (df1) using the .str.split() function. column name is hyphenated (lat-lon). When i run the code below, python reads only the "lat" and ignores the "-lon" thereby returning
AttributeError: 'DataFrame' object has no attribute 'lat'
df1[['lat','lon']] = df1.lat-lon.str.split(" ",expand=True,)
df1
How do i get python to read the entire column name (lat-lon) and not just lat?
df1[['lat','lon']] = df1["lat-lon"].str.split(" ",expand=True,)
You can't use attributes with hyphens. The name must be a valid python name (letters+digits+underscore, not starting with a digit).
A good practice is actually to never use attributes for column names in pandas.
Use:
df1[['lat','lon']] = df1['lat-lon'].str.split(" ",expand=True,)
Related
This question already has answers here:
Python renaming Pandas DataFrame Columns
(4 answers)
Multiple aggregations of the same column using pandas GroupBy.agg()
(4 answers)
Closed 11 months ago.
im new to python, i used to code...
StoreGrouper= DonHenSaless.groupby('Store')
StoreGrouperT= StoreGrouper["SalesDollars"].agg(np.sum)
StoreGrouperT.rename(columns={SalesDollars:TotalSalesDollars})
to group stores and sum by the SalesDollars then rename SalesDollars to TotalSalesDollars. it outputted the following error...
NameError: name 'SalesDollars' is not defined
I also tried using quotes
StoreGrouper= DonHenSaless.groupby('Store')
StoreGrouperT= StoreGrouper["SalesDollars"].agg(np.sum)
StoreGrouperT= StoreGrouperT.rename(columns={'SalesDollars':'TotalSalesDollars'})
This output the error: rename() got an unexpected keyword argument 'columns'
Here is my df
df
In order to rename a column you need quotes so it would be:
StoreGrouperT.rename(columns={'SalesDollars':'TotalSalesDollars'})
Also I usually assign it a variable
StoreGrouperT = StoreGrouperT.rename(columns={'SalesDollars':'TotalSalesDollars'})
Use the pandas rename option to change the column name. You can also use inplace as true if you want your change to get reflected to the dataframe rather than saving it again on the df variable
df.rename(columns={'old_name':'new_name'}, inplace=True)
This question already has answers here:
Joining pandas DataFrames by Column names
(3 answers)
Pandas Merging 101
(8 answers)
Closed last year.
I am following this article, but I was only able to get it to work by making sure there were matching titles, the two still had computer names, but they were called differently in the title, how could I modify my command so that it still references the same column, is that possible?
lj_df2 = pd.merge(d2, d3, on="PrimaryUser", how="left")
For example, I have this, but on my other csv, I have Employee # not primary user
This question already has answers here:
How to unnest (explode) a column in a pandas DataFrame, into multiple rows
(16 answers)
Closed 2 years ago.
I am trying to run python script which I am using explode(). In my local it is working fine but when I am trying to run on server it is giving error.
I am using below code:
df_main1 = (df_main1.set_index(['rule_id', 'applied_sql_function1', 'input_condition', 'input_value', 'and_or_not_oprtor', 'output_condition', 'priority_order']).apply(lambda x: x.astype(str).str.split(',').explode()).reset_index())
Error I am getting:
("'Series' object has no attribute 'explode'", u'occurred at index comb_fld_order')
Problem is different versions of pandas, because Series.explode working in later versions only:
New in version 0.25.0.
Try:
df_main1 = (df_main1.set_index(['rule_id', 'applied_sql_function1', 'input_condition', 'input_value', 'and_or_not_oprtor', 'output_condition', 'priority_order'])[col].str.split(',', expand=True).stack()
Where col is the name of the string column, which you wish to split and explode.
Generally expand will do the horizontal explode, while stack will move everything into one column.
I used to below code to get rid off from explode():
df_main1 = (df_main1.set_index(['rule_id', 'applied_sql_function1', 'input_condition', 'input_value', 'and_or_not_oprtor', 'output_condition', 'priority_order'])['comb_fld_order']
.astype(str)
.str.split(',', expand=True)
.stack()
.reset_index(level=-1, drop=True)
.reset_index(name='comb_fld_order'))
This question already has answers here:
How do I create variable variables?
(17 answers)
Closed 4 years ago.
class MyClass:
__init__(self, some_string):
self.some_string = anything
So similar to the DataFrame class of pandas where the columns can be referenced by appending the name of the column after the dot operator.
Like:
df.column_name_of_some_column
But the column name is a string value when read in from a CSV, for example. Hence the DataFrame classed has someway to create a variable from a string value.
How does it do that?
Best option is to use Dictionary and map the keys as per names.
You can add variables at runtime. But keep in mind that you don't override the
values.
This question already has answers here:
How do I create variable variables?
(17 answers)
Closed 9 months ago.
I'm trying to read a bunch of csvs into pandas using a for loop. I want the table names to be the last bit of the full file path before the extension. For example,
ACS_BV0002_2016_Age.csv
would be
Age
I am doing this so I can create dictionaries with the table name as a key and the column names and data types as values, which I can then use in psycogpg2 to create all of my tables in postgresql in one fell swoop.
This seems to get the name i want:
path = r"C:\Data\Waste_Intervention\Census_Tables\Cleaned"
fList = os.listdir(path)
for doc in fList:
csv = "{}\\{}".format(path, doc)
name = doc.split("_")[-1][:-4]
pd.read_csv(csv)
Is there a way I can use the output of name become the variable name for the dataframe read in by pd.read_csv?
From your code, it is not clear why you assign the result of read_csv to the item that you are trying to assign to the dataframe. Anyway, you asked:
Is there a way I can pass the bit of string I want in to the table
name for pd.csv_read so that I can get all csvs in the path into
pandas with a for loop and have them retain a simple understandable
name?
In this situation, there are a limited number of things you can do. DataFrame objects aren't really associated with a "name", per say, you use a descriptive variable name to handle that.
However, for your case, where you wish to create a variable number of variables, the easiest thing to do (that I'd do), is to use a dictionary.
dfs = {}
for doc in fList:
i = "{}\\{}".format(path, x[0])
j = doc.split("_")[-1][:-4]
dfs[j] = pd.read_csv(i)
You can now refer to the dataframe loaded from ACS_16_5YR_B02001_race.csv using dfs['race']!