I am starting to learn and understand panda module in Python. However, my issue is with the rename string. The rename works fine when i use print, this shows the column has been renamed:
print(data.rename(columns={"Rep": "Name"}))
However, when i use print(data), to show all of the data from the document, the column does not show as being renamed. This also does not show when the file has been exported using the data.to_csv("example.csv") string.
Would really appreciate if somebody could shed some light on this please.
Full Source code below:
import pandas as pd
data = pd.read_excel(r"D:\Downloads\Book1.xlsx")
del data["Region"]
del data["Item"]
print(data.rename(columns={"Rep": "Name"})
print(data)
data.to_csv("example.csv")
Use inplace argument, to make the changes reflect in the DataFrame as well, like this:
data.rename(columns={"Rep": "Name"}, inplace = True)
Try adding 'inplace=True' to data.rename
print(data.rename(columns={"Rep": "Name"}, inplace=True))
Related
I am trying to run the line of code:
pd.get_dummies(pd_df, columns = ['ethnicity'])
However, I keep getting the error 'DataFrame' object has no attribute '_internal'. It looks like its linked to the ...pyspark/pandas/namespace.py file so therefore I am not too sure how to fix it.
Unfortunately, the dataframe itself is private so I can't show/describe it on Stackoverflow however any information about why this could be happening would be greatly appreciated!
I can make the example below work perfectly but it wont work on my code even though it is exactly the same I just have a different DataFrame that has been changed from PySpark to Pandas:
sales_data = pd.DataFrame({"name":["William","Emma","Sofia","Markus","Edward","Thomas","Ethan","Olivia","Arun","Anika","Paulo"]
,"sales":[50000,52000,90000,34000,42000,72000,49000,55000,67000,65000,67000]
,"region":["East","North","East","South","West","West","South","West","West","East",np.nan]
}
)
pd.get_dummies(sales_data, columns = ['region'])
I had this same error. I was confusing the execution by using ps (pyspark.pandas) instead of pd (pandas).
Ensure your alias are correct and you're not accidentally renaming a pandas instantiation:
Ex.
import pyspark.pandas as pd
I am trying to write a code that reads a csv file and can save each columns as a specific variable. I am having difficulty because the header is 7 lines long (something I can control but would like to just ignore if I can manipulate it in code), and then my data is full of important decimal places so it can not change to int( or maybe string?) I've also tried just saving each column by it's placement in the file but am struggling to run it. Any ideas?
Image shows my current code that I have slimmed to show important parts and circles data that prints in my console.
save each columns as a specific variable
import pandas as pd
pd.read_csv('file.csv')
x_col = df['X']
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
If what you are looking for is how to iterate through the columns, no matter how many there are. (Which is what I think you are asking.) Then this code should do the trick:
import pandas as pd
import csv
data = pd.read_csv('optitest.csv', skiprows=6)
for column in data.columns:
# You will need to define what this save() method is.
# Just placing it here as an example.
save(data[column])
The line about formatting your data as a number or a string was a little vague. But if it's decimal data, then you need to use float. See #9637665.
I am exporting a pandas dataframe as an excel file from a tutorial, however the resulting file does not include the highlighting and I have no idea why.
To style it:
df_styled = df.style.apply(lambda x: ['background: orange' for x in df.Margin_rate], axis=0)
and then to export it:
df_styled.to_excel('excel_python_tutorial_marked.xlsx', engine='openpyxl', index=False)
I have made sure to create a new df to export it and everything, where am I going wrong?
Because it's meant to look like this:
But instead it looks normal in excel:
Apparently you need to pass style information explicitly into the openpyxl writer. Maybe this helps.
I have had a good experience with the following, but you might need additional packages and restructure your code a little: https://xlsxwriter.readthedocs.io/example_pandas_column_formats.html
I have a CSV file, diseases_matrix_KNN.csv which has excel table.
Now, I would like to store all the numbers from the row like:
Hypothermia = [0,-1,0,0,0,0,0,0,0,0,0,0,0,0]
For some reason, I am unable to find a solution to this. Even though I have looked. Please let me know if I can read this type of data in the chosen form, using Python please.
most common way to work with excel is use Pandas.
Here is example:
import pandas as pd
df = pd.read_excel(filename)
print (df.iloc['Hypothermia']). # gives you such result
I have a 1 column excel file. I want to import all the values it has in a variable x (something like x=[1,2,3,4.5,-6.....]), then use this variable to run numpy.correlate(x,x,mode='full') to get autocorrelation, after I import numpy.
When I manually enter x=[1,2,3...], it does the job fine, but when I try to copy paste all the values in x=[], it gives me a NameError: name 'NO' is not defined.
Can someone tell me how to go around doing this?
You can use Pandas to import a CSV file with the pd.read_csv function.