Delete entire column with specific content in python using pandas - python

Have imported an excel sheet in python using pandas now,want to delete entire column with specific content as shown in in the snap shot of content
Here from this image want to delete entire column having content of NAN which represents no data entered, later the content can be used for computation purpose using pandas and graph could be plotted using myplotlib
Is there way to delete entire column based on content not on the base of label

Try this..
s = pd.DataFrame({'1':[1,2,3,4], '2':[np.nan, np.nan, np.nan, np.nan]}) # example DataFrame
s.dropna(axis=1,how='all')
It works fine..
Try this..
for col in s.columns:
if False not in list(np.isnan(s[col])):
del s[col]

Related

Create hyperlink for each item in column in python(csv)

I am trying to create a Hyperlink for each item in a column based on another column.
Here is an image to help you understand better:
Each title should hyperlink to the corresponding URL. (when you click apple, it should go to apple.com, when you click banana it should go to banana.com, so on) Is there a way to do this to a CSV file in python?
(Let's say my data is 2000 rows)
Thanks in advance
You can use (I used) the pandas library to read the data from csv and later write it to excel, and leverage the Excel function HYPERLINK to make the cell a, well, Hyperlink. The Excel function for HYPERLINK requires the url we are going to (with http:// or https:// at the beginning) and then the visible text, or friendly-name.
One thing the Excel file will not have when you open it is blue underlined text for the Hyperlinked cells. If that is needed, you can probably leverage this solution somewhat and also use the XlsxWriter module.
import pandas as pd
df = pd.read_csv("path\test.csv") #put your actual path here
# this will read the file and save it is a 'dataframe', basically a table.
df['title'] = '=HYPERLINK("https://' + df["url"] +'","' + df["title"]+'")'
"""the df that we originally pulled in has the columns 'title' and 'url'. This line re-writes the values in
'title' to use the HYPERLINK formula in Excel, where the link is the value from the URL column with 'https://'
added to the beginning, and then uses the value from 'title' as the text the user will see in Excel"""
df.to_excel("save_location\test2.xlsx",index=False) #this saves the new file as an Excel. index=False removes the index column that was created when you first make any dataframe
If you want your output to just be one column, the one they click, your final line will be slightly different:
df['title'].to_excel("save_location\test2.xlsx",index=False)
Did not work or probably you have included too many 's
import pandas as pd
df = pd.read_csv('test.csv')
df['URL'] = 'https://url/'+df['id'].astype(str)
keep_col = ['URL']
newurl = df[keep_col]
newurl.to_csv("newurl.csv", index=False)
This code is working but the output file does not show a clickable url

Dataframe is not aligned properly

Im getting data from a rest api, convert it to json and then into a dataframe. I then put that dataframe into a csv file.
The problem is that while it recognizes the column tags correctly, it aligns them 1 to the right because a 0 showed up to the very left.
I know its the count of rows, but how do I stop it from counting OR how would I go about creating one additional column with the "counter" tag.
response_dividends = requests.get(
f"https://sandbox.iexapis.com/stable/stock/aapl/dividends/quote?token={iex_api}")
response_dividends_parsed = json.loads(response_dividends.text)
df = pd.DataFrame(response_dividends_parsed)
df.to_csv("main_data.csv")
the result then looks like this
,amount,currency,declaredDate,description,exDate,flag,frequency,paymentDate,recordDate,refid,symbol,id,key,subkey,updated
0,0.22,USD,2021-04-15,Sydhnrraas Oeir,2021-04-25,Cash,quarterly,2021-05-12,2021-04-27,2239859,AAPL,NDIDDSEIV,LAAP,2243550,1683800492545
the problem is, its not correctly aligned
I opened it in the csv viewer plugin of pycharm and it shows:
wrong aligned
If you set index=False, the row names (which is the count of rows) will not be written to your csv file.
df.to_csv("main_data.csv", index=False)

Dataframe to CSV returns one empty column which is visible in the dataframe

After scraping I have put the information in a dataframe and want to export it to a .csv but one of the three columns returns empty in the .csv file ("Content"). This is weird since the all of the three columns are visible in the dataframe, see screenshot.
Screenshot dataframe
Line I use to convert:
df.to_csv('filedestination.csv')
Inspecting the df returns objects:
Inspecting dataframe
Does anyone know how it is possible that the last column, "Content" does not show any data in the .csv file?
Screenshot .csv file
After suggestions it seems that the data is available when opening with .txt. How is it possible that excel does not show the data properly?
Screenshot .txt file data
What is the data type of the Content column?
It is not a string, you can convert that to a string. And then perform df.to_csv
Sometimes, this happens weirdly. View & export will be different. Try Resetting the index before exporting it to .csv/ excel. This always works for me.
df.reset_index()
then,
df.to_csv(r'file location/filename.csv')

Some Hyperlinks not opening with Openpyxl

I have a few hundred files with data and hyperlinks in them that I was trying to upload and append to a single DataFrame when I realized that Pandas was not reading any of the hyperlinks.
I then tried to use Openpyxl to read the hyperlinks in the input Excel files and write a new column into the excels with the text of the hyperlink that hopefully Pandas can read into my dataframe.
However, I am running into issues with my testing the openpyxl code. It is able to read and write some of the hyperlinks but not the others.
My sample file has three rows and looks like this:
My actual data has hyperlinks in the way that I have it for "Google" in my test data set.
The other two hyperlinks in my text data, I inserted by right clicking on the cell and pasting the link.
Sample Test file here: Text.xlsx
Here is the code I wrote to read the hyperlink and paste it in a new column. It works for the first two rows (India and China) but fails for the third row (Google). It's unfortunate because all of my actual data is of that type. Can someone please help me figure it out?
import openpyxl
wb = openpyxl.load_workbook('test.xlsx')
ws = wb.active
column_indices = [1]
max_col = ws.max_column
ws.cell(row=1,column = max_col+1).value = "Hyperlink Text"
for row in range(2,ws.max_row+1):
for col in column_indices:
print(ws.cell(row, column=1).hyperlink.target)
ws.cell(column=max_col+1,row=row).value = ws.cell(row, column=1).hyperlink.target
wb.save('test.xlsx')
The cells where you are using the HYPERLINK function (like google.com) will not be of type hyperlink. You will need to process the cells with HyperLink function using re so similar function.
The values looks like below,
>>> ws.cell(2,1).value
'China'
>>> ws.cell(3,1).value
'India'
>>> ws.cell(4,1).value
'=HYPERLINK("www.google.com","google")'
Suggested code to handle HYPERLINK :
val = ws.cell(row,column).value
if val.find("=HYPERLINK") >= 0 :
hyplink = ws.cell(4,1).value # Or use re module for more robust check
Note : The second for loop to iterate over columns seems not required since you are always using column=1.

Opening a Python file from another application without saving the file first (Opening a Pandas table from Excel)

I am working with pandas, and I've just modified a table
Now, I would like to see my table in excel, but it's just a quick look, and I will have to modify the table again later on, so I don't want to save my table anywhere.
In other words, the solution
my_df = pd.DataFrame()
item_path = "my/path"
my_df.to_csv("my/path")
os.startfile(os.normpath(item_path))
Is not what I want. I would like to obtain the same behavior without saving the Dataframe as CSV first.
#Something like:
my_df = pd.DataFrame()
start_excel(table_to_load = my_df) #Opens excel with a COPY of my_df
Note
To quickly explore a DataFrame, df.head() is the way, but I want to open my DataFrame from a Tkinter application. I need to use an external program to open this temporary table
you can have a quick look using
<dataframe_name>.head()
it will display top 5 rows by default
or
you can simply write how many rows you want
<dataframe_name>.head(<rows_you_want>)

Categories