How remove numbering from output after extract xls file with pandas [Python] - python

I have a Python Script that extracts a specific column from an Excel .xls file, but the output has a numbering next to the extracted information, so I would like to know how to format the output so that they don't appear.
My actual code is this:
for i in sys.argv:
file_name = sys.argv[1]
workbook = pd.read_excel(file_name)
df = pd.DataFrame(workbook, columns=['NOM_LOGR_COMPLETO'])
df = df.drop_duplicates()
df = df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
print(df)
My current output:
1 Street Alpha <br>
2 Street Bravo
But the result I need is:
Street Alpha <br>
Street Bravo
without the numbering, just the name of the streets.
Thanks!

I believe you want to have a dataframe without the index. Note that you cannot have a DataFrame without the indexes, they are the whole point of the DataFrame. So for your case, you can adopt:
print(df.values)
to see the dataframe without the index column. To save the output without index, use:
writer = pd.ExcelWriter("dataframe.xlsx", engine='xlsxwriter')
df.to_excel(writer, sheet_name = df, index=False)
writer.save()
where file_name = "dataframe.xlsx" for your case.
Further references can be found at:
How to print pandas DataFrame without index
Printing a pandas dataframe without row number/index
disable index pandas data frame
Python to_excel without row names (index)?

Related

Skip Columns with pandas

problem
I have first concatenating all data from the available excel files into a single dataframe and then writing that dataframe into a new excel file. However, I would like to do 2 simple things:
a leave a 2columns blank for each new dataframe that will be appended
b the headers and the bold formatting has disappeared after appending the dataframes. see a pic of how one excelFile initially looked Original formatting
attempt This is my attempt Two Seperate DataFrames
data = []
for excel_file in excel_files:
print(excel_file) # the name for the dataframe
data.append(pd.read_excel(excel_file, engine="openpyxl"))
df1 = pd.DataFrame(columns=['DVT', 'Col2', 'Col3']) #blank df maybe?!this line is not imp!
#df1.style.set_properties(subset=['DVT'], {'font-weight:bold'}) !this line is not imp!
# concatenate dataframes horizontally
df = pd.concat(data, axis=1)
# save combined data to excel
df.to_excel(excelAutoNamed, index=False)
I don't have Excel available right now, so I can't test, but something like this might be a good approach.
# Open the excel document using a context manager in 'append' mode.
with pd.ExcelWriter(excelAutoNamed, mode="a", engine="openpyxl", if_sheet_exists="overlay") as writer:
for excel_file in excel_files:
print(excel_file)
# Append Dataframe to Excel File.
pd.read_excel(excel_file, engine="openpyxl").to_excel(writer, index=False)
# Append Dataframe with two blank columns to File.
pd.DataFrame([np.nan, np.nan]).T.to_excel(writer, index=False, header=False)

In pandas, how can I read an Excel sheet that has an specific name in a column?

So basically I have a ton of files that change each week that passes by, I want to know if there is a way that I can go ahead and specify to the python script to grab that sheet that contains an specific column name, so for example
For file test.xlsx I have the following structure
sheet 1
columnA
columnB
columnC
Columnd
ColumnE
ColumnF
dsf
sdfas
asdf
sadf
asfdsd
sdfasf
sdfsd
sadfsd
asdfsd
asdfasd
asdfdsf
sdfadf
Sheet 2
jira_tt
alignID
issueID
testID
dsf
sdfas
asdf
sadf
As you can see the excel file has 2 sheets, however this is just an example as some file may have more than 2 sheets or the names of the sheets will change, as stated above I want to read all the sheets in all the files that have the keyword "jira" on their columns, so far I have been able to create a script that reads all the files on the target folder, however I don't have a clue on how to specify the sheet as I needed, here is part of the code that I've created so far
import glob as glob
import pandas as pd
#using glob to get all the files that contains an xlsx extension
ingestor = glob.glob("*.xlsx")
for f in ingestor:
df = pd.read_excel(f)
df
Any kind of assistance or guidance will be appreciated it.
To include all your files as DataFrame's you can create a list to store it and use merge() method to split files in one DataFrame, for example:
import glob as glob
import pandas as pd
ingestor = glob.glob("*.xlsx")
df = reduce(lambda left,right: pd.merge(left,right), [pd.read_excel(data) for data in ingestor])
print(df.isin(['jira']))
if want just files that contains a specific word (like "jira"), you need to evaluate with a conditional using any() method on each iteration and merge data:
ingestor = glob.glob("*.xlsx")
df = pd.DataFrame()
for f in ingestor:
data = pd.read_excel(f)
df.insert(data[data.str.contains("jira"))
print(df)
First, note that pd.read_excel(f) returns the first sheet by default. If you want to get another sheet or more than one, you should use the sheet_name argument of pandas.read_excel, read here.
For the case that the number of sheets is unknown, specify None to get all worksheets:
pd.read_excel(f, sheet_name=None)
Note that now a dict of DataFrames is returned.
To get the sheets with a column that contain "jira", simply check the column's name:
for f in files:
df_dict = pd.read_excel(f, sheet_name=None)
for sheet_name, df in df_dict.items():
for column in df.columns:
if 'jira' in column:
# do something with this column or df
print(df[column])

excel sheets name in pandas dataframe

I have an Excel workbook that I have already loaded and put all the sheets together, now I would like to add a column where I have the name of each original sheet, I don't know if I have to do it before I put everything together, and if that's how I could do it , I am using pandas. This is my code so far, I want the sheet name or number to be in the "Week" column.
xlsx= pd.ExcelFile('archivo.xlsx')
hojas=[]
for hojaslibro in xlsx.sheet_names:
hojas.append(xlsx.parse(hojaslibro))
estado=pd.concat(hojas,ignore_index=True)
estado['Week']=0
This should work:
xl = pd.ExcelFile('archvio.xlsx')
df_combined = pd.DataFrame()
for sheet_name in xl.sheet_names:
df = xl.parse(sheet_name)
df['Week'] = sheet_name # this adds `sheet_name` into the column `Week`
df_combined = df_combined.append(df)

Exporting dataframe to csv not showing first column

I'm trying to export my df to a .csv file. The df has just two columns of data: the image name (.jpg), and the 'value_counts' of how many times that .jpg name occurs in the 'concat_Xenos.csv' file, i.e:
M116_13331848_13109013329679.jpg 19
M116_13331848_13109013316679.jpg 14
M116_13331848_13109013350679.jpg 12
M116_13331848_13109013332679.jpg 11
etc. etc. etc....
However, whenever I export the df, the .csv file only displayes the 'value_counts' column. How do I modify this?
My code is as follows:
concat_Xenos = r'C:\file_path\concat_Xenos.csv'
df = pd.read_csv(concat_Xenos, header=None, index_col=False)[0]
counts = df.value_counts()
export_csv = counts.to_csv (r'C:\file_path\concat_Xenos_valuecounts.csv', index=None, header=False)
Thanks! If any clarification is needed please ask :)
R
This is because the first column is set as index.
Use index=True:
export_csv = counts.to_csv (r'C:\file_path\concat_Xenos_valuecounts.csv', index=True, header=False)
or you can reset your index before exporting.
counts.reset_index(inplace=True)

Convert excel file with many sheets (with spaces in the name of the shett) in pandas data frame

I would like to convert an excel file to a pandas dataframe. All the sheets name have spaces in the name, for instances, ' part 1 of 22, part 2 of 22, and so on. In addition the first column is the same for all the sheets.
I would like to convert this excel file to a unique dataframe. However I dont know what happen with the name in python. I mean I was hable to import them, but i do not know the name of the data frame.
The sheets are imported but i do not know the name of them. After this i would like to use another 'for' and use a pd.merge() in order to create a unique dataframe
for sheet_name in Matrix.sheet_names:
sheet_name = pd.read_excel(Matrix, sheet_name)
print(sheet_name.info())
Using only the code snippet you have shown, each sheet (each DataFrame) will be assigned to the variable sheet_name. Thus, this variable is overwritten on each iteration and you will only have the last sheet as a DataFrame assigned to that variable.
To achieve what you want to do you have to store each sheet, loaded as a DataFrame, somewhere, a list for example. You can then merge or concatenate them, depending on your needs.
Try this:
all_my_sheets = []
for sheet_name in Matrix.sheet_names:
sheet_name = pd.read_excel(Matrix, sheet_name)
all_my_sheets.append(sheet_name)
Or, even better, using list comprehension:
all_my_sheets = [pd.read_excel(Matrix, sheet_name) for sheet_name in Matrix.sheet_names]
You can then concatenate them into one DataFrame like this:
final_df = pd.concat(all_my_sheets, sort=False)
You might consider using the openpyxl package:
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(filename=file_path, read_only=True)
all_my_sheets = wb.sheetnames
# Assuming your sheets have the same headers and footers
n = 1
for ws in all_my_sheets:
records = []
for row in ws._cells_by_row(min_col=1,
min_row=n,
max_col=ws.max_column,
max_row=n):
rec = [cell.value for cell in row]
records.append(rec)
# Make sure you don't duplicate the header
n = 2
# ------------------------------
# Set the column names
records = records[header_row-1:]
header = records.pop(0)
# Create your df
df = pd.DataFrame(records, columns=header)
It may be easiest to call read_excel() once, and save the contents into a list.
So, the first step would look like this:
dfs = pd.read_excel(["Sheet 1", "Sheet 2", "Sheet 3"])
Note that the sheet names you use in the list should be the same as those in the excel file. Then, if you wanted to vertically concatenate these sheets, you would just call:
final_df = pd.concat(dfs, axis=1)
Note that this solution would result in a final_df that includes column headers from all three sheets. So, ideally they would be the same. It sounds like you want to merge the information, which would be done differently; we can't help you with the merge without more information.
I hope this helps!

Categories