I'm reading a csv file with pandas that has no headers.
df = pd.read_csv('file.csv', header=0)
csv file containing 1 row with several users:
admin
user
system
sysadmin
adm
administrator
I need to read the file to a df or a list except for example: sysadmin
and save the result to the csv file
admin
user
system
adm
administrator
Select first columns, filter by boolean indexing and write to file:
df = pd.read_csv('file.csv', header=0)
df[df.iloc[:, 0].ne('sysadmin')].to_csv(file, index=False)
#if there is csv header converted to column name
#df[df['colname'].ne('sysadmin')].to_csv(file, index=False)
If no header in csv need parameters like:
df = pd.read_csv('file.csv', header=None)
df[df.iloc[:, 0].ne('sysadmin')].to_csv(file, index=False, header=False)
you can give it a try:---
df = pd.read_csv('file.csv')
df=df.transpose()
df.columns=['admin','user','system','sysadmin','adm','administrator']
df.head()
I think this will work it out.
if not then try it:
df = pd.read_csv('file.csv')
df.columns=['admin','user','system','sysadmin','adm','administrator']
df.head()
Related
I was trying to remove the empty column in an excel file using pandas using dropna() method. But I ended up with the above error message. Please find my code below :
import pandas as pd
df = pd.ExcelFile("1.xlsx")
print(df.sheet_names)
#df.dropna(how='all', axis=1)
newdf = df.dropna()
Please provide more code and context, but this might help:
import pandas as pd
excel_file_name = 'insert excel file path'
excel_sheet_name = 'insert sheet name'
# create dataframe from desired excel file
df = pd.read_excel(
excel_file_name,
engine='openpyxl',
sheet_name=excel_sheet_name
)
# drop columns with NaN values and write that into df
# # without the inplace option it would have to be
# < df = df.dropna(axis=1) >
df.dropna(axis=1, inplace=True)
# write that dataframe to excel file
with pd.ExcelWriter(
excel_file_name, # file to write to
engine='openpyxl', # which engine to use
mode='a', # use mode append (has to be used for if_sheet_exists to work)
if_sheet_exists='replace' # if that sheet exists, replace it
) as writer:
df.to_excel(writer, sheet_name=excel_sheet_name)
I have a .csv file with 100 rows of data displayed like this
"Jim 1234"
"Sam 1235"
"Mary 1236"
"John 1237"
What I'm trying to achieve is splitting the numbers from the names into 2 columns in python
edit*
Using,
import pandas as pd
df = pd.read_csv('test.csv', sep='\s+')
df.to_csv('result.csv', index=False)
I managed to get it to display like this in excel
However, the numbers still do not show up in column B as I expected.
Your data have only one column and a tab delimiter:
pd.read_csv('test.csv', quoting=1, header=None, squeeze=True) \
.str.split('\t', expand=True) \
.to_csv('result.csv', index=False, header=False)
very simple way,
data=pd.DataFrame(['Jim1234','Sam4546'])
data[0].str.split('(\d+)', expand=True)
if your file resemble to the picture below then the next code will work csv file content
import pandas as pd
df = pd.read_csv('a.csv', header=None, delimiter='\s')
df
code execution
I have the csv file that have columns with no content just headers. And I want them to be included to resulting DataFrame but pandas cuts them off by default. Is there any way to solve this by using read_csv not read_excell?
IIUC, you need header=None:
from io import StringIO
import pandas as pd
data = """
not_header_1,not_header_2
"""
df = pd.read_csv(StringIO(data), sep=',')
print(df)
OUTPUT:
Empty DataFrame
Columns: [not_header_1, not_header_2]
Index: []
Now, with header=None
df = pd.read_csv(StringIO(data), sep=',', header=None)
print(df)
OUTPUT:
0 1
0 not_header_1 not_header_2
I need to know how to open a xls file that is already made, I want to delete some columns and then save the file. This is what I have but I get an error when I want to delete the columns. How do I use the DataFrame function to delete columns and then save.
Read in excel file
Workbook = xlrd.open_workbook("C:/Python/Python37/Files/firstCopy.xls", on_demand=True)
worksheet = Workbook.sheet_by_name("Sheet1")
Delete a column
df.DataFrame.drop(['StartDate', 'EndDate', 'EmployeeID'], axis=1, inplace=True)
Workbook.save('output.xls')
Without seeing your dataset and error it is hard to tell what is going on. See How to Ask and how to create a Minimal, Complete, and Verifiable example.
Here's what I would suggest:
import pandas as pd
df = pd.read_excel('firstCopy.xls')
df.drop(['StartDate', 'EndDate', 'EmployeeID'], axis=1)
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()
import pandas as pd
df = pd.read_excel('firstCopy.xls')
df.drop(['StartDate', 'EndDate', 'EmployeeID'], axis=1, inplace = True)
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()
I have an Excel workbook with many tabs.
Each tab has the same set of headers as all others.
I want to combine all of the data from each tab into one data frame (without repeating the headers for each tab).
So far, I've tried:
import pandas as pd
xl = pd.ExcelFile('file.xlsx')
df = xl.parse()
Can use something for the parse argument that will mean "all spreadsheets"?
Or is this the wrong approach?
Thanks in advance!
Update: I tried:
a=xl.sheet_names
b = pd.DataFrame()
for i in a:
b.append(xl.parse(i))
b
But it's not "working".
This is one way to do it -- load all sheets into a dictionary of dataframes and then concatenate all the values in the dictionary into one dataframe.
import pandas as pd
Set sheetname to None in order to load all sheets into a dict of dataframes
and ignore index to avoid overlapping values later (see comment by #bunji)
df = pd.read_excel('tmp.xlsx', sheet_name=None, index_col=None)
Then concatenate all dataframes
cdf = pd.concat(df.values())
print(cdf)
import pandas as pd
f = 'file.xlsx'
df = pd.read_excel(f, sheet_name=None, ignore_index=True)
df2 = pd.concat(df, sort=True)
df2.to_excel('merged.xlsx',
engine='xlsxwriter',
sheet_name=Merged,
header = True,
index=False)