Pandas cuts off empty columns from csv file - python

I have the csv file that have columns with no content just headers. And I want them to be included to resulting DataFrame but pandas cuts them off by default. Is there any way to solve this by using read_csv not read_excell?

IIUC, you need header=None:
from io import StringIO
import pandas as pd
data = """
not_header_1,not_header_2
"""
df = pd.read_csv(StringIO(data), sep=',')
print(df)
OUTPUT:
Empty DataFrame
Columns: [not_header_1, not_header_2]
Index: []
Now, with header=None
df = pd.read_csv(StringIO(data), sep=',', header=None)
print(df)
OUTPUT:
0 1
0 not_header_1 not_header_2

Related

Extra column appears when appending selected row from one csv to another in Python

I have this code which appends a column of a csv file as a row to another csv file:
def append_pandas(s,d):
import pandas as pd
df = pd.read_csv(s, sep=';', header=None)
df_t = df.T
df_t.iloc[0:1, 0:1] = 'Time Point'
df_t.at[1, 0] = 1
df_t.columns = df_t.iloc[0]
df_new = df_t.drop(0)
pdb = pd.read_csv(d, sep=';')
newpd = pdb.append(df_new)
from pandas import DataFrame
newpd.to_csv(d, sep=';')
The result is supposed to look like this:
Instead, every time the row is appended, there is an extra "Unnamed" column appearing on the left:
Do you know how to fix that?..
Please, help :(
My csv documents from which I select a column look like this:
You have to add index=False to your to_csv() method

Formatting of JSON file

Can we convert the highlighted INTEGER values to STRING value (refer below link)?
https://i.stack.imgur.com/3JbLQ.png
CODE
filename = "newsample2.csv"
jsonFileName = "myjson2.json"
import pandas as pd
df = pd.read_csv ('newsample2.csv')
df.to_json('myjson2.json', indent=4)
print(df)
Try doing something like this.
import pandas as pd
filename = "newsample2.csv"
jsonFileName = "myjson2.json"
df = pd.read_csv ('newsample2.csv')
df['index'] = df.index
df.to_json('myjson2.json', indent=4)
print(df)
This will take indices of your data and store them in the index column, so they will become a part of your data.

pandas drop rows based on cell content and no headers

I'm reading a csv file with pandas that has no headers.
df = pd.read_csv('file.csv', header=0)
csv file containing 1 row with several users:
admin
user
system
sysadmin
adm
administrator
I need to read the file to a df or a list except for example: sysadmin
and save the result to the csv file
admin
user
system
adm
administrator
Select first columns, filter by boolean indexing and write to file:
df = pd.read_csv('file.csv', header=0)
df[df.iloc[:, 0].ne('sysadmin')].to_csv(file, index=False)
#if there is csv header converted to column name
#df[df['colname'].ne('sysadmin')].to_csv(file, index=False)
If no header in csv need parameters like:
df = pd.read_csv('file.csv', header=None)
df[df.iloc[:, 0].ne('sysadmin')].to_csv(file, index=False, header=False)
you can give it a try:---
df = pd.read_csv('file.csv')
df=df.transpose()
df.columns=['admin','user','system','sysadmin','adm','administrator']
df.head()
I think this will work it out.
if not then try it:
df = pd.read_csv('file.csv')
df.columns=['admin','user','system','sysadmin','adm','administrator']
df.head()

Pandas to_csv with extra zeroes

I am having some issues reading a csv to a dataframe, then when I convert to csv it will have extra decimals in it.
Currently using pandas 1.0.5 and python 3.7
For example consider the simple example below:
from io import StringIO
import pandas as pd
d = """ticker,open,close
aapl,108.922,108.583
aapl,109.471,110.25
aapl,113.943,114.752
aapl,117.747,118.825
"""
df = pd.read_csv(StringIO(d), sep=",", header=0, index_col=0)
print(df)
print("\n", df.to_csv())
The output is:
open close
ticker
aapl 108.922 108.583
aapl 109.471 110.250
aapl 113.943 114.752
aapl 117.747 118.825
ticker,open,close
aapl,108.92200000000001,108.583
aapl,109.471,110.25
aapl,113.943,114.75200000000001
aapl,117.74700000000001,118.825
as you can see there are extra zeroes added to the to_csv() output. If I change the read_csv to have dtype=str like df = pd.read_csv(StringIO(d), sep=",", dtype=str, header=0, index_col=0) then I would get my desired output, but I want the dtype to be decided by pandas, to be int64, or float depending on the column values. Instead of forcing all to be object/str.
Is there a way to eliminate these extra zeroes without forcing the dtype to str?
You can use the float-format argument:
d = """ticker,open,close
aapl,108.922,108.583
aapl,109.471,110.25
aapl,113.943,114.752
aapl,117.747,118.825
"""
df = pd.read_csv(StringIO(d), sep=",", header=0, index_col=0)
df.to_csv('output.csv',float_format='%.3f')
#This is how the output.csv file looks:
ticker,open,close
aapl,108.922,108.583
aapl,109.471,110.250
aapl,113.943,114.752
aapl,117.747,118.825

pandas Combine Excel Spreadsheets

I have an Excel workbook with many tabs.
Each tab has the same set of headers as all others.
I want to combine all of the data from each tab into one data frame (without repeating the headers for each tab).
So far, I've tried:
import pandas as pd
xl = pd.ExcelFile('file.xlsx')
df = xl.parse()
Can use something for the parse argument that will mean "all spreadsheets"?
Or is this the wrong approach?
Thanks in advance!
Update: I tried:
a=xl.sheet_names
b = pd.DataFrame()
for i in a:
b.append(xl.parse(i))
b
But it's not "working".
This is one way to do it -- load all sheets into a dictionary of dataframes and then concatenate all the values in the dictionary into one dataframe.
import pandas as pd
Set sheetname to None in order to load all sheets into a dict of dataframes
and ignore index to avoid overlapping values later (see comment by #bunji)
df = pd.read_excel('tmp.xlsx', sheet_name=None, index_col=None)
Then concatenate all dataframes
cdf = pd.concat(df.values())
print(cdf)
import pandas as pd
f = 'file.xlsx'
df = pd.read_excel(f, sheet_name=None, ignore_index=True)
df2 = pd.concat(df, sort=True)
df2.to_excel('merged.xlsx',
engine='xlsxwriter',
sheet_name=Merged,
header = True,
index=False)

Categories