I'd like to export 1 column from excel to txt. I tried this:
import pandas as pd
pd.read_excel('C:/Events.xlsx', sheet_name='Data')
xlsx = pd.read_excel('C:/Events.xlsx', sheet_name='Data')
xlsx = pd.read_excel('C:/Events.xlsx','Data', usecols='F:F')
with open('C:/filename.txt', 'w') as outfile:
xlsx.to_string(outfile, index=False)
output:
20220,333333333333333
NaN
The problem are:
-I found first blank space.
-in second row I found NaN.
Do you have any ideas?
Thank you for your support
Angelo
In the test file it looks like the cell in the second row (in col F) is blank. Pandas will automatically read in blank cells as NaN. Is this row blank in the main file too?
If you have blank cells and you don't want to read them in as datatype NaN, you can convert them to empty string instead by using the keep_default_na parameter when importing the file:
pd.read_excel('your_file_name.csv', keep_default_na=False)
Does this help?
In the test file it looks like the cell in the second row (in col F) is blank. Pandas will automatically read in blank cells as NaN. Is this row blank in the main file too?
If you have blank cells and you don't want to read them in as datatype NaN, you can convert them to empty string instead by using the keep_default_na parameter when importing the file:
pd.read_excel('your_file_name.csv', keep_default_na=False)
graceface
Related
Using Pandas, I'm trying to extract value using the key but I keep failing to do so. Could you help me with this?
There's a csv file like below:
value
"{""id"":""1234"",""currency"":""USD""}"
"{""id"":""5678"",""currency"":""EUR""}"
I imported this file in Pandas and made a DataFrame out of it:
dataframe from a csv file
However, when I tried to extract the value using a key (e.g. df["id"]), I'm facing an error message.
I'd like to see a value 1234 or 5678 using df["id"]. Which step should I take to get it done? This may be a very basic question but I need your help. Thanks.
The csv file isn't being read in correctly.
You haven't set a delimiter; pandas can automatically detect a delimiter but hasn't done so in your case. See the read_csv documentation for more on this. Because the , the pandas dataframe has a single column, value, which has entire lines from your file as individual cells - the first entry is "{""id"":""1234"",""currency"":""USD""}". So, the file doesn't have a column id, and you can't select data by id.
The data aren't formatted as a pandas df, with row titles and columns of data. One option is to read in this data is to manually process each row, though there may be slicker options.
file = 'test.dat'
f = open(file,'r')
id_vals = []
currency = []
for line in f.readlines()[1:]:
## remove obfuscating characters
for c in '"{}\n':
line = line.replace(c,'')
line = line.split(',')
## extract values to two lists
id_vals.append(line[0][3:])
currency.append(line[1][9:])
You just need to clean up the CSV file a little and you are good. Here is every step:
# open your csv and read as a text string
with open('My_CSV.csv', 'r') as f:
my_csv_text = f.read()
# remove problematic strings
find_str = ['{', '}', '"', 'id:', 'currency:','value']
replace_str = ''
for i in find_str:
my_csv_text = re.sub(i, replace_str, my_csv_text)
# Create new csv file and save cleaned text
new_csv_path = './my_new_csv.csv' # or whatever path and name you want
with open(new_csv_path, 'w') as f:
f.write(my_csv_text)
# Create pandas dataframe
df = pd.read_csv('my_new_csv.csv', sep=',', names=['ID', 'Currency'])
print(df)
Output df:
ID Currency
0 1234 USD
1 5678 EUR
You need to extract each row of your dataframe using json.loads() or eval()
something like this:
import json
for row in df.iteritems():
print(json.loads(row.value)["id"])
# OR
print(eval(row.value)["id"])
I am trying to open a csv file by skipping first 5 rows. The data is not getting aligned in dataframe. See screenshot of file
PO = pd.DataFrame()
PO = pd.read_table(acct.csv',sep='\t',skiprows=5,skip_blank_lines=True)
PO
try to set it after import datewise as below.
First sort your data with proper import as it is sticked to the index values. see data image again and data as well. So, when you have proper separator / delimiter you can do following.
do = pd.read_csv('check_test.csv', "r", delimiter='\t', skiprows=range(1, 7),skip_blank_lines=True, encoding="utf8")
d01 = do.iloc[:,1:7]
d02 = d01.sort_values('Date,Reference,Debit')
This is sorting the values into the way you want.
I am trying to change this excel file into csv and I would like replace empty cells with Nan. Also do you have any advice on how to clean up the data from excel better? My Code so far:
sheet1 = wb.sheet_by_index(1)
with open("data%s.csv" %(sheet1.name.replace(" ","")), "w", encoding='utf-8') as file:
writer = csv.writer(file, delimiter = ",")
header = [cell.value for cell in sheet1.row(1)]
writer.writerow(header)
for row_idx in range(2, sheet1.nrows):
row = [int(cell.value) if isinstance(cell.value, float) else cell.value
for cell in sheet1.row(row_idx)]
writer.writerow(row)
You can try to use the data library Pandas in python to organize your data better and easier. It can help you change your data to dataframe. You can simply replace the empty value to something like
df.replace(r'^\s*$', np.nan, regex=True)
if you use this module. You can transfer your dataframe back to csv file again after you clean up your dataframe.
Pandas and numpy libraries have some great in-built functionality for working with csv's (and excel spreadsheets). You can load your excel sheet to a dataframe very easily using Pandas read_excel, then using a bit of regex replace whitespace characters with Nan's using numpy. Then save the datafram as a csv using to_csv.
import pandas as pd
import numpy as np
#read in your excel sheet, default is the first sheet
df=read_excel("data.xlsx",sheet_name='data_tab')
#regex for hidden vales e.g. spaces or empty strings
df=df.replace(r'^\s*$', np.nan, regex=True)
#now save this as a csv using to_csv
df.to_csv("csv_data.csv")
I want to export my dataframe to a csv file. normally I want my dataframe as 2 columns but when I export it, in csv file there is only one column and the data is separated with comma.
m is one column and s is another.
df = pd.DataFrame({'MSE':[m], 'SSIM': [s]})
to append new data frames I used below function and save data to csv file:.
with open('test.csv', 'a+') as f:
df.to_csv(f, header=False)
print(df)
when I print dataframe on console output looks like:
MSE SSIM
0 0.743373 0.843658
but in csv file a column looks like: here first is index, second is m and last one is s. I want them in 3 seperate columns
0,1.1264238582283046,0.8178900901529639
How can I solve this?
Your excel setting is most likely ; (semi-colon). Use:
df.to_csv(f, header=False, sep=';')
df=pd.DataFrame(['abc\n123\n232','1\n2\n3\n4\n5\n6'])
df.to_csv('text.csv')
I would like to have in a single cell in the xlsx (Edited: not csv):
abc
123
232
The desired output is A1 cell only being filled.
The dataframe has only 1 cell.
But the above code would result in the xlsx (Edited: not csv) printing that 1 cell into multiple cells.
Is there a way to format and write the xlsx (Edited: not csv) into multilines within each cell?
Edit:
I shall clarify my problem. There is nothing wrong with my dataframe definition. I would like the "\n" within the strings in each cell of the dataframe to become a line break within the xlsx (Edited: not csv) cell. this is another example.
df=pd.DataFrame(['abc\n123\n232','1\n2\n3\n4\n5\n6'])
df.to_csv('text.csv')
The desired output is A1 and A2 cells only being filled.
Edit 2:
Not in csv but xlsx.
You can use .to_excel with index=False and header=False.
df.to_excel('test.xlsx', index=False, header=False)
But you may need to turn on 'Wrap Text' by yourself.
why not use openpyxl for that it will work
from openpyxl import Workbook
workbook = Workbook()
worksheet = workbook.worksheets[0]
worksheet.title = "Sheet1"
worksheet.cell('A1').style.alignment.wrap_text = True
worksheet.cell('A1').value = "abc\n123\n232"
worksheet.cell('B1').value = "1\n2\n3\n4\n5\n6"
workbook.save('test.xlsx')