I am trying to read this file using read_csv in pandas(python).
But I am not able to capture all columns.
Can you help?
Here is the code:
file = r'path of file'
df = pd.read_csv(file, encoding='cp1252', on_bad_lines='skip')
Thank you
I tried to read your file, and I first noticed that the encoding you specified does not correspond to the one used in your file. I also noticed that the separator is not a comma (,) but a tab (\t).
First, to get the file encoding (in linux), you just need to run:
$ file -i kopie.csv
kopie.csv: text/plain; charset=utf-16le
In Python:
import pandas as pd
path_to_file = 'kopie.csv'
df = pd.read_csv(path_to_file, encoding='utf-16le', sep='\t')
And when I print the shape of the loaded dataframe:
>>> df.shape
(869, 161)
Hi there stack overflow community,
I have several csv-files in a folder and I need to append a column containing the first 8 chars of each filename in a aditional column of the csv. After this step i want to save the datafram including the new colum to the same file.
I get the right output, but it doesn't save the changes in the csv file :/
Maybe someone has some inspiration for me. Thanks a lot!
from tkinter.messagebox import YES
import pandas as pd
import glob, os
import fnmatch
import os
files = glob.glob(r'path\*.csv')
for fp in files:
df = pd.concat([pd.read_csv(fp).assign(date=os.path.basename(fp).split('.')[0][:8])])
#for i in df('date'):
#Decoder problem
print(df)
use:
df.to_csv
like this:
for fp in files:
df = pd.concat([pd.read_csv(fp).assign(date=os.path.basename(fp).split('.')[0][:8])])
df.to_csv(fp, index=False) # index=False if you don't want to save the index as a new column in the csv
btw, I think this may also work and is more readable:
for fp in files:
df = pd.read(fp)
df[date] = os.path.basename(fp).split('.')[0][:8]
df.to_csv(fp, index=False)
I have a folder with multiply .txt files in the all in the same format, tab separated. I'm trying to convert them to csv's separated by column.
I've tried a simple read_file.to_csv (r'C:\Users\Desktop\workspace\Converter\20200923.csv', index=False)
But it doesn't do the separation I'm looking for. Any suggestions are most welcomed. Thank you!
Try something like this:
import os
import pandas as pd
for filename in os.listdir('path/to/dir/'):
if filename.endswith('.txt'):
df = pd.read_table(filename,sep='\t', header=None) # header=None becuase you didn't say that it was data, if it is data just remove this.
df.to_csv(f'{filename[:-3]}csv', index=False)
So i am trying to read a csv file by the code as below:
import pandas as pd
user_cols = ['id','listing_type','status','listing_class','property_type','street_address','city','state',' 'zip_4','cross_street','street_index','unit','floor','location','Latitude',
'longitude','subway','neighborhood','price','incentives','fee_type','fee_percentage','fee_details_broker',
'fee_details_clients','application_information','maintenance','taxes','max_financing','other_costs','beds',
'baths','full_baths','three_quarter_baths','half_baths','total_rooms','square_feet','exterior_square_feet',
'lot_area','lot_dimensions','date_available','date_listed','closed_on','year_built','recent_renovation',
'lease_min','lease_max','date_added','date_edited','date_update','contact','access','keys','mls_name','mls_id',
'courtesy_of','vow_opt_out','idx_opt_out','pet_details','notes','sync','private','listing_score','added_by_id',
'featured_office_id','date_expires','exclusive_file_id','condition','guarantor','blast_link']
data = pd.read_csv("C:\\Users\\Desktop\\dump-4.csv", low_memory=False, dtype=object, header=None, names=user_cols)
I am able to read the file but when i try to display the columns there are about 15-16 column names that are missing. Why is this happening and what can I do.
So when i deleted the dtype=object and header=None..it did print all the columns. not really sure what wouldve been the correct dtype though! Thanks anyway! :)
I have a bunch of DAT files that I need to convert to XLS files using Python. Should I use the CSV library to do this or is there a better way?
I'd use pandas.
import pandas as pd
df = pd.read_table('DATA.DAT')
df.to_excel('DATA.xlsx')
and of course you can setup a loop to get through all you files. Something along these lines maybe
import glob
import os
os.chdir("C:\\FILEPATH\\")
for file in glob.glob("*.DAT"):
#What file is being converted
print file
df = pd.read_table(file)
file1 = file.replace('DAT','xlsx')
df.to_excel(file1)
writer = pd.ExcelWriter('pandas_example.dat',
engine='xlsxwriter',
options={'strings_to_urls': False})
or you can use :
pd.to_excel('example.xlsx')