I have a folder with multiply .txt files in the all in the same format, tab separated. I'm trying to convert them to csv's separated by column.
I've tried a simple read_file.to_csv (r'C:\Users\Desktop\workspace\Converter\20200923.csv', index=False)
But it doesn't do the separation I'm looking for. Any suggestions are most welcomed. Thank you!
Try something like this:
import os
import pandas as pd
for filename in os.listdir('path/to/dir/'):
if filename.endswith('.txt'):
df = pd.read_table(filename,sep='\t', header=None) # header=None becuase you didn't say that it was data, if it is data just remove this.
df.to_csv(f'{filename[:-3]}csv', index=False)
Related
Hi there stack overflow community,
I have several csv-files in a folder and I need to append a column containing the first 8 chars of each filename in a aditional column of the csv. After this step i want to save the datafram including the new colum to the same file.
I get the right output, but it doesn't save the changes in the csv file :/
Maybe someone has some inspiration for me. Thanks a lot!
from tkinter.messagebox import YES
import pandas as pd
import glob, os
import fnmatch
import os
files = glob.glob(r'path\*.csv')
for fp in files:
df = pd.concat([pd.read_csv(fp).assign(date=os.path.basename(fp).split('.')[0][:8])])
#for i in df('date'):
#Decoder problem
print(df)
use:
df.to_csv
like this:
for fp in files:
df = pd.concat([pd.read_csv(fp).assign(date=os.path.basename(fp).split('.')[0][:8])])
df.to_csv(fp, index=False) # index=False if you don't want to save the index as a new column in the csv
btw, I think this may also work and is more readable:
for fp in files:
df = pd.read(fp)
df[date] = os.path.basename(fp).split('.')[0][:8]
df.to_csv(fp, index=False)
I'd like to save the output I get from this piece of code:
import pandas as pd
df = pd.read_csv("inputfile.csv",sep=";",decimal=",", nrows=100)
print (df)
to a new file that then only includes the 100 rows from the input-file?
I tried something with 'w', but that didn't really work.
Thanks for your help!
pandas can also write to *.csv files:
# Write to file with same separator/decimal setting as in your input file
df.to_csv('my_csv.csv', sep=';', decimal=',')
You will need to use the pandas.Dataframe.to_csv function:
df.to_csv("outputfile.csv", sep=";", decimal=",")
I have a text file that contains data like this. It is is just a small example, but the real one is pretty similar.
I am wondering how to display such data in an "Excel Table" like this using Python?
The pandas library is wonderful for reading csv files (which is the file content in the image you linked). You can read in a csv or a txt file using the pandas library and output this to excel in 3 simple lines.
import pandas as pd
df = pd.read_csv('input.csv') # if your file is comma separated
or if your file is tab delimited '\t':
df = pd.read_csv('input.csv', sep='\t')
To save to excel file add the following:
df.to_excel('output.xlsx', 'Sheet1')
complete code:
import pandas as pd
df = pd.read_csv('input.csv') # can replace with df = pd.read_table('input.txt') for '\t'
df.to_excel('output.xlsx', 'Sheet1')
This will explicitly keep the index, so if your input file was:
A,B,C
1,2,3
4,5,6
7,8,9
Your output excel would look like this:
You can see your data has been shifted one column and your index axis has been kept. If you do not want this index column (because you have not assigned your df an index so it has the arbitrary one provided by pandas):
df.to_excel('output.xlsx', 'Sheet1', index=False)
Your output will look like:
Here you can see the index has been dropped from the excel file.
You do not need python! Just rename your text file to CSV and voila, you get your desired output :)
If you want to rename using python then -
You can use os.rename function
os.rename(src, dst)
Where src is the source file and dst is the destination file
XLWT
I use the XLWT library. It produces native Excel files, which is much better than simply importing text files as CSV files. It is a bit of work, but provides most key Excel features, including setting column widths, cell colors, cell formatting, etc.
saving this is:
df.to_excel("testfile.xlsx")
I have around 60 .csv files which i would like to combine in pandas. So far i've used this:
import pandas as pd
import glob
total_files = glob.glob("something*.csv")
data = []
for csv in total_files:
list = pd.read_csv(csv, encoding="utf-8", sep='delimiter', engine='python')
data.append(list)
biggerlist = pd.concat(data, ignore_index=True)
biggerlist.to_csv("output.csv")
This works somewhat, only the files I would like to combine all have the same structure of 15 columns with the same headers. When I use this code, only one column is filled with info of the entire row, and every column name is add-up of all column names (e.g. SEARCH_ROW, DATE, TEXT, etc.).
How can I combine these csv files, while keeping the same structure of the original files?
Edit:
So perhaps I should be a bit more specific regarding my data. This is a snapshot of one of the .csv files i'm using:
As you can see it is just newspaper-data, where the last column is 'TEXT', which isn't shown completely when you open the file.
This is a part of how it looks when i have combined the data using my code.
Apart, i can read any of these .csv files no problem using
data = pd.read_csv("something.csv",encoding="utf-8", sep='delimiter', engine='python')
I solved it!
The problem was the amount of comma's in the text part of my .csv files. So after removing all comma's (just using search/replace), I used:
import pandas
import glob
filenames = glob.glob("something*.csv")
df = pandas.DataFrame()
for filename in filenames:
df = df.append(pandas.read_csv(filename, encoding="utf-8", sep=";"))
Thanks for all the help.
So i am trying to read a csv file by the code as below:
import pandas as pd
user_cols = ['id','listing_type','status','listing_class','property_type','street_address','city','state',' 'zip_4','cross_street','street_index','unit','floor','location','Latitude',
'longitude','subway','neighborhood','price','incentives','fee_type','fee_percentage','fee_details_broker',
'fee_details_clients','application_information','maintenance','taxes','max_financing','other_costs','beds',
'baths','full_baths','three_quarter_baths','half_baths','total_rooms','square_feet','exterior_square_feet',
'lot_area','lot_dimensions','date_available','date_listed','closed_on','year_built','recent_renovation',
'lease_min','lease_max','date_added','date_edited','date_update','contact','access','keys','mls_name','mls_id',
'courtesy_of','vow_opt_out','idx_opt_out','pet_details','notes','sync','private','listing_score','added_by_id',
'featured_office_id','date_expires','exclusive_file_id','condition','guarantor','blast_link']
data = pd.read_csv("C:\\Users\\Desktop\\dump-4.csv", low_memory=False, dtype=object, header=None, names=user_cols)
I am able to read the file but when i try to display the columns there are about 15-16 column names that are missing. Why is this happening and what can I do.
So when i deleted the dtype=object and header=None..it did print all the columns. not really sure what wouldve been the correct dtype though! Thanks anyway! :)