I'm trying to copy a bunch of csv files into 1 big csv.
all 3 files have the same column headers, but I changed them to be according to the files name. For example, file arousal_a_103_happy.csv will now be in the new csv in the column header for it's columns.
My issue, is that
1st: it copies the file in a very strange order, it does not flip it, it's just every column is wherever it wants to be.
2nd: it doesnt copy the files next to eachother, but more like a slope. so if the first file finishes at P23, the new file will start at Q24.
This is the code:
def concatenate(path = "C:\\Users\\User\Desktop\\Work\\subject", outfile = "C:\\Users\\User\\Desktop\\Work\\subject\\concatenated.csv"):
os.chdir(path)
fileList=glob.glob("*happy.csv")
dfList=[]
print(fileList)
i=1
string = "subject"
for files in fileList:
df = pd.read_csv("C:\\Users\\Desktop\\Work\\subject\\" + files, encoding ='CP1255') #Get's error because of the link!
sub = files
i+=1
ColNames =[sub + " Level", sub +" Description", sub+" Number",sub+" Onset_Date",sub+" Onset_Time",sub+" Offset_Date",sub+" Offset_Time",sub+" Duration_Date",sub+" Duration_Time",sub+" Arousal",sub+" Gaze",sub+" Movement" , sub+" Vocalization", sub+" eyes covered", sub+" Mother\'s arrousal", sub+" Transcript"]
df.columns=ColNames
dfList.append(df)
concatDf = pd.concat(dfList, axis=0, ignore_index=True, verify_integrity=True)
concatDf.to_csv(outfile, index=None)
# Fetching files
import csv
FileNames = []
path="C:\\Users\\User\Desktop\\Work\\subject\\"
os.chdir(path)
for counter, files in enumerate(glob.glob("*.csv")):
FileNames.append(files)
print (FileNames)
# Merging all .csv from your folder 'subject'
pathout="C:\\Users\\User\Desktop\\Work\\subject\\"
for filenames in SortedFileNames:
df = pd.read_csv(filenames,encoding='utf-8')
saved_column = df.tweet
saved_column.to_csv(pathout+"mixed.csv", mode='a')
print("File Created Sucessfully mixed.csv")
Related
I was working on a project where I have to scrape the some text files from a source. I completed this task and I have 140 text file.
This is one of the text file I have scraped.
I am trying to create a dataframe where I should have one row for each text file. So I wrote the below code:-
import pandas as pd
import os
txtfolder = r'/home/spx072/Black_coffer_assignment/' #Change to your folder path
#Find the textfiles
textfiles = []
for root, folder, files in os.walk(txtfolder):
for file in files:
if file.endswith('.txt'):
fullname = os.path.join(root, file)
textfiles.append(fullname)
# textfiles.sort() #Sort the filesnames
#Read each of them to a dataframe
for filenum, file in enumerate(textfiles, 1):
if filenum==1:
df = pd.read_csv(file, names=['data'], sep='delimiter', header=None)
df['Samplename']=os.path.basename(file)
else:
tempdf = pd.read_csv(file, names=['data'], sep='delimiter', header=None)
tempdf['Samplename']=os.path.basename(file)
df = pd.concat([df, tempdf], ignore_index=True)
df = df[['Samplename','data']] #
The code runs fine, but the dataframe I am getting is some thing like this :-
I want that each text file should be inside a single row like:-
1.txt should be in df['data'][0],
2.txt should be in df'data' and so on.
I tried different codes and also check several questions but still unable to get the desired result. Can anyone help.
I'm not shure why you need pd.read_csv() for this. Try it with pure python:
result = pd.DataFrame(columns=['Samplename', 'data'])
for file in textfiles:
with open(file) as f:
data = f.read()
result = pd.concat([result, pd.DataFrame({'Samplename' : file, 'data': data}, index=[0])], axis=0, ignore_index=True)
I'm looking to read multiple csv file, and then save them into separate dataframe names.
path = os.getcwd()
csv_files = glob.glob(os.path.join(r'blabla/Data', "*.csv" ))
for f in csv_files:
df = pd.read_csv(f)
print('Location:', f)
print('File Name:', f.split("\\")[-1])
print('Content:')
df.pop('Unnamed: 0')
display(df)
print
When I display(df) within the loop, it displays all three tables in the 3 csv files in that folder. However, when I print df outside the loop, it only gives me the last table that was generated. How do I save each table from the csv file into separate data frames?
seems like you're overwriting the same variable again and again
path = os.getcwd()
csv_files = glob.glob(os.path.join(r'blabla/Data', "*.csv" ))
list_of_dfs = []
for f in csv_files:
df = pd.read_csv(f)
print('Location:', f)
print('File Name:', f.split("\\")[-1])
print('Content:')
df.pop('Unnamed: 0')
display(df)
list_of_dfs.append(df)
access the individual dataframes with list_of_dfs[0], list_of_dfs[1],...
I've a list of csv files (approx. 100) that I'd like to include in one single csv file.
The list is found using
PATH_DATA_FOLDER = 'mypath/'
list_files = os.listdir(PATH_DATA_FOLDER)
for f in list_files:
list_columns = list(pd.read_csv(os.path.join(PATH_DATA_FOLDER, f)).columns)
df = pd.DataFrame(columns=list_columns)
print(df)
Which returns the files (it is just a sample, since I have 100 and more files):
['file1.csv', 'name2.csv', 'example.csv', '.DS_Store']
This, unfortunately, includes also hidden files, that I'd like to exclude.
Each file has the same columns:
Columns: [Name, Surname, Country]
I'd like to find a way to create one unique file with all these fields, plus information of the original file (e.g., adding a new column with the file name).
I've tried with
df1 = pd.read_csv(os.path.join(PATH_DATA_FOLDER, f))
df1['File'] = f # file name
df = df.append(df1)
df = df.reset_index(drop=True).drop_duplicates() # I'd like to drop duplicates in both Name and Surname
but it returns a dataframe with the last entry, so I guess the problem is in the for loop.
I hope you can provide some help.
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
#combine all files in the list
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
#drop duplicates and reset index
combined_csv.drop_duplicates().reset_index(drop=True)
#Save the combined file
combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig')
Have you tried using glob?
filenames = glob.glob("mypath/*.csv") #list of all you csv files.
df = pd.DataFrame(columns=["Name", "Surname", "Country"])
for filename in filenames:
df = df.append(pd.read_csv(filename))
df = df.drop_duplicates().reset_index(drop=True)
Another way would be concatenating the csv files using the cat command after removing the headers and then read the concatenated csv file using pd.read_csv.
I am trying to loop through my files in different folers
the first part of the code is working :
from os import walk
import pandas as pd
path = r'C:\Users\Sarah\Desktop\test2'
my_files = []
for (dirpath, dirnames, filenames) in walk(path):
my_files.extend(filenames)
print(my_files)
the code successfully print all the files with my subfolders
however the problem comes in this part when I try to extract excel columns different files and save them in a directory
all_dicts_list = []
for file_name in my_files:
#Display sheets names using pandas
pd.set_option('display.width',300)
mosul_file = file_name
xl = pd.ExcelFile(mosul_file)
mosul_df = xl.parse(0, header=[1], index_col=[0,1,2])
#Read Excel and Select columns
mosul_file = pd.read_excel(file_name, sheet_name = 0 ,
index_clo=None, na_values= ['NA'], usecols = "C , F ,G")
#Remove NaN values
data_mosul_df = mosul_file.apply (pd.to_numeric, errors='coerce')
data_mosul_df = mosul_file.dropna()
#Save to Dictionary
datamosulx = data_mosul_df.to_dict()
all_dicts_list.append(datamosulx)
all dictionaries will be in all_dicts_list
I get an error FileNotFoundError: [Errno 2] No such file or directory I don't understand the problem or how to fix it.
Thank you
It's hard to tell because you might have lost some of the formatting from copy and pasting but make sure that after the
for file_name in my_files:
anything that you want in the for loop needs to be indented with tabs or spaces to the same level.
print out mosul_file after allocating it to see whether this could be the case and then indent appropriately.
I have a folder containing about 500 .mp4 files :
abc.mp4
lmn.mp4
ijk.mp4
Also I have a .csv file containing the file names (>500) and some values associated with them:
file name value
abc.mp4 5
xyz.mp4 3
lmn.mp4 5
rgb.mp4 4
I want to match the file names of .csv and folder and then place the mp4 files in separate folders depending on the value.
**folder 5:**
abc.mp4
lmn.mp4
**folder 3:**
xyz.mp4
and so on
I tried link
names=[]
names1=[]
for dirname, dirnames, filenames in os.walk('./videos_test'):
for filename in filenames:
if filename.endswith('.mp4'):
names.append(filename)
file = open('names.csv',encoding='utf-8-sig')
lns = csv.reader(file)
for line in lns:
nam = line [0]
sc=line[1]
names1.append(nam)
if nam in names:
print (nam, line[1])
if line[1]==5
print ('5')
print(nam) %just prints the name of file not save
else if line[1]==3
print ('3')
print(nam)
does not give any result.
I'd recommend you to use pandas if you're going to handle csv files.
Here's a code that will automatically create the folders, and put the files in the right place for you using shutil and pandas. I have assumed that your csv's columns are "filename" and "value". Change them if there's a mismatch.
import pandas as pd
import shutil
import os
path_to_csv_file = "file.csv"
df = pd.read_csv(path_to_csv_file)
mp4_root = "mp4_root"
destination_path = "destination_path"
#In order to remove the folder if previously created. You can delete this if you don't like it.
if os.path.isdir(destination_path):
shutil.rmtree(destination_path)
os.mkdir(destination_path)
unique_values = pd.unique(df['value'])
for u in unique_values:
os.mkdir(os.path.join(destination_path, str(u)))
#Here we iterate over the rows of your csv file, and concatenate the value and the filename to the destination_path with our new folder structure.
for index, row in df.iterrows():
cur_path = os.path.join(destination_path, str(row['value']), str(row['filename']))
source_path = os.path.join(mp4_root, str(row['filename']))
shutil.copyfile(source_path, cur_path)
EDIT: If there's a file that is in the csv but not present in the source folder, you could check it before (more pythonic) or you could handle it via a try/catch exception check.(Not recommended)
Check the code below.
source_files = os.listdir(mp4_root)
for index, row in df.iterrows():
if str(row['filename']) not in source_files:
continue
cur_path = os.path.join(destination_path, str(row['value']), str(row['filename']))
source_path = os.path.join(mp4_root, str(row['filename']))
shutil.copyfile(source_path, cur_path)