Appending several panda dataframes are not working - python

I have this code
import os
import pandas as pd
path = r'c:\Temp\factory'
os.chdir(path)
files = os.listdir()
files_csv = [f for f in files if f[-3:] == 'csv']
x = pd.DataFrame()
for f in files_csv:
data = pd.read_csv(f, sep=';', encoding='latin-1')
x = x.append(data, ignore_index=True)
I have used the same code before to concatenate CSV files but now it just does not work.
The problem i face is that only the content of one file makes it to the dataframe by name x.
I know i process all files and i expect the x dataframe to contain in total about 10000 rows but i only get the content of one file aproximatley 2000 rows.
My files typically looks like this:
Computer;Managed by;Given Name
cp1;user1;olle
cp2;user2;niklas
cp3;user3;kalle

I've needed to do something similar before. My solution would be:
x = pd.DataFrame()
for f in files_csv:
data = pd.read_csv(f, sep=';', encoding='latin-1')
x = pd.concat([x, data], ignore_index=True, axis=1)

Related

Combining CSV files into Dataframe python

I am trying to add data from several files in a folder to a data frame. Each .csv file has varying lengths but has the same number of columns. I am trying to add all of them to one data frame with ignoring the index so that the new data frame is just vertically combined. For some reason every time I try to concatenate the data I am left with ~ 363 columns when there should only be 9. Each csv file has the same number of columns so I am confused.
import os
import pandas as pd
import glob
cwd = os.getcwd()
folder = cwd +'\\downloads\\prepared_csv_files\\prepared_csv_files\\'
all_files = glob.glob(folder + "/*.csv")
li = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
I have also tried
final_df = pd.DataFrame(li, columns = ['tool','pressure'])
# and I name all columns not doing it now
here final is the name of the final dataset.
I am assuming tool and pressure are the columns name in your all .csv files
final = pd.DataFrame(columns = ['tool','pressure'])
for filename in all_files:
df = pd.read_csv(filename)
df = pd.DataFrame(df)
final = pd.concat([final,df],ignore_index= True,join="inner")

Python Pandas join a few files

I import a few xlsx files into pandas dataframe. It works fine, but my problem that it copies all the data under each other (so I have 10 excel file with 100 lines = 1000 lines).
I need the Dataframe with 100 lines and 10 columns, so each file will be copied next to each other and not below.
Are there any ideas how to do it?
import os
import pandas as pd
os.chdir('C:/Users/folder/')
path = ('C:/Users/folder/')
files = os.listdir(path)
allNames = pd.DataFrame()
for f in files:
info = pd.read_excel(f,'Sheet1')
allNames = allNames.append(info)
writer = pd.ExcelWriter ('Output.xlsx')
allNames.to_excel(writer, 'Copy')
writer.save()
You can feed your spreadsheets as an array of dataframes directly to pd.concat():
import os
import pandas as pd
os.chdir('C:/Users/folder/')
path = ('C:/Users/folder/')
files = os.listdir(path)
allNames = pd.concat([pd.read_excel(f,'Sheet1') for f in files], axis=1)
writer = pd.ExcelWriter ('Output.xlsx')
allNames.to_excel(writer, 'Copy')
writer.save()
Instead of stacking the tables vertically like this:
allNames = allNames.append(info)
You'll want to concatenate them horizontally like this:
allNames = pd.concat([allNames , info], axis=1)

Read selected data from multiple files

I have 200 .txt files and need to extract one row data from each file and create a different dataframe.
For example (abc1.txt,abc2.txt, .etc) set of files and i need to extract 5th row data from each file and create a dataframe. When reading files, columns need to be separated by '/t' sign.
like this
data = pd.read_csv('abc1.txt', sep="\t", header=None)
I can not figure out how to do all this with a loop. Can you help?
Here is my answer:
import pandas as pd
from pathlib import Path
path = Path('path/to/dir')
files = path.glob('*.txt')
to_concat = []
for f in files:
df = pd.read_csv(f, sep="\t", header=None, nrows=5).loc[4:4]
to_concat.append(df)
result = pd.concat(to_concat)
I have used nrows to read only first 5 rows and then .loc[4:4] to get dataframe rather than series (when you use .loc[4].
Here you go:
import os
import pandas as pd
directory = 'C:\\Users\\PC\\Desktop\\datafiles\\'
aggregate = pd.DataFrame()
for filename in os.listdir(directory):
if filename.endswith(".txt"):
data = pd.read_csv(directory+filename, sep="\t", header=None)
row5 = pd.DataFrame(data.iloc[4]).transpose()
aggregate = aggregate.append(row5)

Join multiple excel files into one excel file via python

I need to combine all the excel files in my directory into one excel file.
for example, I've 3 excel files:
file1:
file 2:
file 3:
I need to concatenate them and get an output as follows
but instead, they were appended one after another
this is my code :
import pandas as pd
import numpy as np
import glob
all_data = pd.DataFrame()
for f in glob.glob('C:/Users/test-results/FinalResult/05-01-2019/*.xlsx'):
df = pd.read_excel(f)
all_data = all_data.append(df, ignore_index=True)
writer = pd.ExcelWriter('mycollected_data.xlsx', engine='xlsxwriter')
all_data.to_excel(writer, sheet_name='Sheet1')
writer.save()
during my quest, all I found was how to append dfs,as shown in my code and I didn't figure out how too use join
You could use
files = glob.glob('C:/Users/test-results/FinalResult/05-01-2019/*.xlsx')
dfs = (pd.read_excel(f, index_col=0) for f in files)
all_data = pd.concat(dfs, axis=1)
Try this:
all_data = pd.concat([all_data,df],axis=1)
all_data = all_data.merge(df, on = ['first_column_name'], how = 'left')

Adding file name in a Column while merging multible csv files to pandas- Python

I have multiple csv files in the same folder with all the same data columns,
20100104 080100;5369;5378.5;5365;5378;2368
20100104 080200;5378;5385;5377;5384.5;652
20100104 080300;5384.5;5391.5;5383;5390;457
20100104 080400;5390.5;5391;5387;5389.5;392
I want to merge the csv files into pandas and add a column with the file name to each line so I can track where it came from later. There seems to be similar threads but I haven't been able to adapt any of the solutions. This is what I have so far. The merge data into one data frame works but I'm stuck on the adding file name column,
import os
import glob
import pandas as pd
path = r'/filepath/'
all_files = glob.glob(os.path.join(path, "*.csv"))
names = [os.path.basename(x) for x in glob.glob(path+'\*.csv')]
list_ = []
for file_ in all_files:
list_.append(pd.read_csv(file_,sep=';', parse_dates=[0], infer_datetime_format=True,header=None ))
df = pd.concat(list_)
Instead of using a list just use DataFrame's append.
df = pd.DataFrame()
for file_ in all_files:
file_df = pd.read_csv(file_,sep=';', parse_dates=[0], infer_datetime_format=True,header=None )
file_df['file_name'] = file_
df = df.append(file_df)

Categories