I have managed to read the following log file into python:
import os
import glob
import pandas as pd
folder = r'C:\Users\x\x\x\x\\'
for infile in glob.glob(os.path.join(folder, 'console*')):
file = open(infile, 'r').read()
print( file)
print(file) gives me:
John, 1,7,8, text
Matt, 3,7,10, text2
Natasha, 4,60,3,text3
I am hoping to convert into a pandas df:
df = pd.DataFrame(file)
but getting a ValueError: DataFrame constructor not properly called!
Does anyone know how to construct the Dataframe of 3 rows by 5 columns and then add in my own columns headers? Thanks very much!
import os
import glob
import pandas as pd
folder = 'C:\\'
filename2 = [y for y in glob.glob(f'{folder}\\*.*')]
# In the case of .csv files.
df_cc = pd.DataFrame()
for z in filename2:
df = pd.read_csv(z, header = None)
df_cc = df_cc.append(df)
Related
I have a code that merges all txt files from a directory into a dataframe
follow the code below
import pandas as pd
import os
import glob
diretorio = "F:\PROJETOS\LOTE45\ARQUIVOS\RISK\RISK_CUSTOM_FUND_N1"
files = []
files = [pd.read_csv(file, delimiter='\t')
for file in glob.glob(os.path.join(diretorio ,"*.txt"))]
df = pd.concat(files, ignore_index=True)
df
that gives result to this table
I needed to add a date column to this table, but I only have the date available at the end of the filename.
How can I get the date at the end of the filename and put it inside the dataframe.
I have no idea how to do this
Assuming the file naming pattern is constant, you can parse the end of the filename for every iteration of the loop this way :-
from datetime import datetime
files = []
for file in glob.glob(os.path.join(diretorio ,"*.txt")):
df_f = pd.read_csv(file, delimiter='\t')
df_f['date'] = datetime.strptime(file[-11:-4], "%d%m%Y")
files.append(df_f)
df = pd.concat(files, ignore_index=True)
import pandas as pd
import os
diretorio = "F:/PROJETOS/LOTE45/ARQUIVOS/RISK/RISK_CUSTOM_FUND_N1/"
files = []
for filename in os.listdir(diretorio):
if filename.endswith(".csv"):
df = pd.read_csv(diretorio + filename, sep=";")
df['Date'] = filename.split('.')[0].split("_")[-1]
files.append(df)
df = pd.concat(files, ignore_index=True)
print(df)
I have created a dataframe using panda in Python. The dataframe uses two columns from a .csv file called filepath and filename, joins them and then outputs in full.
I am trying to use this output to zip the filename in question but it isn't working properly and just overwrites the file.
import pandas as pd
import zipfile
import os
from os import path
from os.path import basename
column_names = ["Path", "Filename", r"Path"]
df = pd.read_csv(r"resources.csv", usecols= ["Path","Filename"])
df = df.dropna()
df = ["/".join(i) for i in zip(df["Path"].map(str),df["Filename"].map(str))]
rows = list(df)
for row in rows:
print (row)
I added the zipfile.ZipFile entries in the for row in rows block but replaced with print(row) to produce the list.
Can anybody help point me in the right direction.
import pandas as pd
import zipfile
import os
from os import path
from os.path import basename
column_names = ["Path","Filename" r"Path"]
df = pd.read_csv(r"resources.csv", usecols= ["Path","Filename"])
df["fullpath"] = df[["Path","Filename"]].agg("/".join, axis=1)
df["zipfilename"] = df["Filename"].str.replace(r'.py', '')
rows = list(df.values)
for row in rows:
zip = zipfile.ZipFile(row[3] + '.zip', 'w', zipfile.ZIP_DEFLATED)
zip.write(row[2], basename(row[1]))
zip.close()
print (row)
After some head scratching i managed to get exactly what I needed from the dataframe and zip individual files.
I import a few xlsx files into pandas dataframe. It works fine, but my problem that it copies all the data under each other (so I have 10 excel file with 100 lines = 1000 lines).
I need the Dataframe with 100 lines and 10 columns, so each file will be copied next to each other and not below.
Are there any ideas how to do it?
import os
import pandas as pd
os.chdir('C:/Users/folder/')
path = ('C:/Users/folder/')
files = os.listdir(path)
allNames = pd.DataFrame()
for f in files:
info = pd.read_excel(f,'Sheet1')
allNames = allNames.append(info)
writer = pd.ExcelWriter ('Output.xlsx')
allNames.to_excel(writer, 'Copy')
writer.save()
You can feed your spreadsheets as an array of dataframes directly to pd.concat():
import os
import pandas as pd
os.chdir('C:/Users/folder/')
path = ('C:/Users/folder/')
files = os.listdir(path)
allNames = pd.concat([pd.read_excel(f,'Sheet1') for f in files], axis=1)
writer = pd.ExcelWriter ('Output.xlsx')
allNames.to_excel(writer, 'Copy')
writer.save()
Instead of stacking the tables vertically like this:
allNames = allNames.append(info)
You'll want to concatenate them horizontally like this:
allNames = pd.concat([allNames , info], axis=1)
I have 200 .txt files and need to extract one row data from each file and create a different dataframe.
For example (abc1.txt,abc2.txt, .etc) set of files and i need to extract 5th row data from each file and create a dataframe. When reading files, columns need to be separated by '/t' sign.
like this
data = pd.read_csv('abc1.txt', sep="\t", header=None)
I can not figure out how to do all this with a loop. Can you help?
Here is my answer:
import pandas as pd
from pathlib import Path
path = Path('path/to/dir')
files = path.glob('*.txt')
to_concat = []
for f in files:
df = pd.read_csv(f, sep="\t", header=None, nrows=5).loc[4:4]
to_concat.append(df)
result = pd.concat(to_concat)
I have used nrows to read only first 5 rows and then .loc[4:4] to get dataframe rather than series (when you use .loc[4].
Here you go:
import os
import pandas as pd
directory = 'C:\\Users\\PC\\Desktop\\datafiles\\'
aggregate = pd.DataFrame()
for filename in os.listdir(directory):
if filename.endswith(".txt"):
data = pd.read_csv(directory+filename, sep="\t", header=None)
row5 = pd.DataFrame(data.iloc[4]).transpose()
aggregate = aggregate.append(row5)
Please help me to find solution for the problem with importing data from multiple csv files to one DataFrame in python.
Code is:
import pandas as pd
import os
import glob
path = r'my_full_path'
os.chdir(path)
results = pd.DataFrame()
for counter, current_file in enumerate(glob.glob("*.csv")):
namedf = pd.read_csv(current_file, header=None, sep=",", delim_whitespace=True)
results = pd.concat([results, namedf], join='outer')
results.to_csv('Result.csv', index=None, header=None, sep=",")
The problem is that some part of data are moving to the rows instead of new columns as required.
What is wrong in my code?
P.S.: I found questions about importing multiple csv-files to DataFrame, for example here: Import multiple csv files into pandas and concatenate into one DataFrame, but solution doesn't solve my issue:-(
it was solved by using join inside of pd.read_csv.read_csv() -> append(dataFrames) -> concat:
def get_merged_files(files_list, **kwargs):
dataframes = []
for file in files_list:
df = pd.read_csv(os.path.join(file), **kwargs)
dataframes.append(df)
return pd.concat(dataframes, axis=1)
You can try using this:
import pandas as pd
import os
files = [file for file in os.listdir('./Your_Folder')] # Here is where all the files are located.
all_csv_files = pd.DataFrame()
for file in files:
df = pd.read_csv("./Your_Folder/"+file)
all_csv_files = pd.concat([all_csv_files, df])
all_csv_files.to_csv("All_CSV_Files_Concat.csv", index=False)