I am reading csv files form multiple zip files to a dataframe and then using .to_csv to save the df with the below code.
import glob
import zipfile
import pandas as pd
dfs = []
for zip_file in glob.glob(r"C:\Users\harsh\Desktop\Temp\*.zip"):
zf = zipfile.ZipFile(zip_file)
dfs += [pd.read_csv(zf.open(f), header=None, sep=";", encoding='latin1') for f in zf.namelist()]
df = pd.concat(dfs,ignore_index=True)
df.to_csv("C:\Users\harsh\Desktop\Temp\data.csv")
However, I am getting a single column with , seperator
example:
0
0 Div,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,...
1 SC3,05/08/00,Albion Rvs,East Fife,0,1,A,0,0,D,...
...
215179 ,,,,,,,,,
There are NaN values as well in the df
Is there any way to save the df with proper structure and data in respective columns?
Related
Im new to python i have an issue with reading one column from different csv files
this codes works but i gives me all NAN values. The column length is different in some of the csv files
import pandas as pd
from glob import glob
def read_row(fn):
return pd.read_csv(fn, sep=r"\s+", usecols=[10])
files = glob('./*/*.csv')
df = pd.concat([read_row(fn) for fn in files], axis=1)
df = df.reset_index(drop=True)
df
Thanks
It worked i just changed sep=r"\s+" to delimiter=','
I import a few xlsx files into pandas dataframe. It works fine, but my problem that it copies all the data under each other (so I have 10 excel file with 100 lines = 1000 lines).
I need the Dataframe with 100 lines and 10 columns, so each file will be copied next to each other and not below.
Are there any ideas how to do it?
import os
import pandas as pd
os.chdir('C:/Users/folder/')
path = ('C:/Users/folder/')
files = os.listdir(path)
allNames = pd.DataFrame()
for f in files:
info = pd.read_excel(f,'Sheet1')
allNames = allNames.append(info)
writer = pd.ExcelWriter ('Output.xlsx')
allNames.to_excel(writer, 'Copy')
writer.save()
You can feed your spreadsheets as an array of dataframes directly to pd.concat():
import os
import pandas as pd
os.chdir('C:/Users/folder/')
path = ('C:/Users/folder/')
files = os.listdir(path)
allNames = pd.concat([pd.read_excel(f,'Sheet1') for f in files], axis=1)
writer = pd.ExcelWriter ('Output.xlsx')
allNames.to_excel(writer, 'Copy')
writer.save()
Instead of stacking the tables vertically like this:
allNames = allNames.append(info)
You'll want to concatenate them horizontally like this:
allNames = pd.concat([allNames , info], axis=1)
I have 200 .txt files and need to extract one row data from each file and create a different dataframe.
For example (abc1.txt,abc2.txt, .etc) set of files and i need to extract 5th row data from each file and create a dataframe. When reading files, columns need to be separated by '/t' sign.
like this
data = pd.read_csv('abc1.txt', sep="\t", header=None)
I can not figure out how to do all this with a loop. Can you help?
Here is my answer:
import pandas as pd
from pathlib import Path
path = Path('path/to/dir')
files = path.glob('*.txt')
to_concat = []
for f in files:
df = pd.read_csv(f, sep="\t", header=None, nrows=5).loc[4:4]
to_concat.append(df)
result = pd.concat(to_concat)
I have used nrows to read only first 5 rows and then .loc[4:4] to get dataframe rather than series (when you use .loc[4].
Here you go:
import os
import pandas as pd
directory = 'C:\\Users\\PC\\Desktop\\datafiles\\'
aggregate = pd.DataFrame()
for filename in os.listdir(directory):
if filename.endswith(".txt"):
data = pd.read_csv(directory+filename, sep="\t", header=None)
row5 = pd.DataFrame(data.iloc[4]).transpose()
aggregate = aggregate.append(row5)
I am trying to merge all found csv files in a given directory. The problem is that all csv files have almost the same header, only one column differs. I want to add that column from all csv files to the merged csv file(and also 4 common columns for all csv).
So far, I have this:
import pandas as pd
from glob import glob
interesting_files = glob(
"C:/Users/iulyd/Downloads/*.csv")
df_list = []
for filename in sorted(interesting_files):
df_list.append(pd.read_csv(filename))
full_df = pd.concat(df_list, sort=False)
full_df.to_csv("C:/Users/iulyd/Downloads/merged_pands.csv", index=False)
With this code I managed to merge all csv files, but the problem is that some columns are empty in the first "n" rows, and only after some rows they get their proper values(from the respective csv). How can I make the values begin normally, after the column header?
Probably just you need add the name columns :
import pandas as pd
from glob import glob
interesting_files = glob(
"D:/PYTHON/csv/*.csv")
df_list = []
for filename in sorted(interesting_files):
print(filename)
#time,latitude,longitude
df_list.append(pd.read_csv(filename,usecols=["time", "latitude", "longitude","altitude"]))
full_df = pd.concat(df_list, sort=False)
print(full_df.head(10))
full_df.to_csv("D:/PYTHON/csv/mege.csv", index=False)
Please help me to find solution for the problem with importing data from multiple csv files to one DataFrame in python.
Code is:
import pandas as pd
import os
import glob
path = r'my_full_path'
os.chdir(path)
results = pd.DataFrame()
for counter, current_file in enumerate(glob.glob("*.csv")):
namedf = pd.read_csv(current_file, header=None, sep=",", delim_whitespace=True)
results = pd.concat([results, namedf], join='outer')
results.to_csv('Result.csv', index=None, header=None, sep=",")
The problem is that some part of data are moving to the rows instead of new columns as required.
What is wrong in my code?
P.S.: I found questions about importing multiple csv-files to DataFrame, for example here: Import multiple csv files into pandas and concatenate into one DataFrame, but solution doesn't solve my issue:-(
it was solved by using join inside of pd.read_csv.read_csv() -> append(dataFrames) -> concat:
def get_merged_files(files_list, **kwargs):
dataframes = []
for file in files_list:
df = pd.read_csv(os.path.join(file), **kwargs)
dataframes.append(df)
return pd.concat(dataframes, axis=1)
You can try using this:
import pandas as pd
import os
files = [file for file in os.listdir('./Your_Folder')] # Here is where all the files are located.
all_csv_files = pd.DataFrame()
for file in files:
df = pd.read_csv("./Your_Folder/"+file)
all_csv_files = pd.concat([all_csv_files, df])
all_csv_files.to_csv("All_CSV_Files_Concat.csv", index=False)