How to skip rows while importing csv? - python

How to skip the rows based on certain value in the first column of the dataset. For example: if the first column has some unwanted stuffs in the first few rows and i want skip those rows upto a trigger value. please help me for importing csv in python

You can achieve this by using the argument skip_rows
Here is sample code below to start with:
import pandas as pd
df = pd.read_csv('users.csv', skiprows=<the row you want to skip>)
For a series of CSV files in the folder, you could use the for loop, read the CSV file and remove the row from the df containing the string.Lastly, concatenate it to the df_overall.
Example:
from pandas import DataFrame, concat, read_csv
df_overall = DataFrame()
dir_path = 'Insert your directory path'
for file_name in glob.glob(dir_path+'*.csv'):
df = pd.read_csv('file_name.csv', header=None)
df = df[~df. < column_name > .str.contains("<your_string>")]
df_overall = concat(df_overall, df)

Related

Delete multiple columns from all text files in a folder

I have a folder with hundreds of .txt files that each have the same 14 column headings on the first row. The values are space separated, and I am trying to remove multiple of the columns from each of these files at once.
I have tried multiple times to do this using glob and pandas as seen below, but get stuck trying to read all of the files in the folder into a dataframe.
import os, glob
import pandas as pd
folder_path = 'Desktop/GRB/data/k1/'
file_list = glob.glob(folder_path + "/\*.txt")
main_dataframe = pd.DataFrame(pd.readcsv(file_list\[0\], delimiter=r"\\s+"))
for i in range (1,len(file_list)):
data = pd.read_csv(file_list\[i\], delimiter=r"\\s+")
df = pd.DataFrame(data)
main_dataframe = pd.concat(\[main_dataframe,df\],axis=1)
print(main_dataframe)
I am new to using pandas and don't know how to do this successfully.
I also plan to use df = df.drop(df.columns[[0, 1, 3]], axis=1) to drop the columns but don't know where it should go in the code.

Extra column appears when appending selected row from one csv to another in Python

I have this code which appends a column of a csv file as a row to another csv file:
def append_pandas(s,d):
import pandas as pd
df = pd.read_csv(s, sep=';', header=None)
df_t = df.T
df_t.iloc[0:1, 0:1] = 'Time Point'
df_t.at[1, 0] = 1
df_t.columns = df_t.iloc[0]
df_new = df_t.drop(0)
pdb = pd.read_csv(d, sep=';')
newpd = pdb.append(df_new)
from pandas import DataFrame
newpd.to_csv(d, sep=';')
The result is supposed to look like this:
Instead, every time the row is appended, there is an extra "Unnamed" column appearing on the left:
Do you know how to fix that?..
Please, help :(
My csv documents from which I select a column look like this:
You have to add index=False to your to_csv() method

Fetch non-empty cells from .xls file using pandas

i am new to python. i want to fetch the values from the cells & empty cells should be discarded.
i want to loop through rows & columns & assign to list
import pandas as pd
from pandas import ExcelFile
from pandas import ExcelWriter
df=pd.read_excel('16Junedata_03062020_80163767_action_03062020_80163767_2624_01.xls', sheet_name='Sheet4')
#newdf = df.fillna({'business_day':0,'zone_id':0,'site_id':0,'device_id':0})
#newdf = df.fillna(method="ffill")
z_id= df['zone_id']
d_id= df['device_id']
s_id= df['site_id']
vst= df['visit_start_time']
# print(z_id)
# print(d_id)
# print(s_id)
for a,zone_id in z_id.iteritems():
for b,site_id in s_id.iteritems():
print(site_id)
You can get the list of non NaN values using below code, this is only for one column:
zone_id_updated = []
for item in df.zone_id.iteritems():
if pd.isna(item[1])==False:
zone_id_updated.append(item[1])
Similarly can be done for other columns.

copy and paste each column from an existing csv file into a new csv file

So I have an existing csv file with multiple columns. I am trying to copy each column (one by one) and paste it into a new csv file. The name of the new csv file will be the header of the column.
I am trying to tweak a code that picks specific columns but no luck so far for multiple columns.
import pandas as pd
cols = ['1']
my_file = r"D:/Excel/new_csv_3.csv"
pd.read_csv(my_file, usecols=cols).to_csv(r"D:/Excel/new1.csv",
index=False)
Try this:
my_file = r"D:/Excel/new_csv_3.csv"
df = pd.read_csv(my_file)
for col in df.columns:
df[col].to_csv(f'D:/Excel/new{col}.csv')
if you need specific columns, just change the for loop:
for col in ['1', '2', '3']:
...
So this code works fine with no additional column of row numbers
import pandas as pd
my_file = r"D:/Excel/new_csv_3.csv"
df = pd.read_csv(my_file)
for col in df.columns:
df[col].to_csv(f'D:/Excel/{col}.csv', index=False)

Pandas: import multiple csv files into dataframe using a loop and hierarchical indexing

I would like to read multiple CSV files (with a different number of columns) from a target directory into a single Python Pandas DataFrame to efficiently search and extract data.
Example file:
Events
1,0.32,0.20,0.67
2,0.94,0.19,0.14,0.21,0.94
3,0.32,0.20,0.64,0.32
4,0.87,0.13,0.61,0.54,0.25,0.43
5,0.62,0.21,0.77,0.44,0.16
Here is what I have so far:
# get a list of all csv files in target directory
my_dir = "C:\\Data\\"
filelist = []
os.chdir( my_dir )
for files in glob.glob( "*.csv" ) :
filelist.append(files)
# read each csv file into single dataframe and add a filename reference column
# (i.e. file1, file2, file 3) for each file read
df = pd.DataFrame()
columns = range(1,100)
for c, f in enumerate(filelist) :
key = "file%i" % c
frame = pd.read_csv( (my_dir + f), skiprows = 1, index_col=0, names=columns )
frame['key'] = key
df = df.append(frame,ignore_index=True)
(the indexing isn't working properly)
Essentially, the script below is exactly what I want (tried and tested) but needs to be looped through 10 or more csv files:
df1 = pd.DataFrame()
df2 = pd.DataFrame()
columns = range(1,100)
df1 = pd.read_csv("C:\\Data\\Currambene_001y09h00m_events.csv",
skiprows = 1, index_col=0, names=columns)
df2 = pd.read_csv("C:\\Data\\Currambene_001y12h00m_events.csv",
skiprows = 1, index_col=0, names=columns)
keys = [('file1'), ('file2')]
df = pd.concat([df1, df2], keys=keys, names=['fileno'])
I have found many related links, however I am still not able to get this to work:
Reading Multiple CSV Files into Python Pandas Dataframe
Merge of multiple data frames of different number of columns into one big data frame
Import multiple csv files into pandas and concatenate into one DataFrame
You need to decide in what axis you want to append your files. Pandas will always try to do the right thing by:
Assuming that each column from each file is different, and appending digits to columns with similar names across files if necessary, so that they don't get mixed;
Items that belong to the same row index across files are placed side by side, under their respective columns.
The trick to appending efficiently is to tip the files sideways, so you get the desired behaviour to match what pandas.concat will be doing. This is my recipe:
from pandas import *
files = !ls *.csv # IPython magic
d = concat([read_csv(f, index_col=0, header=None, axis=1) for f in files], keys=files)
Notice that read_csv is transposed with axis=1, so it will be concatenated on the column axis, preserving its names. If you need, you can transpose the resulting DataFrame back with d.T.
EDIT:
For different number of columns in each source file, you'll need to supply a header. I understand you don't have a header in your source files, so let's create one with a simple function:
def reader(f):
d = read_csv(f, index_col=0, header=None, axis=1)
d.columns = range(d.shape[1])
return d
df = concat([reader(f) for f in files], keys=files)

Categories