I am trying to merge two files, I am supplying them headers as they are not able to pick up headers when I merge them using concatenate , I get an error when I am trying to drop a column......
ValueError: labels ['lh.aparc.a2009s.meancurv'] not contained in axis
Therefore I am trying the below method.....
The headers are important because I want to compute average, mean etc on the basis of these headers....
But currently, the result file looks like this
CSV 1CSV1 looks like this CSV 2 looks the same just with rh
# !/bin/bash
ls -d */ | sed -e "s/\///g" | grep -v "Results" | grep -v "Output">> subjects.txt;
module unload freesurfer
module load freesurfer/5.3.0
module load python
export SUBJECTS_DIR=/N/u/shrechak/Karst/GENFL_FREESURFER53_KARST_RES
source $FREESURFER_HOME/FreeSurferEnv.sh
aparcstats2table --hemi lh --subjectsfile=subjects.txt --parc aparc.a2009s --meas meancurv --tablefile lh.a2009s.meancurv.txt
aparcstats2table --hemi rh --subjectsfile=subjects.txt --parc aparc.a2009s --meas meancurv --tablefile rh.a2009s.meancurv.txt
for f in *.txt; do
mv "$f" "${f%.txt}.csv"
done
python <<END_OF_PYTHON
import csv
import pandas as pd
names= ["meancurv",
"lh_G_and_S_frontomargin_meancurv",
"lh_G_and_S_occipital_inf_meancurv",
"lh_G_and_S_paracentral_meancurv",
"lh_G_and_S_subcentral_meancurv",
"lh_G_and_S_transv_frontopol_meancurv",
"lh_G_and_S_cingul-ant_meancurv",
"lh_G_and_S_cingul-Mid-Ant_meancurv",
"lh_G_and_S_cingul-Mid-Post_meancurv",
"lh_G_cingul-Post-dorsal_meancurv",
"lh_G_cingul-Post-ventral_meancurv",
"lh_G_cuneus_meancurv",
"lh_G_front_inf-Opercular_meancurv",
"lh_G_front_inf-orbital_meancurv",
"lh_G_front_inf-Triangul_meancurv",
"lh_G_front_middle_meancurv",
"lh_G_front_sup_meancurv",
"lh_G_Ins_lg_and_S_cent_ins_meancurv",
"lh_G_insular_short_meancurv",
"lh_G_occipital_middle_meancurv",
"lh_G_occipital_sup_meancurv",
"lh_G_oc-temp_lat-fusifor_meancurv",
"lh_G_oc-temp_med-Lingual_meancurv",
"lh_G_oc-temp_med-Parahip_meancurv",
"lh_G_orbital_meancurv",
"lh_G_pariet_infoangular_meancurv",
"lh_G_pariet_infSupramar_meancurv",
"lh_G_parietal_sup_meancurv",
"lh_G_postcentral_meancurv",
"lh_G_precentral_meancurv",
"lh_G_precuneus_meancurv",
"lh_G_rectus_meancurv",
"lh_G_subcallosal_meancurv",
"lh_G_temp_sup-G_T_transv_meancurv",
"lh_G_temp_sup-Lateral_meancurv",
"lh_G_temp_sup-Plan_polar_meancurv",
"lh_G_temp_supPlan_tempo_meancurv",
"lh_G_temporal_inf_meancurv",
"lh_G_temporal_middle_meancurv",
"lh_Lat_Fis-ant-Horizont_meancurv",
"lh_Lat_Fis-ant-Vertical_meancurv",
"lh_Lat_Fispost_meancurv",
"lh_Pole_occipital_meancurv",
"lh_Pole_temporal_meancurv",
"lh_S_calcarine_meancurv",
"lh_S_central_meancurv",
"lh_S_cingulMarginalis_meancurv",
"lh_S_circular_insula_ant_meancurv",
"lh_S_circular_insula_inf_meancurv",
"lh_S_circular_insula_sup_meancurv",
"lh_S_collat_transv_ant_meancurv",
"lh_S_collat_transv_post_meancurv",
"lh_S_front_inf_meancurv",
"lh_S_front_middle_meancurv",
"lh_S_front_sup_meancurv",
"lh_S_interm_prim-Jensen_meancurv",
"lh_S_intrapariet_and_P_trans_meancurv",
"lh_S_oc_middle_and_Lunatus_meancurv",
"lh_S_oc_sup_and_transversal_meancurv",
"lh_S_occipital_ant_meancurv",
"lh_S_oc-temp_lat_meancurv",
"lh_S_oc-temp_med_and_Lingual_meancurv",
"lh_S_orbital_lateral_meancurv",
"lh_S_orbital_med-olfact_meancurv",
"lh_S_orbital-H_Shaped_meancurv",
"lh_S_parieto_occipital_meancurv",
"lh_S_pericallosal_meancurv",
"lh_S_postcentral_meancurv",
"lh_S_precentral-inf-part_meancurv",
"lh_S_precentral-sup-part_meancurv",
"lh_S_suborbital_meancurv",
"lh_S_subparietal_meancurv",
"lh_S_temporal_inf_meancurv",
"lh_S_temporal_sup_meancurv",
"lh_S_temporal_transverse_meancurv"]
df1 = pd.read_csv('lh.a2009s.meancurv.csv', header = None, names = names)
names1 = ["meancurv",
"rh_G_and_S_frontomargin_meancurv",
"rh_G_and_S_occipital_inf_meancurv",
"rh_G_and_S_paracentral_meancurv",
"rh_G_and_S_subcentral_meancurv",
"rh_G_and_S_transv_frontopol_meancurv",
"rh_G_and_S_cingul-Ant_meancurv",
"rh_G_and_S_cingul-Mid-Ant_meancurv",
"rh_G_and_S_cingul-Mid-Post_meancurv",
"rh_G_cingul-Post-dorsal_meancurv",
"rh_G_cingul-Post-ventral_meancurv",
"rh_G_cuneus_meancurv",
"rh_G_front_inf-Opercular_meancurv",
"rh_G_front_inf-Orbital_meancurv",
"rh_G_front_inf-Triangul_meancurv",
"rh_G_front_middle_meancurv",
"rh_G_front_sup_meancurv",
"rh_G_Ins_lg_and_S_cent_ins_meancurv",
"rh_G_insular_short_meancurv",
"rh_G_occipital_middle_meancurv",
"rh_G_occipital_sup_meancurv",
"rh_G_oc-temp_lat-fusifor_meancurv",
"rh_G_oc-temp_med-Lingual_meancurv",
"rh_G_oc-temp_med-Parahip_meancurv",
"rh_G_orbital_meancurv",
"rh_G_pariet_inf-Angular_meancurv",
"rh_G_pariet_inf-Supramar_meancurv",
"rh_G_parietal_sup_meancurv",
"rh_G_postcentral_meancurv",
"rh_G_precentral_meancurv",
"rh_G_precuneus_meancurv",
"rh_G_rectus_meancurv",
"rh_G_subcallosal_meancurv",
"rh_G_temp_sup-G_T_transv_meancurv",
"rh_G_temp_sup-Lateral_meancurv",
"rh_G_temp_sup-Plan_polar_meancurv",
"rh_G_temp_sup-Plan_tempo_meancurv",
"rh_G_temporal_inf_meancurv",
"rh_G_temporal_middle_meancurv",
"rh_Lat_Fis-ant-Horizont_meancurv",
"rh_Lat_Fis-ant-Vertical_meancurv",
"rh_Lat_Fis-post_meancurv",
"rh_Pole_occipital_meancurv",
"rh_Pole_temporal_meancurv",
"rh_S_calcarine_meancurv",
"rh_S_central_meancurv",
"rh_S_cingulMarginalis_meancurv",
"rh_S_circular_insula_ant_meancurv",
"rh_S_circular_insula_inf_meancurv",
"rh_S_circular_insula_sup_meancurv",
"rh_S_collat_transv_ant_meancurv",
"rh_S_collat_transv_post_meancurv",
"rh_S_front_inf_meancurv",
"rh_S_front_middle_meancurv",
"rh_S_front_sup_meancurv",
"rh_S_interm_prim-Jensen_meancurv",
"rh_S_intrapariet_and_P_trans_meancurv",
"rh_S_oc_middle_and_Lunatus_meancurv",
"rh_S_oc_sup_and_transversal_meancurv",
"rh_S_occipital_ant_meancurv",
"rh_S_oc-temp_lat_meancurv",
"rh_S_oc-temp_med_and_Lingual_meancurv",
"rh_S_orbital_lateral_meancurv",
"rh_S_orbital_med-olfact_meancurv",
"rh_S_orbital-H_Shaped_meancurv",
"rh_S_parieto_occipital_meancurv",
"rh_S_pericallosal_meancurv",
"rh_S_postcentral_meancurv",
"rh_S_precentral-inf-part_meancurv",
"rh_S_precentral-sup-part_meancurv",
"rh_S_suborbital_meancurv",
"rh_S_subparietal_meancurv",
"rh_S_temporal_inf_meancurv",
"rh_S_temporal_sup_meancurv",
"rh_S_temporal_transverse_meancurv"
]
df2 = pd.read_csv('rh.a2009s.meancurv.csv', header = None, names = names1)
result = pd.merge(df1, df2, on='meancurv', how='outer')
result.to_csv('result.csv')
END_OF_PYTHON
echo "goodbye!";
So you want to skip the first row and only pull the data parts.
Here's an MCVE.
Code:
import io
import pandas as pd
csv1 = io.StringIO(u'''
a,b,c
1,4,7
2,5,8
3,6,9
''')
df = pd.read_csv(csv1, names = ['d','e','f'], skiprows = [1])
print df
Output:
d e f
0 1 4 7
1 2 5 8
2 3 6 9
Here's a way you can merge two files together file keeping the headers from the one of the files after merging.
Say you're keeping files in a list 'files':
files = ['file1.csv', 'file2.csv'] #keep files here
finalDF = pd.DataFrame() #this is an empty dataframe
for file in files:
thisDF = pd.read_csv(file)
finalDF = finalDF.append(thisDF, ignore_index=True)
Now if you want try these two lines:
say you want to check the header using a simple print head()
print finalDF.head()
and if you want to write this merged data frame to a csv file
finalDF.to_csv('merged-file.csv', encoding="utf-8", index=False)
for skipping rows are you trying to skip rows after or before
merging? let me know and i can try helping with that too.
Example:
file1.csv:
,column1,column2,column3,column4,Date,Device,sample_site
2,14888,0.060011931,248084,13.40535464,3/15/2017,DESKTOP,http://www.example1.com
11,1358,0.033212679,40888,7.465099785,3/15/2017,MOBILE,http://www.example2.com
23,130,0.02998155,4336,8.337638376,3/15/2017,TABLET,http://www.example3.com
file2.csv:
,column1,column2,column3,column4,Date,Device,sample_site
35,2685,0.034564882,77680,10.97812822,3/15/2017,DESKTOP,https://www.example4.com
45,280,0.026197605,10688,7.801272455,3/15/2017,MOBILE,https://www.example5.com
54,24,0.022878932,1049,8.202097235,3/15/2017,TABLET,https://www.example6.com
merged-file.csv:
Unnamed: 0,column1,column2,column3,column4,Date,Device,sample_site
2,14888,0.060011931,248084,13.40535464,3/15/2017,DESKTOP,http://www.example1.com
11,1358,0.033212679,40888,7.465099785,3/15/2017,MOBILE,http://www.example2.com
23,130,0.02998155,4336,8.337638376,3/15/2017,TABLET,http://www.example3.com
35,2685,0.034564882,77680,10.97812822,3/15/2017,DESKTOP,https://www.example4.com
45,280,0.026197605,10688,7.801272455,3/15/2017,MOBILE,https://www.example5.com
54,24,0.022878932,1049,8.202097235,3/15/2017,TABLET,https://www.example6.com
Reply:
Are you trying to merge data based on a column? In that case you can concat or merge with join based on an axis.
Say for example:
pd.concat([df1, df2]) #add axis and join type if necessary.
Here's the documentation to help you understand: merging and concat in pandas