I am trying to select couple of columns based on column heading with wild card and one more column. When I execute the below code , I am getting the expected result, but there is an index which is appearing. how to drop the index . Any suggestions.
infile:
dir,name,ct1,cn1,ct2,cn2
991,name1,em,a#email.com,ep,1234
999,name2,em,b#email.com,ep,12345
872,name3,em,c#email.com,ep,123456
here is the code which I used.
import pandas as pd
df=pd.read_csv('infile.csv')
df_new=df.loc[:,df.columns.str.startswith('c')]
df_new_1=pd.read_csv('name.csv', usecols= ['dir'])
df_merge=pd.concat([df_new,df_new_1],axis=1, join="inner")
df_merge.to_csv('outfile.csv')
Pass false for index when you save to csv :
df_merge.to_csv('outfile.csv', index=False)
Related
I ma trying to write a code that deletes the unnamed column , that comes right before Unix Timestamp. After deleting I will save the modified dataframe into data.csv. How would I be able to get the Expected Output below?
import pandas ads pd
data = pd.read_csv('data.csv')
data.drop('')
data.to_csv('data.csv')
data.csv file
,Unix Timestamp,Date,Symbol,Open,High,Low,Close,Volume
0,1635686220,2021-10-31 13:17:00,BTCUSD,60638.0,60640.0,60636.0,60638.0,0.4357009185659157
1,1635686160,2021-10-31 13:16:00,BTCUSD,60568.0,60640.0,60568.0,60638.0,3.9771881707839967
2,1635686100,2021-10-31 13:15:00,BTCUSD,60620.0,60633.0,60565.0,60568.0,1.3977284440628714
Updated csv (Expected Output):
Unix Timestamp,Date,Symbol,Open,High,Low,Close,Volume
1635686220,2021-10-31 13:17:00,BTCUSD,60638.0,60640.0,60636.0,60638.0,0.4357009185659157
1635686160,2021-10-31 13:16:00,BTCUSD,60568.0,60640.0,60568.0,60638.0,3.9771881707839967
1635686100,2021-10-31 13:15:00,BTCUSD,60620.0,60633.0,60565.0,60568.0,1.3977284440628714
This is the index. Use index=False in to_csv.
data.to_csv('data.csv', index=False)
Set the first column as index df = pd.read_csv('data.csv', index_col=0) and set index=False when writing the results.
you can follow below code.it will take column from 1st position and then you can save that df to csv without index values.
df = df.iloc[:,1:]
df.to_csv("data.csv",index=False)
I want to import an excel where I want to keep just some columns.
This is my code:
df=pd.read_excel(file_location_PDD)
col=df[['hkont','dmbtr','belnr','monat','gjahr','budat','shkzg','shkzg','usname','sname','dmsol','dmhab']]
print(col)
col.to_excel("JETNEW.xlsx")
I selected all the columns which I want it but 2 names of columns don't appear all time in the files which I have to import and these columns are 'usname' and 'sname'.
Cause of that I received an error ['usname','sname'] not in index
How can I do this ?
Thanks
Source -- https://stackoverflow.com/a/38463068/14515824
You need to use df.reindex instead of df[[]]. I also have changed 'excel.xlsx' to r'excel.xlsx' to specify to only read the file.
An example:
df.reindex(columns=['a','b','c'])
Which in your code would be:
file_location_PDD = r'excel.xlsx'
df = pd.read_excel(file_location_PDD)
col = df.reindex(columns=['hkont','dmbtr','belnr','monat','gjahr','budat','shkzg','shkzg','usname','sname','dmsol','dmhab'])
print(col)
col.to_excel("output.xlsx")
So, I have 7200 txt files, each with 25 lines. I would like to create a dataframe from them, with 7200 rows and 25 columns -- each line of the .txt file would be a value a column.
For that, first I have created a list column_names with length 25, and tested importing one single .txt file.
However, when I try this:
pd.read_csv('Data/fake-meta-information/1-meta.txt', delim_whitespace=True, names=column_names)
I get 25x25 dataframe, with values only on the first column. How do I read this into the dataframe in a way that I can get the txt lines to be imputed as values into the columns, and not imputing everything into the first column and creating 25 rows?
My next step would be creating a for loop to append each text file as a new row.
Probably something like this:
dir1 = *folder_path*
list = os.listdir(dir1)
number_files = len(list)
for i in range(number_files):
title = list[i]
df_temp = pd.read_csv(dir1 + title, delim_whitespace=True, names=column_names)
df = df.append(df_temp,ignore_index=True)
I hope I have been clear. Thank you all in advance!
read_csv generates a row per line in the source file but you want them to be columns. You could read the rows and pivot to columns, but since these files have a single value per line, you can just read them in numpy and use each resulting array as a row in a dataframe.
import numpy as np
import pandas as pd
from pathlib import Path
dir1 = Path(".")
df = pd.DataFrame([np.loadtxt(filename) for filename in dir1.glob("*.txt")])
print(df)
tdelaney's answer is probably "better" than mine, but if you want to keep your code more stylistically closer to what you are currently doing the following is another option.
You are getting your current output (25x25 with data in the first column only) because your read data is 25x1 but you are forcing the dataframe to have 25 columns with your names=column_names parameter.
To solve, just wait until the end to apply the column names:
Get a 25x1 df (drop the names param):
df_temp = pd.read_csv(dir1 + title, delim_whitespace=True)
Append the 25x1 df forming a 25x7200 df: df = df.append(df_temp,ignore_index=True)
Transpose the df forming the final 7200x25 df: df=df.T
Add column names: df.columns=column_names
I am trying to select the 'Name' column from a sample csv file named gradesM3.csv.
I have been following this tutorial but when it comes to selecting a single column, it doesn't work anymore.
My code:
import pandas as pd
df = pd.read_csv('gradesM3.csv')
df
The output:
Out[9]:
StudentID;Name;Assignment1;Assignment2;Assignment3
0 s123456;Michael Andersen;11;7;-3
1 s123789;Bettina Petersen;0;4;10
2 s123579;Marie Hansen;10;4;7
I believe there's already something wrong here as from what I've seen on other discussions, it's supposed to look more like a table.
When I try to display only the 'Name' column, with this command:
df['Name']
It returns:
KeyError: 'Name'
To sum up, I am trying to import my CSV file as a proper dataframe so I can work with it
Thanks
SOLVED
Thanks to W-B's comment, it worked with this code:
df = pd.read_csv('gradesM3.csv',sep=';')
What the code does below is read a column (named "First") and look for the string "TOM".
I want to go through all the columns in the file ( not just the "First" column) - i was thinking of doing something like excelFile[i][j] where i and j are set in a loop but that does not work. Any ideas?
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
import re
excelFile=pd.read_excel("test.xls")
for i in excelFile.index:
match=re.match(".*TOM.*",excelFile['First'][i])
if match:
print(excelFile['First'][i])
print("check")
excelFile.any(axis=None) will return a boolean value telling you if the value was found anywhere in the dataframe.
Documentation for pd.DataFrame.any
To print if the value was found, get the columns from the dataframe and use iterrows:
# Create a list of columns in the dataframe
columns = excelFile.columns.tolist()
# Loop through indices and rows in the dataframe using iterrows
for index, row in excelFile.iterrows():
# Loop through columns
for col in columns:
cell = row[col]
# If we find it, print it out
if re.match(".*TOM.*", cell):
print(f'Found at Index: {index} Column: {col}')
something like this loops through all of the columns and looking for a string match
for column in excelFile:
if 'tom' in column.lower():
print(column)