Deleting an unnamed column from a csv file Pandas Python - python

I ma trying to write a code that deletes the unnamed column , that comes right before Unix Timestamp. After deleting I will save the modified dataframe into data.csv. How would I be able to get the Expected Output below?
import pandas ads pd
data = pd.read_csv('data.csv')
data.drop('')
data.to_csv('data.csv')
data.csv file
,Unix Timestamp,Date,Symbol,Open,High,Low,Close,Volume
0,1635686220,2021-10-31 13:17:00,BTCUSD,60638.0,60640.0,60636.0,60638.0,0.4357009185659157
1,1635686160,2021-10-31 13:16:00,BTCUSD,60568.0,60640.0,60568.0,60638.0,3.9771881707839967
2,1635686100,2021-10-31 13:15:00,BTCUSD,60620.0,60633.0,60565.0,60568.0,1.3977284440628714
Updated csv (Expected Output):
Unix Timestamp,Date,Symbol,Open,High,Low,Close,Volume
1635686220,2021-10-31 13:17:00,BTCUSD,60638.0,60640.0,60636.0,60638.0,0.4357009185659157
1635686160,2021-10-31 13:16:00,BTCUSD,60568.0,60640.0,60568.0,60638.0,3.9771881707839967
1635686100,2021-10-31 13:15:00,BTCUSD,60620.0,60633.0,60565.0,60568.0,1.3977284440628714

This is the index. Use index=False in to_csv.
data.to_csv('data.csv', index=False)

Set the first column as index df = pd.read_csv('data.csv', index_col=0) and set index=False when writing the results.

you can follow below code.it will take column from 1st position and then you can save that df to csv without index values.
df = df.iloc[:,1:]
df.to_csv("data.csv",index=False)

Related

How to drop the index after creating the csv file in pandas

I am trying to select couple of columns based on column heading with wild card and one more column. When I execute the below code , I am getting the expected result, but there is an index which is appearing. how to drop the index . Any suggestions.
infile:
dir,name,ct1,cn1,ct2,cn2
991,name1,em,a#email.com,ep,1234
999,name2,em,b#email.com,ep,12345
872,name3,em,c#email.com,ep,123456
here is the code which I used.
import pandas as pd
df=pd.read_csv('infile.csv')
df_new=df.loc[:,df.columns.str.startswith('c')]
df_new_1=pd.read_csv('name.csv', usecols= ['dir'])
df_merge=pd.concat([df_new,df_new_1],axis=1, join="inner")
df_merge.to_csv('outfile.csv')
Pass false for index when you save to csv :
df_merge.to_csv('outfile.csv', index=False)

Read from a .csv the first n rows and store the column in to a list

I am facing a problem with implanting a python code that reads the first n rows from a .csv file and store the values of the columns in a list . the length of the list has to be 2000, and the list will be used to create a plot
The columns in the .csv file are not labeled
You can use pandas to do this:
import pandas as pd
df = pd.read_csv("test.csv", nrows=2000, header=None) #header = None avoids the first row to be read as column names
df_list = df.values.tolist()
this might help but for future reference , read pandas documentation
import pandas as pd
df = pd.read_csv(<full_path_to_file>,nrows=2000,sep=<file_seperator>)
col_list = list(df.columns)

How to read single column of xlsx file into a dataframe?

I have an .xlsx file with 5 sheets, each sheet has 4 columns and I need to read the first column of the 5th sheet into a column of a dataframe.
I've tried this:
df = read_excel('file_path.xlsx', sheet_names='sheet_5', index_col='column_name'
However this seems to copy the whole sheet into the dataframe rather than just the first column.
Thanks to #Quang Hoang's comment, I found the solution.
df = pd.read_excel('file_path.xlsx', sheet_name, usecols=['column_name'])
The usecols option in read_excel only read in the column I wanted into the dataframe
Hey lets try it this way.
import pandas as pd
df = pd.read_csv ('path/to/file.csv', sheet_name = '5', index_col = 0)
print(df[['column_name']])
Tell me if it works, I recommend you reading documentation
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html

How to convert Python dict to DataFrame columns after Pandas version 0.21.0?

I am trying to run the same script over two computers, and have my dict structure, data, convert to Pandas DataFrame.
df = pd.DataFrame(data, columns=[column_label])
df.to_csv('./result.csv', mode='w', index=False)
It works perfectly on the computer with Pandas version less than 0.21.0. However, when I execute the same code over to my server, suddenly it would generate a csv file with just the column labels, and none of the data.
I tried to print out the values of df, and on the server it's all NAN.
When I remove the columns part like this:
df = pd.DataFrame(data)
df.to_csv('./result.csv', mode='w', index=False)
Suddenly the data are back, albeit the columns are missing and the data are not in order.
If I do
df = pd.DataFrame(data)
df.columns = column_label
df.to_csv('./result.csv', mode='w', index=False)
The column labels are back, but the column labels are out of order, and the data is also doesn't match the order of the column labels.
If I do
df = pd.DataFrame(columns=[column_label])
df = pd.DataFrame(data)
df.to_csv('./result.csv', mode='w', index=False)
The the data matches the order of the column labels, but the column labels themselves are out of order...
I have since upgrade my pandas library on the computer that was working from 0.17.0 to 0.22.0, and it has also stopped working.
So for some reason, assigning data and columns=[column_label] to a DataFrame in one line seems to break the dict to DataFrame conversion after Pandas version 0.21.0.
How should I do columns assignment with the newer versions of Pandas?
I am assuming column_label is a list.The columns parameter of pandas just requires a list of column names. What you are doing by passing [column_label] to the columns parameter is passing a list of lists. Try without [].
df = pd.DataFrame(data, columns=column_label)

converting column names to integer with read_csv

I have constructed a matrix with integer values for columns and index. The matrix is acutally hierachical for each month. My problem is that the indexing and selecting of data does not work anymore as before when I write the data to csv and then load as pandas dataframe.
Selecting data before writing and reading data to file:
matrix.ix[1][4][3] would for example give 123
In words select, month January and get me the (travel) flow from origin 4 to destination 3.
After writing and reading the data to csv and back into pandas, the original referencing fails but if I convert the column indexing to string it works:
matrix.ix[1]['4'][3]
... the column names have automatically been tranformed from integer into string. But I would prefer the original indexing.
Any suggestions?
My current quick fix for handling the data after loading from csv is:
#Writing df to file
mulitindex_df_Travel_monthly.to_csv(r'result/Final_monthly_FlightData_countrylevel_v4.csv')
#Loading df from csv
test_matrix = pd.read_csv(filepath_inputdata+'/Final_monthly_FlightData_countrylevel_v4.csv',
index_col=[0, 1])
test_matrix.rename(columns = int, inplace = True) #Thx, #ayhan
CSV FILE:
https://www.dropbox.com/s/4u2opzh65zwcn81/travel_matrix_SO.csv?dl=0
I used something like this:
df = df.rename(columns={str(c): c for c in columns})
where:
df is pandas dataframe and columns are column to change
You could also do
df.columns = df.columns.astype(int)
or
df.columns = df.columns.map(int)
Related: what is difference between .map(str) and .astype(str) in dataframe

Categories