How to access the paritcular column in csv file using pandas - python

While I'm trying to compare the two csv and then access the particular column of both
I can't access it .
Need the remedy to access the particular column of csv file while looping.

If you are using pandas, then after you create a dataframe, use df['your column name'] to access that particular column. Please note that the name you provide as 'your column name' should exactly match that in your csv file.
Eg:
import numpy as np
import pandas as pd
df = pd.read_csv('path to your file')
column = df['your column name']

Related

Read Tables from an excel sheet and make them dataframe and merge those dataframes pandas

I have multiple Tables on a single excel sheet.
I want to make a data frame of each table and then merge them.
Each Table has some columns matching and some not matching.
I have tried and could identify table ranges under different group codes but couldn't find any further solution.
Refer attached sheet and
https://1drv.ms/x/s!At2EihO7V0hmujF_VNOrj3xJjZWW?e=Exbvti
I tried the below code
I don't know how to grab convert these identified tables into dataframe and convert them.
Please help me with this
screenshot of data
groups sheet data
import pandas as pd
import glob
import os
#reading excel files folder
sheets = pd.read_excel(r"D:\PO-1473-Scaff-June22efff.xlsx",header=None)
sheets.dropna(axis=1, how="all" , inplace=True)
sheets.dropna(subset = [2] , inplace=True)
# Set a unique index for each group
sheets["group_id"] = (sheets[0] == "SN").cumsum()

Inserting Data into an Excel file using Pandas - Python

I have an excel file that contains the names of 60 datasets.
I'm trying to write a piece of code that "enters" the Excel file, accesses a specific dataset (whose name is in the Excel file), gathers and analyses some data and finally, creates a new column in the Excel file and inserts the information gathered beforehand.
I can do most of it, except for the part of adding a new column and entering the data.
I was trying to do something like this:
path_data = **the path to the excel file**
recap = pd.read_excel(os.path.join(path_data,'My_Excel.xlsx')) # where I access the Excel file
recap['New information Column'] = Some Value
Is this a correct way of doing this? And if so, can someone suggest a better way (that works ehehe)
Thank you a lot!
You can import the excel file into python using pandas.
import pandas as pd
df = pd.read_excel (r'Path\Filename.xlsx')
print (df)
If you have many sheets, then you could do this:
import pandas as pd
df = pd.read_excel (r'Path\Filename.xlsx', sheet_name='sheetname')
print (df)
To add a new column you could do the following:
df['name of the new column'] = 'things to add'
Then when you're ready, you can export it as xlsx:
import openpyxl
# to excel
df.to_excel(r'Path\filename.xlsx')

Split multiple times?

So I'm currently transferring a txt file into a csv. It's mostly cleaned up, but even after splitting there are still empty columns between some of my data.
Below is my messy CSV file
And here is my current code:
Sat_File = '/Users'
output = '/Users2'
import csv
import matplotlib as plt
import pandas as pd
with open(Sat_File,'r') as sat:
with open(output,'w') as outfile:
if "2004" in line:
line=line.split(' ')
writer=csv.writer(outfile)
writer.writerow(line)
Basically, I'm just trying to eliminate those gaps between columns in the CSV picture I've provided. Thank you!
You can use python Pandas library to clear out the empty columns:
import pandas as pd
df = pd.read_csv('path_to_csv_file').dropna(axis=1, how='all')
df.to_csv('path_to_clean_csv_file')
Basically we:
Import the pandas library.
Read the csv file into a variable called df (stands for data frame).
Than we use the dropna function that allows to discard empty columns/rows. axis=1 means drop columns (0 means rows) and how='all' means drop columns all of the values in them are empty.
We save the clean data frame df to a new, clean csv file.
$$$ Pr0f!t $$$

Pandas/Python/Dropna: Renaming header column names after a dropna takes place with intention to import to MySQL

With the code below, I've successfully removed rows where values may be blank in my CSV file, which consists of 33 columns.
import pandas as pd
from sqlalchemy import create_engine
data = pd.read_csv('TestCSV.csv', sep=',')
data.dropna()
data.dropna().to_csv('CleanCSV.csv', index=False)
Now, the intention is to rename the 33 header columns within the file to my own, to then to import the contents of the new (with the newly named headers) into my MySQL database with the following code, which is missing the renaming of the headers:
data = pd.read_csv('CleanCSV.csv', sep=',')
cnx = create_engine('mysql+pymysql://root:password#localhost:3306/schema', echo=False)
data.to_sql(name='t_database', con=cnx, if_exists='append', index=False)
I've read up slightly on DataFrames but is this option still valid for when the contents is in a CSV file? If so, how do I assign the newly dropna's contents to a DataFrame and from there, rename the headers of the columns, after which I intend to import to MySQL?
Thank you in advance.
Before you create the new csv, do this
new_df = data.dropna().rename(columns={'oldcol1': 'newcol1', 'oldcol2': 'newcol2})
The columns argument is a dictionary with key and values as old and new column names respectively.

how to save Python pandas data into excel file?

I am trying to load data from the web source and save it as a Excel file but not sure how to do it. What should I do? The original dataframe has different columns. Let's say that I am trying to save 'Open' column
import matplotlib.pyplot as plt
import pandas_datareader.data as web
import datetime
import pandas as pd
def ViewStockTrend(compcode):
start = datetime.datetime(2015,2,2)
end = datetime.datetime(2016,7,13)
stock = web.DataReader(compcode,'yahoo',start,end)
print(stock['Open'])
compcode = ['FDX','GOOGL','FB']
aa= ViewStockTrend(compcode)
Once you have made the pandas dataframe just use to_excel on the entire thing if you want:
aa.to_excel('output/filename.xlsx')
If stock is a pandas DataFrame, you need to construct a new Framefrom that column and output that one to excel:
df = pd.DataFrame(stock['Open'])
df.to_excel('path/to/your/file')

Categories