Add columns to existing CSV file - python

I am trying to add new columns to an existing csv that already has rows and columns that looks like this:
I would like it to append all the new column names to the columns after column 4.
The code I currently have is adding all the new columns to the bottom of the csv:
def extract_data_from_report3():
with open('OMtest.csv', 'a', newline='') as f_out:
writer = csv.writer(f_out)
writer.writerow(
['OMGroup:OMRegister', 'OMGroup', 'OMRegister', 'RegisterType', 'Measures', 'Description', 'GeneratedOn'])
Is there any way to do this effectively?

You can use the pandas lib, without iterating through the values. Here an example
new_header = ['OMGroup:OMRegister', 'OMGroup', 'OMRegister', 'RegisterType', 'Measures', 'Description', 'GeneratedOn']
# Import pandas package
import pandas as pd
my_df = pd.read_csv(path_to_csv)
for column_name in new_header:
new_column = [ ... your values ...] #should be a list of your dataframe size
my_df[column_name] = new_column
keep in mind that the new column should have the same size of the number of rows of your table to work
If you need only to add the new columns without values, you can do as such:
for column_name in new_header:
new_column = ["" for i in range(len(mydf.index))] #should be a list of dataframe size
my_df[column_name] = new_column
Then you can write back the csv in this way:
my_df.to_csv(path_to_csv)
Here details on the read_csv method
Here details on the to_csv method

Related

Extra column appears when appending selected row from one csv to another in Python

I have this code which appends a column of a csv file as a row to another csv file:
def append_pandas(s,d):
import pandas as pd
df = pd.read_csv(s, sep=';', header=None)
df_t = df.T
df_t.iloc[0:1, 0:1] = 'Time Point'
df_t.at[1, 0] = 1
df_t.columns = df_t.iloc[0]
df_new = df_t.drop(0)
pdb = pd.read_csv(d, sep=';')
newpd = pdb.append(df_new)
from pandas import DataFrame
newpd.to_csv(d, sep=';')
The result is supposed to look like this:
Instead, every time the row is appended, there is an extra "Unnamed" column appearing on the left:
Do you know how to fix that?..
Please, help :(
My csv documents from which I select a column look like this:
You have to add index=False to your to_csv() method

How to skip rows while importing csv?

How to skip the rows based on certain value in the first column of the dataset. For example: if the first column has some unwanted stuffs in the first few rows and i want skip those rows upto a trigger value. please help me for importing csv in python
You can achieve this by using the argument skip_rows
Here is sample code below to start with:
import pandas as pd
df = pd.read_csv('users.csv', skiprows=<the row you want to skip>)
For a series of CSV files in the folder, you could use the for loop, read the CSV file and remove the row from the df containing the string.Lastly, concatenate it to the df_overall.
Example:
from pandas import DataFrame, concat, read_csv
df_overall = DataFrame()
dir_path = 'Insert your directory path'
for file_name in glob.glob(dir_path+'*.csv'):
df = pd.read_csv('file_name.csv', header=None)
df = df[~df. < column_name > .str.contains("<your_string>")]
df_overall = concat(df_overall, df)

Best way to read a text data file with key="value" format?

I have a text file formatted like:
item(1) description="Tofu" Group="Foods" Quantity=5
item(2) description="Apples" Group="Foods" Quantity=10
What's the best way to read this style of format in Python?
Here's one way you could do this in pandas to get a DataFrame of your items.
(I copy-pasted your text file into "test.txt" for testing purposes.)
This method automatically assigns column names and sets the item(...) column as the index. You could also assign the column names manually, which would change the script a bit.
import pandas as pd
# read in the data
df = pd.read_csv("test.txt", delimiter=" ", header=None)
# set the index as the first column
df = df.set_index(0)
# capture our column names, to rename columns
column_names = []
# for each column...
for col in df.columns:
# extract the column name
col_name = df[col].str.split("=").str[0].unique()[0]
column_names.append(col_name)
# extract the data
col_data = df[col].str.split("=").str[1]
# optional: remove the double quotes
try:
col_data = col_data.replace('"', "")
except:
pass
# store just the data back in the column
df[col] = col_data
# store our new column names
df.columns = column_names
There are probably a lot of ways to do this based on what you're trying to accomplish and how much variation you expect in the data.

Append only new values to CSV from DataFrame in Python

Let's suppose I have a CSV file which looks like this:
Date,High,Low,Open,Close,Volume,Adj Close
1980-12-12,0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907
1980-12-15,0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
1980-12-16,0.453125,0.4508928656578064,0.453125,0.4508928656578064,26432000.0,0.02020619809627533
I have also a Pandas DataFrame which has exactly the same values but also the new entries. My goal is to append to the CSV file only the new values.
I tried like this, but unfortunately this append not only the new entries, but the old ones also:
df.to_csv('{}/{}'.format(FOLDER, 'AAPL.CSV'), mode='a', header=False)
You can just re-read your csv file after writing it and drop any duplicates before appending the newly fetched data.
The following code was working for me:
import pandas as pd
# Creating original csv
columns = ['Date','High','Low','Open','Close','Volume','Adj Close']
original_rows = [["1980-12-12",0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907], ["1980-12-15",0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
]]
df_original = pd.DataFrame(columns=columns, data=original_rows)
df_original.to_csv('AAPL.CSV', mode='w', index=False)
# Fetching the new data
rows_updated = [["1980-12-12",0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907], ["1980-12-15",0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
], ["1980-12-16",0.453125,0.4508928656578064,0.453125,0.4508928656578064,26432000.0,0.02020619809627533]]
df_updated = pd.DataFrame(columns=columns, data=rows_updated)
# Read in current csv values
current_csv_data = pd.read_csv('AAPL.CSV')
# Drop duplicates and append only new data
new_entries = pd.concat([current_csv_data, df_updated]).drop_duplicates(subset='Date', keep=False)
new_entries.to_csv('AAPL.CSV', mode='a', header=False, index=False)

How to Perform Mathematical Operation on One Value of a CSV file?

I am dealing with a csv file that contains three columns and three rows containing numeric data. The csv data file simply looks like the following:
Colum1,Colum2,Colum3
1,2,3
1,2,3
1,2,3
My question is how to write a python code that take a single value of one of the column and perform a specific operation. For example, let say I want to take the first value in 'Colum1' and subtract it from the sum of all the values in the column.
Here is my attempt:
import csv
f = open('columns.csv')
rows = csv.DictReader(f)
value_of_single_row = 0.0
for i in rows:
value_of_single_Row += float(i) # trying to isolate a single value here!
print value_of_single_row - sum(float(r['Colum1']) for r in rows)
f.close()
Based on the code you provided, I suggest you take a look at the doc to see the preferred approach on how to read through a csv file. Take a look here:
How to use CsvReader
with that being said, you can modify the beginning of your code slightly to this:
import csv
with open('data.csv', 'rb') as f:
rows = csv.DictReader(f)
for row in rows:
# perform operation per row
From there you now have access to each row.
This should give you what you need to do proper row-by-row operations.
What I suggest you do is play around with printing out your rows to see what your data looks like. You will see that each row being outputted is a dictionary.
So if you were going through each row, you can just simply do something like this:
for row in rows:
row['Colum1'] # or row.get('Colum1')
# to do some math to add everything in Column1
s += float(row['Column1'])
So all of that will look like this:
import csv
s = 0
with open('data.csv', 'rb') as f:
rows = csv.DictReader(f)
for row in rows:
s += float(row['Colum1'])
You can do pretty much all of this with pandas
from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd
import sys
import os
Location = r'path/test.csv'
df = pd.read_csv(Location, names=['Colum1','Colum2','Colum3'])
df = df[1:] #Remove the headers since they're unnecessary
print df
df.xs(1)['Colum1']=int(df.loc[1,'Colum1'])+5
print df
You can write back to your csv using df.to_csv('File path', index=False,header=True) Having headers=True will add the headers back in.
To do this more along the lines of what you have you can do it like this
import csv
Location = r'C:/Users/tnabrelsfo/Documents/Programs/Stack/test.csv'
data = []
with open(Location, 'r') as f:
for line in f:
data.append(line.replace('\n','').replace(' ','').split(','))
data = data[1:]
print data
data[1][1] = 5
print data
it will read in each row, cut out the column names, and then you can modify the values by index
So here is my simple solution using pandas library. Suppose we have sample.csv file
import pandas as pd
df = pd.read_csv('sample.csv') # df is now a DataFrame
df['Colum1'] = df['Colum1'] - df['Colum1'].sum() # here we replace the column by subtracting sum of value in the column
print df
df.to_csv('sample.csv', index=False) # save dataframe back to csv file
You can also use map function to do operation to one column, for example,
import pandas as pd
df = pd.read_csv('sample.csv')
col_sum = df['Colum1'].sum() # sum of the first column
df['Colum1'] = df['Colum1'].map(lambda x: x - col_sum)

Categories