remove selected csv column in python - python

I have a variable that contains a string of:
fruit_wanted = 'banana,apple'
I also have a csv file
fruit,'orange','grape','banana','mango','apple','strawberry'
number,1,2,3,4,5,6
value,3,2,2,4,2,1
price,3,2,1,2,3,4
Now how do I delete the column in which the 'fruit' does not listed in the 'fruit_wanted' variable?
So that the outfile would look like
fruit,'banana','apple'
number,3,5
value,2,2
price,1,3
Thank you.

Read the csv file using the DictReader() class, and ignore the columns you don't want:
fruit_wanted = ['fruit'] + ["'%s'" % f for f in fruit_wanted.split(',')]
outfile = csv.DictWriter(open(outputfile, 'wb'), fieldnames=fruit_wanted)
fruit_wanted = set(fruit_wanted)
for row in csv.DictReader(open(inputfile, 'rb')):
row = {k: row[k] for k in row if k in fruit_wanted}
outfile.writerow(row)

Here's some pseudocode:
open the original CSV for input, and the new one for output
read the first row of the original CSV and figure out which columns you want to delete
write the modified first row to the output CSV
for each row in the input CSV:
delete the columns you figured out before
write the modified row to the output CSV

Related

How i can take the integer in string from a row in csv file to python?

Hy guys my teacher has assing me to get the integer from a row string in one column. This all thing is going to be by read a csv file with the help from python.So my terminal dosen't hit but i dont get nothing as a guide problem, i want from every row to take the integer and print them.
Here is my code :
import pandas as pd
tx = [ "T4.csv" ]
for name_csv in tx :
df = pd.read_csv( name_csv, names=["A"])
for row in df:
if row == ('NSIT ,A: ,'):
# i dont know how to use the split for to take the integer and print them !!!!
print("A",row)
else
# i dont know how to use the split for to take the integer and print them !!!!
print("B",row)
Also here is and what it have the the csv file :(i have the just them all in the column A)
NSIT ,A: ,-213
NSIT ,A: ,-43652
NSIT ,B: ,-39
NSIT ,A: ,-2
NSIT ,B: ,-46
At the end i have put my try on python, i hope you guys to understand the problem i have.
df = pd.read_csv( "T4.csv", names=["c1", "c2", "c3"])
print(df.c3)
Read the file one line at a time. Split each line on comma. Print the last item in the resulting list.
with open('T4.csv') as data:
for line in data:
len(tokens := line.split(',')) == 3:
print(tokens[2])
Alternative:
with open('T4.csv') as data:
d = {}
for line in data:
if len(tokens := line.split(',')) == 3:
_, b, c = map(str.strip, tokens)
d.setdefault(b, []).append(c)
for k, v in d.items():
print(k, end='')
print(*v, sep=',', end='')
print(f' sum={sum(map(int, v))}')
Output:
A:-213,-43652,-2 sum=-43867
B:-39,-46 sum=-85
Your question was not very clear. So I assume you want to print out the 3rd column of the CSV file. I also think that you opened the CSV file in Excel, which is why you see that all the data is put in Column A.
A CSV (comma-separated values) file is a plain text file that contains data organised as a table of rows and columns, where each row represents a record, and each column represents a field or attribute of the form.
A newline character typically separates each row of data in a CSV file, and the values in each column are separated by a delimiter character, such as a comma (,). For example, here is a simple CSV file with three rows and three columns:
S.No, Student Name, Student Roll No.
1, Alpha, 123
2, Beta, 456
3, Gamma, 789
For a simple application like what you mention, Pandas might not be required. You can use the standard csvreader library of Python to do this.
Please find the code below to print out the 3rd column of your CSV file.
import csv
with open("T4.csv") as csv_file:
csv_reader = csv.reader(csv_file, delimiter=",")
headers = next(csv_reader) # Get the column headers
print(headers[2]) # Print the 3rd column header
for row in csv_reader:
print(row[2]) # Print the 3rd column data

Create and append CSV row by row

I would just like to create a csv file and at the same time add my data row by row with a for loop.
for x in y:
newRow = "\n%s,%s\n" % (sentence1, sentence2)
with open('Mydata.csv', "a") as f:
f.write(newRow)
After the above process, I tried to read the csv file but I can't separate the columns. It seems that there is only one column, maybe I did something wrong in the csv creation process?
colnames = ['A_sentence', 'B_sentence']
Mydata = pd.read_csv(Mydata, names=colnames, delimiter=";")
print(Mydata['A_sntence']) #output Nan
When you are writing the file, it looks like you are using commas as separators, but when reading the file you are using semicolons (probably just a typo). Change delimiter=";" to delimiter="," and it should work.

Need a way to take three csv files and put into one as well as remove duplicates and replace values in Python

I'm new to Python but I need help creating a script that will take in three different csv files, combine them together, remove duplicates from the first column as well as remove any rows that are blank, then change a revenue area to a number.
The three CSV files are setup the same.
The first column is a phone number and the second column is a revenue area (city).
The first column will need all duplicates & blank values removed.
The second column will have values like "Macon", "Marceline", "Brookfield", which will need to be changed to a specific value like:
Macon = 1
Marceline = 8
Brookfield = 4
And then if it doesn't match one of those values put a default value of 9.
Welcome to Stack Overflow!
Firstly, you'll want to be using the csv library for the "reader" and "writer" functions, so import the csv module.
Then, you'll want to open the new file to be written to, and use the csv.writer function on it.
After that, you'll want to define a set (I name it seen). This will be used to prevent duplicates from being written.
Write your headers (if you need them) to the new file using the writer.
Open your first old file, using csv module's "reader". Iterate through the rows using a for loop, and add the rows to the "seen" set. If a row has been seen, simply "continue" instead of writing to the file. Repeat this for the next two files.
To assign the values to the cities, you'll want to define a dictionary that holds the old names as the keys, and new values for the names as the values.
So, your code should look something like this:
import csv
myDict = {'Macon' : 1, 'Marceline' : 8, 'Brookfield' : 4}
seen = set()
newFile = open('newFile.csv', 'wb', newline='') #newline argument will prevent the writer from writing extra newlines, preventing empty rows.
writer = csv.writer(newFile)
writer.writerow(['Phone Number', 'City']) #This will write a header row for you.
#Open the first file, read each row, skip empty rows, skip duplicate rows, change value of "City", write to new file.
with open('firstFile.csv', 'rb') as inFile:
for row in csv.reader(inFile):
if any(row):
row[1] = myDict[row[1]]
if row in seen:
continue
seen.add(row)
writer.writerow(row)
#Open the second file, read each row, skip if row is empty, skip duplicate rows, change value of "City", write to new file.
with open('secondFile.csv', 'rb') as inFile:
for row in csv.reader(inFile):
if any(row):
row[1] = myDict[row[1]]
if row in seen:
continue
seen.add(row)
writer.writerow(row)
#Open the third file, read each row, skip empty rows, skip duplicate rows, change value of "City", write to new file.
with open('thirdFile.csv', 'rb') as inFile:
for row in csv.reader(inFile):
if any(row):
row[1] = myDict[row[1]]
if row in seen:
continue
seen.add(row)
writer.writerow(row)
#Close the output file
newFile.close()
I have not tested this myself, but it is very similar to two different programs that I wrote, and I have attempted to combine them into one. Let me know if this helps, or if there is something wrong with it!
-JCoder96

How to add key-pair values to an open csv file?

I am new to Python. I have used just letters to simplify my code below.My code writes a CSV file with columns of a,b,c,d values,each has 10 rows (length). I would like to add the average value of c and d to the same CSV file as well as an additional two columns each have one row for ave values. I have tried to append field names and write the new values but it didn't work.
with open('out.csv', 'w') as csvfile:
fieldnames=['a','b','c','d']
csv_writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
csv_writer.writeheader()
total_c=0
total_d=0
for i in range(1,length):
do something get a,b,c,d values.
total_c += c
total_d += d
csv_writer.writerow({'a': a,'b':b,'c':c,'d':d })
mean_c=total_c /length
mean_c=total_c /length
I expect to see something in the picture:
Try to use pandas library to deal with csv file. I provided sample code below, I assume that csv file has no header present on the first line.
import pandas as pd
data = pd.read_csv('out.csv',header=[['a','b','c','d'])
#making sure i am using copy of dataframe
avg_data = data.copy()
#creating new columns average in same dataframe
avg_data['mean_c'] = avg_data.iloc[:,2].mean(axis=1)
avg_data['mean_d'] = avg_data.iloc[:,3].mean(axis=1)
# writing updated data to csv file
avg_data.to_csv('out.csv', sep=',', encoding='utf-8')

Copying one column of a CSV file and adding it to another file using python

I have two files, the first one is called book1.csv, and looks like this:
header1,header2,header3,header4,header5
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
The second file is called book2.csv, and looks like this:
header1,header2,header3,header4,header5
1,2,3,4
1,2,3,4
1,2,3,4
My goal is to copy the column that contains the 5's in book1.csv to the corresponding column in book2.csv.
The problem with my code seems to be that it is not appending right nor is it selecting just the index that I want to copy.It also gives an error that I have selected an incorrect index position. The output is as follows:
header1,header2,header3,header4,header5
1,2,3,4
1,2,3,4
1,2,3,41,2,3,4,5
Here is my code:
import csv
with open('C:/Users/SAM/Desktop/book2.csv','a') as csvout:
write=csv.writer(csvout, delimiter=',')
with open('C:/Users/SAM/Desktop/book1.csv','rb') as csvfile1:
read=csv.reader(csvfile1, delimiter=',')
header=next(read)
for row in read:
row[5]=write.writerow(row)
What should I do to get this to append properly?
Thanks for any help!
What about something like this. I read in both books, append the last element of book1 to the book2 row for every row in book2, which I store in a list. Then I write the contents of that list to a new .csv file.
with open('book1.csv', 'r') as book1:
with open('book2.csv', 'r') as book2:
reader1 = csv.reader(book1, delimiter=',')
reader2 = csv.reader(book2, delimiter=',')
both = []
fields = reader1.next() # read header row
reader2.next() # read and ignore header row
for row1, row2 in zip(reader1, reader2):
row2.append(row1[-1])
both.append(row2)
with open('output.csv', 'w') as output:
writer = csv.writer(output, delimiter=',')
writer.writerow(fields) # write a header row
writer.writerows(both)
Although some of the code above will work it is not really scalable and a vectorised approach is needed. Getting to work with numpy or pandas will make some of these tasks easier so it is great to learn a bit of it.
You can download pandas from the Pandas Website
# Load Pandas
from pandas import DataFrame
# Load each file into a pandas dataframe, this is based on a numpy array
data1 = DataFrame.from_csv('csv1.csv',sep=',',parse_dates=False)
data2 = DataFrame.from_csv('csv2.csv',sep=',',parse_dates=False)
#Now add 'header5' from data1 to data2
data2['header5'] = data1['header5']
#Save it back to csv
data2.to_csv('output.csv')
Regarding the "error that I have selected an incorrect index position," I suspect this is because you're using row[5] in your code. Indexing in Python starts from 0, so if you have A = [1, 2, 3, 4, 5] then to get the 5 you would do print(A[4]).
Assuming the two files have the same number of rows and the rows are in the same order, I think you want to do something like this:
import csv
# Open the two input files, which I've renamed to be more descriptive,
# and also an output file that we'll be creating
with open("four_col.csv", mode='r') as four_col, \
open("five_col.csv", mode='r') as five_col, \
open("five_output.csv", mode='w', newline='') as outfile:
four_reader = csv.reader(four_col)
five_reader = csv.reader(five_col)
five_writer = csv.writer(outfile)
_ = next(four_reader) # Ignore headers for the 4-column file
headers = next(five_reader)
five_writer.writerow(headers)
for four_row, five_row in zip(four_reader, five_reader):
last_col = five_row[-1] # # Or use five_row[4]
four_row.append(last_col)
five_writer.writerow(four_row)
Why not reading the files line by line and use the -1 index to find the last item?
endings=[]
with open('book1.csv') as book1:
for line in book1:
# if not header line:
endings.append(line.split(',')[-1])
linecounter=0
with open('book2.csv') as book2:
for line in book2:
# if not header line:
print line+','+str(endings[linecounter]) # or write to file
linecounter+=1
You should also catch errors if row numbers don't match.

Categories