Vlookup in python

Vlookup in python - python

I am new to python and leaning as fast as possible. I know how to do my problem in bash and trying to work on python.
I have a data file (data_array.csv in the example) and index file, index.csv, at which I want to extract the data from the data file that have the same ID in the index file and store in to a new file, Out.txt. I also want to put NA ,in the Out.txt, for those ID's that have no value in the data file. I know how to do it for one column. But my data has more than 1000 columns (from 1 to 1344). I want you help me with a script that can do it faster. My data file, index id and proposed out put as follows.
data_array.csv
Id 1 2 3 . . 1344
1 10 20 30 . . -1
2 20 30 40 . . -2
3 30 40 50 . . -3
4 40 50 60 . . -4
6 60 60 70 . . -5
8 80 70 80 . . -6
10 100 80 90 . . -7
index.csv
Id
1
2
8
9
10
Required Output is
Out.txt
Id 1 2 3 . . 1344
1 10 20 30 . . -1
2 20 30 40 . . -2
8 80 70 80 . . -6
9 NA NA NA NA
10 100 80 90 . . -7
I tried
#! /usr/bin/python
import csv
with open('data_array.csv','r') as lookuplist:
with open('index.csv', "r") as csvinput:
with open('VlookupOut','w') as output:
reader = csv.reader(lookuplist)
reader2 = csv.reader(csvinput)
writer = csv.writer(output)
for i in reader2:
for xl in reader:
if i[0] == xl[0]:
i.append(xl[1:])
writer.writerow(i)
But it only do for the first row. I want the program to work for the entire rows and columns of my data files.

It only output the first row because after xl in reader for the first time, you are at the end of the file. You need to point to the beginning of the file after that. To increase efficiency, you can read the csvinput into a dictionary first, then use dictionary lookup to get the row you need:
#! /usr/bin/python
import csv
with open('data_array.csv','r') as lookuplist:
with open('index.csv', "r") as csvinput:
with open('VlookupOut','w') as output:
reader = csv.reader(lookuplist)
reader2 = csv.reader(csvinput)
writer = csv.writer(output)
d = {}
for xl in reader2:
d[xl[0]] = xl[1:]
for i in reader:
if i[0] in d:
i.append(d[i[0]])
writer.writerow(i)

When you read a CSV file using for xl in readerit will go through every row until it reaches the end. But it will only do this once. You can tell it to go back to the first row of the CSV file by using .seek(0).
#! /usr/bin/python
import csv
with open('data_array.csv','r') as lookuplist:
with open('index.csv', "r") as csvinput:
with open('VlookupOut','w') as output:
reader = csv.reader(lookuplist)
reader2 = csv.reader(csvinput)
writer = csv.writer(output)
for i in reader2:
for xl in reader:
if i[0] == xl[0]:
i.append(xl[1:])
writer.writerow(i)
lookuplist.seek(0)

Related

How to read specific columns in the csv file?

I have lots of live data coming from sensor. Currently, I stored the data in a csv file as following:
0 2 1 437 464 385 171 0:44:4 dog.jpg
1 1 3 452 254 444 525 0:56:2 cat.jpg
2 3 2 552 525 785 522 0:52:8 car.jpg
3 8 4 552 525 233 555 0:52:8 car.jpg
4 7 5 552 525 433 522 1:52:8 phone.jpg
5 9 3 552 525 555 522 1:52:8 car.jpg
6 6 6 444 392 111 232 1:43:4 dog.jpg
7 1 1 234 322 191 112 1:43:4 dog.jpg
.
.
.
.
Third column has numbers between 1 to 6. I want to read information of columns #4 and #5 for all the rows that have number 2 and 5 in the third columns. I also want to write them in another csv file line by line every 2 second, one line at the time.
I do so because I have another code which would go through the data and read the data from there. I was wondering how could I write the information for the lines that have 3 and 5 in their 3rd column? Please advise!
for example:
2 552 525
5 552 525
......
......
.....
.
import csv
with open('newfilename.csv', 'w') as f2:
with open('mydata.csv', mode='r') as infile:
reader = csv.reader(infile) # no conversion to list
header = next(reader) # get first line
for row in reader: # continue to read one line per loop
if row[5] == 2 & 5:

The third column has index 2 so you should be checking if row[2] is one of '2' or '5'. I have done this by defining the set select = {'2', '5'} and checking if row[2] in select.
I don't see what you are using header for but I assume you have more code that processes header somewhere. If you don't need header and just want to skip the first line, just do next(reader) without assigning it to header but I have kept header in my code under the assumption you use it later.
We can use time.sleep(2) from the time module to help us write a row every 2 seconds.
Below, "in.txt" is the csv file containing the sample input you provided and "out.txt" is the file we write to.
Code
import csv
import time
select = {'2', '5'}
with open("in.txt") as f_in, open("out.txt", "w") as f_out:
reader = csv.reader(f_in)
writer = csv.writer(f_out)
header = next(reader)
for row in reader:
if row[2] in select:
print(f"Writing {row[2:5]} at {time.time()}")
writer.writerow(row[2:5])
# f_out.flush() may need to be run here
time.sleep(2)
Output
Writing ['2', '552', '525'] at 1650526118.9760585
Writing ['5', '552', '525'] at 1650526120.9763758
"out.txt"
2,552,525
5,552,525
Input
"in.txt"
0,2,1,437,464,385,171,0:44:4,dog.jpg
1,1,3,452,254,444,525,0:56:2,cat.jpg
2,3,2,552,525,785,522,0:52:8,car.jpg
3,8,4,552,525,233,555,0:52:8,car.jpg
4,7,5,552,525,433,522,1:52:8,phone.jpg
5,9,3,552,525,555,522,1:52:8,car.jpg
6,6,6,444,392,111,232,1:43:4,dog.jpg
7,1,1,234,322,191,112,1:43:4,dog.jpg

I think you'd just need to change your if statement to be able to get the rows you want.
for example:
import csv
with open('newfilename.csv', 'w') as f2:
with open('mydata.csv', mode='r') as infile:
reader = csv.reader(infile) # no conversion to list
header = next(reader) # get first line
for row in reader: # continue to read one line per loop
if row[5] in [2,5]:
inside the if, you'll get the rows that have 2 or 5

Issue computing difference between two csv files

I'm trying to obtain the difference between two csv files A.csv and B.csv in order to obtain new rows added in the second file. A.csv has the following data.
acct ABC 88888888 99999999 ABC-GHD 4/1/18 4 1 2018 Redundant/RSK
B.csv has the following data.
acct ABC 88888888 99999999 ABC-GHD 4/1/18 4 1 2018 Redundant/RSK
acct ABC 88888888 99999999 ABC-GHD 4/1/18 4 1 2018 DT/89
To write the new rows added into an output file I'm using the following script.
input_file1 = "A.csv"
input_file2 = "B.csv"
output_path = "out.csv"
with open(input_file1, 'r') as t1:
fileone = set(t1)
with open(input_file2, 'r') as t2, open(output_path, 'w') as outFile:
for line in t2:
if line not in fileone:
outFile.write(line)
Expected output is :
acct ABC 88888888 99999999 ABC-GHD 4/1/18 4 1 2018 DT/89
Output obtained through the above script is :
acct ABC 88888888 99999999 ABC-GHD 4/1/18 4 1 2018 Redundant/RSK
acct ABC 88888888 99999999 ABC-GHD 4/1/18 4 1 2018 DT/89
I'm not sure where I'm making a mistake, tried debugging it but with no progress.

You need to be careful with trailing newlines. As such it is safer to remove the newlines before comparing and then add them back when writing:
input_file1 = "A.csv"
input_file2 = "B.csv"
output_path = "out.csv"
with open(input_file1, 'r') as t1:
fileone = set(t1.read().splitlines())
with open(input_file2, 'r') as t2, open(output_path, 'w') as outFile:
for line in t2:
line = line.strip()
if line not in fileone:
outFile.write(line + '\n')

How to extract lines from csv file to anothe csv file?

By reference to this question: How to extract and copy lines from csv file to another csv file in python? the result That I have after executing this code based on this array:
frame.number frame.len frame.cap_len frame.Type
1 100 100 ICMP_tt
2 64 64 UDP
3 100 100 ICMP_tt
4 87 64 ICMP_nn
5 100 100 ICMP_tt
6 87 64 ICMP_nn
7 100 100 ICMP_tt
8 87 64 ICMP_nn
9 87 64 ICMP_nn
This is the code:
# read
data = []
with open('test.csv', 'r') as f:
f_csv = csv.reader(f)
# header = next(f_csv)
for row in f_csv:
data.append(row)
# write
with open('newtest.csv', 'w+') as f:
writer = csv.writer(f)
for i in range(int(len(data) * 30 / 100)):
writer.writerow(data[i])
This is the result in the newtest.csv file:
frame.number frame.len frame.cap_len frame.Type
empty line....
1 100 100 ICMP_tt
empty line....
2 64 64 UDP
However, I hope that the result looks like this:
frame.number frame.len frame.cap_len frame.Type
1 100 100 ICMP_tt
2 64 64 UDP
The test.csv file stil the same I mean that the two lines copied are not deleted. that means that I want to have:
frame.number frame.len frame.cap_len frame.Type
3 100 100 ICMP_tt
4 87 64 ICMP_nn
5 100 100 ICMP_tt
6 87 64 ICMP_nn
7 100 100 ICMP_tt
8 87 64 ICMP_nn
9 87 64 ICMP_nn
I hope that you can help me please.

To do the read and write separately in Python 2.x, you could use the following approach:
import csv
with open('test.csv', 'rb') as f_input:
data = list(csv.reader(f_input))
with open('newtest.csv', 'wb') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerows(data)
This would mean data holds all of the rows as a list of lists. Make sure the files are opened in binary mode for both reading and writing. If this is not done, you will get empty lines.
To do this one row at at time (useful if the CSV file is too large for memory), you could do the following:
with open('test.csv', 'rb') as f_input, open('newtest.csv', 'wb') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
for row in csv_input:
csv_output.writerow(row)
Or even:
with open('test.csv', 'rb') as f_input, open('newtest.csv', 'wb') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
csv_output.writerows(csv_input)

Python adding cr to lf in binary data?

I was writing a python script which converts an ascii file containing one pair numbers per line to a straight binary representation. Here is my script:
in_file = open("p02_0609.bin", 'r')
out_file = open("sta013.bin", 'w')
out_data = bytearray()
for line in in_file:
addr, i2c_data = [int(x) for x in line.split(" ")]
out_data.append(addr)
out_data.append(i2c_data)
out_file.write(out_data)
out_file.close()
in_file.close()
and a sample of the file it's reading (about 2000 lines total)
58 1
42 4
40 0
41 0
32 0
33 0
34 0
35 0
36 0
37 0
38 0
39 0
40 1
40 2
33 143
40 3
33 0
40 4
40 5
40 6
40 7
40 8
40 9
40 10
40 11
The output file ends on an odd byte, which it shouldn't since all the data is in pairs, and is about 80 bytes longer than expected. After poking around with a hex editor, I finally found the culprit. Every instance of "10" (Ascii LF) has had a CR appended in front of it. How do I make it stop doing that?
Tl;dr: Python is being a dumbass and adding CR to LF in binary data where that makes no sense. How to fix?

You are working with text files so line endings are automatically added by open function. You need to use the mode 'wb' in open for reading and writing bytes.

Read in file as arrays then rearrange columns

I would like to read in a file with multiple columns and write out a new file with columns in a different order than the original file. One of the columns has some extra text that I want eliminated in the new file as well.
For instance, if I read in file: data.txt
1 6 omi=11 16 21 26
2 7 omi=12 17 22 27
3 8 omi=13 18 23 28
4 9 omi=14 19 24 29
5 10 omi=15 20 25 30
I would like the written file to be: dataNEW.txt
26 1 11 16
27 2 12 17
28 3 13 18
29 4 14 19
30 5 15 20
With the help of inspectorG4dget, I came up with this:
import csv as csv
import sys as sys
infile = open('Rearrange Column Test.txt')
sys.stdout = open('Rearrange Column TestNEW.txt' , 'w')
for line in csv.reader(infile, delimiter='\t'):
newline = [line[i] for i in [5, 0, 2, 3]]
newline[2] = newline[2].split('=')[1]
print newline[0], newline[1], newline[2], newline[3]
sys.stdout.close()
Is there a more concise way to get an output without any commas than listing each line index from 0 to the total number of lines?

import csv
with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
writer = csv.writer(outfile)
for line in csv.reader(infile, delimiter='\t'):
newline = [line[i] for i in [-1, 0, 2 3]]
newline[2] = newline[2].split('=')[1]
writer.writerow(newline)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Vlookup in python - python

Related

How to read specific columns in the csv file?

Issue computing difference between two csv files

How to extract lines from csv file to anothe csv file?

Python adding cr to lf in binary data?

Read in file as arrays then rearrange columns

Categories

Resources