How to store a list of data in python - python

def pk():
with open('doc.csv') as csv_file:
lines = csv_file.readlines()
for line in lines[1::]:
array = line.split(',')
list_pk = array[1].replace('"', '')
# print(list_pk)
return list_pk
print(pk())
I wonder why when I print list_pk just right in front of return it prints all extracted data, but when I print pk() it prints only the first row. What should I do to get the list of data when I print pk()?

You are currently returning your list after only a single run through your for loop. The return statement breaks out of loops. You could accumulate results in your for loop and then move the return statement to after it:
def pk():
with open('doc.csv') as csv_file:
lines = csv_file.readlines()
full_list = []
for line in lines[1::]:
array = line.split(',')
list_pk = array[1].replace('"', '')
full_list.append(list_pk) # accumulate results
return full_list # return the full result

Related

How to correctly append the output to an empty list

I am looking for words in my chosen text files.
def cash_sum(self):
with open(self.infile.name, "r") as myfile:
lines = myfile.readlines()
for line in lines:
if re.search("AMOUNT", line):
x = []
x.append(line[26:29])
print(x)
I have this output:
['100']
['100']
['100']
And I want to add them so I can have sum of all amounts from this file.
Any advices?
Initialize x outside the line-iterating loop.
Also, you don't need to readlines() separately...
def cash_sum(self):
x = []
with open(self.infile.name, "r") as myfile:
for line in myfile:
if re.search("AMOUNT", line):
x.append(line[26:29])
# or cast to int before appending:
# x.append(int(line[26:29]))
return x
This will give you the sum...
def cash_sum(self):
with open(self.infile.name, 'r') as myfile:
lines = myfile.readlines()
totalAmount = 0
for line in lines:
if re.search("AMOUNT", line):
totalAmount += float(line[26:29])
First thing I'd recommend is casting that value to a float. I say float instead of int just in case you have a decimal point. For example, float('100') == 100.0 . Once you have that, you should be able to then add them up as you would normal numbers.
You have redefined x inside your loop, so every time your loop iterates, it resets x to and empty list. Try the following:
def cash_sum(self):
with open(self.infile.name, "r") as myfile:
lines = myfile.readlines()
x = [] # That way x is defined outside the loop
for line in lines:
if re.search("AMOUNT", line):
x.append(line[26:29])
print(x)
# And if you want to add each item in the list...
num = float(0)
for number in x:
num = num + float(number)
return num

How to increase the speed of CSV data matching?

I have a scripts that parse two CSV files and compares the first column from one file with the second column from another file. The problem is those files are big and it takes some time to finish the process. The question is how to improve the speed? I tried to use yield from lines before the for cycle but the problem is then I have convert lines[1:] to list(lines[1:]) as result it makes no sense.
def pk():
with open('way/to/first.csv') as csv_file:
lines = csv_file.readlines()
full_list = []
for line in lines[1:]:
array = line.split(',')
list_pk = array[0].replace('"', '')
full_list.append(list_pk)
return full_list
def fk():
with open('way/to/second.csv') as csv_file:
lines = csv_file.readlines()
full_list = []
for line in lines[1:]:
array = line.split(',')
list_fk = array[1].replace('"', '')
full_list.append(list_fk)
return full_list
def res():
f = fk()
p = pk()
for i in f:
if i not in p:
raise AssertionError(f'{i} not found')
Try using python's "set difference" to find the elements in set A that do not have a match in set B:
def res():
fset = set(fk())
pset = set(pk())
print('items in F that are missing from P:')
print(fset - pset)

joining every 4th line in csv-file

I'd like to join every 4th line together so I thought something like this would work:
import csv
filename = "mycsv.csv"
f = open(filename, "rb")
new_csv = []
count = 1
for i, line in enumerate(file(filename)):
line = line.rstrip()
print line
if count % 4 == 0:
new_csv.append(old_line_1 + old_line_2 + old_line_3+line)
else:
old_line_1 = line[i-2]
old_line_2 = line[i-1]
old_line_3 = line
count += 1
print new_csv
But line[i-1] and line[i-2] does not take current line -1 and -2 as I thought. So how can I access current line -1 and -2?
The variable line contains only the line for the current iteration, so accessing line[i-1] will only give you one character within the current line. The other answer is probably the tersest way to put it but, building on your code, you could do something like this instead:
import csv
filename = "mycsv.csv"
with open(filename, "rb") as f:
reader = csv.reader(f)
new_csv = []
lines = []
for i, line in enumerate(reader):
line = line.rstrip()
lines.append(line)
if (i + 1) % 4 == 0:
new_csv.append("".join(lines))
lines = []
print new_csv
This should do as you require
join_every_n = 4
all_lines = [line.rstrip() for line in file(filename)] # note the OP uses some unknown func `file` here
transposed_lines = zip(*[all_lines[n::join_every_n] for n in range(join_every_n)])
joined = [''.join([l1,l2,l3,l4]) for (l1,l2,l3,l4) in transposed_lines]
likewise you could also do
joined = map(''.join, transposed_lines)
Explanation
This will return every i'th element in a your_list with an offset of n
your_list[n::i]
Then you can combine this across a range(4) to generate for every 4 lines in a list such that you get
[[line0, line3, ...], [line1, line4, ...], [line2, line6, ...], [line3, line7, ...]]
Then the transposed_lines is required to transpose this array so that it becomes like
[[line0, line1, line2, line3], [line4, line5, line6, line7], ...]
Now you can simple unpack and join each individual list element
Example
all_lines = map(str, range(100))
transposed_lines = zip(*[all_lines[n::4] for n in range(4)])
joined = [''.join([l1,l2,l3,l4]) for (l1,l2,l3,l4) in transposed_lines]
gives
['0123',
'4567',
'891011',
...

"IndexError: list index out of range" when reading file

Just started learning Python and I'm struggling with this a little.
I'm opening a txt file that will be variable in length and I need to iterate over a user definable amount of lines at a time. When I get to the end of the file I receive the error in the subject field. I've also tried the readlines() function and a couple of variations on the "if" statement that causes the problem. I just can't seem to get the code to find EOF.
Hmm, as I write this, I'm thinking ... do I need to addlist "EOF" to the array and just look for that? Is that the best solution, to find a custom EOF?
My code snippet goes something like:
### variables defined outside of scapy PacketHandler ##
x = 0
B = 0
##########
with open('dict.txt') as f:
lines = list(f)
global x
global B
B = B + int(sys.argv[3])
while x <= B:
while y <= int(sys.argv[2]):
if lines[x] != "":
#...do stuff...
# Scapy send packet Dot11Elt(ID="SSID",info"%s" % (lines[x].strip())
# ....more code...
x = x 1
Let’s say you need to read X lines at a time, put it in a list and process it:
with open('dict.txt') as f:
enoughLines = True
while enoughLines:
lines = []
for i in range(X):
l = f.readline()
if l != '':
lines.append( l )
else:
enoughLines = False
break
if enoughLines:
#Do what has to be done with the list “lines”
else:
break
#Do what needs to be done with the list “lines” that has less than X lines in it
Try a for in loop. You have created your list, now iterate through it.
with open('dict.txt') as f:
lines = list(f)
for item in lines: #each item here is an item in the list you created
print(item)
this way you go through each line of your text file and don't have to worry about where it ends.
edit:
you can do this as well!
with open('dict.txt') as f:
for row in f:
print(row)
The following function will return a generator that returns the next n lines in a file:
def iter_n(obj, n):
iterator = iter(obj)
while True:
result = []
try:
while len(result) < n:
result.append(next(iterator))
except StopIteration:
if len(result) == 0:
raise
yield result
Here is how you can use it:
>>> with open('test.txt') as f:
... for three_lines in iter_n(f, 3):
... print three_lines
...
['first line\n', 'second line\n', 'third line\n']
['fourth line\n', 'fifth line\n', 'sixth line\n']
['seventh line\n']
Contents of test.txt:
first line
second line
third line
fourth line
fifth line
sixth line
seventh line
Note that, because the file does not have a multiple of 3 lines, the last value returned is not 3 lines, but just the rest of the file.
Because this solution uses a generator, it doesn't require that the full file be read into memory (into a list), but iterates over it as needed.
In fact, the above function can iterate over any iterable object, like lists, strings, etc:
>>> for three_numbers in iter_n([1, 2, 3, 4, 5, 6, 7], 3):
... print three_numbers
...
[1, 2, 3]
[4, 5, 6]
[7]
>>> for three_chars in iter_n("1234567", 3):
... print three_chars
...
['1', '2', '3']
['4', '5', '6']
['7']
If you want to get n lines in a list use itertools.islice yielding each list:
from itertools import islice
def yield_lists(f,n):
with open(f) as f:
for sli in iter(lambda : list(islice(f,n)),[]):
yield sli
If you want to use loops, you don't need a while loop at all, you can use an inner loop in range n-1 calling next on the file object with a default value of an empty string, if we get an empty string break the loop if not just append and again yield each list:
def yield_lists(f,n):
with open(f) as f:
for line in f:
temp = [line]
for i in range(n-1):
line = next(f,"")
if not line:
break
temp.append(line)
yield temp

How to rearrange numbers from different lines of a text file in python?

So I have a text file consisting of one column, each column consist two numbers
190..255
337..2799
2801..3733
3734..5020
5234..5530
5683..6459
8238..9191
9306..9893
I would like to discard the very 1st and the very last number, in this case, 190 and 9893.
and basically moves the rest of the numbers one spot forward. like this
My desired output
255..337
2799..2801
3733..3734
5020..5234
5530..5683
6459..8238
9191..9306
I hope that makes sense I'm not sure how to approach this
lines = """190..255
337..2799
2801..3733"""
values = [int(v) for line in lines.split() for v in line.split('..')]
# values = [190, 255, 337, 2799, 2801, 3733]
pairs = zip(values[1:-1:2], values[2:-1:2])
# pairs = [(255, 337), (2799, 2801)]
out = '\n'.join('%d..%d' % pair for pair in pairs)
# out = "255..337\n2799..2801"
Try this:
with open(filename, 'r') as f:
lines = f.readlines()
numbers = []
for row in lines:
numbers.extend(row.split('..'))
numbers = numbers[1:len(numbers)-1]
newLines = ['..'.join(numbers[idx:idx+2]) for idx in xrange(0, len(numbers), 2]
with open(filename, 'w') as f:
for line in newLines:
f.write(line)
f.write('\n')
Try this:
Read all of them into one list, split each line into two numbers, so you have one list of all your numbers.
Remove the first and last item from your list
Write out your list, two items at a time, with dots in between them.
Here's an example:
a = """190..255
337..2799
2801..3733
3734..5020
5234..5530
5683..6459
8238..9191
9306..9893"""
a_list = a.replace('..','\n').split()
b_list = a_list[1:-1]
b = ''
for i in range(len(a_list)/2):
b += '..'.join(b_list[2*i:2*i+2]) + '\n'
temp = []
with open('temp.txt') as ofile:
for x in ofile:
temp.append(x.rstrip("\n"))
for x in range(0, len(temp) - 1):
print temp[x].split("..")[1] +".."+ temp[x+1].split("..")[0]
x += 1
Maybe this will help:
def makeColumns(listOfNumbers):
n = int()
while n < len(listOfNumbers):
print(listOfNumbers[n], '..', listOfNumbers[(n+1)])
n += 2
def trim(listOfNumbers):
listOfNumbers.pop(0)
listOfNumbers.pop((len(listOfNumbers) - 1))
listOfNumbers = [190, 255, 337, 2799, 2801, 3733, 3734, 5020, 5234, 5530, 5683, 6459, 8238, 9191, 9306, 9893]
makeColumns(listOfNumbers)
print()
trim(listOfNumbers)
makeColumns(listOfNumbers)
I think this might be useful too. I am reading data from a file name list.
data = open("list","r")
temp = []
value = []
print data
for line in data:
temp = line.split("..")
value.append(temp[0])
value.append(temp[1])
for i in range(1,(len(value)-1),2):
print value[i].strip()+".."+value[i+1]
print value
After reading the data I split and store it in the temporary list.After that, I copy data to the main list value which have all of the data.Then I iterate from the second element to second last element to get the output of interest. strip function is used in order to remove the '\n' character from the value.
You can later write these values to a file Instead of printing out.

Categories