Dijkstra python library passing value to the add_edge in Graph - python

I am trying to use the in-build Dijkstar library from Python and have query on passing add_edge values. Please help.
from dijkstar import find_path, Graph
graph = Graph()
input_file = input('Input the file name')
w = list()
i = 0
with open(input_file, 'r') as file:
for line in file:
for word in line.split():
w.append(word)
graph.add_edge(w[0], w[1], w[2])
print(w[0], w[1], w[2])
i = 0
w.clear()
print(find_path(graph, 1, 4))
The input file is following and it's working fine for w[0], w[1] and w[2]
1 2 1000
2 3 2000
3 4 3000
1 4 4000
The output is showing error as follows:
raise NoPathError('Could not find a path from {0} to {1}'.format(s, d))
dijkstar.algorithm.NoPathError: Could not find a path from 1 to 4
There is a path from 1 to 4 in two ways, then why it shows the error, not able to understand.
It would be great if any help I can get.

Believe issue was you where not converting input to numbers (i.e. weights were still strings).
Try the following.
Code
from dijkstar import find_path, Graph
input_file = input('Input the file name: ')
with open(input_file, 'r') as file:
graph = Graph() # place closer to where first used
for line in file:
line = line.rstrip() # remove trailing '\n'
w = list(map(int, line.split())) # convert line to list of ints
graph.add_edge(w[0], w[1], w[2]) # add edge with weights
print(w[0], w[1], w[2])
print(find_path(graph, 1, 4))
Input
file.txt
1 2 1000
2 3 2000
3 4 3000
1 4 4000
Output
PathInfo(nodes=[1, 4], edges=[4000], costs=[4000], total
_cost=4000)
Comments
No need to declare w as a list or to clear it between usage
w = list() # no need
w.clear() # no need
You should almost always strip off the trailing '\n' when iterating over a file
line = line.rstrip()
This is an inefficient way of placing elements in w
for word in line.split():
w.append(word)
Simpler just to assign directly.
w = line.split()
However, w would be filled with strings, so need to map to ints.
w = list(map(int, line.split()))
Variable i is not used (or needed) so remove.
i = 0

Related

Extract the index of largest number in different lines

I am writing a code for extracting specific lines from my file and then look for the maximum number, more specifically for its position (index).
So I start my code looking for the lines:
with open (filename,'r') as f:
lines = f.readlines()
for index, line in enumerate(lines):
if 'a ' in line:
x=(lines[index])
print(x)
So here from my code I got the lines I was looking for:
a 3 4 5
a 6 3 2
Then the rest of my code is looking for the maximum between the numbers and prints the index:
y = [float(item) for item in x.split()]
z=y.index(max(y[1:3]))
print(z)
now the code finds the index of the two largest numbers (so for 5 in the first line and 6 in the second):
3
1
But I want my code compare also the numbers between the two lines (so largest number between 3,4, 5,6,3,2), to have as output the index of the line, where is in the file the line containing the largest number (for example line 300) and the position in line (1).
Can you suggest to me some possible solutions?
You can try something like that.
max_value - list, where you can get max number, line and position
max_value = [0, 0, 0] # value, line, position
with open(filename, 'r') as f:
lines = f.readlines()
for index, line in enumerate(lines):
if 'a ' in line:
# get line data with digits
line_data = line.split(' ')[1:]
# check if element digit and bigger then max value - save it
for el_index, element in enumerate(line_data):
if element.isdigit() and int(element) > max_value[0]:
max_value = [int(element), index, el_index]
print(max_value)
Input data
a 3 4 5
a 6 3 2
Output data
# 6 - is max, 1 - line, 0 - position
[6, 1, 0]
You should iterate over every single line and keep track of the line number as well as the position of the items in that line all together. Btw you should run this with python 3.9+ (because of .startswith() method.)
with open(filename) as f:
lines = [line.rstrip() for line in f]
max_ = 0
line_and_position = (0, 0)
for i, line in enumerate(lines):
if line.startswith('a '):
# building list of integers for finding the maximum
list_ = [int(i) for i in line.split()[1:]]
for item in list_:
if item > max_:
max_ = item
# setting the line number and position in that line
line_and_position = i, line.find(str(item))
print(f'maximum number {max_} is in line {line_and_position[0] + 1} at index {line_and_position[1]}')
Input :
a 3 4 5
a 6 3 2
a 1 31 4
b 2 3 2
a 7 1 8
Output:
maximum number 31 is in line 3 at index 4
You can do it like below. I commented each line for explanation. This method differs from the others in that: using regex we are getting the current number and it's character position from one source. In other words, there is no going back into the line to find data after-the-fact. Everything we need comes on every iteration of the loop. Also, all the lines are filtered as they are received. Between the 2, having a stack of conditions is eliminated. We end up with 2 loops that get directly to the point and one condition to see if the requested data needs to be updated.
import re
with open(filename, 'r') as f:
#prime data
data = (0, 0, 0)
#store every line that starts with 'a' or blank line if it doesn't
for L, ln in enumerate([ln if ln[0] is 'a' else '' for ln in f.readlines()]):
#get number and line properties
for res in [(int(m.group('n')), L, m.span()[0]) for m in re.compile(r'(?P<n>\d+)').finditer(ln)]:
#compare new number with current max
if res[0] > data[0]:
#store new properties if greater
data = res
#print final
print('Max: {}, Line: {}, Position: {}'.format(*data))

Selecting line from file by using "startswith" and "next" commands

I have a file from which I want to create a list ("timestep") from the numbers which appear after each line "ITEM: TIMESTEP" so:
timestep = [253400, 253500, .. etc]
Here is the sample of the file I have:
ITEM: TIMESTEP
253400
ITEM: NUMBER OF ATOMS
378
ITEM: BOX BOUNDS pp pp pp
-2.6943709180241954e-01 5.6240920636804063e+01
-2.8194230631882372e-01 5.8851195163321044e+01
-2.7398090193568775e-01 5.7189372326936599e+01
ITEM: ATOMS id type q x y z
16865 3 0 28.8028 1.81293 26.876
16866 2 0 27.6753 2.22199 27.8362
16867 2 0 26.8715 1.04115 28.4178
16868 2 0 25.7503 1.42602 29.4002
16869 2 0 24.8716 0.25569 29.8897
16870 3 0 23.7129 0.593415 30.8357
16871 3 0 11.9253 -0.270359 31.7252
ITEM: TIMESTEP
253500
ITEM: NUMBER OF ATOMS
378
ITEM: BOX BOUNDS pp pp pp
-2.6943709180241954e-01 5.6240920636804063e+01
-2.8194230631882372e-01 5.8851195163321044e+01
-2.7398090193568775e-01 5.7189372326936599e+01
ITEM: ATOMS id type q x y z
16865 3 0 28.8028 1.81293 26.876
16866 2 0 27.6753 2.22199 27.8362
16867 2 0 26.8715 1.04115 28.4178
16868 2 0 25.7503 1.42602 29.4002
16869 2 0 24.8716 0.25569 29.8897
16870 3 0 23.7129 0.593415 30.8357
16871 3 0 11.9253 -0.270359 31.7252
To do this I tried to use "startswith" and "next" commands at once and it didn't work. Is there other way to do it? I send also the code I'm trying to use for that:
timestep = []
with open(file, 'r') as f:
lines = f.readlines()
for line in lines:
line = line.split()
if line[0].startswith("ITEM: TIMESTEP"):
timestep.append(next(line))
print(timestep)
The logic is to decide whether to append the current line to timestep or not. So, what you need is a variable which tells you append the current line when that variable is TRUE.
timestep = []
append_to_list = False # decision variable
with open(file, 'r') as f:
lines = f.readlines()
for line in lines:
line = line.strip() # remove "\n" from line
if line.startswith("ITEM"):
# Update add_to_list
if line == 'ITEM: TIMESTEP':
append_to_list = True
else:
append_to_list = False
else:
# append to list if line doesn't start with "ITEM" and append_to_list is TRUE
if append_to_list:
timestep.append(line)
print(timestep)
output:
['253400', '253500']
First - I don't like this, because it doesn't scale. You can only get the first immediately following line nicely, anything else will be just ugh...
But you asked, so ... for x in lines will create an iterator over lines and use that to keep the position. You don't have access to that iterator, so next will not be the next element you're expecting. But you can make your own iterator and use that:
lines_iter = iter(lines)
for line in lines_iter:
# whatever was here
timestep.append(next(line_iter))
However, if you ever want to scale it... for is not a good way to iterate over a file like this. You want to know what is in the next/previous line. I would suggest using while:
timestep = []
with open('example.txt', 'r') as f:
lines = f.readlines()
i = 0
while i < len(lines):
if line[i].startswith("ITEM: TIMESTEP"):
i += 1
while not line[i].startswith("ITEM: "):
timestep.append(next(line))
i += 1
else:
i += 1
This way you can extend it for different types of ITEMS of variable length.
So the problem with your code is subtle. You have a list lines which you iterate over, but you can't call next on a list.
Instead, turn it into an explicit iterator and you should be fine
timestep = []
with open(file, 'r') as f:
lines = f.readlines()
lines_iter = iter(lines)
for line in lines_iter:
line = line.strip() # removes the newline
if line.startswith("ITEM: TIMESTEP"):
timestep.append(next(lines_iter, None)) # the second argument here prevents errors
# when ITEM: TIMESTEP appears as the
# last line in the file
print(timestep)
I'm also not sure why you included line.split, which seems to be incorrect (in any case line.split()[0].startswith('ITEM: TIMESTEP') can never be true, since the split will separate ITEM: and TIMESTEP into separate elements of the resulting list.)
For a more robust answer, consider grouping your data based on when the line begins with ITEM.
def process_file(f):
ITEM_MARKER = 'ITEM: '
item_title = '(none)'
values = []
for line in f:
if line.startswith(ITEM_MARKER):
if values:
yield (item_title, values)
item_title = line[len(ITEM_MARKER):].strip() # strip off the marker
values = []
else:
values.append(line.strip())
if values:
yield (item_title, values)
This will let you pass in the whole file and will lazily produce a set of values for each ITEM: <whatever> group. Then you can aggregate in some reasonable way.
with open(file, 'r') as f:
groups = process_file(f)
aggregations = {}
for name, values in groups:
aggregations.setdefault(name, []).extend(values)
print(aggregations['TIMESTEP']) # this is what you want
You can use enumerate to help with index referencing. We can check to see if the string ITEM: TIMESTEP is in the previous line then add the integer to our timestep list.
timestep = []
with open('example.txt', 'r') as f:
lines = f.readlines()
for i, line in enumerate(lines):
if "ITEM: TIMESTEP" in lines[i-1]:
timestep.append(int(line.strip()))
print(timestep)

How can I merge each two lines of a large text file into a Python list?

I have a .txt file that is split into multiple lines, but each two of these lines I would like to merge into a single line of a list. How do I do that?
Thanks a lot!
What I have is organized like this:
[1 2 3 4
5 6]
[1 2 3 4
5 6 ]
while what I need would be:
[1 2 3 4 5 6]
[1 2 3 4 5 6]
data =[]
with open(r'<add file path here >','r') as file:
x = file.readlines()
for i in range(0,len(x),2):
data.append(x[i:i+2])
new =[' '.join(i) for i in data]
for i in range(len(new)):
new[i]=new[i].replace('\n','')
new_file_name = r'' #give new file path here
with open(new_file_name,'w+') as file:
for i in new:
file.write(i+'\n')
Try This
final_data = []
with open('file.txt') as a:
fdata= a.readlines()
for ln in range(0,len(fdata),2):
final_data.append(" ".join([fdata[ln].strip('\n'), fdata[ln+1].strip('\n')]))
print (final_data)
I feel you can use a regex for solving this scenario :
#! /usr/bin/env python2.7
import re
with open("textfilename.txt") as r:
text_data = r.read()
independent_lists = re.findall(r"\[(.+?)\]",r ,re.DOTALL)
#now that we have got each independent_list we can next work on
#turning it into a list
final_list_of_objects = [each_string.replace("\n"," ").split() for each_string in independent_lists]
print final_list_of_objects
However if you do not want them to be as a list object and rather just want the outcome without the newline characters inbetween the list then:
#! /usr/bin/env python2.7
import re
with open("textfilename.txt") as r:
text_data = r.read()
new_txt = ""
for each_char in text_data:
if each_char == "[":
bool_char = True
elif each_char == "]":
bool_char = False
elif each_char == "\n" and bool_char:
each_char = " "
new_txt += each_char
new_txt = re.sub(r"\s+", " ", new_txt) # to remove multiple space lines between numbers
You can do two things here:
1) If the text file was created by writing using numpy's savetxt function, you can simply use numpy.loadtxt function with appropriate delimiter.
2) Read file in a string and use a combination of replace and split functions.
file = open(filename,'r')
dataset = file.read()
dataset = dataset.replace('\n',' ').replace('] ',']\n').split('\n')
dataset = [x.replace('[','').replace(']','').split(' ') for x in dataset]
with open('test.txt') as file:
new_data = (" ".join(line.strip() for line in file).replace('] ',']\n').split('\n')) # ['[1 2 3 4 5 6]', ' [1 2 3 4 5 6 ]']
with open('test.txt','w+') as file:
for data in new_data:
file.write(data+'\n')
line.rstrip() removes just the trailing newline('\n') from the line.
you need to pass all read and stripped lines to ' '.join(), not
each line itself. Strings in python are sequences to, so the string
contained in line is interpreted as separate characters when passed on
it's own to ' '.join().

Possible to do this more efficiently (turn compact file to sparse)

I have to read in a file line by line that has indices of where a vector has 1's
so for example:
1 3 9 10
means:
0,1,0,1,0,0,0,0,0,1,1
My goal is to write program that will take each line and print out the full vector with the 0's.
I am able to do this with my current program for a few lines:
#create a sparse vector
list_line_sparse = [0] * int(num_features)
#loop over all the lines
for item in lines:
#split the line on spaces
zz = item.split(' ')
#get all ints on a line
d = [int(x.strip()) for x in zz]
#loop over all ints and change index to 1 in sparse vector
for i in d:
list_line_sparse[i]=1
out_file += (', '.join(str(item) for item in list_line_sparse))
#change back to 0's
for i in d:
list_line_sparse[i]=0
out_file +='\n'
f = open('outfile', 'w')
f.write(out_file)
f.close()
The problem is that for a file with a lot of features and lines, my program is very very inefficient - it basically never finishes. Is there anything sticking out that I should change to make it more efficent? (I.e. the 2 for loops)
It would probably be more efficient to write each line of data to your output file as it is generated, rather than building up a huge string in memory.
numpy is a popular Python module that's good for doing bulk operations on numbers. If you start with:
import numpy as np
list_line_sparse = np.zeros(num_features, dtype=np.uint8)
Then, given d as the list of numbers on the current line, you can simply do:
list_line_sparse[d] = 1
to set ALL of those indexes in the array at the same time, no loop required. (At the Python level at least, obviously there's still a loop involved, but it's down in the C implementation of numpy).
It is slowing down because you are doing string concatenation. It is better to work with lists.
Also you could use csv to read your space separated lines in, and to then write each row with commas automatically added:
import csv
num_features = 20
with open('input.txt', 'r', newline='') as f_input, open('output.txt', 'w', newline='') as f_output:
csv_input = csv.reader(f_input, delimiter=' ')
csv_output = csv.writer(f_output)
for row in csv_input:
list_line_sparse = [0] * int(num_features)
for v in map(int, row):
list_line_sparse[v] = 1
csv_output.writerow(list_line_sparse)
So if input.txt contained the following:
1 3 9 10
1 3 9 11
2 7 3 5
Giving you an output.txt containing:
0,1,0,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0
0,1,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0
0,0,1,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0
Too much loops: first, the item.split(), then the for x in zz, then for i in d, then for item in list_line_sparse, and then for i in d again. Strings concatenations could be your most expensive part: the .join and the output +=. And all this for every line.
You could try a "character by character" parsing and writing. Something like this:
#features per line
count = int(num_features)
f = open('outfile.txt', 'w')
#loop over all lines
for item in lines:
#reset the feature
i = 0
#the characters buffer
index = ""
#parse character by character
for character in item:
#if a space or end of line is found,
#and the characters buffer (index) is not empty
if character in (" ", "\r", "\n"):
if index:
#parse the characters buffer
index = int(index)
#if is not the first feature
if i > 0:
#add the separator
f.write(", ")
#add 0's until index
while i < index:
f.write("0, ")
i += 1
#and write 1
f.write("1")
i += 1
#reset the characters buffer
index = ""
#if is not a space or end on line
else:
#add the character to the buffer
index += character
#if the last line didn't end with a carriage return,
#index could be waiting to be parsed
if index:
index = int(index)
if i > 0:
f.write(", ")
while i < index:
f.write("0, ")
i += 1
f.write("1")
i += 1
index = ""
#fill with 0's
while i < count:
if i == 0:
f.write("0")
else:
f.write(", 0")
i += 1
f.write("\n")
f.close()
Let's rework your code into a simpler package that takes better advantage of Python's features:
import sys
NUM_FEATURES = 12
with open(sys.argv[1]) as source, open(sys.argv[2], 'w') as sink:
for line in source:
list_line_sparse = [0] * NUM_FEATURES
indicies = map(int, line.rstrip().split())
for index in indicies:
list_line_sparse[index] = 1
print(*list_line_sparse, file=sink, sep=',')
I revisited this problem with your "more efficiently" in mind. Although the above is more memory efficient, it is a hair slower time-wise. I reconsidered your original and came up with a solution that is less memory efficient but about 2x faster than your code:
import sys
NUM_FEATURES = 12
data = ''
with open(sys.argv[1]) as source:
for line in source:
list_line_sparse = ["0"] * NUM_FEATURES
indicies = map(int, line.rstrip().split())
for index in indicies:
list_line_sparse[index] = "1"
data += ",".join(list_line_sparse) + '\n'
with open(sys.argv[2], 'w') as sink:
sink.write(data)
Like your original solution, it stores all the data in memory and writes it out at the end which is both a disadvantage (memory-wise) and an advantage (time-wise.)
input.txt
1 3 9 10
1 3 9 11
2 7 3 5
USAGE
% python3 test.py input.txt output.txt
output.txt
0,1,0,1,0,0,0,0,0,1,1,0
0,1,0,1,0,0,0,0,0,1,0,1
0,0,1,1,0,1,0,1,0,0,0,0

python: how to count number in one file?

I need to write a Python program to read the values in a file, one per line, such as file: test.txt
1
2
3
4
5
6
7
8
9
10
Denoting these as j1, j2, j3, ... jn,
I need to sum the differences of consecutive values:
a=(j2-j1)+(j3-j2)+...+(jn-j[n-1])
I have example source code
a=0
for(j=2;j<=n;j++){
a=a+(j-(j-1))
}
print a
and the output is
9
If I understand correctly, the following equation;
a = (j2-j1) + (j3-j2) + ... + (jn-(jn-1))
As you iterate over the file, it will subtract the value in the previous line from the value in the current line and then add all those differences.
a = 0
with open("test.txt", "r") as f:
previous = next(f).strip()
for line in f:
line = line.strip()
if not line: continue
a = a + (int(line) - int(previous))
previous = line
print(a)
Solution (Python 3)
res = 0
with open("test.txt","r") as fp:
lines = list(map(int,fp.readlines()))
for i in range(1,len(lines)):
res += lines[i]-lines[i-1]
print(res)
Output: 9
test.text contains:
1
2
3
4
5
6
7
8
9
10
I'm not even sure if I understand the question, but here's my best attempt at solving what I think is your problem:
To read values from a file, use "with open()" in read mode ('r'):
with open('test.txt', 'r') as f:
-your code here-
"as f" means that "f" will now represent your file if you use it anywhere in that block
So, to read all the lines and store them into a list, do this:
all_lines = f.readlines()
You can now do whatever you want with the data.
If you look at the function you're trying to solve, a=(j2-j1)+(j3-j2)+...+(jn-(jn-1)), you'll notice that many of the values cancel out, e.g. (j2-j1)+(j3-j2) = j3-j1. Thus, the entire function boils down to jn-j1, so all you need is the first and last number.
Edit: That being said, please try and search this forum first before asking any questions. As someone who's been in your shoes before, I decided to help you out, but you should learn to reference other people's questions that are identical to your own.
The correct answer is 9 :
with open("data.txt") as f:
# set prev to first number in the file
prev = int(next(f))
sm = 0
# iterate over the remaining numbers
for j in f:
j = int(j)
sm += j - prev
# update prev
prev = j
print(sm)
Or using itertools.tee and zip:
from itertools import tee
with open("data.txt") as f:
a,b = tee(f)
next(b)
print(sum(int(j) - int(i) for i,j in zip(a, b)))

Categories