Read links from a list from a txt file - Python - python

I have a file named links.txt which contains the following list
(List name : Set_of_links) :
[https://link1.com, https://link2.com, https://link3.com/hello, https://links4.com/index.php, . . . . ]
I'm executing the program, links_python.py which needs to read each link from that file and store it in a local variable in the python script. I'm using the following program :
i = 0
with open(links.txt, "r") as f:
f.read(set_of_links[i])
i+=1
Seems to be not working.

If you have only 1 line of links, throw away the brackets and spaces and try
links = []
with open('links.txt')) as f:
links = f.read().split(',')

Try the following : Thanks to #Jean for the edit
i = 0
with open(links.txt, "r") as f:
set_of_links[i] = f.readline()
i+=1

If you want to separate each link and append it to set_of_links you can use re to substitute these characters [], then create the list by splitting. Using list comprehensions it should look like:
import re
with open('links.txt', 'r') as f:
set_of_links = [re.sub(r'[(\[\],)]', '', x) for x in f.read().split()]
print set_of_links
output:
['https://link1.com', 'https://link2.com', 'https://link3.com/hello', 'https://links4.com/index.php']

Related

Python list iterate not working as expected

I have a file called list.txt:
['d1','d2','d3']
I want to loop through all the items in the list. Here is the code:
deviceList = open("list.txt", "r")
deviceList = deviceList.read()
for i in deviceList:
print(i)
Here the issue is that, when I run the code, it will split all the characters:
% python3 run.py
[
'
d
1
'
,
'
d
2
'
,
'
d
3
'
]
It's like all the items have been considered as 1 string? I think needs to be parsed? Please let me know what am I missing..
Simply because you do not have a list, you are reading a pure text...
I suggest writing the list without the [] so you can use the split() function.
Write the file like this: d1;d2;d3
and use this script to obtain a list
f = open("filename", 'r')
line = f.readlines()
f.close()
list = line.split(";")
if you need the [] in the file, simply add a strip() function like this
f = open("filename", 'r')
line = f.readlines()
f.close()
strip = line.strip("[]")
list = strip.split(";")
should work the same
This isn't the cleanest solution, but it will do if your .txt file is always just in the "[x,y,z]" format.
deviceList = open("list.txt", "r")
deviceList = deviceList[1:-1]
deviceList = deviceList.split(",")
for i in deviceList:
print(i)
This takes your string, strips the "[" and "]", and then separates the entire string between the commas and turns that into a list. As other users have suggested, there are probably better ways to store this list than a text file as it is, but this solution will do exactly what you are asking. Hope this helps!

read text file in python and extract specific value in each line?

I have a text file that each line of it is as follows:
n:1 mse_avg:8.46 mse_y:12.69 mse_u:0.00 mse_v:0.00 psnr_avg:38.86 psnr_y:37.10 psnr_u:inf psnr_v:inf
n:2 mse_avg:12.20 mse_y:18.30 mse_u:0.00 mse_v:0.00 psnr_avg:37.27 psnr_y:35.51 psnr_u:inf psnr_v:inf
I need to read each line extract psnr_y and its value in a matrix. does python have any other functions for reading a text file? I need to extract psnr_y from each line. I have a matlab code for this, but I need a python code and I am not familiar with functions in python. could you please help me with this issue?
this is the matlab code:
opt = {'Delimiter',{':',' '}};
fid = fopen('data.txt','rt');
nmc = nnz(fgetl(fid)==':');
frewind(fid);
fmt = repmat('%s%f',1,nmc);
tmp = textscan(fid,fmt,opt{:});
fclose(fid);
fnm = [tmp{:,1:2:end}];
out = cell2struct(tmp(:,2:2:end),fnm(1,:),2)
You can use regex like below:
import re
with open('textfile.txt') as f:
a = f.readlines()
pattern = r'psnr_y:([\d.]+)'
for line in a:
print(re.search(pattern, line)[1])
This code will return only psnr_y's value. you can remove [1] and change it with [0] to get the full string like "psnr_y:37.10".
If you want to assign it into a list, the code would look like this:
import re
a_list = []
with open('textfile.txt') as f:
a = f.readlines()
pattern = r'psnr_y:([\d.]+)'
for line in a:
a_list.append(re.search(pattern, line)[1])
use regular expression
r'psnr_y:([\d.]+)'
on each line read
and extract match.group(1) from the result
if needed convert to float: float(match.group(1))
Since I hate regex, I would suggest:
s = 'n:1 mse_avg:8.46 mse_y:12.69 mse_u:0.00 mse_v:0.00 psnr_avg:38.86 psnr_y:37.10 psnr_u:inf psnr_v:inf \nn:2 mse_avg:12.20 mse_y:18.30 mse_u:0.00 mse_v:0.00 psnr_avg:37.27 psnr_y:35.51 psnr_u:inf psnr_v:inf'
lst = s.split('\n')
out = []
for line in lst:
psnr_y_pos = line.index('psnr_y:')
next_key = line[psnr_y_pos:].index(' ')
psnr_y = line[psnr_y_pos+7:psnr_y_pos+next_key]
out.append(psnr_y)
print(out)
out is a list of the values of psnr_y in each line.
For a simple answer with no need to import additional modules, you could try:
rows = []
with open("my_file", "r") as f:
for row in f.readlines():
value_pairs = row.strip().split(" ")
print(value_pairs)
values = {pair.split(":")[0]: pair.split(":")[1] for pair in value_pairs}
print(values["psnr_y"])
rows.append(values)
print(rows)
This gives you a list of dictionaries (basically JSON structure but with python objects).
This probably won't be the fastest solution but the structure is nice and you don't have to use regex
import fileinput
import re
for line in fileinput.input():
row = dict([s.split(':') for s in re.findall('[\S]+:[\S]+', line)])
print(row['psnr_y'])
To verify,
python script_name.py < /path/to/your/dataset.txt

Python stripping words upon specific condition in a list of sentences

My starting file was .txt one, that looked like:
https://www.website.com/something1/id=39494 notes !!!! other notes
https://www.website2.com/something1/id=596774 ... notes2 !! other notes2
and so on.. so very messy
to clean it up I did:
import re
with open('file.txt', 'r') as filehandle:
places = [current_place.rstrip() for current_place in filehandle.readlines()]
filtered = [x for x in places if x.strip()]
This gave me a list of websites (without spaces in between) but still with notes in the same string.
My goal is the first have a list of "cleaned" websites without any notes afterwords:
https://www.website.com/something1/id=39494
https://www.website2.com/something1/id=596774
For that I thought to target the space after the end of website and get rid of all the words afterwords:
for s in filtered:
f = re.search('\s')
This returns an error, but even if it worked it wouldn't return what I thought.
The second step is to strip the website of some characters and compose it like: https://www.website.com/embed/id=39494
but this would come later.
I just wonder how can I achieve the first step and get rid of the notes after the website and have a clean list.
If each line consists of a URL followed by a space and any other text, you can simply split by the space and take the first element of each line:
urls = []
with open('file.txt') as filehandle:
for line in filehandle:
if not line.strip(): continue # skip empty lines
urls.append(line.split(" ")[0])
# now the variable `urls` should contain all the URLs you are looking for
EDIT: second step
for url in urls:
print('<iframe src="{}"></iframe>'.format(url))
You can use this:
# to read the lines
with open('file.txt', 'r') as f:
strlist = f.readlines()
# list to store the URLs
webs = []
for x in strlist:
webs.append(x.split(' ')[0])
print(webs)
In case if the URL position is not always at the beginning of the line. You can try
https?:\/\/www\.\w+\.com\/\w+\/id=(\d+)
then you can use back reference to get the URL and id.
Code example
with open('file.txt') as file:
for line in file:
m = re.match(r'https?:\/\/www\.\w+\.com\/\w+\/id=(\d+)', line)
if m:
print("URL=%s" % m.group(0))
print("ID=%d" % int(m.group(1)))
Demo

Python compare two files using a list

Im trying to compare two files via regex strings and print the output. I seem to have an issue with my loop as only the last line gets printed out. What am I missing ?
import re
delist = [r'"age":.*",',r'"average":.*",',r'"class":.*",']
with open('test1.txt', 'r') as bolo:
boloman = bolo.read()
for dabo in delist:
venga = re.findall(dabo, boloman)
for vaga in venga:
with open ('test.txt', 'r' ) as f:
content = f.read()
venga2 = re.findall(dabo, content)
for vaga2 in venga2:
mboa = content.replace(vaga2,vaga,1)
print (mboa)
At first, a problem I see is that you are always setting mboa with the only result. I think what you really want to do is to create a list and append it to that list.
import re
mboa = []
delist = [r'"age":.*",',r'"average":.*",',r'"class":.*",']
with open('test1.txt', 'r') as bolo:
boloman = bolo.read()
for dabo in delist:
venga = re.findall(dabo, boloman)
for vaga in venga:
with open ('test.txt', 'r' ) as f:
content = f.read()
venga2 = re.findall(dabo, content)
for vaga2 in venga2:
mboa.append(content.replace(vaga2,vaga,1))
print (mboa)
does that solve the issue? if it doesn't add a comment to this question and I'll try to fix it out ;)

P4 Python error: file(s) not on client

I am using the p4 Python module to try and open several files for edit. Basically I have a list of files that I am grabbing from a txt document. I then do some formatting to each item in the list and append those items to an empty list. My method is not the most efficient but I am just trying to get this to work before optimizing.
edit_files = []
with open('C:\\Users\\rgriffin\Desktop\\replace.txt', 'r' )as f:
for line in f:
l = line.partition(',')[0]
e = l.replace('#(', '')
r = e.replace('U:\\', '//Stream/main/')
q= r.replace('\\', '/')
edit_files.append(q)
f.close
for i in edit_files:
p4.run("edit" , i)
With this code I get an error:
[Warning]: '"//Stream/main/Data/anims/stuff/char/Emotion_Angry.hkx" - file(s) not on client.'
If I change the last line to this...
p4.run("edit" , "//Stream/main/Data/anims/stuff/char/Emotion_Angry.hkx")
The file is checked out as expected. I did a type check and i is a string.
Input data:
#("U:\Data\anims\stuff\char\Emotion_Angry_Partial.hkx", "u:\Assets\Actors\stuff\char\Animation\char_Idle.max")
In the following command, there are quote characters at ends. Do remove them. Also seems like there are empty lines.
Change
for i in edit_files:
p4.run("edit" , i)
to
for i in edit_files:
f=i.replace('"','').strip()
if len(f)>0:
print "Opening ["+f+"]"
p4.run("edit" , f)
or One liner
[p4.run("edit" , i.replace('"','').strip()) for i in edit_files if i.strip()]
Or you may want to change your populating code itself:
Use:
with open('C:\\Users\\rgriffin\Desktop\\replace.txt', 'r' )as f:
for line in f:
l = line.partition(',')[0].replace('#(', '').replace('U:\\', '//Stream/main/').replace('\\', '/').replace('"', '').strip()
if len(l)>0:
edit_files.append(l)
f.close

Categories