Parsing txt using python sets data structure [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
how can i parse this so that I can obtain how many unique urls there are regardless of the number behind it ? using python

You can open the file and get the lines as a string using:
with open("/path/to/file.txt") as file:
lines = list(file)
This will give you a list of all lines in the text file.
Now since you do not want duplicates, I think using set would be a good way. (Set does not contain duplicates)
answer=set()
for x in lines:
answer.add(x[x.find(" ")+1:x.rfind(":")])
This will iterate through all the lines and add the part after the space till and not including the : to the set, which will handle the case for duplicates. Now answer should contain all the unique urls
Tested for Python3.6

You can use regex to parse and extract uids from your file line per line.
import re
uids = set()
with open('...') as f:
for line in f:
m = re.match('$[a-z0-9]+', line)
if m:
uids.add(m.group(0))
print(len(uids))

import re
A, List = ("String_1 URL_1:10\nString_2 URL_2:20\nString_3 URL_1:30".replace(" ", ",")).split("\n"), []
for x in range(len(A)):
Result = re.search(",(.*):", A[x])
if Result.group(1) not in List:
List.append(Result.group(1))
print(len(List))
This should solve your problem.

Related

Python nested list from file [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
Let's say I have the following .txt file:
"StringA1","StringA2","StringA3"
"StringB1","StringB2","StringB3"
"StringC1","StringC2","StringC3"
And I want a nested list in the format:
nestedList = [["StringA1","StringA2","StringA3"],["StringB1","StringB2","StringB2"],["StringC1","StringC2","StringC3"]]
so I can access StringB2 for example like this:
nestedList[1][1]
What would be the best approach? I do not have a tremendous amount of data, maybe 100 lines at max, so I don't need a database or something
You can this sample code:
with open('file.txt') as f:
nestedList = [line.split(',') for line in f.readlines()]
print(nestedList[1][1])
file = open('a.txt').read()
file
l=[]
res = file.split('\n')
for i in range(len(res)):
l.append(res[i].split(','))
print(l[1][1])
Assuming your file name as a.txt is having data in same format as you specified in question, i.e newline seperated so that we can add nested list, and data inside is ,(comma) seperated. Above code will give you the right output.

How to read values from a file and set dynamically into an array in Python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Im new to python and need to know how to achieve the below requirement
I have a file with values populated as lines separated
for eg.
ABC
DEF
GHI
I want to read these values from a file and need to set them in the below format in python, basically into a dictionary.
{"Keys":[{"common_key":"ABC"},{"common_key":"DEF"},{"common_key":"GHI"}]}
Dictionary should contain only one Key and its value is an array with set of jsons with a common key assigned with different values each which are read from the file.
This way works for me:
with open('filename.txt') as fin :
lines = [i.strip() for i in fin.readlines() if len(i) > 1]
common_dict = { 'Keys' : [ {'common_key' : i} for i in lines] }
The best way to do this would probably be a simple list comprehension. If you need to do any separation logic in the lines str.split() is your friend.
with open("filename.txt") as txtfile:
yourDesiredDict = {"Keys": [{"common_key": x} for x in txtfile.readlines()]}

Converting a line-separated file into a list [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
So I have a file that look something like this:
Apples
Red
Round
Banana
Yellow
Long
I want to create a list where each 'group' is in a separate list like this:
[[Apples, Red, Round], [Banana, Yellow, Long]
I am completely stumped on how I should proceed.
We'll split on \n\n (two newlines) to separate the groups, then use regular str.split to divide those into items
with open('yourfile.txt') as f:
l = list(map(str.split, f.read().split('\n\n')))
You can see a similar example (without the file I/O) running at this repl.it
I'd split the content once by a double newline in order to get each group separately, and then split each group individually:
with f = open('file.txt'):
content = f.read()
result = [group.split() for group in content.split('\n\n')]
Short code is great but what if you have to maintain it 2 years later? ;-)
So here's my more human readable version which works like the existing versions:
file = open('file.txt')
text = file.read() # read everything into a string
file.close()
groups = text.split("\n\n") # create the groups
result = [ group.split("\n") for group in groups ] # split each group again at \n
This is two lines longer but I prefer this way because I am able to read and understand that easier than the other versions. However it's your decision. I hope I could help you.

Remove double and single square brackets from text file generated from python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have this text file which was generated from python. Text File. I noticed that the data contained in two of the columns are enclosed withing double and single brackets respectively. These two column values were saved as string as were the other columns.
How can I remove these brackets when I import the text file into python.
Thank you,
If you just want to remove all instances of square brackets from a string, you can do the following:
s = "[[ hello] [there]]"
s = s.replace("[", "")
s = s.replace("]", "")
UPDATE:
If you want the code to import the file contents, and make the changes:
with open('/path/to/my_file.txt', 'r') as my_file:
text = my_file.read()
text = text.replace("[", "")
text = text.replace("]", "")
# If you wish to save the updates back into a cleaned up file
with open('/path/to/my_file_clean.txt', 'w') as my_file:
my_file.write(text)
You can use a regular expression:
import re
pattern = re.compile(r'\d+\.\d+')
with open('someFile.txt', 'r') as f:
for line in f:
float_list = " ".join(re.findall(pattern, line))
This will only find the parts of the string that match the format number <period> number.

how to remove all staff by using regular Expression [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
Here's a copy of some of the lines (solution, pos and gloss) in my txt file:
solution: (كَتَبَ kataba) [katab-u_1]
pos: katab/VERB_PERFECT+a/PVSUFF_SUBJ:3MS
gloss: ___ + write + he/it <verb>
I would like to return the word 'katab' that inside the square brackets in first line and remove all staff and lines and number every things. I'm working on python 2.7
I tried to write this code:
pattern = re.compile("'(?P[^']+)':\s*(?P<root>[^,]*)\d+")
Whenever you think "I need to match a pattern", you should think "Regular Expressions" as a good starting point. See doco. It is a little trickier since the input file is unicode.
import re
import codecs
with codecs.open("test.unicode.txt","rb", "utf-8") as f:
words = []
for line in f.readlines():
matches = re.match(b"solution:.+\[(?P<word>\w+).*\]", line, flags=re.U)
if matches:
words.append(matches.groups()[0])
print(words)
Output:
[u'katab']

Categories