Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
So I have a file that look something like this:
Apples
Red
Round
Banana
Yellow
Long
I want to create a list where each 'group' is in a separate list like this:
[[Apples, Red, Round], [Banana, Yellow, Long]
I am completely stumped on how I should proceed.
We'll split on \n\n (two newlines) to separate the groups, then use regular str.split to divide those into items
with open('yourfile.txt') as f:
l = list(map(str.split, f.read().split('\n\n')))
You can see a similar example (without the file I/O) running at this repl.it
I'd split the content once by a double newline in order to get each group separately, and then split each group individually:
with f = open('file.txt'):
content = f.read()
result = [group.split() for group in content.split('\n\n')]
Short code is great but what if you have to maintain it 2 years later? ;-)
So here's my more human readable version which works like the existing versions:
file = open('file.txt')
text = file.read() # read everything into a string
file.close()
groups = text.split("\n\n") # create the groups
result = [ group.split("\n") for group in groups ] # split each group again at \n
This is two lines longer but I prefer this way because I am able to read and understand that easier than the other versions. However it's your decision. I hope I could help you.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Im new to python and need to know how to achieve the below requirement
I have a file with values populated as lines separated
for eg.
ABC
DEF
GHI
I want to read these values from a file and need to set them in the below format in python, basically into a dictionary.
{"Keys":[{"common_key":"ABC"},{"common_key":"DEF"},{"common_key":"GHI"}]}
Dictionary should contain only one Key and its value is an array with set of jsons with a common key assigned with different values each which are read from the file.
This way works for me:
with open('filename.txt') as fin :
lines = [i.strip() for i in fin.readlines() if len(i) > 1]
common_dict = { 'Keys' : [ {'common_key' : i} for i in lines] }
The best way to do this would probably be a simple list comprehension. If you need to do any separation logic in the lines str.split() is your friend.
with open("filename.txt") as txtfile:
yourDesiredDict = {"Keys": [{"common_key": x} for x in txtfile.readlines()]}
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
how can i parse this so that I can obtain how many unique urls there are regardless of the number behind it ? using python
You can open the file and get the lines as a string using:
with open("/path/to/file.txt") as file:
lines = list(file)
This will give you a list of all lines in the text file.
Now since you do not want duplicates, I think using set would be a good way. (Set does not contain duplicates)
answer=set()
for x in lines:
answer.add(x[x.find(" ")+1:x.rfind(":")])
This will iterate through all the lines and add the part after the space till and not including the : to the set, which will handle the case for duplicates. Now answer should contain all the unique urls
Tested for Python3.6
You can use regex to parse and extract uids from your file line per line.
import re
uids = set()
with open('...') as f:
for line in f:
m = re.match('$[a-z0-9]+', line)
if m:
uids.add(m.group(0))
print(len(uids))
import re
A, List = ("String_1 URL_1:10\nString_2 URL_2:20\nString_3 URL_1:30".replace(" ", ",")).split("\n"), []
for x in range(len(A)):
Result = re.search(",(.*):", A[x])
if Result.group(1) not in List:
List.append(Result.group(1))
print(len(List))
This should solve your problem.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a content like this:
aid: "1168577519", cmt_id = 1168594403;
Now I want to get all number sequence:
1168577519
1168594403
by regex.
I have never meet regex problem, but this time I should use it to do some parse job.
Now I can just get sequence after "aid" and "cmt_id" respectively. I don't know how to merge them into one regex.
My current progress:
pattern = re.compile('(?<=aid: ").*?(?=",)')
print pattern.findall(s)
and
pattern = re.compile('(?<=cmt_id = ).*?(?=;)')
print pattern.findall(s)
There are many different approaches to designing a suitable regular expression which depend on the range of possible inputs you are likely to encounter.
The following would solve your exact question but could fail given different styled input. You need to provide more details, but this would be a start.
re_content = re.search("aid\: \"([0-9]*?)\",\W*cmt_id = ([0-9]*?);", input)
print re_content.groups()
This gives the following output:
('1168577519', '1168594403')
This example assumes that there might be other numbers in your input, and you are trying to extract just the aid and cmt_id values.
The simplest solution is to use re.findall
Example
>>> import re
>>> string = 'aid: "1168577519", cmt_id = 1168594403;'
>>> re.findall(r'\d+', string)
['1168577519', '1168594403']
>>>
\d+ matches one or more digits.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm trying to check a TSV of results and want to see how many times F and S come up (for failure or success) however I'm not sure how to doing this counting, or how to having it search the file
I've come across something that is sort of what I'm looking for: Python Program how to count number of times the letter c shows up
def countletter(S, F):
count = 0
for x in S:
if x.lower() == F.lower():
count += 1
return count
print (countletter("abcdefFf","F"))
But it isn't perfect and I'm not sure how to make it search the file.
Assuming that the count result applies to the whole file you can use a collections.Counter:
from collections import Counter
with open('input.tsv') as infile:
counts = Counter(infile.read())
for c in 'SF':
print '{}: {}'.format(c, counts.get(c))
This has the advantage of allowing you to obtain counts of any character (not just "S" and "F") with one pass of the file.
You could also just use str.count() for a specific character (or a string), but if you need counts more than one character you'll find a Counter more convenient and probably faster too.
You need to pass the file contents to your countletter function.
with open("FILE_TO_OPEN_AND_READ.txt") as f:
data = f.read()
print (countletter(data,"F"))
This opens and reads the file into data. For this example, I'm assuming your file is relatively small. Then data is passed into countletter as the first parameter, instead of a hardcoded string ("abcdefFf" in your example).
One note about your code, you are missing a closing parenthesis in your print statement. I've added that in my example above.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Say I have a line like this:
aaaaaaa, bbbbbbb, ccccccc, ddddddd
And I need to cut it to look like this:
bbbbbbb, ccccccc
So how to delete the first word and everything after third comma ?
Here's a couple of tricks you could use to achieve what you want:
Use .split(', ') to break your line into an array of words
Use the sub-array notation ([1:3] for example) to keep the second and third words
Reconstruct the array back into a line using .join, supplying any delimiter you'd like (e.g. a new comma)
For example:
', '.join("aaa, bbb, ccc, dddd, eeee".split(', ')[1:3])
def wordsTwoAndThree( csvString, sep ):
return sep.join(csvString.split(sep)[1:3])
print( wordsTwoAndThree("aaaaaaa, bbbbbbb, ccccccc, ddddddd", ',') )
Similarly to #Paedolos 's suggestion.