Similar questions have been asked but none quite like this.
I need to save 2 pieces of information in a text file, the username and their associated health integer. Now I need to be able to look into the file and see the user and then see what value is connected with it. Writing it the first time I plan to use open('text.txt', 'a') to append the new user and integers to the end of the txt file.
my main problem is this, How do I figure out which value is connected to a user string? If they're on the same line can I do something like read the only the number in that line?
What are your guys' suggestions? If none of this works, I guess I'll need to move over to json.
This may be what you're looking for. I'd suggest reading one line at a time to parse through the text file.
Another method would be to read the entire txt and separate strings using something like text_data.split("\n"), which should work if the data is separated by line (denoted by '\n').
You're probably looking for configparser which is designed for just that!
Construct a new configuration
>>> import configparser
>>> config = configparser.ConfigParser()
>>> config.sections()
[]
>>> config['Players'] = {
... "ti7": 999,
... "example": 50
... }
>>> with open('example.cfg', 'w') as fh:
... config.write(fh) # write directly to file handler
...
Now read it back
>>> import configparser
>>> config = configparser.ConfigParser()
>>> config.read("example.cfg")
['example.cfg']
>>> print(dict(config["Players"]))
{'ti7': '999', 'example': '50'}
Inspecting the written file
% cat example.cfg
[Players]
ti7 = 999
example = 50
If you already have a text config written in the form key value in each line, you can probably parse your config file as follows:
user_healths = {} # start empty dictionary
with open("text.txt", 'r') as fh: # open file for reading
for line in fh.read().strip().split('\n'): # list lines, ignore last empty
user, health = line.split(maxsplit=1) # "a b c" -> ["a", "b c"]
user_healths[user] = int(health) # ValueError if not number
Note that this will make the user's health the last value listed in text.txt if it appears multiple times, which may be what you want if you always append to the file
% cat text.txt
user1 100
user2 150
user1 200
Parsing text.txt above:
>>> print(user_healths)
{'user1': 200, 'user2': 150}
Related
I have the following code that takes raw_file.txt and turns it into processed_file.txt.
Problem 1:
Besides item_location I also need the item_id (as str, not int) to be in the processed file, perhaps as list so it would look like WANTED_processed_file.txt
def process_file(raw_file, processed_file, target1):
with open(raw_file, 'r') as raw_file:
with open(processed_file, 'a') as processed_file:
for line in raw_file:
if target1 in line:
processed_file.write(line.split(target1)[1])
process_file('raw_file.txt', 'processed_file.txt', 'item_location: ')
By adding another if statement with target2, the content is appended below target1 (as expected), but I don't actually know how to make it a list.
Problem 2:
With my current code I'm only able to process the string corresponding to the line, but since WANTED_processed_file.txt contains multiple list I need to adapt it.
def my_function():
print(i)
with open('processed_file.txt', "r") as processed_file:
items = processed_file.read().splitlines()
for i in items:
my_function()
This is what I've tried but I'm not getting the desired results:
def my_function():
print(f'Item {i[0]} is located at {i[1]}')
with open('WANTED_processed_file.txt', "r") as processed_file:
items = processed_file.read()
for i in items:
my_function()
raw_file.txt:
ITEM:
item_id: 0001
item_location: first location
item_description: something
ITEM:
item_id: 0002
item_location: second location
item_description: something else
processed_file.txt:
first location
second location
WANTED_processed_file.txt:
['0001', 'first location']
['0002', 'second location']
Thank you and apologies for the lengthy post
You want to parse a multiline blocs from a text file. The robust way would be to process it line by line searching for start of bloc markers and data markers, fill a data structure and then store the data to a file.
If you are sure that your items will always be in the same order, you could use a multiline regex:
import re
...
with open('raw_file.txt') as fd:
text = fd.read()
rx = re.compile(r'ITEM:.*?item_id: ([^\n]*).*?item_location: ([^\n]*)',
re.MULTILINE | re.DOTALL)
with open('processed_file', 'w') as fd:
for record in rx.finditer(t):
print(list(record), file=fd)
But beware, it will be less robust than a true parser...
I am using python, and I have a large 'outputString' that consists of several outputs, each on a new line, to look something like this:
{size:1, title:"Hello", space:0}
{size:21, title:"World", space:10}
{size:3, title:"Goodbye", space:20}
However, there is so much data that I cannot see it all in the terminal, and would like to write code that automatically writes a json file. I am having trouble getting the json to keep the separated lines. Right now, it is all one large line in the json file. I have attached some code that I have tried. I have also attached the code used to make the string that I want to convert to a json. Thank you so much!
for value in outputList:
newOutputString = json.dumps(value)
outputString += (newOutputString + "\n")
with open('data.json', 'w') as outfile:
for item in outputString.splitlines():
json.dump(item, outfile)
json.dump("\n",outfile)
If the input really is a string, you'll probably have to make sure it's some properly formated as json:
outputString = '''{"size":1, "title":"Hello", "space":0}
{"size":21, "title":"World", "space":10}
{"size":3, "title":"Goodbye", "space":20}'''
You could then use pandas to manipulate your data (so it's not a problem of screen size anymore).
import pandas as pd
import json
pd.DataFrame([json.loads(line) for line in outputString.split('\n')])
Which gives:
size title space
0 1 Hello 0
1 21 World 10
2 3 Goodbye 20
On the other hand, from what I understand outputString is not a string but a list of dictionaries, so you could write a simpler version of this:
outputString = [{'size':1, 'title':"Hello", 'space':0},
{'size':21, 'title':"World", 'space':10},
{'size':3, 'title':"Goodbye", 'space':20}]
pd.DataFrame(outputString)
Which gives the same DataFrame as before. Using this DataFrame will allow you to query your data and it will be much more confortable than a JSON. For example
>>> df = pd.DataFrame(outputString)
>>> df[df['size'] >= 3]
size title space
1 21 World 10
2 3 Goodbye 20
You could also to try ipython (or even jupyter/jupyterlab) as it will probably also make your life easier.
You can use below code:
json_data = json.loads(outputString)
with open('data.json', 'w') as outfile:
json.dump(json_data, outfile, indent= 5)
I have a script that reads through a log file that contains hundreds of these logs, and looks for the ones that have a "On, Off, or Switch" type. Then I output each log into its own list. I'm trying to find a way to extract the Out and In times into a separate list/array and then subtract the two times to find the duration of each separate log. This is what the outputted logs look like:
['2020-01-31T12:04:57.976Z 1234 Out: [2020-01-31T00:30:20.150Z] Id: {"Id":"4-f-4-9-6a"', '"Type":"Switch"', '"In":"2020-01-31T00:30:20.140Z"']
This is my current code:
logfile = '/path/to/my/logfile'
with open(logfile, 'r') as f:
text = f.read()
words = ["On", "Off", "Switch"]
text2 = text.split('\n')
for l in text.split('\n'):
if (words[0] in l or words[1] in l or words[2] in l):
log = l.split(',')[0:3]
I'm stuck on how to target only the Out and In time values from the logs and put them in an array and convert to a time value to find duration.
Initial log before script: everything after the "In" time is useless for what I'm looking for so I only have the first three indices outputted
2020-01-31T12:04:57.976Z 1234 Out: [2020-01-31T00:30:20.150Z] Id: {"Id":"4-f-4-9-6a","Type":"Switch,"In":"2020-01-31T00:30:20.140Z","Path":"interface","message":"interface changed status from unknown to normal","severity":"INFORMATIONAL","display":true,"json_map":"{\"severity\":null,\"eventId\":\"65e-64d9-45-ab62-8ef98ac5e60d\",\"componentPath\":\"interface_css\",\"displayToGui\":false,\"originalState\":\"unknown\",\"closed\":false,\"eventType\":\"InterfaceStateChange\",\"time\":\"2019-04-18T07:04:32.747Z\",\"json_map\":null,\"message\":\"interface_css changed status from unknown to normal\",\"newState\":\"normal\",\"info\":\"Event created with current status\"}","closed":false,"info":"Event created with current status","originalState":"unknown","newState":"normal"}
Below is a possible solution. The wordmatch line is a bit of a hack, until I find something clearer: it's just a one-liner that create an empty or 1-element set of True if one of the words matches.
(Untested)
import re
logfile = '/path/to/my/logfile'
words = ["On", "Off", "Switch"]
dateformat = r'\d{4}\-\d{2}\-\d{2}T\d{2}:\d{2}:\d{2}\.\d+[Zz]?'
pattern = fr'Out:\s*\[(?P<out>{dateformat})\].*In":\s*\"(?P<in>{dateformat})\"'
regex = re.compile(pattern)
with open(logfile, 'r') as f:
for line in f:
wordmatch = set(filter(None, (word in s for word in words)))
if wordmatch:
match = regex.search(line)
if match:
intime = match.group('in')
outtime = match.group('out')
# whatever to store these strings, e.g., append to list or insert in a dict.
As noted, your log example is very awkward, so this works for the example line, but may not work for every line. Adjust as necessary.
I have also not included (if so wanted), a conversion to a datetime.datetime object. For that, read through the datetime module documentation, in particular datetime.strptime. (Alternatively, you may want to store your results in a Pandas table. In that case, read through the Pandas documentation on how to convert strings to actual datetime objects.)
You also don't need to read nad split on newlines yourself: for line in f will do that for you (provided f is indeed a filehandle).
Regex is probably the way to go (fastness, efficiency etc.) ... but ...
You could take a very simplistic (if very inefficient) approach of cleaning your data:
join all of it into a string
replace things that hinder easy parsing
split wisely and filter the split
like so:
data = ['2020-01-31T12:04:57.976Z 1234 Out: [2020-01-31T00:30:20.150Z] Id: {"Id":"4-f-4-9-6a"', '"Type":"Switch"', '"In":"2020-01-31T00:30:20.140Z"']
all_text = " ".join(data)
# this is inefficient and will create throwaway intermediate strings - if you are
# in a hurry or operate on 100s of MB of data, this is NOT the way to go, unless
# you have time
# iterate pairs of ("bad thing", "what to replace it with") (or list of bad things)
for thing in [ (": ",":"), (list('[]{}"'),"") ]:
whatt = thing[0]
withh = thing[1]
# if list, do so for each bad thing
if isinstance(whatt, list):
for p in whatt:
# replace it
all_text = all_text.replace(p,withh)
else:
all_text = all_text.replace(whatt,withh)
# format is now far better suited to splitting/filtering
cleaned = [a for a in all_text.split(" ")
if any(a.startswith(prefix) or "Switch" in a
for prefix in {"In:","Switch:","Out:"})]
print(cleaned)
Outputs:
['Out:2020-01-31T00:30:20.150Z', 'Type:Switch', 'In:2020-01-31T00:30:20.140Z']
After cleaning your data would look like:
2020-01-31T12:04:57.976Z 1234 Out:2020-01-31T00:30:20.150Z Id:Id:4-f-4-9-6a Type:Switch In:2020-01-31T00:30:20.140Z
You can transform the clean list into a dictionary for ease of lookup:
d = dict( part.split(":",1) for part in cleaned)
print(d)
will produce:
{'In': '2020-01-31T00:30:20.140Z',
'Type': 'Switch',
'Out': '2020-01-31T00:30:20.150Z'}
You can use datetime module to parse the times from your values as shown in 0 0 post.
I'm having trouble getting anything to write in my outut file (word_count.txt).
I expect the script to review all 500 phrases in my phrases.txt document, and output a list of all the words and how many times they appear.
from re import findall,sub
from os import listdir
from collections import Counter
# path to folder containg all the files
str_dir_folder = '../data'
# name and location of output file
str_output_file = '../data/word_count.txt'
# the list where all the words will be placed
list_file_data = '../data/phrases.txt'
# loop through all the files in the directory
for str_each_file in listdir(str_dir_folder):
if str_each_file.endswith('data'):
# open file and read
with open(str_dir_folder+str_each_file,'r') as file_r_data:
str_file_data = file_r_data.read()
# add data to list
list_file_data.append(str_file_data)
# clean all the data so that we don't have all the nasty bits in it
str_full_data = ' '.join(list_file_data)
str_clean1 = sub('t','',str_full_data)
str_clean_data = sub('n',' ',str_clean1)
# find all the words and put them into a list
list_all_words = findall('w+',str_clean_data)
# dictionary with all the times a word has been used
dict_word_count = Counter(list_all_words)
# put data in a list, ready for output file
list_output_data = []
for str_each_item in dict_word_count:
str_word = str_each_item
int_freq = dict_word_count[str_each_item]
str_out_line = '"%s",%d' % (str_word,int_freq)
# populates output list
list_output_data.append(str_out_line)
# create output file, write data, close it
file_w_output = open(str_output_file,'w')
file_w_output.write('n'.join(list_output_data))
file_w_output.close()
Any help would be great (especially if I'm able to actually output 'single' words within the output list.
thanks very much.
Would be helpful if we got more information such as what you've tried and what sorts of error messages you received. As kaveh commented above, this code has some major indentation issues. Once I got around those, there were a number of other logic errors to work through. I've made some assumptions:
list_file_data is assigned to '../data/phrases.txt' but there is then a
loop through all file in a directory. Since you don't have any handling for
multiple files elsewhere, I've removed that logic and referenced the
file listed in list_file_data (and added a small bit of error
handling). If you do want to walk through a directory, I'd suggest
using os.walk() (http://www.tutorialspoint.com/python/os_walk.htm)
You named your file 'pharses.txt' but then check for if the files
that endswith 'data'. I've removed this logic.
You've placed the data set into a list when findall works just fine with strings and ignores special characters that you've manually removed. Test here:
https://regex101.com/ to make sure.
Changed 'w+' to '\w+' - check out the above link
Converting to a list outside of the output loop isn't necessary - your dict_word_count is a Counter object which has an 'iteritems' method to roll through each key and value. Also changed the variable name to 'counter_word_count' to be slightly more accurate.
Instead of manually generating csv's, I've imported csv and utilized the writerow method (and quoting options)
Code below, hope this helps:
import csv
import os
from collections import Counter
from re import findall,sub
# name and location of output file
str_output_file = '../data/word_count.txt'
# the list where all the words will be placed
list_file_data = '../data/phrases.txt'
if not os.path.exists(list_file_data):
raise OSError('File {} does not exist.'.format(list_file_data))
with open(list_file_data, 'r') as file_r_data:
str_file_data = file_r_data.read()
# find all the words and put them into a list
list_all_words = findall('\w+',str_file_data)
# dictionary with all the times a word has been used
counter_word_count = Counter(list_all_words)
with open(str_output_file, 'w') as output_file:
fieldnames = ['word', 'freq']
writer = csv.writer(output_file, quoting=csv.QUOTE_ALL)
writer.writerow(fieldnames)
for key, value in counter_word_count.iteritems():
output_row = [key, value]
writer.writerow(output_row)
Something like this?
from collections import Counter
from glob import glob
def extract_words_from_line(s):
# make this as complicated as you want for extracting words from a line
return s.strip().split()
tally = sum(
(Counter(extract_words_from_line(line))
for infile in glob('../data/*.data')
for line in open(infile)),
Counter())
for k in sorted(tally, key=tally.get, reverse=True):
print k, tally[k]
I have tsv file which is prepared like:
*Settings*
Force, Tags FakeTag
Resource ../../robot_resources/global.tsv
*Test, Cases*
Testcase1 [Documentation] PURPOSE:
... Test, checking,,
...
...
... PRECONDITION:
... set,
... make, sure, that,
...
...
[Setup] stopProcessAndWaitForIt
startAdvErrorTrace
startAdvDumpEvents
stopProcessAndWaitForIt
stopAndDownloadAdvDumpEvents
Testcase2 [Documentation] PURPOSE:
... Test, checking,, if,
...
...
... PRECONDITION:
What i want to do is:
- start reading file from Test, Cases
- read separatly every testcase: testcase1, testcase2..n (every testcase starts without tab, testcase body starts with tab)
- evaluate if all testcases has expresions "startAdvErrorTrace" "startAdvDumpEvents" etc
I have in tsv about 50 testcases and want evaluate all file
I'm totally green in developing. I have found some ideas like read csv file as tsv. But i don't know how to achieve my expectations
I don't know what file format this is, but you can do something like this:
items = ("startAdvErrorTrace", "startAdvDumpEvents") # add more as desired
import re
with open("testfile") as infile:
# skip contents until Test Cases
contents = re.search(r"(?s)\*Test, Cases\*.*", infile.read())
cases = contents.split("\n\n") # Split on two consecutive newlines
for case in cases:
if not all(item in case for item in items)
print("Not all items found in case:")
print(case)
Here's a little script to parse the flags per Testcase. The output is:
Flags per testcase:
{1: ['startAdvErrorTrace', 'startAdvDumpEvents'], 2: []}
And the script is:
usage = {}
flags = ('startAdvErrorTrace', 'startAdvDumpEvents')
with(open(testfile)) as f:
lines = [line.strip() for line in f.readlines()]
start_parsing = False
for line in lines:
if 'Test, Cases' in line:
start_parsing = True
continue
if parse_cases:
if 'Testcase' in line:
case_nr = int(re.findall('\d+', line)[0])
usage[case_nr] = []
continue
for flag in flags:
if flag in line:
usage[case_nr].append(flag)
print 'Flags per testcase:'
usage
Hope that helps.