I have a text file like this:
[0.52, '1_1man::army'], stack
[0.45, '3_3man::army'], flow
[0.52, '1_1man::army'], testing
[0.52, '2_2man:army'], expert
How can I load into the file and print all the values for
'1_1man::army', '3_3man::army', '1_1man::army' and '2_2man:army'
My code:
text = open("text.txt", "r").readlines()
print(text[1])
Then to implement the solutions some good people have shared. I cant use their codes since the file I have now is different from the one I posted(I wish to try out this new example).
How can I arrange the list according to similar item in certain location
If that format is rigid throughout the file. You could simply use split() to extract those values in between quotes
with open("text.txt", "r") as file:
for line in file:
print (line.split("'")[1])
line.split("'") slices the string up whenever it sees a '. In your case, every line would be sliced into a list of 3 elements:
[0.52,
1_1man::army
], stack
You want the middle one, which has index [1]. So line.split("'")[1] gives you exactly that.
An easier approach to this would to make a json file instead. Python was a good built in json reading library. This is what the json would look like:
{
"1_1man::army": "stack",
"3_3man::army": "flow",
"1_1man::army": "testing",
"2_2man::army": "expert",
}
You would enter this and change the file extension from .txt to .json. You can read it like this:
import json
with open("YourText/JsonFileHere.json") as f:
data = json.load(f)
// Get first 1_1man::army value
data[0]["1_1man::army"]
// Get 3_3man::army value
data["3_3man::army"]
// Get second 1_1man::army value
data["1_1man::army"]
// Get 1_1man::army value
data[1]["1_1man::army"]
// in order to add things to the json do this:
data["What you want the new key to be called"] = "What the value is"
Let me know if this helps!
Related
I have a large .txt file that is a result of a C-file being parsed containing various blocks of data, but about 90% of them are useless to me. I'm trying to get rid of them and then save the result to another file, but have hard time doing so. At first I tried to delete all useless information in unparsed file, but then it won't parse. My .txt file is built like this:
//Update: Files I'm trying to work on comes from pycparser module, that I found on a GitHub.
File before being parsed looks like this:
And after using pycparser
file_to_parse = pycparser.parse_file(current_directory + r"\D_Out_Clean\file.d_prec")
I want to delete all blocks that starts with word Typedef. This module stores this in an one big list that I can access via it's attribute.
Currently my code looks like this:
len_of_ext_list = len(file_to_parse.ext)
i = 0
while i < len_of_ext_list:
if 'TypeDecl' not in file_to_parse.ext[i]:
print("NOT A TYPEDECL")
print(file_to_parse.ext[i], type(file_to_parse.ext[i]))
parsed_file_2 = open(current_directory + r"\Zadanie\D_Out_Clean_Parsed\clean_file.d_prec", "w+")
parsed_file_2.write("%s%s\n" %("", file_to_parse.ext[i]))
parsed_file_2.close
#file_to_parse_2 = file_to_parse.ext[i]
i+=1
But above code only saves one last FuncDef from a unparsed file, and I don't know how to change it.
So, now I'm trying to get rid of all typedefs in parsed file as they don't have any valuable information for me. I want to now what functions definitions and declarations are in file, and what type of global variables are stored in parsed file. Hope this is more clear now.
I suggest reading the entire input file into a string, and then doing a regex replacement:
with open(current_directory + r"\D_Out\file.txt", "r+") as file:
with open(current_directory + r"\D_Out_Clean\clean_file.txt", "w+") as output:
data = file.read()
data = re.sub(r'type(?:\n\{.*?\}|[^;]*?;)\n?', '', data, flags=re.S)
output.write(line)
Here is a regex demo showing that the replacement logic is working.
Okay so long story short I want to read a .txt file into my program and then insert a string at a specific point in the text. The output would look something along the lines of this:
"text from file {string} more text from file"
This is the relevant code I'm currently working with:
with open(r"act 1 text\act_1_scene_1_talk.txt","r+") as scene_1_talk_file:
scene_1_talk = scene_1_talk_file.read()
print(input("Press enter to continue. "))
print(f"{scene_1_talk}")
I suppose I could just cut the text file in half and then put the string in between it, but I would prefer to keep the file in one body. I can provide additional code segments to help clarify anything.
Let's say you want to put your {string} at, like, middle of the text file. Then you can
with open('note.txt', "r") as f:
f_read = f.read()
middle_position = int(len(f_read) / 2) - 1 //minus one because array starts from zero
//strings are immutable, so we need another variable to store the new string
result = f_read[:middle_position] + "{string}" + f_read[middle_position:]
print(result)
I have a huge text file that contains several JSON objects inside of it that I want to parse into a csv file. Just because i'm dealing with someone else's data I cannot really change the format its being delivered in.
Since I dont know how many objects JSON objects I just can create a couple set of dictionaries, wrap them in a list and then json.loads() the list.
Also, since all the objects are in a single text line I can't a regex expression to separete each individual json object and then put them on a list.(It's a super complicated and sometimes triple nested json at some points.
Here's, my current code
def json_to_csv(text_file_name,desired_csv_name):
#Cleans up a bit of the text file
file = fileinput.FileInput(text_file_name, inplace=True)
ile = fileinput.FileInput(text_file_name, inplace=True)
for line in file:
sys.stdout.write(line.replace(u'\'', u'"'))
for line in ile:
sys.stdout.write(re.sub(r'("[\s\w]*)"([\s\w]*")', r"\1\2", line))
#try to load the text file to content var
with open(text_file_name, "rb") as fin:
content = json.load(fin)
#Rest of the logic using the json data in content
#that uses it for the desired csv format
This code gives a ValueError: Extra data: line 1 column 159816 because there is more than one object there.
I seen similar questions in Google and StackOverflow. But none of those solutions none because of the fact that it's just one really long line in a text file and I dont know how many objects there are in the file.
If you are trying to split apart the highest level braces you could do something like
string = '{"NextToken": {"value": "...'
objects = eval("[" + string + "]")
and then parse each item in the list.
I'm trying to load a large JSON File (300MB) to use to parse to excel. I just started running into a MemoryError when I do a json.load(file). Questions similar to this have been posted but have not been able to answer my specific question. I want to be able to return all the data from the json file in one block like I did in the code. What is the best way to do that? The Code and json structure are below:
The code looks like this.
def parse_from_file(filename):
""" proceed to load the json file that given and verified,
it and returns the data that was in the json file so it can actually be read
Args:
filename (string): full branch location, used to grab the json file plus '_metrics.json'
Returns:
data: whatever data is being loaded from the json file
"""
print("STARTING PARSE FROM FILE")
with open(filename) as json_file:
d = json.load(json_file)
json_file.close()
return d
The structure looks like this.
[
{
"analysis_type": "test_one",
"date": 1505900472.25,
"_id": "my_id_1.1.1",
"content": {
.
.
.
}
},
{
"analysis_type": "test_two",
"date": 1605939478.91,
"_id": "my_id_1.1.2",
"content": {
.
.
.
}
},
.
.
.
]
Inside "content" the information is not consistent but has 3 distinct but different possible template that can be predicted based of analysis_type.
i did like this way, hope it will helps you. and maybe you need skip the 1th line "[". and remove "," at a line end if exists "},".
with open(file) as f:
for line in f:
while True:
try:
jfile = ujson.loads(line)
break
except ValueError:
# Not yet a complete JSON value
line += next(f)
# do something with jfile
If all the tested libraries are giving you memory problems my approach would be splitting the file into one per each object inside the array.
If the file has the newlines and padding as you said in the OP I owuld read by line, discarding if it is [ or ] writting the lines to new files every time you find a }, where you also need to remove the commas. Then try to load everyfile and print a message when you end reading each one to see where it fails, if it does.
If the file has no newlines or is not properly padded you would need to start reading char by char keeping too counters, increasing each of them when you find [ or { and decreasing them when you find ] or } respectively. Also take into account that you may need to discard any curly or square bracket that is inside a string, though that may not be needed.
I have a file that I wish to parse. It has data in the json format, but the file is not a json file. I want to loop through the file, and pull out the ID where totalReplyCount is greater than 0.
{ "totalReplyCount": 0,
"newLevel":{
"main":{
"url":"http://www.someURL.com",
"name":"Ronald Whitlock",
"timestamp":"2016-07-26T01:22:03.000Z",
"text":"something great"
},
"id":"z12wcjdxfqvhif5ee22ys5ejzva2j5zxh04"
}
},
{ "totalReplyCount": 4,
"newLevel":{
"main":{
"url":"http://www.someUR2L.com",
"name":"other name",
"timestamp":"2016-07-26T01:22:03.000Z",
"text":"something else great"
},
"id":"kjsdbesd2wd2eedd23rf3r3r2e2dwe2edsd"
}
},
My initial attempt was to do the following
def readCsv(filename):
with open(filename, 'r') as csvFile:
for row in csvFile["totalReplyCount"]:
print row
but I get an error stating
TypeError: 'file' object has no attribute 'getitem'
I know this is just an attempt at printing and not doing what I want to do, but I am a novice at python and lost as to what I am doing wrong. What is the correct way to do this? My end result should look like this for the ids:
['insdisndiwneien23e2es', 'lsndion2ei2esdsd',....]
EDIT 1- 7/26/16
I saw that I made a mistake in my formatting when I copied the code (it was late, I was tired..). I switched it to a proper format that is more like JSON. This new edit properly matches file I am parsing. I then tried to parse it with JSON, and got the ValueError: Extra data: line 2 column 1 - line X column 1:, where line X is the end of the line.
def readCsv(filename):
with open(filename, 'r') as file:
data=json.load(file)
pprint(data)
I also tried DictReader, and got a KeyError: 'totalReplyCount'. Is the dictionary un-ordered?
EDIT 2 -7/27/16
After taking a break, coming back to it, and thinking it over, I realized that what I have (after proper massaging of the data) is a CSV file, that contains a proper JSON object on each line. So, I have to parse the CSV file, then parse each line which is a top level, whole and complete JSON object. The code I used to try and parse this is below but all I get is the first string character, an open curly brace '{' :
def readCsv(filename):
with open(filename, 'r') as csvfile:
for row in csv.DictReader(csvfile):
for item in row:
print item[0]
I am guessing that the DictReader is converting the json object to a string, and that is why I am only getting a curly brace as opposed to the first key. If I was to do print item[0:5] I would get a mish mash of the first 4 characters in an un-ordered fashion on each line, which I assume is because the format has turned into an un-ordered list? I think I understand my problem a little bit better, but still wrapping my head around the data structures and the methods used to parse them. What am I missing?
After reading the question and all the above answers, please check if this is useful to you.
I have considered input file as simple file not as csv or json file.
Flow of code is as follow:
Open and read a file in reverse order.
Search for ID in line. Extract ID and store in temp variable.
Go on reading file line by line and search totalReplyCount.
Once you got totalReplyCount, check it if it greater than 0.
If yes, then store temp ID in id_list and re-initialize temp variable.
import re
tmp_id_to_store = ''
id_list = []
for line in reversed(open("a.txt").readlines()):
m = re.search('"id":"(\w+)"', line.rstrip())
if m:
tmp_id_to_store = m.group(1)
n = re.search('{ "totalReplyCount": (\d+),', line.rstrip())
if n:
fou = n.group(1)
if int(fou) > 0:
id_list.append(tmp_id_to_store)
tmp_id_to_store = ''
print id_list
More check points can be added.
As the error stated, Your csvFile is a file object, it is not a dict object, so you can't get an item out of it.
if your csvFile is in CSV format, you can use the csv module to read each line of the csv into a dict :
import csv
with open(filename) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print row['totalReplyCount']
note the DictReader method from the csv module, it will read your csv line and parse it into dict object
If your input file is JSON why not just use the JSON library to parse it and then run a for loop over that data. Then it is just a matter of iterating over the keys and extracting data.
import json
from pprint import pprint
with open('data.json') as data_file:
data = json.load(data_file)
pprint(data)
Parsing values from a JSON file using Python?
Look at Justin Peel's answer. It should help.
Parsing values from a JSON file in Python , this link has it all # Parsing values from a JSON file using Python? via stackoverflow.
Here is a shell one-liner, should solve your problem, though it's not python.
egrep -o '"(?:totalReplyCount|id)":(.*?)$' filename | awk '/totalReplyCount/ {if ($2+0 > 0) {getline; print}}' | cut -d: -f2
output:
"kjsdbesd2wd2eedd23rf3r3r2e2dwe2edsd"