Python Loop through dictionary - python

I have a file that I wish to parse. It has data in the json format, but the file is not a json file. I want to loop through the file, and pull out the ID where totalReplyCount is greater than 0.
{ "totalReplyCount": 0,
"newLevel":{
"main":{
"url":"http://www.someURL.com",
"name":"Ronald Whitlock",
"timestamp":"2016-07-26T01:22:03.000Z",
"text":"something great"
},
"id":"z12wcjdxfqvhif5ee22ys5ejzva2j5zxh04"
}
},
{ "totalReplyCount": 4,
"newLevel":{
"main":{
"url":"http://www.someUR2L.com",
"name":"other name",
"timestamp":"2016-07-26T01:22:03.000Z",
"text":"something else great"
},
"id":"kjsdbesd2wd2eedd23rf3r3r2e2dwe2edsd"
}
},
My initial attempt was to do the following
def readCsv(filename):
with open(filename, 'r') as csvFile:
for row in csvFile["totalReplyCount"]:
print row
but I get an error stating
TypeError: 'file' object has no attribute 'getitem'
I know this is just an attempt at printing and not doing what I want to do, but I am a novice at python and lost as to what I am doing wrong. What is the correct way to do this? My end result should look like this for the ids:
['insdisndiwneien23e2es', 'lsndion2ei2esdsd',....]
EDIT 1- 7/26/16
I saw that I made a mistake in my formatting when I copied the code (it was late, I was tired..). I switched it to a proper format that is more like JSON. This new edit properly matches file I am parsing. I then tried to parse it with JSON, and got the ValueError: Extra data: line 2 column 1 - line X column 1:, where line X is the end of the line.
def readCsv(filename):
with open(filename, 'r') as file:
data=json.load(file)
pprint(data)
I also tried DictReader, and got a KeyError: 'totalReplyCount'. Is the dictionary un-ordered?
EDIT 2 -7/27/16
After taking a break, coming back to it, and thinking it over, I realized that what I have (after proper massaging of the data) is a CSV file, that contains a proper JSON object on each line. So, I have to parse the CSV file, then parse each line which is a top level, whole and complete JSON object. The code I used to try and parse this is below but all I get is the first string character, an open curly brace '{' :
def readCsv(filename):
with open(filename, 'r') as csvfile:
for row in csv.DictReader(csvfile):
for item in row:
print item[0]
I am guessing that the DictReader is converting the json object to a string, and that is why I am only getting a curly brace as opposed to the first key. If I was to do print item[0:5] I would get a mish mash of the first 4 characters in an un-ordered fashion on each line, which I assume is because the format has turned into an un-ordered list? I think I understand my problem a little bit better, but still wrapping my head around the data structures and the methods used to parse them. What am I missing?

After reading the question and all the above answers, please check if this is useful to you.
I have considered input file as simple file not as csv or json file.
Flow of code is as follow:
Open and read a file in reverse order.
Search for ID in line. Extract ID and store in temp variable.
Go on reading file line by line and search totalReplyCount.
Once you got totalReplyCount, check it if it greater than 0.
If yes, then store temp ID in id_list and re-initialize temp variable.
import re
tmp_id_to_store = ''
id_list = []
for line in reversed(open("a.txt").readlines()):
m = re.search('"id":"(\w+)"', line.rstrip())
if m:
tmp_id_to_store = m.group(1)
n = re.search('{ "totalReplyCount": (\d+),', line.rstrip())
if n:
fou = n.group(1)
if int(fou) > 0:
id_list.append(tmp_id_to_store)
tmp_id_to_store = ''
print id_list
More check points can be added.

As the error stated, Your csvFile is a file object, it is not a dict object, so you can't get an item out of it.
if your csvFile is in CSV format, you can use the csv module to read each line of the csv into a dict :
import csv
with open(filename) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print row['totalReplyCount']
note the DictReader method from the csv module, it will read your csv line and parse it into dict object

If your input file is JSON why not just use the JSON library to parse it and then run a for loop over that data. Then it is just a matter of iterating over the keys and extracting data.
import json
from pprint import pprint
with open('data.json') as data_file:
data = json.load(data_file)
pprint(data)
Parsing values from a JSON file using Python?
Look at Justin Peel's answer. It should help.

Parsing values from a JSON file in Python , this link has it all # Parsing values from a JSON file using Python? via stackoverflow.

Here is a shell one-liner, should solve your problem, though it's not python.
egrep -o '"(?:totalReplyCount|id)":(.*?)$' filename | awk '/totalReplyCount/ {if ($2+0 > 0) {getline; print}}' | cut -d: -f2
output:
"kjsdbesd2wd2eedd23rf3r3r2e2dwe2edsd"

Related

Does json.dump() in python rewrite or append a JSON file

When working with json.dump() I noticed that it appears to be rewriting the entire document. Is this correct, and is there another way to append to the dictionary like .append() deos with lists?
When I write the function like this and change the key value (name), it would appear that the item is being appended.
filename = "infohere.json"
name = "Bob"
numbers = 20
#Write to JSON
def writejson(name = name, numbers = numbers):
with open(filename, "r") as info:
xdict = json.load(info)
xdict[name] = numbers
with open(filename, "w") as info:
json.dump(xdict, info)
When you write it out like this however, you can see that the code clearly writes over the entire dictionary/json file.
filename = infohere.json
dict = {"Bob":23, "Mark":50}
dict2 = {Ricky":40}
#Write to JSON
def writejson2(dict):
with open(filehere, "w") as info:
json.dump(dict, info)
writejson(dict)
writejson(dict2)
In the second example it only ever shows up the last date input leading me to believe that this is rewriting the entire document. If the case is that it writes the whole document during each json.dump, does this cause issues with larger json file, if so is there another method like .append() but for dealing with json.
Thanks in advance.
Neither.
json.dump doesn't decide whether to delete prior content when it writes to a file. That decision happens when you run open(filehere, "w"); that is what deletes old content.
But: Normal JSON isn't amenable to appends.
A single JSON document is one object. There are variants on the format that allow multiple documents in one file, the most common of which is JSONL (which has one JSON document per line). Unless you're using such a format, trying to append JSON to a non-empty file usually won't result in something that can be successfully parsed.

How to convert a series of JSON strings into one json file?

I am using python and json to construct a json file. I have a string, 'outputString' which consists of multiple lines of dictionaries turned into jsons, in the following format:
{size:1, title:"Hello", space:0}
{size:21, title:"World", space:10}
{size:3, title:"Goodbye", space:20}
I would like to turn this string of jsons and write a new json file entirely, with each item still being its own line. I would like to turn the string of multiple json objects and turn it into one json file. I have attached the code on how I got outputString and what I have tried to do. Right now, the code I have writes the file, but all on one line. I would like the lines to be separated as the string is.
for value in outputList:
newOutputString = json.dumps(value)
outputString += (newOutputString + "\n")
with open('data.json', 'w') as outfile:
for item in outputString.splitlines():
json.dump(item, outfile)
json.dump("\n",outfile)
PROBLEM: when you json.dump("\n",outfile) it will always be written on the same line as ā€\nā€ is not recognised as a new line in json.
SOLUTION: ensure that you write a new line using python and not a json encoded string:
with open('data.json', 'a') as outfile: # We are appending to the file so that we can add multiple new lines for each of different json strings
for item in outputString.splitlines():
json.dump(item, outfile)
outfile.write("\nā€) # write to the file a new line, as you can see this uses a python string, no need to encode with json
See comments for explanation.
Please ensure that the file you write to is empty if you just want these json objects in them.
Your value rows are not in actual json format if the properties do not come between double quotes.
This would be a proper json data format:
{"size":1, "title":"Hello", "space":0}
Having said that here is a solution to your question with the type of data you provided.
I am assuming your data comes like this:
outputList = ['{size:1, title:"Hello", space:0}',
'{size:21, title:"World", space:10}',
'{size:3, title:"Goodbye", space:20}']
so the only thing you need to do is write each value using the file.write() function
Python 3.6 and above:
with open('data.json', 'w') as outfile:
for value in outputList:
outfile.write(f"{value}\n")
Python 3.5 and below:
with open('data.json', 'w') as outfile:
for value in outputList:
outfile.write(value+"\n")
data.json file will look like this:
{size:1, title:"Hello", space:0}
{size:21, title:"World", space:10}
{size:3, title:"Goodbye", space:20}
Note: As someone already commented, your data.json file will not be a true json format ted file but it serves the purpose of your question. Enjoy! :)

Print specific words from .json on python

I am using twitter and downloaded a sample code from:" https://stream.twitter.com/1.1/statuses/sample.json"
I used pretty printing but it doesn't print like how I want it.
I only need the user "name" or "screen_name", "user_mention", and "retweeted". I need this to draw a tree with nodes (names) and edges (retweets or mentions with sentimate value (+/-).
first: I dont know how to remove everything from json to just print the 3 things.
Code:
with open (fname) as json_file:
for line in json_file.readlines():
type(line)
f_contents = json_file.read()
keywords = ["id","screen_name","retweeted", "user_mention" ]
keywords =set(keywords)
print(keywords)
pprint.pprint(line, indent = 4 , width=5)
If you want to filter some dictionary by keys, there are some approaches. You can see the solutions discussed at this thread: https://stackoverflow.com/a/3420156/3921457
One solution in your case can be something like:
import json
keywords = {'id','screen_name','retweeted', 'user_mention'}
with open(fname) as file:
for raw_line in file.readlines():
full_line = json.loads(line)
line = {key: full_line[key] for key in keywords}
pprint.pprint(line, indent = 4 , width=5)
Use the json library to actually load the file as structured data. Trying to read the file a line at a time isn't going to work very well, because it ignores how JSON is structured; and the .read() call here ruins the strategy anyway (it reads the entire rest of the file aside from the first line, into f_contents, and then the loop doesn't run again after that).
So:
import json
with open(fname) as json_file:
data = json.load(json_file)
Use Python operations to pull out the parts of the data you need. What you will have is an ordinary dict or list that contains more dicts or lists, etc., as deeply nested as the JSON is.
Now you can pprint the relevant fragments.

How to load text file in python?

I have a text file like this:
[0.52, '1_1man::army'], stack
[0.45, '3_3man::army'], flow
[0.52, '1_1man::army'], testing
[0.52, '2_2man:army'], expert
How can I load into the file and print all the values for
'1_1man::army', '3_3man::army', '1_1man::army' and '2_2man:army'
My code:
text = open("text.txt", "r").readlines()
print(text[1])
Then to implement the solutions some good people have shared. I cant use their codes since the file I have now is different from the one I posted(I wish to try out this new example).
How can I arrange the list according to similar item in certain location
If that format is rigid throughout the file. You could simply use split() to extract those values in between quotes
with open("text.txt", "r") as file:
for line in file:
print (line.split("'")[1])
line.split("'") slices the string up whenever it sees a '. In your case, every line would be sliced into a list of 3 elements:
[0.52,
1_1man::army
], stack
You want the middle one, which has index [1]. So line.split("'")[1] gives you exactly that.
An easier approach to this would to make a json file instead. Python was a good built in json reading library. This is what the json would look like:
{
"1_1man::army": "stack",
"3_3man::army": "flow",
"1_1man::army": "testing",
"2_2man::army": "expert",
}
You would enter this and change the file extension from .txt to .json. You can read it like this:
import json
with open("YourText/JsonFileHere.json") as f:
data = json.load(f)
// Get first 1_1man::army value
data[0]["1_1man::army"]
// Get 3_3man::army value
data["3_3man::army"]
// Get second 1_1man::army value
data["1_1man::army"]
// Get 1_1man::army value
data[1]["1_1man::army"]
// in order to add things to the json do this:
data["What you want the new key to be called"] = "What the value is"
Let me know if this helps!

How to read the json file after skippinf few lines in python?

I have a json file whose contents is as follow:-
[
{"time":"56990","device_id":"1","kwh":"279.4"},
{"time":"60590","device_id":"1","kwh":"289.4"},
{"time":"64190","device_id":"1","kwh":"299.4"},
{"time":"67790","device_id":"1","kwh":"319.4"},
]
Now I want to read this file one line at a time using seek and tell methods in python. I tried this but it shows an error saying not able to decode. I actually want to read the json file after every 15 mins or so from that pointer where it was last read.
This is what I have tried.
last_pointer = 0
with open (FILENAME) as f:
f.seek(last_pointer)
raw_data = json.load(f) // this raw_data should load json starting from the last pointer.
.....process something.........
last_position = f.tell()
If your data is arranged in lines exactly as shown, you can construct an ad-hoc solution by reading lines from the file one by one, trimming the trailing comma, and feeding the result to json.loads. But perhaps the better variant would be to use a streaming parser like ijson.
import json
import time
with open ('dat') as f:
line = f.readline()
while line:
try:
raw_data = json.loads(line.strip().strip(','))
print (raw_data)
time.sleep(15*60)
except ValueError:
pass
line = f.readline()

Categories