Reading a dictionary from within a dictionary - python
I have a json file for tweet data. The data that I want to look at is the text of the tweet. For some reason, some of the tweets are too long to put into the normal text part of the dictionary.
It seems like there is a dictionary within another dictionary and I can't figure out how to access it very well.
Basically, what I want in the end is one column of a data frame that will have all of the text from each individual tweet. Here is a link to a small sample of the data that contains a problem tweet.
Here is the code I have so far:
import json
import pandas as pd
tweets = []
#This writes the json file so that I can work with it. This part works correctly.
with open("filelocation.txt") as source
for line in source:
if line.strip():
tweets.append(json.loads(line))
print(len(tweets)
df = pd.DataFrame.from_dict(tweets)
df.info()
When looking at the info you can see that there will be a column called extended_tweet that only encompasses one of the two sample tweets. Within this column, there seems to be another dictionary with one of those keys being full_text.
I want to add another column to the dataframe that just has this information along with the normal text column when the full_text is null.
My first thought was to try and read that specific column of the dataframe as a dictionary again using:
d = pd.DataFrame.from_dict(tweets['extended_tweet]['full_text])
But this doesn't work. I don't really understand why that doesn't work as that is how I read the data the first time.
My guess is that I can't look at the specific names because I am going back to the list and it would have to read all or none. The error it gives me says "KeyError: 'full_text' "
I also tried using the recommendation provided by this website. But this gave me a None value no matter what.
Thanks in advance!
I tried to do what #Dan D. suggested, however, this still gave me errors. But it gave me the idea to try this:
tweet[0]['extended_tweet']['full_text']
This works and gives me the value that I am looking for. But I need to run through the whole thing. So I tried this:
df['full'] = [tweet[i]['extended_tweet']['full_text'] for i in range(len(tweet))
This gives me "Key Error: 'extended_tweet' "
Does it seem like I am on the right track?
I would suggest to flatten out the dictionaries like this:
tweet = json.loads(line)
tweet['full_text'] = tweet['extended_tweet']['full_text']
tweets.append(tweet)
I don't know if the answer suggested earlier works. I never got that successfully. But I did figure out something else that works well for me.
What I really needed was a way to display the full text of a tweet. I first loaded the tweets from the json with what I posted above. Then I noticed that in the data file, there is something called truncated. If this value is true, the tweet is cut short and the full tweet is placed within the
tweet[i]['extended_tweet]['full_text]
In order to access it, I used this:
tweet_list = []
for i in range(len(tweets)):
if tweets[i]['truncated'] == 'True':
tweet_list.append(tweets[i]['extended_tweet']['full_text']
else:
tweet_list.append(tweets[i]['text']
Then I can work with the data using the whol text from each tweet.
Related
transform JSON file to be usable
Long story short, i get the query from spotify api which is JSON that has data about newest albums. How do i get the specific info from that like let's say every band name or every album title. I've tried a lot of ways to get that info that i found on the internet and nothing seems to work for me and after couple of hours im kinda frustrated JSON data is on jsfiddle here is the request endpoint = "https://api.spotify.com/v1/browse/new-releases" lookup_url = f"{endpoint}" r = requests.get(lookup_url, headers=headers) print(r.json()) you can find the
When you make this request like the comments have mentioned you get a dictionary which you can then access the keys and values. For example if you want to get the album_type you could do the following: print(data["albums"]["items"][0]["album_type"]) Since items contains a list you would need to get the first values 0 and then access the album_type. Output: single Here is a link to the code I used with your json. I suggest you look into how to deal with json data in python, this is a good place to start.
I copied the data from the jsfiddle link. Now try the following code: import ast pyobj=ast.literal_eval(str_cop_from_src) later you can try with keys pyobj["albums"]["items"][0]["album_type"] pyobj will be a python dictionary will all data.
How would I be able to remove this part of the variable?
So I am making a code like a guessing game. The data for the guessing game is in the CSV file so I decided to use pandas. I have tried to use pandas to import my csv file, pick a random row and put the data into variables so I can use it in the rest of the code but, I can't figure out how to format the data in the variable correctly. I've tried to split the string with split() but I am quite lost. ar = pandas.read_csv('names.csv') ar.columns = ["Song Name","Artist","Intials"] randomsong = ar.sample(1) songartist = randomsong["Artist"] songname = (randomsong["Song Name"]) songintials = randomsong["Intials"] print(songname) My CSV file looks like this. Song Name,Artist,Intials Someone you loved,Lewis Capaldi,SYL Bad Guy,Billie Eilish,BG Ransom,Lil Tecca,R Wow,Post Malone, W I expect the output to be the name of the song from the csv file. For Example Bad Guy Instead the output is 1 Bad Guy Name: Song Name, dtype:object If anyone knows the solution please let me know. Thanks
You're getting a series object as output. You can try randomsong["Song Name"].to_string()
Use df['column].values to get values of the column. In your case, songartist = randomsong["Artist"].values[0] because you want only the first element of the returned list.
Python- Insert new values into 'nested' list?
What I'm trying to do isn't a huge problem in php, but I can't find much assistance for Python. In simple terms, from a list which produces output as follows: {"marketId":"1.130856098","totalAvailable":null,"isMarketDataDelayed":null,"lastMatchTime":null,"betDelay":0,"version":2576584033,"complete":true,"runnersVoidable":false,"totalMatched":null,"status":"OPEN","bspReconciled":false,"crossMatching":false,"inplay":false,"numberOfWinners":1,"numberOfRunners":10,"numberOfActiveRunners":8,"runners":[{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":2.8,"size":34.16},{"price":2.76,"size":200},{"price":2.5,"size":237.85}],"availableToLay":[{"price":2.94,"size":6.03},{"price":2.96,"size":10.82},{"price":3,"size":33.45}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832765}... All I want to do is add in an extra field, containing the 'runner name' in the data set below, into each of the 'runners' sub lists from the initial data set, based on selection_id=selectionId. So initially I iterate through the full dataset, and then create a separate list to get the runner name from the runner id (I should point out that runnerId===selectionId===selection_id, no idea why there are multiple names are used), this works fine and the code is shown below: for market_book in market_books: market_catalogues = trading.betting.list_market_catalogue( market_projection=["RUNNER_DESCRIPTION", "RUNNER_METADATA", "COMPETITION", "EVENT", "EVENT_TYPE", "MARKET_DESCRIPTION", "MARKET_START_TIME"], filter=betfairlightweight.filters.market_filter( market_ids=[market_book.market_id], ), max_results=100) data = [] for market_catalogue in market_catalogues: for runner in market_catalogue.runners: data.append( (runner.selection_id, runner.runner_name) ) So as you can see I have the data in data[], but what I need to do is add it to the initial data set, based on the selection_id. I'm more comfortable with Php or Javascript, so apologies if this seems a bit simplistic, but the code snippets I've found on-line only seem to assist with very simple Python lists and nothing 'nested' (to me the structure seems similar to a nested array). As per the request below, here is the full list: {"marketId":"1.130856098","totalAvailable":null,"isMarketDataDelayed":null,"lastMatchTime":null,"betDelay":0,"version":2576584033,"complete":true,"runnersVoidable":false,"totalMatched":null,"status":"OPEN","bspReconciled":false,"crossMatching":false,"inplay":false,"numberOfWinners":1,"numberOfRunners":10,"numberOfActiveRunners":8,"runners":[{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":2.8,"size":34.16},{"price":2.76,"size":200},{"price":2.5,"size":237.85}],"availableToLay":[{"price":2.94,"size":6.03},{"price":2.96,"size":10.82},{"price":3,"size":33.45}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832765},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":20,"size":3},{"price":19.5,"size":26.36},{"price":19,"size":2}],"availableToLay":[{"price":21,"size":13},{"price":22,"size":2},{"price":23,"size":2}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832767},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":11,"size":9.75},{"price":10.5,"size":3},{"price":10,"size":28.18}],"availableToLay":[{"price":11.5,"size":12},{"price":13.5,"size":2},{"price":14,"size":7.75}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832766},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":48,"size":2},{"price":46,"size":5},{"price":42,"size":5}],"availableToLay":[{"price":60,"size":7},{"price":70,"size":5},{"price":75,"size":10}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832769},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":18.5,"size":28.94},{"price":18,"size":5},{"price":17.5,"size":3}],"availableToLay":[{"price":21,"size":20},{"price":23,"size":2},{"price":24,"size":2}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832768},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":4.3,"size":9},{"price":4.2,"size":257.98},{"price":4.1,"size":51.1}],"availableToLay":[{"price":4.4,"size":20.97},{"price":4.5,"size":30},{"price":4.6,"size":16}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832771},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":24,"size":6.75},{"price":23,"size":2},{"price":22,"size":2}],"availableToLay":[{"price":26,"size":2},{"price":27,"size":2},{"price":28,"size":2}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832770},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":5.7,"size":149.33},{"price":5.5,"size":29.41},{"price":5.4,"size":5}],"availableToLay":[{"price":6,"size":85},{"price":6.6,"size":5},{"price":6.8,"size":5}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":10064909}],"publishTime":1551612312125,"priceLadderDefinition":{"type":"CLASSIC"},"keyLineDescription":null,"marketDefinition":{"bspMarket":false,"turnInPlayEnabled":false,"persistenceEnabled":false,"marketBaseRate":5,"eventId":"28180290","eventTypeId":"2378961","numberOfWinners":1,"bettingType":"ODDS","marketType":"NONSPORT","marketTime":"2019-03-29T00:00:00.000Z","suspendTime":"2019-03-29T00:00:00.000Z","bspReconciled":false,"complete":true,"inPlay":false,"crossMatching":false,"runnersVoidable":false,"numberOfActiveRunners":8,"betDelay":0,"status":"OPEN","runners":[{"status":"ACTIVE","sortPriority":1,"id":10064909},{"status":"ACTIVE","sortPriority":2,"id":12832765},{"status":"ACTIVE","sortPriority":3,"id":12832766},{"status":"ACTIVE","sortPriority":4,"id":12832767},{"status":"ACTIVE","sortPriority":5,"id":12832768},{"status":"ACTIVE","sortPriority":6,"id":12832770},{"status":"ACTIVE","sortPriority":7,"id":12832769},{"status":"ACTIVE","sortPriority":8,"id":12832771},{"status":"LOSER","sortPriority":9,"id":10317013},{"status":"LOSER","sortPriority":10,"id":10317010}],"regulators":["MR_INT"],"countryCode":"GB","discountAllowed":true,"timezone":"Europe\/London","openDate":"2019-03-29T00:00:00.000Z","version":2576584033,"priceLadderDefinition":{"type":"CLASSIC"}}}
i think i understand what you are trying to do now first hold your data as a python object (you gave us a json object) import json my_data = json.loads(my_json_string) for item in my_data['runners']: item['selectionId'] = [item['selectionId'], my_name_here] the thing is that my_data['runners'][i]['selectionId'] is a string, unless you want to concat the name and the id together, you should turn it into a list or even a dictionary each item is a dicitonary so you can always also a new keys to it item['new_key'] = my_value
So, essentially this works...with one exception...I can see from the print(...) in the loop that the attribute is updated, however what I can't seem to do is then see this update outside the loop. mkt_runners = [] for market_catalogue in market_catalogues: for r in market_catalogue.runners: mkt_runners.append((r.selection_id, r.runner_name)) for market_book in market_books: for runner in market_book.runners: for x in mkt_runners: if runner.selection_id in x: setattr(runner, 'x', x[1]) print(market_book.market_id, runner.x, runner.selection_id) print(market_book.json()) So the print(market_book.market_id.... displays as expected, but when I print the whole list it shows the un-updated version. I can't seem to find an obvious solution, which is odd, as it seems like a really simple thing (I tried messing around with indents, in case that was the problem, but it doesn't seem to be, its like its not refreshing the market_book list post update of the runners sub list)!
Parsing multiple occurrences of an item into a dictionary
Attempting to parse several separate image links from JSON data through python, but having some issues drilling down to the right level, due to what I believe is from having a list of strings. For the majority of the items, I've had success with the below example, pulling back everything I need. Outside of this instance, everything is a 1:1 ratio of keys:values, but for this one, there are multiple values associated with one key. resultsdict['item_name'] = item['attribute_key'] I've been adding it all to a resultsdict={}, but am only able to get to the below sample string when I print. INPUT: for item in data['Item']: resultsdict['images'] = item['Variations']['Pictures'] OUTPUT (only relevant section): 'images': [{u'VariationSpecificPictureSet': [{u'PictureURL': [u'http//imagelink1'], u'VariationSpecificValue': u'color1'}, {u'PictureURL': [u'http//imagelink2'], u'VariationSpecificValue': u'color2'}, {u'PictureURL': [u'http//imagelink3'], u'VariationSpecificValue': u'color3'}, {u'PictureURL': [u'http//imagelink4'], u'VariationSpecificValue': u'color4'}] I feel like I could add ['VariationPictureSet']['PictureURL'] at the end of my initial input, but that throws an error due to the indices not being integers, but strings. Ideally, I would like to see the output as a simple comma-separated list of just the URLs, as follows: OUTPUT: 'images': http//imagelink1, http//imagelink2, http//imagelink3, http//imagelink4
An answer to your comment that required a bit of code to it. When using for item in data['Item']: resultsdict['images'] = item['Variations']['Pictures'] you get a list with one element, so I recommend using this for item in data['Item']: resultsdict['images'] = item['Variations']['Pictures'][0] now you can use for image in resultsdict['images']['VariationsSpecificPictureSet']: print(image['PictureURL'])
Thanks for the help, #uzzee, it's appreciated. I kept tinkering with it and was able to pull the continuous string of all the image URLs with the following code. resultsdict['images'] = sum([x['PictureURL'] for x in item['variations']['Pictures'][0]['VariationSpecificPictureSet']],[]) Without the sum it looks like this and pulls in the whole list of lists... resultsdict['images'] = [x['PictureURL'] for x in item['variations']['Pictures'][0]['VariationSpecificPictureSet']]
Search a single column for a particular value in a CSV file and return an entire row
Issue The code does not correctly identify the input (item). It simply dumps to my failure message even if such a value exists in the CSV file. Can anyone help me determine what I am doing wrong? Background I am working on a small program that asks for user input (function not given here), searches a specific column in a CSV file (Item) and returns the entire row. The CSV data format is shown below. I have shortened the data from the actual amount (49 field names, 18000+ rows). Code import csv from collections import namedtuple from contextlib import closing def search(): item = 1000001 raw_data = 'active_sanitized.csv' failure = 'No matching item could be found with that item code. Please try again.' check = False with closing(open(raw_data, newline='')) as open_data: read_data = csv.DictReader(open_data, delimiter=';') item_data = namedtuple('item_data', read_data.fieldnames) while check == False: for row in map(item_data._make, read_data): if row.Item == item: return row else: return failure CSV structure active_sanitized.csv Item;Name;Cost;Qty;Price;Description 1000001;Name here:1;1001;1;11;Item description here:1 1000002;Name here:2;1002;2;22;Item description here:2 1000003;Name here:3;1003;3;33;Item description here:3 1000004;Name here:4;1004;4;44;Item description here:4 1000005;Name here:5;1005;5;55;Item description here:5 1000006;Name here:6;1006;6;66;Item description here:6 1000007;Name here:7;1007;7;77;Item description here:7 1000008;Name here:8;1008;8;88;Item description here:8 1000009;Name here:9;1009;9;99;Item description here:9 Notes My experience with Python is relatively little, but I thought this would be a good problem to start with in order to learn more. I determined the methods to open (and wrap in a close function) the CSV file, read the data via DictReader (to get the field names), and then create a named tuple to be able to quickly select the desired columns for the output (Item, Cost, Price, Name). Column order is important, hence the use of DictReader and namedtuple. While there is the possibility of hard-coding each of the field names, I felt that if the program can read them on file open, it would be much more helpful when working on similar files that have the same column names but different column organization. Research CSV Header and named tuple: What is the pythonic way to read CSV file data as rows of namedtuples? Converting CSV data to tuple: How to split a CSV row so row[0] is the name and any remaining items are a tuple? There were additional links of research, but I cannot post more than two.
You have three problems with this: You return on the first failure, so it will never get past the first line. You are reading strings from the file, and comparing to an int. _make iterates over the dictionary keys, not the values, producing the wrong result (item_data(Item='Name', Name='Price', Cost='Qty', Qty='Item', Price='Cost', Description='Description')). for row in (item_data(**data) for data in read_data): if row.Item == str(item): return row return failure This fixes the issues at hand - we check against a string, and we only return if none of the items matched (although you might want to begin converting the strings to ints in the data rather than this hackish fix for the string/int issue). I have also changed the way you are looping - using a generator expression makes for a more natural syntax, using the normal construction syntax for named attributes from a dict. This is cleaner and more readable than using _make and map(). It also fixes problem 3.