Concatenate json array; issue when adding a unique key - python

I am scrapping a website full of JSON arrays. I'm trying to concatenate them and add a unique key to every array (see the code below):
For each iteration, the bot calls the function test and if I print get_text, I have this:
{"status":{"code":0,"message":"Ok","user":{"isBanned":false,"isNotConfirmed":false}},"payload":{"vat":0,"price":{"201":{"100":{"batchsize_id":60916,"quantity":100,"product_price":14.12,"shipment_price":8.98,"country_id":"DE"},"200":{"batchsize_id":60922,"quantity":200,"product_price":19.13,"shipment_price":9.06,"country_id":"DE"},"300":{"batchsize_id":60928,"quantity":300,"product_price":23.64,"shipment_price":9.14,"country_id":"DE"},"400":{"batchsize_id":60934,"quantity":400,"product_price":28.4,"shipment_price":9.23,"country_id":"DE"},"500":{"batchsize_id":60940,"quantity":500,"product_price":32.93,"shipment_price":9.32,"country_id":"DE"},"600":{"batchsize_id":60946,"quantity":600,"product_price":37.08,"shipment_price":9.4,"country_id":"DE"},"700":{"batchsize_id":60952,"quantity":700,"product_price":41,"shipment_price":9.48,"country_id":"DE"},"800":{"batchsize_id":60958,"quantity":800,"product_price":44.72,"shipment_price":9.63,"country_id":"DE"},"900":{"batchsize_id":60964,"quantity":900,"product_price":48.24,"shipment_price":9.71,"country_id":"DE"},"1000":{"batchsize_id":60970,"quantity":1000,"product_price":51.59,"shipment_price":9.79,"country_id":"DE"},"minDeliveryDays":5,"maxDeliveryDays":8}},"productionCountry":["DE"],"minDeliveryDays":5,"maxDeliveryDays":8},"pager":{"total":null,"count":null,"current":1}}
And my code below try to concatenate every iteration and add a unique key for each array.
def test(url_final):
get_url = requests.get(url_final)
#get_dict = json.loads(get_url.text)
get_text = get_url.text
print(get_text)
dictionary['a' + str(uuid.uuid4())[:8]] = get_text
print(dictionary)
printit()
My only issue, is that "print dictionary" returns me this:
{'a6a465b1a': '{"status":{"code":0,"message":"Ok","user":{"isBanned":false,"isNotConfirmed":false}},"payload":{"vat":0,"price":{"201":{"100":{"batchsize_id":60916,"quantity":100,"product_price":14.12,"shipment_price":8.98,"country_id":"DE"},"200":{"batchsize_id":60922,"quantity":200,"product_price":19.13,"shipment_price":9.06,"country_id":"DE"},"300":{"batchsize_id":60928,"quantity":300,"product_price":23.64,"shipment_price":9.14,"country_id":"DE"},"400":{"batchsize_id":60934,"quantity":400,"product_price":28.4,"shipment_price":9.23,"country_id":"DE"},"500":{"batchsize_id":60940,"quantity":500,"product_price":32.93,"shipment_price":9.32,"country_id":"DE"},"600":{"batchsize_id":60946,"quantity":600,"product_price":37.08,"shipment_price":9.4,"country_id":"DE"},"700":{"batchsize_id":60952,"quantity":700,"product_price":41,"shipment_price":9.48,"country_id":"DE"},"800":{"batchsize_id":60958,"quantity":800,"product_price":44.72,"shipment_price":9.63,"country_id":"DE"},"900":{"batchsize_id":60964,"quantity":900,"product_price":48.24,"shipment_price":9.71,"country_id":"DE"},"1000":{"batchsize_id":60970,"quantity":1000,"product_price":51.59,"shipment_price":9.79,"country_id":"DE"},"minDeliveryDays":5,"maxDeliveryDays":8}},"productionCountry":["DE"],"minDeliveryDays":5,"maxDeliveryDays":8},"pager":{"total":null,"count":null,"current":1}}'}
How do I remove the string in front of each array?

You are getting the JSON back as a string rather than as a dictionary like you expect. In order to convert from a string into some values, it has to be parsed. You have the right call there, just in the wrong place. Once you have get_text populated (you might want to consider better names), you need to run json_values = json.loads(get_text). Now json_values will contain the dictionary you expect, and you can assign it instead of get_text.

Related

Python Search Dictionary by Key to return Value

I have a fairly complex dictionary with lists that I'm trying to use. I'm attempting to search the dictionary for the key "rows", which has a list.
With my current sample of data, I can easily pull it like so with index operator:
my_rows = my_dict['inputContent']['document']['fields'][2]['value']['rows']
'rows' is a list and those are the values I am trying to pull. However, my data from 'rows' won't always be in the exact same location but it will always be in my_dict. It could be in a different index like so:
my_dict['inputContent']['document']['fields'][4]['value']['rows']
or
my_dict['inputContent']['document']['fields'][7]['value']['rows']
Really any number.
I've tried using just the basic:
my_rows = my_dict.get("rows")
But this returns None.
I find lots of articles on how to search for values and return key, but I know the key and it will always be the same, while my values in 'rows' will always be different.
I'm new to python and using dictionaries in general, but i'm really struggling to drill down into this dictionary to pull this list.
my_dict['inputContent']['document']['fields'][2]['value']['rows']
my_dict['inputContent']['document']['fields'][4]['value']['rows']
my_dict['inputContent']['document']['fields'][7]['value']['rows']
Looks like the overall structure is the same, the only variable is the numeric list index of the fourth element. So we need a loop that iterates over each element of that list:
for element in my_dict['inputContent']['document']['fields']:
if 'rows' in element['value']:
# found it!
print(element['value']['rows'])

how to make a list of strings into a regular list?

in a loop for, I make several api requests and at the output I get links from json(there are different numbers), then I convert them to strings (because there are some None values, and I need to write them to google sheets)
as a result on the output of print(value) I have:
['"https://example"']
['"https://example"']
['null']
each has a string type
i need to get list value = [link1, link2, none]
cell_values = value
for i, val in enumerate(cell_values):
cell_list[i].value = val
worksheet.update_cells(cell_list)
I think I've tried everything I can.
if you do not convert this to a list, then either each value is written to one cell and overwritten, or each letter is written separately to the range
found a bug in my code and now everything works
I made a list variable inside the loop, but I had to make a list variable before the loop, and in the loop already write strings to the variable
simple mistake,
the question is closed

How to parse a string for array access

I'm trying to parse user input that specifies a value to look up in a dictionary.
For example, this user input may be the string fieldA[5]. I need to then look up the key fieldA in my dictionary (which would be a list) and then grab the element at position 5. This should be generalizable to any depth of lists, e.g. fieldA[5][2][1][8], and properly fail for invalid inputs fieldA[5[ or fieldA[5][7.
I have investigated doing:
import re
key = "fieldA[5]"
re.split('\[|\]', key)
which results in:
['fieldA', '5', '']
This works as an output (I can lookup in the dictionary with it), but does not actually enforce or check pairing of brackets, and is unintuitive in how it reads and how this output would be used to look up in the dictionary.
Is there a parser library, or alternative approach available?
If you don't expect it to get much more complicated than what you described, you might be able to get away with some regex. I usually wouldn't recommend it for this purpose, but I want you to be able to start somewhere...
import re
key = input('Enter field and indices (e.g. "fieldA[5][3]"): ')
field = re.match(r'[^[]*', key).group(0)
indices = [int(num) for num in re.findall(r'\[(\d+)]', key)]
This will simply not recognize a missing field or incorrect bracketing and return an empty string or list respectively.

Accessing Data from Within a List or Tuple and Cleaning it

Here is a conceptual problem that I have been having regarding the cleaning of data and how to interact with lists and tuples that I'm not sure completely how to explain, but if I can get a fix for it, I can conceptually be better at using python.
Here it is: (using python 3 and sqlite3)
I have an SQLite Database with a date column which has text in it in the format of MM-DD-YY 24:00. when viewed in DB Browser the text looks fine. However, when using a fetchall() in Python the code prints the dates out in the format of 'MM-DD-YY\xa0'. I want to clean out the \xa0 from the code and I tried some code that is a combination of what I think I should do plus another post I read on here. This is the code:
print(dates)
output [('MM-DD-YY\xa0',), ('MM-DD-YY\xa0',)etc.blahblah] i just typed this in here
to show you guys the output
dates_clean = []
for i in dates:
clean = str(i).replace(u'\xa0', u' ')
dates_clean.append(clean)
Now when I print dates_clean I get:
["('MM-DD-YY\xa0',)", "('MM-DD-YY\xa0',)"etc]
so now as you can see when i tried to clean it, it did what I wanted it to do but now the actual tuple that it was originally contained in has become part of the text itself and is contained inside another tuple. Therefore when I write this list back into SQLite using an UPDATE statement. all of the date values are contained inside a tuple.
It frustrates me because I have been facing issues such as this for a while, where I want to edit something inside of a list or a tuple and have the new value just replace the old one instead of keeping all of the characters that say it is a tuple and making them just text. Sorry if that is confusing like I said its hard for me to explain. I always end up making my data more dirty when trying to clean it up.
Any insights in how to efficiently clean data inside lists and tuples would be greatly appreciated. I think I am confused about the difference between accessing the tuple or accessing what is inside the tuple. It could also be helpful if you could suggest the name of the conceptual problem I'm dealing with so I can do more research on my own.
Thanks!
You are garbling the output by calling str() on the tuple, either implicitly when printing the whole array at once, or explicitly when trying to “clean” it.
See (python3):
>>> print("MM-DD-YY\xa024:00")
MM-DD-YY 24:00
but:
>>> print(("MM-DD-YY\xa024:00",))
('MM-DD-YY\xa024:00',)
This is because tuple.__str__ calls repr on the content, escaping the non-ascii characters in the process.
However if you print the tuple elements as separate arguments, the result will be correct. So you want to replace the printing with something like:
for row in dates:
print(*row)
The * expands the tuple to separate parameters. Since these are strings, they will be printed as is:
>>> row = ("MM-DD-YY\xa023:00", "MM-DD-YY\xa024:00")
>>> print(*row)
MM-DD-YY 23:00 MM-DD-YY 24:00
You can add separator if you wish
>>> print(*row, sep=', ')
MM-DD-YY 23:00, MM-DD-YY 24:00
... or you can format it:
>>> print('from {0} to {1}'.format(*row))
from MM-DD-YY 23:00 to MM-DD-YY 24:00
Here I am using the * again to expand the tuple to separate arguments and then simply {0} for zeroth member, {1} for first, {2} for second etc. (you can also use {} for next if you don't need to change the order, but giving the indices is clearer).
Ok, so now if you actually need to get rid of the non-breaking space anyway, replace is the right tool. You just need to apply it to each element of the tuple. There are two ways:
Explicit destructuring; applicable when the number of elements is fixed (should be; it is a row of known query):
Given:
>>> row = ('foo', 2, 5.5)
you can destructure it and construct a new tuple:
>>> (a, b, c) = row
>>> (a.replace('o', '0'), b + 1, c * 2)
('f00', 3, 11.0)
this lets you do different transformation on each column.
Mapping; applicable when you want to do the same transformation on all elements:
Given:
>>> row = ('foo', 'boo', 'zoo')
you just wrap a generator comprehension in a tuple constructor:
>>> tuple(x.replace('o', '0') for x in row)
('f00', 'b00', 'z00')
On a side-note, SQLite has some date and time functions and they expect the timestamps to be in strict IS8601 format, i.e. %Y-%m-%dT%H:%M:%S (optionally with %z at the end; using strftime format; in TR#35 format it is YYYY-MM-ddTHH-mm-ss(xx)).
In your case, dates is actually a list of tuples, with each tuple containing one string element. The , at the end of the date string is how you identify a single element tuple.
The for loop you have needs to work on each element within the tuples, instead of the tuples themselves. Something along the lines of:
for i in dates:
date = i[0]
clean = str(date).replace('\xa0', '')
dates_clean.append(date)
I am not sure this the best solution to your actual problem of manipulating data in the db, but should answer your question.
EDIT: Also, refer the Jan's reply about unicode strings and python 2 vs python 3.

Django query returning too many values

I'm trying to get a list of the names in a table using Django. The field I'm searching for is "name", and I print out my response, which gives the following:
[u"name1", u"name2"]
However, when I send that to a website in javascript, I see that the length is 16, though console.log shows the same result as the python print statements. When I try to iterate over the list that prints as above, I get the integers 0-15 (the loop I am using is
for (var name in names)).
Why is the string representation of this list so much different than the actual representation, and how do I get a representation that matches the print representation if I can't iterate over it or anything?
This is because names is actually a string within your javascript. You need to pass back the json list or convert the stringified json into objects. This second part can be done with JSON.parse(). Unfortunately, your question doesn't show how you're returning the data or how you're handling the data within javascript, so I can't help you any further than this for now.

Categories