In my data frame,one of column has string values as array look.I get them and stored in an array.Then array look like,
S=['[18831]', '[12329]', '[4526, 5101, 11276]', '[14388, 14389]']
I want it to be
S= [18831,12329,[4526, 5101, 11276],[14388, 14389]]
as 2d array to access this IDs.How to do this using python
Those lists are in JSON format so you could use the built in JSON parser.
import json
stringArray = "[1,2,3]"
integerArray = json.loads(stringArray) # [1,2,3]
Check out https://docs.python.org/2/library/json.html
Try this:
[eval(a)[0] if len(eval(a)) == 1 else eval(a) for a in S]
Related
I basically first converted a multidimensional array to a string array in order to set the values as my dictionary key, and now I need to convert the string array back to a regular float array. For example, what I have is:
str_array = ['[0.25 0.2916666666666667]', '[0.5833333333333334 0.2916666666666667]',
'[0.5555555555555555 0.3333333333333332]']
And I literally just need it back as a regular array
array = [[0.25 0.2916666666666667], [0.5833333333333334 0.2916666666666667],
[0.5555555555555555 0.3333333333333332]]
I have tried all the following : (*independently)
for i in str_arr:
i.strip("'")
np.array(i)
float(i)
Yet none of them work. They either cannot convert str --> float or they still keep the type as a str. Please help.
Use ast.literal_eval to convert str to another data type
import ast
str_array = ['[0.25 0.2916666666666667]', '[0.5833333333333334 0.2916666666666667]',
'[0.5555555555555555 0.3333333333333332]']
result = [ast.literal_eval(i.replace(" ", ",")) for i in str_array]
print(result) # [[0.25, 0.2916666666666667], [0.5833333333333334, 0.2916666666666667], [0.5555555555555555, 0.3333333333333332]]
You can also use the basic function eval.
[eval(x.replace(" ",",")) for x in str_array]
I'm using an API to gather some data that comes to me in JSON format. I'm using json.loads to import the data and can successfully write it to a CSV. Unfortunately, the data comes in in a format that I don't want so I'd like to reformat the json list.
I've tried creating a new list and assigning the JSON list to the desired list. I get the following error: TypeError: list indices must be integers or slices, not str
import requests
import json
import csv
response = requests.get(url).text //json source
data = json.loads(response)
newsdata = (data["response"]["docs"])
// These two lines reformat the date to what I want it to look like
newsdate = [y["pub_date"] for y in newsdata]
newsdate = [y.split('T')[0] for y in newsdate]
newsdata["pub_date"] = newsdate // This line is what I've tried to replace the json
newssnip = [y["snippet"] for y in newsdata]
newshead = [y["headline"]["main"] for y in newsdata]
for z in newsdata:
csvwriter.writerow([z["pub_date"], //This is the JSON data i want to reformat
z["headline"]["main"],
z["snippet"],
z["web_url"]])
I expected the newsdata["pub_date"] to be overwritten when I assigned newsdate to it but I get the following error instead: TypeError: list indices must be integers or slices, not str
Thank you for your help! :)
EDIT:
I've uploaded an example json response here on github called "exmaple.json": https://github.com/theChef613/nytnewsscrapper
That error is saying that newsdata is list and is therefore not subscriptable with a string. If you post the raw JSON data returned or also print(type(newsdata)) to figure out what class newsdata is and how to work with it. It's also possible that newsdata is a 2D (or N-d) array where the first element is the key and the second element is the value.
It looks like I have a malformed numpy array in Python3.x---this was saved as a list of lists of strings.
foo = [[7.0352220e-01 5.3130367e-06 1.5167372e-05 1.0797821e-06]
[1.3130367e-06 2.4584832e-01 2.2375602e-05 7.3299240e-06] [7.2646574e-06 7.1252006e-06 3.0184277e-01 ... 1.0048618e-05 3.1828706e-06 1.0196264e-06]..]
I get the following error trying to read in this data as np.float32 into a numpy array:
np.asarray(foo, dtype=np.float32)
error:
ValueError: could not convert string to float:[[7.0352220e-01 5.3130367e-06 1.5167372e-05 1.0797821e-06][1.3130367e-06 2.4584832e-01 2.2375602e-05 7.3299240e-06] [7.2646574e-06 7.1252006e-06 3.0184277e-01 ... 1.0048618e-05 3.1828706e-06 1.0196264e-06]..]
I've tried explicitly converting each list element into a float as follows:
try2 = np.asarray(map(np.float32, foo))
but it snags on a bracket:
ValueError: could not convert string to float: [
What is the recommended way to convert a list of lists of strings into a numpy array, type float?
If you replace the spaces with commas, you can use json.loads to read the string as a list, and pass that to np.asarray:
import json
import numpy as np
foo = "[[7.0352220e-01 5.3130367e-06 1.5167372e-05 1.0797821e-06] \
[1.3130367e-06 2.4584832e-01 2.2375602e-05 7.3299240e-06]]"
a = np.asarray(json.loads(foo.replace(" ", ",")), dtype=np.float32)
print(a)
#array([[7.0352220e-01, 5.3130367e-06, 1.5167372e-05, 1.0797821e-06],
# [1.3130367e-06, 2.4584832e-01, 2.2375602e-05, 7.3299240e-06]])
print(a.dtype)
#float32
This assumes there is exactly 1 space between values. If that is not the case, you can use re.sub to replace multiple spaces with a comma:
import re
a = np.asarray(json.loads(re.sub("\s+", ",", foo)))
#array([[7.0352221e-01, 5.3130366e-06, 1.5167372e-05, 1.0797821e-06],
# [1.3130367e-06, 2.4584831e-01, 2.2375601e-05, 7.3299238e-06]],
# dtype=float32)
As far as I have seen, np.asarray() works only if dtype has a different datatype from the initial datatype. Please try and remove that argument and see if it works.
How is your string data shaped? Probably the simplest way is to use split() and iterate over the list. Example (list of lists of strings) that worked for me:
foo = [['7.0352220e-01 5.3130367e-06 1.5167372e-05 1.0797821e-06'],
['7.0352220e-01 5.3130367e-06 1.5167372e-05 1.0797821e-06']]
arr = np.array([[value.split() for value in row][0] for row in foo], dtype='<f8')
(Note: the [0] is used as split creates a list itself. You can use np.reshape in alternative)
EDIT: if its a string representation (not a list of strings as stated in the OP):
foo = '[[7.0352220e-01 5.3130367e-06 1.5167372e-05 1.0797821e-06][7.0352220e-01 5.3130367e-06 1.5167372e-05 1.0797821e-06]'
arr=np.array([line.split() for line in foo.replace('[','').replace(']]','').split(']')], dtype='<f8')
Given:
foo = [['7.0352220e-01 5.3130367e-06 1.5167372e-05 1.0797821e-06'],
['1.3130367e-06 2.4584832e-01 2.2375602e-05 7.3299240e-06'],
['7.2646574e-06 7.1252006e-06 3.0184277e-01 1.0048618e-05']]
Try this to split each string
foo = [row[i].split() for row in foo for i in range(len(foo[0]))]
This for changing type to floats.
foo = [[float(row[i]) for i in range(len(foo[0]))] for row in foo]
print(type(foo[0][1]))
>> float
Then turn it into a numpy array:
foo = np.array(foo)
print(type(foo[0][1]))
>> numpy.float64
I am trying to convert a string to a dictionary with dict function, like this
import json
p = "{'id':'12589456'}"
d = dict(p)
print d['id']
But I get the following error
ValueError: dictionary update sequence element #0 has length 1; 2 is required
Why does it fail? How can I fix this?
What you have is a string, but dict function can only iterate over tuples (key-value pairs) to construct a dictionary. See the examples given in the dict's documentation.
In this particular case, you can use ast.literal_eval to convert the string to the corresponding dict object, like this
>>> p = "{'id':'12589456'}"
>>> from ast import literal_eval
>>> d = literal_eval(p)
>>> d['id']
'12589456'
Since p is a string containing JSON (ish), you have to load it first to get back a Python dictionary. Then you can access items within it:
p = '{"id":"12589456"}'
d = json.loads(p)
print d["id"]
However, note that the value in p is not actually JSON; JSON demands (and the Python json module enforces) that strings are quoted with double-quotes, not single quotes. I've updated it in my example here, but depending on where you got your example from, you might have more to do.
import json
array = '{"fruits": ["apple", "banana", "orange"]}'
data = json.loads(array)
That is my JSON array, but I would want to convert all the values in the fruits string to a Python list. What would be the correct way of doing this?
import json
array = '{"fruits": ["apple", "banana", "orange"]}'
data = json.loads(array)
print data['fruits']
# the print displays:
# [u'apple', u'banana', u'orange']
You had everything you needed. data will be a dict, and data['fruits'] will be a list
Tested on Ideone.
import json
array = '{"fruits": ["apple", "banana", "orange"]}'
data = json.loads(array)
fruits_list = data['fruits']
print fruits_list
data will return you a string representation of a list, but it is actually still a string. Just check the type of data with type(data). That means if you try using indexing on this string representation of a list as such data['fruits'][0], it will return you "[" as it is the first character of data['fruits']
You can do json.loads(data['fruits']) to convert it back to a Python list so that you can interact with regular list indexing. There are 2 other ways you can convert it back to a Python list suggested here