I have a .csv file with a column which contains a list of values structured as well:
What I wanna do is create a nested list with all the values in the following format, so I can iterate through them with another method:
[[1009, 1310], [9420, 9699], [11590, 12009], [12290, 12499], [14460, 14809]]
I tried to read it by simple converting the cell to a list:
df = pd.read_csv('example.csv', usecols=['anomaly_sequences'])
a = df.iloc[0]['anomaly_sequences']
print(a[0])
But the output I got is: [
If I check its type with print(a.dtype) I get:
AttributeError: 'str' object has no attribute 'dtype'
How can I read it directly as a list instead of a string?
You can use literal_eval from the standard ast library to take a string and evaluate it as python code.
import ast
df['anomaly_sequences'] = df['anomaly_sequences'].apply(ast.literal_eval)
Related
I have a dataframe as below,
df = pd.DataFrame({'URL_domains':[['wa.me','t.co','goo.gl','fb.com'],['tinyurl.com','bit.ly'],['test.in']]})
Here the column URL_Domains has got 2 observations with a list of Domains.
I would like to know the length of each observations URL domain list as:
df['len_of_url_list'] = df['URL_domains'].map(len)
and output as:
That is fine and no issues with above case and
In my case these list observations are treated string type as below:
When i execute the below code with string type URL domain it has shown the below output:
How to make a datatype conversion from string to list here in pandas ?
Use ast.literal_eval, because eval is bad practice:
import ast
df['len_of_url_list'] = df['URL_domains'].map(ast.literal_eval)
df["URL_domains"] = df["URL_domains"].apply(eval)
I have a column {'duration': 0, 'is_incoming': False}
I want to fetch 0 and Falseout of this. How do I split it using Python (Pandas)?
I tried - data["additional_data"] = data["additional_data"].apply(lambda x :",".join(x.split(":")[:-1]))
I want two columns Duration and Incoming_Time
How do I do this?
You can try converting those string to actual dict:
from ast import literal_eval
Finally:
out=pd.DataFrame(df['additional_data'].astype(str).map(literal_eval).tolist())
Now if you print out you will get your expected output
If needed use join() method:
df=df.join(out)
Now if you print df you will get your expected result
If your column additional_data contains real dict / json, you can directly use the string accessor .str[] to get the dict values by keys, as follows:
data['Duration'] = data['additional_data].str['duration']
data['Incoming_Time'] = = data['additional_data].str['is_incoming']
If your column additional_data contains strings of dict (enclosing dict with a pair of single quotes or double quotes), you need to convert the string to dict first, by:
from ast import literal_eval
data['Duration'] = data['additional_data].map(literal_eval).str['duration']
data['Incoming_Time'] = data['additional_data].map(literal_eval).str['is_incoming']
Please don't flag my answer instantaniously, because I searched several other questions that didn't solve my problem, like this.
I'm trying to generate a python set of strings from a csv file. The printed pandas dataframe of the loaded csv file has the following structure:
0
0 me
1 yes
2 it
For a project I need this to be formatted to look like this
STOPWORDS = {'me', 'yes', 'it'}
I tried to do this by the following code.
import pandas as pd
df_stopwords = pd.read_csv("C:/Users/Jakob/stopwords.csv", encoding = 'iso8859-15', header=-1)
STOPWORDS = {}
for index, row in df_stopwords.iterrows():
STOPWORDS.update(str(row))
print(STOPWORDS)
However, I get this error:
dictionary update sequence element #0 has length 1; 2 is required
When I use the STOPWORDS.update(str(row)) I get the this error:
'dict' object has no attribute 'add'
Thank you all in advance!
You can directly create a set from the values in the dataframe with:
set(df.values.ravel())
{'me', 'yes', 'it'}
A dictionary is a mapping of keys and values. Like an object in many other languages. Since you need it as a set, define it as a set. Don't change it to a set later.
import pandas as pd
df_stopwords = pd.read_csv("C:/Users/Jakob/stopwords.csv", encoding = 'iso8859-15', header=-1)
STOPWORDS = set()
for index, row in df_stopwords.iterrows():
STOPWORDS.add(str(row))
print(STOPWORDS)
It looks like you need to convert the values in your column as a list and then use the list as your stop words.
stopwords = df_stopwords['0'].tolist()
--> ['me', 'yes', 'it']
As mentioned in the accepted answer here. You might wanna use itertuples() since it is faster.
STOPWORDS = set()
for index, row in df_stopwords.itertuples():
STOPWORDS.add(row)
print(STOPWORDS)
I am trying to convert output of data (from Integer to String) from a List generated using Pandas.
I got the output of data from a csv file.
Here is my code that covers expression using Pandas (excluding part where it shows how to come up with generation of object 'InFile' (csv file)).
import pandas as pd
....
with open(InFile) as fp:
skip = next(it.ifilter(
lambda x: x[1].startswith('ID'),
enumerate(fp)
))[0]
dg = pd.read_csv(InFile, usercols=['ID'], skiprows=skip)
dgl = dg['ID'].values.tolist()
Currently, output is a List (example below).
[111111, 2222, 3333333, 444444]
I am trying to match data from other List (which is populated into String or Varchar(data type in MySQL), but somehow, I cannot come up with any match. My previous post -> How to find match from two Lists (from MySQL and csv)
So, I am guessing that the data type from the List generated by Pandas is an Integer.
So, how do I convert the data type from Integer to String?
Which line should I add something like str(10), for an example?
You can use pd.Series.astype:
dgl = dg['ID'].astype(str).values.tolist()
print(dgl)
Output:
['111111', '2222', '3333333', '444444']
I have some data stored in a panda DataFrame and I would like to query my MongoDB with a list constructed from a single series in the DataFrame. When I convert the series with .tolist() or the function list() I apparently get a list, but when I input this list in a Pymongo query I get the error:
bson.errors.InvalidDocument: Cannot encode object: <the first value of the list>
Here is an example that reproduces the error:
So first creating a Mongo database:
from pymongo import MongoClient
import pandas as pd
db = MongoClient().test
db.collection.insert_many([{'key_x':1},{'key_x':2},{'key_x':3}])
Then I query the database for the documents where key_x is in [1,3]:
x_list = [1,3]
for doc in db.collection.find({'key_x':{'$in': x_list}}):
print doc
As expected no error and the two entries {'key_x':2} and {'key_x':3} are printed to the console.
Now I try first transforming the list to a Pandas series and converting back to a list.
ser = pd.Series([1,3])
x_list = ser.tolist()
print type(x_list) #Checking to see if it is indeed a list
> <type 'list'>
for doc in db.collection.find({'key_x':{'$in': x_list}}):
print doc
Then this error message is printed:
>bson.errors.InvalidDocument: Cannot encode object: 1
Thank you very much for any input.
The different behavior arises from the different type of list's elements:
# first example
type(x_list[0])
int
# second example
type(x_list[0])
numpy.int64
One way to fix it is to use ser.values.tolist() instead of ser.tolist(). Apparently, pandas tolist() behaves differently than numpy's.
The problem is that pandas is return a list of np.int64 objects, not built-in int objects.
In [50]: ser = pd.Series([1,3])
In [51]: type(ser.tolist()[0])
Out[51]: numpy.int64
The following
ser = pd.Series([1,3])
x_list = [int(i) for i in ser.tolist()]
for doc in db.test.find({'key_x':{'$in': x_list}}):
print(doc)
works as expected.