Hi I am trying to read data from MongoDB and trying to convert it into data frame but getting an error that says "("Could not convert ObjectId('620d3f43ae93743dbb6f6846') with type ObjectId: did not recognize Python value type when inferring an Arrow data type", 'Conversion failed for column _id with type object')".
my code looks like this:
import streamlit as st
import pymongo
import streamlit as st
from pymongo import MongoClient
import pandas as pd
import json
import string
import io
import re
import time
import csv
from pandas import Timestamp
import certifi
import datetime
from pandas.io.json import json_normalize
client=pymongo.MongoClient()
connection = MongoClient("mongodb+srv://*****:****#cluster0.t4iwt.mongodb.net/testdb?retryWrites=true&w=majority",tlsCAFile=certifi.where())
db=connection["testdb"]
collection=db["test"]
cursor = collection.find()
entries=list(cursor)
entries[:]
df=pd.DataFrame(entries)
st.write(df)
Change type of _id to str.
df = pd.DataFrame(entries)
df = df.astype({"_id": str})
st.write(df)
I'm trying to import the mongodb collection data in a pandas dataframe. When the database name is simple like 'admin', it's able to load in the dataframe. However when I try with one of my required databases named asdev-Admin (line 5), I get an empty dataframe. Apparently the error's somewhere related to the special character in the db name, but I don't know how to get around it. How do I resolve this??
import pymongo
import pandas as pd
from pymongo import MongoClient
client = MongoClient()
db = client.asdev-Admin
collection = db.system.groups
data = pd.DataFrame(list(collection.find()))
print(data)
The error states: NameError: name 'Admin' is not defined
You can change db = client.asdev-Admin to db = client['asdev-Admin'].
I have a list of UUIDs and I'm trying to extract datetime from the code below I found online but it shows error.
Is there any other way or any changes in this code to extract it?
import uuid
import time_uuid
my_uuid = uuid.UUID('{7a3febe0-3fcc-3a28-bdab-c29438888d12}')
ts = time_uuid.TimeUUID(bytes=my_uuid.bytes).get_timestamp()
Error: No module named time_uuid
Expected: Datetime
I am running this query in python. I do get the error "Invalid string literal" in the regular expression part. I know this regex is, not sure what syntax is missing here. Any help would be appreciated.
import pandas as pd
import numpy as np
import datetime
from time import gmtime, strftime
import smtplib
import sys
from IPython.core.display import HTML
PROJECT = 'server'
queryString = '''
SELECT
mn as mName,
dt as DateTime,
ip as LocalIPAddress,
REGEXP_EXTRACT(path, 'ActName:([\s\S\w\W]*?)ActDomain:') AS ActName,
FROM Agent.Logs
'''
You're missing an r
REGEXP_EXTRACT(path, r'ActName:([\s\S\w\W]*?)ActDomain:') AS ActName
I am developing a project which has about a dozen different files. At the top of each file I have almost the identical lines which import the same libraries and initializes a connection to my DB:
import re
import urllib2
import datetime
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.sql import *
from sqlalchemy.orm.collections import *
from table_def import Team, Player, Box_Score, Game, Name_Mapper
from datetime import timedelta
from bs4 import BeautifulSoup as bs
from datetime import date, datetime, timedelta
import numpy as np
import argparse
engine = create_engine('sqlite:///ncaaf.db', echo=False)
md = MetaData(bind=engine)
Session = sessionmaker(bind=engine)
s = Session()
teams_table = Table("teams", md, autoload=True)
games_table = Table("games", md, autoload=True)
box_scores_table = Table("box_scores", md, autoload=True)
players_table = Table("players", md, autoload=True)
names_table = Table("names", md, autoload=True)
Can I make a module to import all these modules and to initialize this DB connection? Is that standard? Or dumb for some reason I am not realizing?
When you import something into your module, it becomes available as if it was declared in your module itself. So, you can do what you want like this:
In common_imports.py:
from datetime import date, datetime, timedelta
import numpy as np
import argparse
...
In main_module.py:
from common_import import *
a = np.array([]) # works fine
However this is not recommended since Explicit is better than implicit. E.g. if you do this, someone else (or even you from the future) won't understand where all these imported modules come from. Instead, try to either organize your imports better, or decompose your module into several ones. For example, in your import list I see argparse, SQL stuff and numpy, and I can't imaging single module that may need all these unrelated libraries.
If you create a package, you can import them in the __init__.py file, although I would suggest leaving them where they are to increase code-readability.