Import error: No module named gspread or csv? - python

I am working with a Google spreadsheet, and am trying to use Python 2.7 to convert the spreadsheet data to a CSV file.
When I attempt to run the script I receive:
Import error: No module named gspread.
When I take out the gspread portion, then I receive:
Import error: No module named csv.
Any suggestions would be greatly appreciated. Thank you in advance.
import csv
import gspread
g=gspread.login('skiesgoinggreen#gmail.com', 'e-mail_password')
docid = "0AgNp9UJ4CX93dHl3RW9GRXJDS3kxaXRJMGNqWmhQWVE"
spreadsheet = g.open_by_key(docid)
for i, worksheet in enumerate(spreadsheet.worksheets()):
filename = docid + '-worksheet' + str(i) + '.csv'
writer = csv.writer(open(filename, 'wb'))
writer.writerows(worksheet.get_all_values())

Try the following:
Check you don't have a file named csv.py
Check you have python version >= 2.3
Install gspread with:
pip install gspread

Let's get some clarification here... the "pip install gspread" command is only for linux.
If you are on Windows, you need to have these modules in C:\Python27\Lib\site-packages
I thought CSV module came built-in but I checked and I don't have gspread, you need to download it and place the gspread.py file in the aforementioned directory.

Related

Trying to load Python dataframe into Hadoop (Impala) using `ibis`, getting "AttributeError: module 'ibis' has no attribute 'impala' "

I'm running the following block of Python commands in a Jupyter notebook to upload my dataframe, labeled df, to Impala:
import hdfs
from hdfs.ext.kerberos import KerberosClient
import pandas as pd
import ibis
hdfs = KerberosClient('< URL address >')
client = ibis.impala.connect(host="impala.sys.cigna.com", port=25003, timeout=3600, auth_mechanism="GSSAPI", hdfs_client=hdfs)
db=client.database("< database >")
db.create_table("pythonIBISTest", df)
. . . but am getting the error message AttributeError: module 'ibis' has no attribute 'impala'.
Note: I've already installed the hdfs, ibis, ibis-framework[Kerberos], and impyla modules in the Jupyter terminal.
What am I doing wrong?
You may need to pip install ibis-framework[impala] to get the impala part of ibis

How can I open a .snappy.parquet file in python?

How can I open a .snappy.parquet file in python 3.5? So far, I used this code:
import numpy
import pyarrow
filename = "/Users/T/Desktop/data.snappy.parquet"
df = pyarrow.parquet.read_table(filename).to_pandas()
But, it gives this error:
AttributeError: module 'pyarrow' has no attribute 'compat'
P.S. I installed pyarrow this way:
pip install pyarrow
I have got the same issue and managed to solve it by following the solutio proposed in https://github.com/dask/fastparquet/issues/366 solution.
1) install python-snappy by using conda install (for some reason with pip install, I couldn't download it)
2) Add the snappy_decompress function.
from fastparquet import ParquetFile
import snappy
def snappy_decompress(data, uncompressed_size):
return snappy.decompress(data)
pf = ParquetFile('filename') # filename includes .snappy.parquet extension
dff=pf.to_pandas()
The error AttributeError: module 'pyarrow' has no attribute 'compat' is sadly a bit misleading. To execute the to_pandas() function on a pyarrow.Table instance you need pandas installed. The above error is a sympton of the missing requirement.
pandas is a not a hard requirement of pyarrow as most of its functionality is usable with just Python built-ins and NumPy. Thus users of pyarrow which include pandas can work with it without needing to have pandas pre-installed.
You can use pandas to read snppay.parquet files into a python pandas dataframe.
import pandas as pd
filename = "/Users/T/Desktop/data.snappy.parquet"
df = pd.read_parquet(filename)

Export dataframe in pyspark to excel file given the 'openpyxl' module is not installed

I am trying to write my spark dataframes in an excel file to generate desired reports by changing them in pandas dataframe and then using
panda_df = df.toPandas()
writer = pd.ExcelWriter(filename)
panda_df.to_excel(writer,'Sheet1', startcol = 0, startrow = 0)
this gives an error saying
File "/usr/lib64/python2.6/site-packages/pandas/io/excel.py", line 350, in __init__
from openpyxl.workbook import Workbook
ImportError: No module named openpyxl.workbook
I am running this on a remote server and hence do not have admin rights to use sudo apt-get as it says "Sudo: apt-get: command not found" and I have also tried using pip to no usage as it is not installed either. Is there any other way I can write my dataframes in excel?
You can proceed as follows.
You can clone the library from it's source repository here:
git clone https://bitbucket.org/openpyxl/openpyxl
Go into the openpyxl directory, then run the following to install it for your user without admin permission:
python setup.py install --user
Then, you can add the path to the openpyxl to your code as follows:
import sys
sys.path.append('/path/to/openpyxl/folder')
panda_df = df.toPandas()
writer = pd.ExcelWriter(filename)
panda_df.to_excel(writer,'Sheet1', startcol = 0, startrow = 0)
Alternatively, you can use the Spark2 datasource of the HadoopOffice library (supports also Python). You can read/write Excel files that encrypted, linked to other workbooks, have metadata etc.
Furthermore, it has a low footprint mode, which enables you quickly writing of larger Excel files without requiring large memory amounts or CPUs:
https://github.com/ZuInnoTe/spark-hadoopoffice-ds
The datasource is based on the HadoopOffice library enabling virtually any Hadoop application to read/write Excel files, because it has corresponding Hadoop FileInputFormats and FileOutputFormats:
https://github.com/ZuInnoTe/hadoopoffice

Why do I get a "no such file or directory" error when the file is known to exist?

I have uploaded a package to pypi and github. I have then installed the package and tried to use it. It contains a python script which need to read from a file. I have placed both in the same directory.
pip install pycricket
from pycricket import cricket
c = cricket.Cricket()
c.query()
Query() function involves reading from a file. When I see the 'pycricket' package in library, both script as well as file are in same folder.
query():
with open('matches.csv', 'r') as f:
#code
I don't know why I get the error.
You can inspect the current working directory with:
>>> import os
>>> os.getcwd()
If your data is in a different directory (unclear from the question, but likely given the error message), then change to the directory where the data is stores:
>>> os.chdir(path_to_data_directory)

Python ImportError on web hosting

I'm fairly new to Python, so forgive me if I'm missing something obvious.
I have been using the Topia TermExtract package, and the code I wrote has been working fine on my local machine (Mac OS 10.6.5; Python 2.6). However, when I copy the entire directory, complete with package files, to my GoDaddy hosting, I get this error:
File "test.py", line 2, in ?
from topia.termextract import extract
File "/home/DIRECTORY_HERE/topia/__init__.py", line 1, in ?
import pkg_resources
ImportError: No module named pkg_resources
I'm not sure what I need to do to make this work. Here is the script I wrote:
import sys
from topia.termextract import extract
extractor = extract.TermExtractor()
extractor
extractor.filter = extract.DefaultFilter(singleStrengthMinOccur=1)
# join array into string from command-line arguments.
str = ' '.join(sys.argv)
x = extractor(str)
print "\nExtracted text:\n"
# for each extracted word, print it out.
for i in range(0, len(x)):
if ((x[i][0])[-3:] != ".py"):
print x[i][0]
print "\n"
Thanks!
The pkg_resources package is part of setuptools. Install that on the hosting.
I got it. I had to install VirtualEnv. If anyone has a similar problem, check out this post:
How to install setuptools?

Categories