Python for items list, to indivdual string? - python

Just wondering if someone can be kind enough to tell me what I am doing wrong.
working on a bit of code, with use with my cherrypy project
import glob
import os
def get_html(apath):
print (apath)
fhtml = open(apath,'r')
data = fhtml.read()
return data
article_path = os.getcwd() +'/articles/*.html'
myArticles = [glob.glob(article_path)]
print len(myArticles)
for items in myArticles:
get_html(items)
Results in:
['/home/sd/Documents/projects/cherryweb/articles/Module 5-Exploitation Techniques And Exercise.pdf.html']
Traceback (most recent call last):
File "fntest.py", line 22, in <module>
get_html(items)
File "fntest.py", line 10, in get_html
fhtml = open(apath,'r')
TypeError: coercing to Unicode: need string or buffer, list found
I assume its because the filename I'm passing has the [' and '] from list on the string,
I could write a function to trim these parts off, but it that the only option, or am I doing something stupid.
Thanks

myArticles = [glob.glob(article_path)]
should be:
myArticles = glob.glob(article_path)
glob.glob returns a list, by adding [] around it you made it a list of list.
So, in this loop:
for items in myArticles:
get_html(items)
you actually passed the whole list to get_html and open raised that error.
Demo:
>>> open([])
Traceback (most recent call last):
File "<ipython-input-242-013dc85bc958>", line 1, in <module>
open([])
TypeError: coercing to Unicode: need string or buffer, list found

Related

Get value from a dictionary into a JSON file

I need to get all the bodyHtml and authorId values from the file that appears here: https://drive.google.com/file/d/10EGOAWsw3G5-ETUryYX7__JPOfNwUsL6/view?usp=sharing
I have tried several ways, but I always find the error of: TypeError: list indices must be integers, not str
I've tried several ways, this is my last code:
# -*- coding: utf-8 -*-
import json
import requests
import datetime
data = json.loads(open('file.json').read())
coments = data['headDocument']['content']['id']
for comment in data['headDocument']['content']['content']['bodyHtml']:
info = comment
print(info)
and get this error:
Traceback (most recent call last):
File "coments.py", line 16, in <module>
for comment in data['headDocument']['content']['content']['bodyHtml']:
TypeError: list indices must be integers, not str
Can anyone help with this problem?
Your headDocument['content'] is a list, so you should loop through it. Like this:
for item in data['headDocument']['content']:
print(item['content']['bodyHtml'])

How to read every log line to match a regex pattern using spark?

The following program throws an error
from pyparsing import Regex, re
from pyspark import SparkContext
sc = SparkContext("local","hospital")
LOG_PATTERN ='(?P<Case_ID>[^ ;]+);(?P<Event_ID>[^ ;]+);(?P<Date_Time>[^ ;]+);(?P<Activity>[^;]+);(?P<Resource>[^ ;]+);(?P<Costs>[^ ;]+)'
logLine=sc.textFile("C:\TestLogs\Hospital.log").cache()
#logLine='1;35654423;30-12-2010:11.02;register request;Pete;50'
for line in logLine.readlines():
match = re.search(LOG_PATTERN,logLine)
Case_ID = match.group(1)
Event_ID = match.group(2)
Date_Time = match.group(3)
Activity = match.group(4)
Resource = match.group(5)
Costs = match.group(6)
print Case_ID
print Event_ID
print Date_Time
print Activity
print Resource
print Costs
Error:
Traceback (most recent call last): File
"C:/Spark/spark-1.6.1-bin-hadoop2.4/bin/hospital2.py", line 7, in
for line in logLine.readlines(): AttributeError: 'RDD' object has no attribute 'readlines'
If i add the open function to read the file then i get the following error:
Traceback (most recent call last): File
"C:/Spark/spark-1.6.1-bin-hadoop2.4/bin/hospital2.py", line 7, in
f = open(logLine,"r") TypeError: coercing to Unicode: need string or buffer, RDD found
Can't seem to figure out how to read line by line and extract words that match the pattern.
Also if i pass only a single logline logLine='1;35654423;30-12-2010:11.02;register request;Pete;50' it works. I'm new to spark and know only basics in python. Please help.
You are mixing things up.
The line
logLine=sc.textFile("C:\TestLogs\Hospital.log")
creates an RDD, and RDDs do not have a readlines() method.
See the RDD API here:
http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD
You can use collect() to retrieve the content of the RDD line by line.
readlines() is part of the standard Python file API, but you do not usually need it when working with files in Spark.
You simply load the file with textFile() and then process it with RDD API, see the link above.
As answered by Matei, readlines() is Python API and sc.textFile will create an RDD, so the error that RDD has no attributes readlines().
If you have to process file using Spark APIs, you can use filter API on RDD created for pattern and then you can split the output based on delimiter.
An example as below:
logLine = sc.textFile("C:\TestLogs\Hospital.log")
logLine_Filtered = logLine.filter(lambda x: "LOG_PATTERN" in x)
logLine_output = logLine_Filtered(lambda a: a.split("<delimiter>")[0], a.split("<delimiter>")[1].....).collect()
logLine_output.first()
Dataframe would be even better

Regexpr in python

for printJobString in logfile:
userRegex = re.search('(\suser:\s)(.+?)(\sprinter:\s)', printJobString)
if userRegex:
userString = userRegex.group(2)
pagesInt = int(re.search('(\spages:\s)(.+?)(\scode:\s)', printJobString).group(2))
above is my code, when I run this program in the module I end up getting,
Traceback (most recent call last):
File "C:\Users\brandon\Desktop\project3\project3\pages.py", line 45, in <module>
log2hist("log") # version 2.
File "C:\Users\brandon\Desktop\project3\project3\pages.py", line 29, in log2hist
pagesInt = int(re.search('(\spages:\s)(.+?)(\scode:\s)', printJobString).group(2))
AttributeError: 'NoneType' object has no attribute 'group'
I know this error means the search is returning None but I'm not sure how to handle this case. Any help would be appreciated, very new to python and still learning the basics.
I am writing a program that should print out the number of pages a user has.
180.186.109.129 code: k n h user: luis printer: core 2 pages: 32
is a target string, my python file is trying to create a data file that has one line for each user and contains the total number of pages printed
The reason it happens is because your regexp does not find anything and returns None
re.search('(\spages:\s)(.+?)(\scode:\s)') returns None
use an if statement to test if it's not None before you try to group
for printJobString in logfile:
userRegex = re.search('(\suser:\s)(.+?)(\sprinter:\s)', printJobString)
if userRegex:
userString = userRegex.group(2)
pagesInt = re.search('(\spages:\s)(.+?)(\scode:\s)', printJobString)
if pagesInt:
pagesInt = int(pageInts.group(2))

Python3 and hmac . How to handle string not being binary

I had a script in Python2 that was working great.
def _generate_signature(data):
return hmac.new('key', data, hashlib.sha256).hexdigest()
Where data was the output of json.dumps.
Now, if I try to run the same kind of code in Python 3, I get the following:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/hmac.py", line 144, in new
return HMAC(key, msg, digestmod)
File "/usr/lib/python3.4/hmac.py", line 42, in __init__
raise TypeError("key: expected bytes or bytearray, but got %r" %type(key).__name__)
TypeError: key: expected bytes or bytearray, but got 'str'
If I try something like transforming the key to bytes like so:
bytes('key')
I get
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string argument without an encoding
I'm still struggling to understand the encodings in Python 3.
You can use bytes literal: b'key'
def _generate_signature(data):
return hmac.new(b'key', data, hashlib.sha256).hexdigest()
In addition to that, make sure data is also bytes. For example, if it is read from file, you need to use binary mode (rb) when opening the file.
Not to resurrect an old question but I did want to add something I feel is missing from this answer, to which I had trouble finding an appropriate explanation/example of anywhere else:
Aquiles Carattino was pretty close with his attempt at converting the string to bytes, but was missing the second argument, the encoding of the string to be converted to bytes.
If someone would like to convert a string to bytes through some other means than static assignment (such as reading from a config file or a DB), the following should work:
(Python 3+ only, not compatible with Python 2)
import hmac, hashlib
def _generate_signature(data):
key = 'key' # Defined as a simple string.
key_bytes= bytes(key , 'latin-1') # Commonly 'latin-1' or 'ascii'
data_bytes = bytes(data, 'latin-1') # Assumes `data` is also an ascii string.
return hmac.new(key_bytes, data_bytes , hashlib.sha256).hexdigest()
print(
_generate_signature('this is my string of data')
)
try
codecs.encode()
which can be used both in python2.7.12 and 3.5.2
import hashlib
import codecs
import hmac
a = "aaaaaaa"
b = "bbbbbbb"
hmac.new(codecs.encode(a), msg=codecs.encode(b), digestmod=hashlib.sha256).hexdigest()
for python3 this is how i solved it.
import codecs
import hmac
def _generate_signature(data):
return hmac.new(codecs.encode(key), codecs.encode(data), codecs.encode(hashlib.sha256)).hexdigest()

Python: TypeError: 'float' object is not callable

I am trying to join 2 strings using this code:
def __get_temp(self):
return float(self.ask('RS'))
def __set_temp(self, temp):
set = ('SS' + repr(temp))
stat = self.ask(set)
return self.check(stat)
temp = property(__get_temp, __set_temp)
Once together, I then send a signal over a serial bus using PyVisa. However, when I try to call the function, I get
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
chil.temp(13)
TypeError: 'float' object is not callable
I've tried looking around for explanation of this error, but none of them make any sense. Anyone know what is going on?
It looks like you are trying to set the property temp, but what you're actually doing is getting the property and then trying to call it as function with the parameter 13. The syntax for setting is:
chil.temp = 13

Categories