Exact number of characters in a JSON - python

I have a file with several JSON objects in each line. And, I need to know the length of each object, I mean the each number of characters. But when I run the code below, it provides a number of characters less than expected.
jsonFile = open(File, 'r')
line = jsonFile.readline()
len(line)
It considers some elements like "/n" as a one character but I want it to say there are two. Do you have any idea please ?

print os.path.getsize('myfile.json')
http://devdocs.io/python/library/os.path#os.path.getsize

Related

Amend list from file - Correct syntax and file format?

I currently have a list hard coded into my python code. As it keeps expanding, I wanted to make it more dynamic by reading the list from a file. I have read through many articles about how to do this, but in practice I can't get this working. So firstly, here is an example of the existing hardcoded list:
serverlist = []
serverlist.append(("abc.com", "abc"))
serverlist.append(("def.com", "def"))
serverlist.append(("hji.com", "hji"))
When I enter the command 'print serverlist' the output is shown below and my list works perfectly when I access it:
[('abc.com', 'abc'), ('def.com', 'def'), ('hji.com', 'hji')]
Now I've replaced the above code with the following:
serverlist = []
with open('/server.list', 'r') as f:
serverlist = [line.rstrip('\n') for line in f]
With the contents of server.list being:
'abc.com', 'abc'
'def.com', 'def'
'hji.com', 'hji'
When I now enter the command print serverlist, the output is shown below:
["'abc.com', 'abc'", "'def.com', 'def'", "'hji.com', 'hji'"]
And the list is not working correctly. So what exactly am I doing wrong? Am I reading the file incorrectly or am I formatting the file incorrectly? Or something else?
The contents of the file are not interpreted as Python code. When you read a line in f, it is a string; and the quotation marks, commas etc. in your file are just those characters as parts of a string.
If you want to create some other data structure from the string, you need to parse it. The program has no way to know that you want to turn the string "'abc.com', 'abc'" into the tuple ('abc.com', 'abc'), unless you instruct it to.
This is the point where the question becomes "too broad".
If you are in control of the file contents, then you can simplify the data format to make this more straightforward. For example, if you just have abc.com abc on the line of the file, so that your string ends up as 'abc.com abc', you can then just .split() that; this assumes that you don't need to represent whitespace inside either of the two items. You could instead split on another character (like the comma, in your case) if necessary (.split(',')). If you need a general-purpose hammer, you might want to look into JSON. There is also ast.literal_eval which can be used to treat text as simple Python literal expressions - in this case, you would need the lines of the file to include the enclosing parentheses as well.
If you are willing to let go of the quotes in your file and rewrite it as
abc.com, abc
def.com, def
hji.com, hji
the code to load can be reduced to a one liner using the fact that files are iterables
with open('servers.list') as f:
servers = [tuple(line.split(', ')) for line in f]
Remember that using a file as an iterator already strips off the newlines.
You can allow arbitrary whitespace by doing something like
servers = [tuple(word.strip() for word in line.split(',')) for line in f]
It might be easier to use something like regex to parse the original format. You could use an expression that captures the parts of the line you care about and matches but discards the rest:
import re
pattern = re.compile('\'(.+)\',\\s*\'(.+)\'')
You could then extract the names from the matched groups
with open('servers.list') as f:
servers = [pattern.fullmatch(line).groups() for line in f]
This is just a trivialized example. You can make it as complicated as you wish for your real file format.
Try this:
serverlist = []
with open('/server.list', 'r') as f:
for line in f:
serverlist.append(tuple(line.rstrip('\n').split(',')))
Explanation
You want an explicit for loop so you cycle through each line as expected.
You need list.append for each line to append to your list.
You need to use split(',') in order to split by commas.
Convert to tuple as this is your desired output.
List comprehension method
The for loop can be condensed as below:
with open('/server.list', 'r') as f:
serverlist = [tuple(line.rstrip('\n').split(',')) for line in f]

Python: Read in Data from File

I have to read data from a text file from the command line. It is not too difficult to read in each line, but I need a way to separate each part of the line.
The file contains the following in order for several hundred lines:
String (Sometimes more than 1 word)
Integer
String (Sometimes more than 1 word)
Integer
So for example the input could have:
Hello 5 Sample String 10
The current implementation I have for reading in each line is as follows... how can I modify it to separate it into what I want? I have tried splitting the line, but I always end up getting only one character of the first string this way with no integers or any part of the second string.
with open(sys.argv[1],"r") as f:
for line in f:
print(line)
The desired output would be:
Hello
5
Sample String
10
and so on for each line in the file. There could be thousands of lines in the file. I just need to separate each part so I can work with them separately.
The program can't magically split lines the way you want. You will need to read in one line at a time and parse it yourself based on the format.
Since there are two integers and an indeterminate number of (what I assume are) space-delimited words, you may be able to use a regular expression to find the integers then use them as delimiters to split up the line.

Python 2 - How do I import a text file containing a long sequence of digits and convert it to a string of individual numbers?

I want to take a text file that is a long sequence of digits with line breaks,
i.e. something like
38482406847387
85869153438194
96531040384827
43157689643163
but much larger, and convert it to a string that would just read
[3,8,4,8,...,1,6,3]
so that I can iterate over it, manipulate it, visualise it and so on.
I have had a look at the open() function but so far I can only get it to break up the file into separate lines. I know I can use a for loop to go through the giant string of the whole document and form a list that way, but then I get '/n' and spaces showing up everywhere, which is undesirable.
For context, I grabbed a text file from the web of some preposterous number of digits of pi, and I thought it would be instructive and interesting to go through it and look for patterns, plot the distribution of digits, convert to ASCII and other such nonsense. I figured it would be a fun way for me to learn a bit more about Python.
import re
print re.findall('\d', open('file.txt', 'r').read())
with open("/path/to/file") as f:
print [int(x) for x in f.read() if x.isdigit()]
This is shorter.
If you get all the numbers as a string then you can use
# here, digits is the numbers as a string including the \n character
list = [digit for digit in digits.replace('\n', '')]
For small files:
with open('path/to/file') as infile:
answer = list(int(i) for i in ''.join(line.strip() for line in infile))
For larger files:
answer = []
with open('path/to/file') as infile:
for line in infile:
answer.extend([int(i) for i in line.strip()])

Removing selected characters from text file

I have long a text file where each line looks something like /MM0001 (Table(12,)) or /MM0015 (Table(11,)). I want to keep only the four-digit number next to /MM. If it weren't for the "table(12,)" part I could just strip all the non-numeric characters, but I don't know how to extract the four-digit numbers only. Any advice on getting started?
If it's exactly that format, you could just print out line[3:7]
You could parse text line by line and then use 4th to 7th char of every line.
ln[3:7]
import re
R=re.compile(r'/MM(\d+)')
for line in file:
L=R.match(line)
if L:
print L.group(1)
or, more succinctly...
lines=[R.match(line).group(1) for line in file] #works if the lines are guaranteed to start with \MM
This should give you only the integers following a /MM and should work no matter how long the strings of integers are. If they're guaranteed to be a certain length, then you're better off with one of the other examples (which don't use regex).
if each line starts with /MM then just go through the file and print out line[3:7] e.g.
for line in file:
print line[3:7]

python read output

Write a program that outputs the first number within a file specified by the user. It should behave like:
Enter a file name: l11-1.txt
The first number is 20.
You will need to use the file object method .read(1) to read 1 character at a time, and a string object method to check if it is a number. If there is no number, the expected behaviour is:
Enter a file name: l11-2.txt
There is no number in l11-2.txt.
Why is reading 1 character at a time a better algorithm than calling .read() once and then processing the resulting string using a loop?
I have the files and it does correspond to the answers above but im not sure how to make it output properly.
The code i have so far is below:
filenm = raw_input("Enter a file name: ")
datain=file(filenm,"r")
try:
c=datain.read(1)
result = []
while int(c) >= 0:
result.append(c)
c = datain.read(1)
except:
pass
if len(result) > 0:
print "The first number is",(" ".join(result))+" . "
else:
print "There is no number in" , filenm + "."
so far this opens the file and reads it but the output is always no number even if there is one. Can anyone help me ?
OK, you've been given some instructions:
read a string input from the user
open the file given by that string
.read(1) a character at a time until you get the first number or EOF
print the number
You've got the first and second parts here (although you should use open instead of file to open a file), what next? The first thing to do is to work out your algorithm: what do you want the computer to do?
Your last line starts looping over the lines in the file, which sounds like not what your teacher wants -- they want you to read a single character. File objects have a .read() method that lets you specify how many bytes to read, so:
c = datain.read(1)
will read a single character into a string. You can then call .isdigit() on that to determine if it's a digit or not:
c.isdigit()
It sounds like you're supposed to keep reading a digit until you run out, and then concatenate them all together; if the first thing you read isn't a digit (c.isdigit() is False) you should just error out
Your datain variable is a file object. Use its .read(1) method to read 1 character at a time. Take a look at the string methods and find one that will tell you if a string is a number.
Why is reading 1 character at a time a better algorithm than calling .read() once and then processing the resulting string using a loop?
Define "better".
In this case, it's "better" because it makes you think.
In some cases, it's "better" because it can save reading an entire line when reading the first few bytes is enough.
In some cases, it's "better" because the entire line may not be sitting around in the input buffer.
You could use regex like (searching for an integer or a float):
import re
with open(filename, 'r') as fd:
match = re.match('([-]?\d+(\.\d+|))', fd.read())
if match:
print 'My first number is', match.groups()[0]
This with with anything like: "Hello 111." => will output 111.

Categories