Turning this list off a website into a python list

Turning this list off a website into a python list - python

https://github.com/danielmiessler/SecLists/blob/master/Passwords/10k_most_common.txt
What would be the most efficient way of turning this list into a python list, I only really need the top 100.
Thanks, leon.
FYI, I have looked around the website and have found many thing relating to multiline lists but nothing has helped me.

yourlist = open("10k_most_common.txt", "r").read().split("\n")[:100]
open() : opens the file; takes two arguments, the file name, and what you want to do with it, in this case you want to r read it
read() : so now you have the file object, but you need what's written inside of it; with the function read() you get all the content of the file as a string
split() : it divides the string by the argument, in this case \n (the newlines), so the string becomes a list
[:100] : With this, you can specificates what part of the list you want. So [:100] is the same as [0:100:1], that means: start from index zero, arrive to index 100, with an intervall between element of one (so every element from 0 to 100)
If it is too hard to you to understand you can use this longer form:
file = open("10k_most_common.txt", "r")
string = file.read()
yourlist = string.split("\n")[:100]

Related

What is the purpose of readline().strip()?

What is the purpose of readline().strip() (especially in the below code)?
Context:
I was taking a look at the following code:
op = open('encyin.txt', 'r')
n, q = op.readline().split()
n = int(n)
q = int(q)
dic = {}
for i in range(1, n + 1):
dic[str(i)]=(op.readline().strip())
And trying to interpret it.
My Interpretation:
The start is simple enough - it opens a file encyin.txt in read mode. It takes input - n & p - from the line, the .split() separating the two inputs. They are then classified as integers, and an empty list dict is created?
From there, a for loop is utilised.
But what does the last line mean? I am not familiar with (a) readline().strip() and (b) how this affects list dict and the values of the input:
For Example
If ency.txt was the following:
6 5
1151
723
1321
815
780
931
What happens to the other numbers from the 2nd line downwards? Does the readline().split assign them a line number? Does it add it to the list dict, a bit like .append?
What does the last line mean of the top code do? I am not familiar with (a) readline().strip() and (b) how this affects list dict and the values of the input:

In your text file, you have these things called whitespace characters. Often, these are spaces or enters ('\n') that you want to get rid of. The strip() helps you remove these whitespace characters.
If you were to print the numbers after reading them and without stripping, you would get:
number1
number2
number3
...
Because you haven't removed the hidden 'enter' character.

When reading a python script and you come across some function that you don't know, your goal should first to be understand the function out of context, and then you can figure out what they are doing in context.
The first port of call for understanding builtin/standard library functions (as opposed to functions from some extra library) should be the python docs. When the docs fail you, move on to other sources (there are plenty).
In this case, you want to know what op.readline() does. Well, what is op? I would go to open, and see that it creates a file object, which tells you that the actual implementation used is in io. Here we can search the page for readline.
What do the docs have to say about readline?
Read and return one line from the stream.
Here, I would assume, since it's a text file, "a line from the stream" is a string object (but you could always open a python interpreter to check), and look up string.strip(), which says:
Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace.
Now put them together. They call (op.readline().strip()).
We know op is a "file object" using io
io's readline reads a single line from the stream
some_string.strip() called without parameters removes all whitespace from the start and end of some_string
Although python uses duck-typing, objects still have types/behaviours and understanding code often involves knowing what kind of object you are dealing with at any point so you can look into how it should work.
For example, if you know something is a dictionary, but you don't know what a dictionary is, you should search the docs for some info and try to understand what it does out of context first.

op = open('encyin.txt', 'r')
n, q = op.readline().split()
n = int(n)
q = int(q)
dic = {}
for i in range(1, n + 1):
# Here you're creating a key-value pair using the str value of the loop variable
# i as the dictionary key i.e. key dic[str(i)] creates the key, and the value is
# op.readline().strip(). strip() is a str method that removes trailing characters.
# the default is to remove whitespace at the beginning and ends of the string.
# These spaces get trimmed off if the method is called
dic[str(i)]=(op.readline().strip())
https://docs.python.org/3/library/stdtypes.html?highlight=str#str.strip

readline() returning a single line as string from your file.
ex: for the given txt file info:
Danni Loss
Shani Amari
Michele favarotti
readline() will return the first line:
Danni Loss\n
then there is a use of strip() removes all empty chars from the start and end of the string, so you will get:
Danni Loss

.readline() reads a line from a file. The result includes a trailing '\n'.
.strip() removes all leading & trailing whitespace (e.g. the above-mentioned '\n') from a string.
Thus, the last line of code dic[str(i)]=(op.readline().strip()) does the following:
Reads line from the open file
Strips whitespace from the line
Stores the stripped line in the dictionary using the index (converted to string) as a key

unable to convert python type str into a list array

i'm new to python, and i am developing a tool/script for ssh and sftp. i noticed some of the code i'm using creates what i thought was a string array
channel_data = str()
to hold the console output from an ssh session. if i check "type" on channel_data it comes back as class 'str' ,
but yet if i perform for loop to read each item in channel_data , and channel_data contains what appears to be 30 lines from an ssh console
for line in channel_data:
if "my text" in line:
found = True
each iteration of "line" shows a single character, as if the whole ssh console output of 30 lines of text is broken down into single character array. i do have \n within all the text.
for example channel_data would contain "Cisco Nexus Operation System (NX-OS) Software\r\nCopyright (c) 2002-2016\r\n ..... etc. etc.. ", but again would read in my for loop and print out "C" then "i" then "s" etc..
i'm trying to understand do i have a char array here or a string array here that is made up of single string characters and how to convert it into a string list based on \n within Python?

You can iterate a string just like a list in Python. So, yes, as expected, your string type channel_data will in fact give you every character.
Python does not have a char array. You will have a list of strings, even as a single character as each item in the list:
>>> type(['a', 'b'])
<type 'list'>
Also, just for the sake of adding some extra information for your own knowledge when it comes to usage of terminology, there is a difference between array and list in Python: Python List vs. Array - when to use?
So, what you are actually looking to do here is take the channel_data string and make it a list by calling the split method on it.
The split method will, by default, split on white space characters only. Check the documentation. So, you will want to make sure what you want to actually split on and provide that detail to the method.
You can take a look at splitlines to see if that works for you.
As specified in the documentation for splitlines:
Line breaks are not included in the resulting list unless keepends is
given and true.
Your result will then be a list of strings as you expect. So, as an example you can do:
your_new_list_of_str = channel_data.split('\n')
or
your_new_list_of_str = channel_data.splitlines()

string_list = channel_data.splitlines()
See docs at https://docs.python.org/3.6/library/stdtypes.html#str.splitlines

Why does ['Mr David',23,'City'] become [[\'Mr David\',23,\'City\'] in Python?

My problem is this: ['Mr David',23,'City'] becomes [\'Mr David\',23,\'City\']. Can you please suggest how to fix it?
here is some code if you wanna see..
path = r'D:/ListFile'
rdata = open(path,'rb')
ListNow = []
for ch in rdata:
ListNow.append(ch)
print ListNow
What I am trying to do: I read it from a file and try to rewrite it to memory (because I dont know how to work with list stored in files which are on disk?)

I think you have a file D:\listfile with lines containing python lists such as
['Mr David',23,'City']
which you want to read into a list of lists. However as you loop over the file, each line is read to a string. So ch is a string, but you were expecting it to be a list.
If you trust the contents of the list file to contain safe expressions you can get python to evaluate the strings
ListNow.append(eval(ch))
This is dangerous if the listfile isn't trusted (for example if it contains data collected from a website) because malicious code in the listfile would be run. In that case you would have to analyse the string, starting with ch.split(',')

When you print a list in Python, it uses repr to reproduce the value, which will represent each list item as a Python string literal. If you want to print each value separately, you can do so:
for item in ListNow:
print item
(arshajii’s comment below might explain that more clearly, by the way. Thanks!)

output list of special letters string with python

I have 2 lists of strings, they are constructed from database content but I have difficulty in reading it into a txt file
list_a={u'XYZ-123-456',u'XYZ-123-456',u'XYZ-789-456',....}
list_b={u'Tom \n is creative', u'Sky is blue',....}
What I need to do is get rid of 'u' at the beginning of each string in both lists, getting rid of \n within each string in list_b where applicable.
then write each element in list_b to a text file, one by a line, into the txt file, whose title is the element in list_a. Strings with the same title should be in the same txt file. Anyone can help ?
I think I can use pop function to get rid of u, but how about the '\n' within the string?
I am also not 100% sure about the I/O function. Anyone can help me out with the sample script?
Thanks a lot, a newbie of python

Welcome so SO.
list_a is actually a set. The {} indicates set, which is a list of unique items - so you may not want this type if you expect multiple values in the 'list'. If you want an actual list then encapsulate the 'list' with [...] instead of {...}
The 'u' in the set indicates the text is unicode. if you want this removed:
list_a = map(str, list_a)
\n is a control character indicating a new line. This will be represented as a new line when printed in the console. If you really don't want new lines then do:
list_b= { item.replace('\n','') for item in list_b }
I assume you want each item on a new line in the text file. If so:
with open('c:\file.txt', 'w') as file:
file.write('\n'.join(list_a))
file.write('\n'.join(list_b))
for non duplicates merge the 2 lists into a set. As they are currently already sets this will work:
with open('c:\file.txt', 'w') as file:
file.write('\n'.join(list_a + list_b))

python print string on multiple lines

I have a function that can only accept strings. (it creates the image with the string, but the string has little formatting and no word wrapping, so a long string will just bleed right through the edge of the image and keep going into the abyss, when in reality I would have liked it to create a paragraph, instead of a one line infinity).
I need it print with line breaks. Currently the file is being readin using
inputFiles.readlines()
so that this reads the entire file. Storing file.readLines() creates a list. So this list cannot be passed to my function looking for a string.
I used
inputFileContent = ' \n'.join(inputFiles.readLines())
in an attempt to force hard line breaks into the string between each list item. This does not work (edit: elaboration here) which means that the inputFileContent string does not have line breaks even though I put '\n' between the list elements. From my understanding, the readLines() function puts the individual lines into individual elements of a list.
any suggestions? Thank you

Use inputFiles.read() which creates a string. Does that help?

The 'join' should have worked. Your problem may be that the writing of the string ignores newline characters. You could maybe try '\r\n'.join(...)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Turning this list off a website into a python list - python

Related

What is the purpose of readline().strip()?

unable to convert python type str into a list array

Why does ['Mr David',23,'City'] become [[\'Mr David\',23,\'City\'] in Python?

output list of special letters string with python

python print string on multiple lines

Categories

Resources