I have this txt and I need to parse it in a Python script.
00:08:e3:ff:fd:90 - 172.21.152.1
70:70:8b:85:67:80 - 172.21.155.4
I want to separate and store in an array only the MAC address. How can I do this?
You can use the built-in function open to read the file, giving the path to the file and passing the "r" argument to tell that you want to read the file. Then use the readlines function from the file handler which returns a list of lines. For each line you can split the text on the dash character. The mac address will be the first element in the list given by the split function.
with open("file.txt", "r") as f :
macs = [line.split(" - ")[0] for line in f.readlines()]
You could achieve this with pandas as well
import pandas as pd
macs = pd.read_table('file.txt', header=None, usecols=[0], delim_whitespace=True)
I think it would be unnecessary to use pandas for this purpose only. However, if you are using pandas already anyway, I would prefer this approach
Related
I want to read a CSV file without using the open() function.
file.txt
'xxr'|'wer'|'xxr'|'xxr'
'xxt'|'dse'|'xxt'|'xxt'
'xxv'|'cad'|'xxv'|'xxv'
'xxe'|'sdf'|'xxe'|'xxe'
'xxw'|'sder'|'xxw'|'xxw'
'xxz'|'csd'| 'xxz'| 'xxz'
I've tried this, but this doesn't open a file. Just use 'file.txt' as a string.
file = ('file.txt')
reader = csv.reader(file,delimiter="|")
mylist = list(reader)
I cannot use the regular with open('file.txt', 'r') ....
Reason: The customer is sending this data pipeline to a platform that doesn't support the open() function, due to directory function restrictions (not a permissions issue).
I also cannot read as Dataframe, because they are unstructured lists, and this template is much simpler.
This is a conversion from a python script to Data Dream, with spark. Kind of odd... but they can reproduce pandas and numpy. They can't use Function open() and with.
Any ideas?
You could use fileinput although I'm unsure of how the module deals with opening the files and if it is any different than the open function but it does allow for multiple files to be opened in order using one stream and it seems to allow for more flexibility in how the file is read:
import fileinput
with fileinput.input('file.txt') as f:
reader = csv.reader(file,delimiter="|")
mylist = list(reader)
There is nothing wrong with:
reader = csv.reader(open(file),delimiter="|")
Or with pandas:
import pandas as pd
mylist = pd.read_csv(file, sep="|").to_numpy().tolist()
I am a complete noob at Python so I apologize if the solution is obvious.
I am trying to read some .csv field data on python for processing. Currently I have:
data = pd.read_csv('somedata.csv', sep=' |,', engine='python', usecols=(range(0,10)), skiprows=155, skipfooter=3)
However depending on if the data collection was interrupted, the last few lines of the file may be something like:
#data_end
Run Complete
Or
Run Interrupted
ERROR
A bunch of error codes
Hence I can't just use skipfooter=3. Is there a way for Python to detect the length of the footer and skip it? Thank you.
You can first read the content of your file as a plain text file into a Python list, remove those lines that don't contain the expected number of separators, and then transform the list into an IO stream. This IO stream is then passed on to pd.read_csv as if it was a file object.
The code might look like this:
from io import StringIO
import pandas as pd
# adjust these variables to meet your requirements:
number_of_columns = 11
separator = " |, "
# read the content of the file as plain text:
with open("somedata.csv", "r") as infile:
raw = infile.readlines()
# drop the rows that don't contain the expected number of separators:
raw = [x for x in raw if x.count(separator) == number_of_columns]
# turn the list into an IO stream (after joining the rows into a big string):
stream = StringIO("".join(raw))
# pass the string as an argument to pd.read_csv():
df = pd.read_csv(stream, sep=separator, engine='python',
usecols=(range(0,10)), skiprows=155)
If you use Python 2.7, you have to make replace the first line from io import StringIO by the following two lines:
from __future__ import unicode_literals
from cStringIO import StringIO
This is so because StringIO requires a unicode string (which is not the default in Python 2.7), and because the StringIO class lives in a different module in Python 2.7.
I think you have to simply resort to counting the commas for each line and manually find the last correct one. I'm not aware of a parameter to read_csv to automate that.
So I basically just want to have a list of all the pixel colour values that overlap written in a text file so I can then access them later.
The only problem is that the text file is having (set([ or whatever written with it.
Heres my code
import cv2
import numpy as np
import time
om=cv2.imread('spectrum1.png')
om=om.reshape(1,-1,3)
om_list=om.tolist()
om_tuple={tuple(item) for item in om_list[0]}
om_set=set(om_tuple)
im=cv2.imread('RGB.png')
im=cv2.resize(im,(100,100))
im= im.reshape(1,-1,3)
im_list=im.tolist()
im_tuple={tuple(item) for item in im_list[0]}
ColourCount= om_set & set(im_tuple)
File= open('Weedlist', 'w')
File.write(str(ColourCount))
Also, if I run this program again but with a different picture for comparison, will it append the data or overwrite it? It's kinda hard to tell when just looking at numbers.
If you replace these lines:
im=cv2.imread('RGB.png')
File= open('Weedlist', 'w')
File.write(str(ColourCount))
with:
import sys
im=cv2.imread(sys.argv[1])
open(sys.argv[1]+'Weedlist', 'w').write(str(list(ColourCount)))
you will get a new file for each input file and also you don't have to overwrite the RGB.png every time you want to try something new.
Files opened with mode 'w' will be overwritten. You can use 'a' to append.
You opened the file with the 'w' mode, write mode, which will truncate (empty) the file when you open it. Use 'a' append mode if you want data to be added to the end each time
You are writing the str() conversion of a set object to your file:
ColourCount= om_set & set(im_tuple)
File= open('Weedlist', 'w')
File.write(str(ColourCount))
Don't use str to convert the whole object; format your data to a string you find easy to read back again. You probably want to add a newline too if you want each new entry to be added on a new line. Perhaps you want to sort the data too, since a set lists items in an ordered determined by implementation details.
If comma-separated works for you, use str.join(); your set contains tuples of integer numbers, and it sounds as if you are fine with the repr() output per tuple, so we can re-use that:
with open('Weedlist', 'a') as outputfile:
output = ', '.join([str(tup) for tup in sorted(ColourCount)])
outputfile.write(output + '\n')
I used with there to ensure that the file object is automatically closed again after you are done writing; see Understanding Python's with statement for further information on what this means.
Note that if you plan to read this data again, the above is not going to be all that efficient to parse again. You should pick a machine-readable format. If you need to communicate with an existing program, you'll need to find out what formats that program accepts.
If you are programming that other program as well, pick a format that other programming language supports. JSON is widely supported for example (use the json module and convert your set to a list first; json.dump(sorted(ColourCount), fileobj), then `fileobj.write('\n') to produce newline-separated JSON objects could do).
If that other program is coded in Python, consider using the pickle module, which writes Python objects to a file efficiently in a format the same module can load again:
with open('Weedlist', 'ab') as picklefile:
pickle.dump(ColourCount, picklefile)
and reading is as easy as:
sets = []
with open('Weedlist', 'rb') as picklefile:
while True:
try:
sets.append(pickle.load(output))
except EOFError:
break
See Saving and loading multiple objects in pickle file? as to why I use a while True loop there to load multiple entries.
How would you like the data to be written? Replace the final line by
File.write(str(list(ColourCount)))
Maybe you like that more.
If you run that program, it will overwrite the previous content of the file. If you prefer to apprend the data open the file with:
File= open('Weedlist', 'a')
I've got a csv file and whenever i access the elements it gets me
aapl,2001-12-4,,,,,
The commas at the end is causing my functions to not work properly for my other application. How can I remove this in order to get rid of any additional commas after elements?
for example the above after correction would be
aaple,2001-12-4
anything will help, thanks so much.
m
Why would you remove the trailing commas? Typically the commas with no value between them would mean that the particular field is empty.
It would be better I think to not modify the line/file, but instead utilize in your application a method to split the line on commas. Then, do what you need to do with the list of data
import csv
csv_file = file('test.csv', 'rU')
csv_list = csv.reader(csv_file)
for k in csv_list:
print filter(None,k)
>>>
['aapl','2001-02-4']
Here's how to remove the excess commas from the right hand side of a string:
In [2]: mystring = '1,2,3,4,"Hello!",,,,,,,,,'
In [3]: mystring.rstrip(',')
Out[3]: '1,2,3,4,"Hello!"'
In [4]:
Expand on this to perform the comma-stripping operation for each line of a file:
Open the original .csv file.
Process one line, removing excess commas.
Write the processed line to a new file.
Repeat until your original .csv file is completely processed.
Use str.rstrip:
>>> 'aapl,2001-12-4,,,,,'.rstrip(',')
'aapl,2001-12-4'
If in case you can use sed then you can do this way from command line
sed -re 's/,*$//g' temp.csv
One of the simplest tricks is to use the parameter usecols in the read_csv function to limit what columns you read in:
For example
import pandas as pd
from google.colab import files
import io
uploaded = files.upload()
x_train = pd.read_csv(io.StringIO(uploaded['x_train.csv'].decode('utf-8')), skiprows=1, usecols=range(10) ,header=None)
To limit the reader to read only 10 columns, since the comma will be on the column 11.
I'm trying to read a csv file in python, so that I can then find the average of the values in one of the columns using numpy.average.
My script looks like this:
import os
import numpy
import csv
listing = os.listdir('/path/to/directory/of/files/i/need')
os.chdir('/path/to/directory/of/files/i/need')
for file in listing[1:]:
r = csv.reader(open(file, 'rU'))
for row in r:
if len(row)<2:continue
if float(row[2]) <=0.05:
avg = numpy.average(float(row[2]))
print avg
but I keep on getting the error ValueError: invalid literal for float(). The csv reader seems to be reading the numbers as string, and won't allow me to convert it to a float. Any suggestions?
Judging by the comments, your program is running into problems with the headers.
Two solutions of this are to use r.next(), which skips a line, before your for loop, or to use the DictReader class. The advantage of the DictReader class is that you can treat each row as a dictionary instead of a tuple, which may make for more readability in some cases, but you do have to pass the list of headers to it in the constructor.
change:
float(row[2])
to:
float(row[2].strip("'\""))