Based on the following link:
I can easily format a single MAC. But I'm having an issue trying to do multiple from a csv file. When I run the file, it converts them but the script will convert each one like 6 times. If I add "return" then it only converts the first one 6 times.
def readfile_csv():
with open('ap_macs.csv', 'r',encoding='utf-8-sig') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for lines in csv_reader:
data = (lines[0])
for i in range(0,12,2):
format_mac = ':'.join(data[i:i + 2] for i in range(0, 12, 2))
print(format_mac.swapcase())
Ideally, I'd love to be able to do this with Pandas and Excel but the indexing is killing me. Appreciate any help. Thank you.
ap_macs
A1B2C3D4E5F6
A1B2C3D4E5F7
A1B2C3D4E5F8
A1B2C3D4E5F9
a1b2c3d4e5f6
a1b2c3d4e5f7
a1b2c3d4e5f8
a1b2c3d4e5f9
You could use pandas for this. Note that pandas is overkill if all you're using it for is to read the csv.
df = pd.read_csv('ap_macs.csv')
# Slice the mac addresses into chunks
# This list will contain one `pd.Series` each for the second through last chunks
chunks = [df["ap_macs"].str[i:i+2] for i in range(2, 12, 2)]
# Then concatenate all the chunks, with a separator, to the first chunk
df["MAC"] = df['ap_macs'].str[0:2].str.cat(chunks, ":")
which gives:
ap_macs MAC
0 A1B2C3D4E5F6 A1:B2:C3:D4:E5:F6
1 A1B2C3D4E5F7 A1:B2:C3:D4:E5:F7
2 A1B2C3D4E5F8 A1:B2:C3:D4:E5:F8
3 A1B2C3D4E5F9 A1:B2:C3:D4:E5:F9
4 a1b2c3d4e5f6 a1:b2:c3:d4:e5:f6
5 a1b2c3d4e5f7 a1:b2:c3:d4:e5:f7
6 a1b2c3d4e5f8 a1:b2:c3:d4:e5:f8
7 a1b2c3d4e5f9 a1:b2:c3:d4:e5:f9
Of course, you can overwrite the ap_macs column if you want, but I created a new column for this demonstration.
If you want to use your csv.reader approach, you need to create your string first, and then print it.
def readfile_csv():
# csv_data = []
with open('ap_macs.csv', 'r',encoding='utf-8-sig') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for row in csv_reader:
data = row[0]
for i in range(0,12,2):
format_mac = ':'.join(data[i:i + 2] for i in range(0, 12, 2)).swapcase()
print(format_mac)
# csv_data.append(format_mac)
# return csv_data
which will print:
a1:b2:c3:d4:e5:f6
a1:b2:c3:d4:e5:f7
a1:b2:c3:d4:e5:f8
a1:b2:c3:d4:e5:f9
A1:B2:C3:D4:E5:F6
A1:B2:C3:D4:E5:F7
A1:B2:C3:D4:E5:F8
A1:B2:C3:D4:E5:F9
Note that printing is not the same as returning data, and if you actually want to use this data outside the function, you'll have to return it (uncomment the commented lines)
Related
I want to read a CSV file generated by my other script and I need to check 2 columns at same time. The problem is that my loop its stopping because there are empty values for some lines and It cant reach the following value. For example:
HASH 1111
HASH 2222
HASH 3333
HASH 4444
HASH 5555
HASH
HASH
HASH 6666
I cant read further point 5, because 6 and 7 has empty values and I need to read also the 8. Here is my code.
import csv
with open('vts.csv') as csvDataFile:
csvReader = csv.reader(csvDataFile, delimiter=';')
next(csvReader)
VTs = []
for row in csvReader:
VT = row
VTs.append(VT)
for row in VTs:
print(row[0],row[4])
Is there any way to continue the listing without manually sorting the Excel?
First, a csv file is not an Excel file. The former is a text delimited file, the latter is a binary one.
Next, your problem is not at reading time: the csv module can easily accept files with variable number of fields across its rows, including empty lines that will just give empty lists for row.
So the fix is just:
...
for row in VTs:
if len(row) > 4:
print(row[0],row[4])
There is no problem with your code except for the print(row[0],row[4]) for the given data while there no so many columns. I tested your code as follows:
.py
import csv
with open('vts.csv') as csvDataFile:
csvReader = csv.reader(csvDataFile, delimiter=';')
next(csvReader)
VTs = []
for row in csvReader:
VT = row
VTs.append(VT)
for row in VTs:
print(row[0], row[1])
vts.csv
HASH;1111
HASH;2222
HASH;3333
HASH;4444
HASH;5555
HASH;
HASH;
HASH;6666
If your data is as the sample, you don't really need delimiter=';' since it's a comma-separated value (hence csv), not semicolon ;.
Anyway, you can just ignore if the intended column not exists. Assuming your input is in proper csv format as below.
col1,col2
hash1,1111
hash2,2222
...
You can use csv.reader as what you did.
import csv
with open('vts.csv') as csvDataFile:
csvReader = csv.reader(csvDataFile, delimiter=';')
next(csvReader)
# csv.reader returns generator object, which you can convert it to list as below
VTs = list(csvReader)
for row in VTs:
if len(row) == 2:
print(row[0],row[1])
If your goal is only for inspecting the data, you can conveniently use pandas.DataFrame:
import pandas as pd
df = pd.read_csv("vts.csv")
print(df.dropna()) # This will print all rows without any missing data
I am asking Python to print the minimum number from a column of CSV data, but the top row is the column number, and I don't want Python to take the top row into account. How can I make sure Python ignores the first line?
This is the code so far:
import csv
with open('all16.csv', 'rb') as inf:
incsv = csv.reader(inf)
column = 1
datatype = float
data = (datatype(column) for row in incsv)
least_value = min(data)
print least_value
Could you also explain what you are doing, not just give the code? I am very very new to Python and would like to make sure I understand everything.
You could use an instance of the csv module's Sniffer class to deduce the format of a CSV file and detect whether a header row is present along with the built-in next() function to skip over the first row only when necessary:
import csv
with open('all16.csv', 'r', newline='') as file:
has_header = csv.Sniffer().has_header(file.read(1024))
file.seek(0) # Rewind.
reader = csv.reader(file)
if has_header:
next(reader) # Skip header row.
column = 1
datatype = float
data = (datatype(row[column]) for row in reader)
least_value = min(data)
print(least_value)
Since datatype and column are hardcoded in your example, it would be slightly faster to process the row like this:
data = (float(row[1]) for row in reader)
Note: the code above is for Python 3.x. For Python 2.x use the following line to open the file instead of what is shown:
with open('all16.csv', 'rb') as file:
To skip the first line just call:
next(inf)
Files in Python are iterators over lines.
Borrowed from python cookbook,
A more concise template code might look like this:
import csv
with open('stocks.csv') as f:
f_csv = csv.reader(f)
headers = next(f_csv)
for row in f_csv:
# Process row ...
In a similar use case I had to skip annoying lines before the line with my actual column names. This solution worked nicely. Read the file first, then pass the list to csv.DictReader.
with open('all16.csv') as tmp:
# Skip first line (if any)
next(tmp, None)
# {line_num: row}
data = dict(enumerate(csv.DictReader(tmp)))
You would normally use next(incsv) which advances the iterator one row, so you skip the header. The other (say you wanted to skip 30 rows) would be:
from itertools import islice
for row in islice(incsv, 30, None):
# process
use csv.DictReader instead of csv.Reader.
If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as field names. you would then be able to access field values using row["1"] etc
Python 2.x
csvreader.next()
Return the next row of the reader’s iterable object as a list, parsed
according to the current dialect.
csv_data = csv.reader(open('sample.csv'))
csv_data.next() # skip first row
for row in csv_data:
print(row) # should print second row
Python 3.x
csvreader.__next__()
Return the next row of the reader’s iterable object as a list (if the
object was returned from reader()) or a dict (if it is a DictReader
instance), parsed according to the current dialect. Usually you should
call this as next(reader).
csv_data = csv.reader(open('sample.csv'))
csv_data.__next__() # skip first row
for row in csv_data:
print(row) # should print second row
The documentation for the Python 3 CSV module provides this example:
with open('example.csv', newline='') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
# ... process CSV file contents here ...
The Sniffer will try to auto-detect many things about the CSV file. You need to explicitly call its has_header() method to determine whether the file has a header line. If it does, then skip the first row when iterating the CSV rows. You can do it like this:
if sniffer.has_header():
for header_row in reader:
break
for data_row in reader:
# do something with the row
this might be a very old question but with pandas we have a very easy solution
import pandas as pd
data=pd.read_csv('all16.csv',skiprows=1)
data['column'].min()
with skiprows=1 we can skip the first row then we can find the least value using data['column'].min()
The new 'pandas' package might be more relevant than 'csv'. The code below will read a CSV file, by default interpreting the first line as the column header and find the minimum across columns.
import pandas as pd
data = pd.read_csv('all16.csv')
data.min()
Because this is related to something I was doing, I'll share here.
What if we're not sure if there's a header and you also don't feel like importing sniffer and other things?
If your task is basic, such as printing or appending to a list or array, you could just use an if statement:
# Let's say there's 4 columns
with open('file.csv') as csvfile:
csvreader = csv.reader(csvfile)
# read first line
first_line = next(csvreader)
# My headers were just text. You can use any suitable conditional here
if len(first_line) == 4:
array.append(first_line)
# Now we'll just iterate over everything else as usual:
for row in csvreader:
array.append(row)
Well, my mini wrapper library would do the job as well.
>>> import pyexcel as pe
>>> data = pe.load('all16.csv', name_columns_by_row=0)
>>> min(data.column[1])
Meanwhile, if you know what header column index one is, for example "Column 1", you can do this instead:
>>> min(data.column["Column 1"])
For me the easiest way to go is to use range.
import csv
with open('files/filename.csv') as I:
reader = csv.reader(I)
fulllist = list(reader)
# Starting with data skipping header
for item in range(1, len(fulllist)):
# Print each row using "item" as the index value
print (fulllist[item])
I would convert csvreader to list, then pop the first element
import csv
with open(fileName, 'r') as csvfile:
csvreader = csv.reader(csvfile)
data = list(csvreader) # Convert to list
data.pop(0) # Removes the first row
for row in data:
print(row)
I would use tail to get rid of the unwanted first line:
tail -n +2 $INFIL | whatever_script.py
just add [1:]
example below:
data = pd.read_csv("/Users/xyz/Desktop/xyxData/xyz.csv", sep=',', header=None)**[1:]**
that works for me in iPython
Python 3.X
Handles UTF8 BOM + HEADER
It was quite frustrating that the csv module could not easily get the header, there is also a bug with the UTF-8 BOM (first char in file).
This works for me using only the csv module:
import csv
def read_csv(self, csv_path, delimiter):
with open(csv_path, newline='', encoding='utf-8') as f:
# https://bugs.python.org/issue7185
# Remove UTF8 BOM.
txt = f.read()[1:]
# Remove header line.
header = txt.splitlines()[:1]
lines = txt.splitlines()[1:]
# Convert to list.
csv_rows = list(csv.reader(lines, delimiter=delimiter))
for row in csv_rows:
value = row[INDEX_HERE]
Simple Solution is to use csv.DictReader()
import csv
def read_csv(file): with open(file, 'r') as file:
reader = csv.DictReader(file)
for row in reader:
print(row["column_name"]) # Replace the name of column header.
I have a CSV file with 100 rows.
How do I read specific rows?
I want to read say the 9th line or the 23rd line etc?
You could use a list comprehension to filter the file like so:
with open('file.csv') as fd:
reader=csv.reader(fd)
interestingrows=[row for idx, row in enumerate(reader) if idx in (28,62)]
# now interestingrows contains the 28th and the 62th row after the header
Use list to grab all the rows at once as a list. Then access your target rows by their index/offset in the list. For example:
#!/usr/bin/env python
import csv
with open('source.csv') as csv_file:
csv_reader = csv.reader(csv_file)
rows = list(csv_reader)
print(rows[8])
print(rows[22])
You simply skip the necessary number of rows:
with open("test.csv", "rb") as infile:
r = csv.reader(infile)
for i in range(8): # count from 0 to 7
next(r) # and discard the rows
row = next(r) # "row" contains row number 9 now
You could read all of them and then use normal lists to find them.
with open('bigfile.csv','rb') as longishfile:
reader=csv.reader(longishfile)
rows=[r for r in reader]
print row[9]
print row[88]
If you have a massive file, this can kill your memory but if the file's got less than 10,000 lines you shouldn't run into any big slowdowns.
You can do something like this :
with open('raw_data.csv') as csvfile:
readCSV = list(csv.reader(csvfile, delimiter=','))
row_you_want = readCSV[index_of_row_you_want]
May be this could help you , using pandas you can easily do it with loc
'''
Reading 3rd record using pandas -> (loc)
Note : Index start from 0
If want to read second record then 3-1 -> 2
loc[2]` -> read second row and `:` -> entire row details
'''
import pandas as pd
df = pd.read_csv('employee_details.csv')
df.loc[[2],:]
Output :
I have a for loop that prints 4 details:
deats = soup.find_all('p')
for n in deats:
print n.text
The output is 4 printed lines.
Instead of printing, what I'd like to do is have each 'n' written to a different column in a .csv. Obviously, when I use a regular .write() it puts it in the same column. In other words, how would I make it write each iteration of the loop to the next column?
You would create the csv row as a loop (or using list comprehension) I will show the explicit loop for ease of reading and you can change it to a single list comprehension line yourself.
row = []
for n in deats:
row.append(n)
Now you have row ready to write to the .csv file using csv.Writer()
Hei, try like this:
import csv
csv_output = csv.writer(open("output.csv", "wb")) # output.csv is the output file name!
csv_output.writerow(["Col1","Col2","Col3","Col4"]) # Setting first row with all column titles
temp = []
deats = soup.find_all('p')
for n in deats:
temp.append(str(n.text))
csv_output.writerow(temp)
You use the csv module for this:
import csv
with open('output.csv', 'wb') as csvfile:
opwriter = csv.writer(csvfile, delimiter=','
opwriter.writerow([n.text for n in deats])
extra_stuff = pie,cake,eat,too
some_file.write(",".join(n.text for n in deats)+"," + ",".join(str(s) for s in extra_stuff))
??? is that all you are looking for?
I am trying to read in a table from a .CSV file which should have 5 columns.
But, some rows have corrupt data..making it more than 5 columns.
How do I reject those rows and continue reading further ?
*Using
temp = read_table(folder + r'\temp.txt, sep=r'\t')
Just gives an error and stops the program*
I am new to Python...please help
Thanks
Look into using Python's csv module.
Without testing the damaged file it is difficult to say if this will do the trick however the csvreader reads a csv file's rows as a list of strings so you could potentially check if the list has 5 elements and proceed that way.
A code example:
out = []
with open('file.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimeter=' ')
for row in reader:
if len(row) == 5:
out.append(row)