get a list into a string - python

filename = 'result'
column = 'Latitude'
os.system("wget http://earthquake.usgs.gov/earthquakes/feed/csv/1.0/hour")
#csv_data = csv.reader(downloaded_data)
file = csv.reader(open('/home/coperthought/Documents/hour' , 'rb'), delimiter='\t')
data = [] # This will contain our data
# Create a csv reader object to iterate through the file
reader = csv.reader( open( '/home/coperthought/Documents/hour' , 'rU'), delimiter=',', dialect='excel')
hrow = reader.next() # Get the top row
idx = hrow.index(column) # Find the column of the data you're looking for
for row in reader: # Iterate the remaining rows
data.append( row[idx] )
os.remove ( '/home/coperthought/Documents/hour')
print data
then data is
['63.190', '63.730', '59.935', '38.805', '61.416', '63.213']
how can I get this into a string. Join is one..
thanks

how can I get this into a string. Join is one..
Just use 'whateveryouwanthere'.join(data).
You already mentioned the join method, you need to explain what's the problem here if you do not want a solution involving join.

Related

Merge rows in a CSV to a column

I am new in python, I have one CSV file, it has more than 1000 rows, I want to merge particular rows and move those rows to another column, can any one help?
This is the source csv file I have:
I want to move emails under members column with comma separator, like this image:
To read csv files in Python, you can use the csv module. This code does the merging you're looking for.
import csv
output = [] # this will store a list of new rows
with open('test.csv') as f:
reader = csv.reader(f)
# read the first line of the input as the headers
header = next(reader)
output.append(header)
# we will build up groups and their emails
emails = []
group = []
for row in reader:
if len(row) > 1 and row[1]: # "UserGroup" is given
if group:
group[-1] = ','.join(emails)
group = row
output.append(group)
emails = []
else: # it isn't, assume this is an email
emails.append(row[0])
group[-1] = ','.join(emails)
# now write a new file
with open('new.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(output)

Efficient and Fast merger for multiple .CSV files

I have more than 15M tweets and I need to merger the ID and Text after dropping duplicates. I need most efficient way to do this as it is taking very long to complete?
frames = []
missed = 0
for q in query_list:
hashtag = q + '.csv'
try:
file_data = pd.read_csv(path + hashtag ,encoding='utf-8')
frames.append(file_data)
except:
missed+= 1
continue
df = pd.concat(frames)
df = df[['id','text']]
df = df.drop_duplicates()
df.to_csv('row_tweets.csv',index=False)
If you want unique pairs of (id, text), I'd just do it in pure python using set for easy de-duplication, and csv readers/writers:
import csv
id_text_pairs = set() # set of (id, text) pairs
missed = 0
for q in query_list:
hashtag = q + '.csv'
try:
with open(path + hashtag, 'r') as infile:
reader = csv.DictReader(infile)
for row in reader:
id_text_pairs.add( (row['id'], row['text']) ) # this won't add duplicates
except:
missed += 1
continue
with open('row_tweets.csv', 'w') as outfile:
col_names = ['id', 'text']
writer = csv.DictWriter(outfile, fieldnames=col_names)
writer.writeheader() # First line is the 'id,text' header
for id, text in id_text_pairs:
writer.writerow({'id': id, 'text': text}) # write each id,text pair
That should do it, and I believe will be more efficient in de-duping than a huge dataframe call at the end. Note that if your text's contain commas, you might want to output in tab-delimited format using the DictWriter argument delimiter='\t', or the quotechar and quoting arguments, check out the csv documentation here.

How to find if any element within a list is present within a row in a CSV file when using a for loop

import csv
with open('example.csv', 'r') as f:
csvfile = csv.reader(f, delimiter = ',')
client_email = ['#example.co.uk', '#moreexamples.com', 'lastexample.com']
for row in csvfile:
if row not in client_email:
print row
Assume code is formatted in blocks properly, it's not translating properly when I copy paste. I've created a list of company email domain names (as seen in the example), and I've created a loop to print out every row in my CSV that is not present in the list. Other columns in the CSV file include first name, second name, company name etc. so it is not limited to only emails.
Problem is when Im testing, it is printing off rows with the emails in the list i.e jackson#example.co.uk.
Any ideas?
In your example, row refers to a list of strings. So each row is ['First name', 'Second name', 'Company Name'] etc.
You're currently checking whether any column is exactly one of the elements in your client_email.
I suspect you want to check whether the text of any column contains one of the elements in client_email.
You could use another loop:
for row in csvfile:
for column in row:
# check if the column contains any of the email domains here
# if it does:
print row
continue
To check if a string contains any strings in another list, I often find this approach useful:
s = "xxabcxx"
stop_list = ["abc", "def", "ghi"]
if any(elem in s for elem in stop_list):
pass
One way to check may be to see if set of client_email and set in row has common elements (by changing if condition in loop):
import csv
with open('example.csv', 'r') as f:
csvfile = csv.reader(f, delimiter = ',')
client_email = ['#example.co.uk', '#moreexamples.com', 'lastexample.com']
for row in csvfile:
if (set(row) & set(client_email)):
print (row)
You can also use any as following:
import csv
with open('untitled.csv', 'r') as f:
csvfile = csv.reader(f, delimiter = ',')
client_email = ['#example.co.uk', '#moreexamples.com', 'lastexample.com']
for row in csvfile:
if any(item in row for item in client_email):
print (row)
Another possible way,
import csv
data = csv.reader(open('example.csv', 'r'))
emails = {'#example.co.uk', '#moreexamples.com', 'lastexample.com'}
for row in data:
if any(email in cell for cell in row for email in emails):
print(row)

Python append column header & append column values from list to csv

I am trying to append column header (hard-coded) and append column values from list to an existing csv. I am not getting the desired result.
Method 1 is appending results on an existing csv file. Method 2 clones a copy of existing csv into temp.csv. Both methods don't get me the desired output I am looking for. In Results 1, it just appends after the last row cell. In results 2, all list values append on each row. Expected results is what I am looking for.
I have included my code below. Appreciate any input or guidance.
Existing CSV Test.csv
Type,Id,TypeId,CalcValues
B,111K,111Kequity(long) 111K,116.211768
C,111N,B(long) 111N,0.106559957
B,111J,c(long) 111J,20.061634
Code - Method 1 & 2
final_results = ['0.1065599566767107', '0.0038113334533441123', '20.061623176440904']
# Method1
csvfile = "test.csv"
with open(csvfile, "a") as output:
writer = csv.writer(output, lineterminator='\n')
for val in final_results:
writer.writerow([val])
# Method2
with open("test.csv", 'rb') as input, open('temp.csv', 'wb') as output:
reader = csv.reader(input, delimiter = ',')
writer = csv.writer(output, delimiter = ',')
all = []
row = next(reader)
row.insert(5, 'Results')
all.append(row)
for row in reader:
for i in final_results:
print type(i)
row.insert(5, i)
all.append(row)
writer.writerows(all)
Results for Method 1
Type,Id,TypeId,CalcValues
B,111K,111Kequity(long) 111K,116.211768
C,111N,B(long) 111N,0.106559957
B,111J,c(long) 111J,20.0616340.1065599566767107
0.0038113334533441123
20.061623176440904
Results for Method 2
Type,Id,TypeId,CalcValues,Results
B,111K,111Kequity(long) 111K,116.211768,0.1065599566767107,20.061623176440904,0.0038113334533441123
C,111N,B(long) 111N,0.106559957,0.1065599566767107,20.061623176440904,0.0038113334533441123
B,111J,c(long) 111J,20.061634,0.1065599566767107,20.061623176440904,0.0038113334533441123
Expected Result
Type,Id,TypeId,CalcValues,ID
B,111K,111Kequity(long) 111K,116.211768,0.1065599566767107
C,111N,B(long) 111N,0.106559957,20.061623176440904
B,111J,c(long) 111J,20.061634,0.0038113334533441123
First method is bound to fail: you don't want to add new lines but new columns. So back to second method:
You insert the title OK, but then you're looping through the results on each row, whereas you need to iterate on them.
For this, i create an iterator from the final_results list (with __iter__()), then I call it.next and append to each row (no need to insert in the end, just append)
I removed the all big list, because 1) you can write one line at a time, saves memory, and 2) all is a predefined function. Avoid to use that as a variable.
final_results = ['0.1065599566767107', '0.0038113334533441123', '20.061623176440904']
# Method2
with open("test.csv", 'rb') as input, open('temp.csv', 'wb') as output:
reader = csv.reader(input, delimiter = ',')
writer = csv.writer(output, delimiter = ',')
row = next(reader) # read title line
row.append("Results")
writer.writerow(row) # write enhanced title line
it = final_results.__iter__() # create an iterator on the result
for row in reader:
if row: # avoid empty lines that usually lurk undetected at the end of the files
try:
row.append(next(it)) # add a result to current row
except StopIteration:
row.append("N/A") # not enough results: pad with N/A
writer.writerow(row)
result:
Type,Id,TypeId,CalcValues,Results
B,111K,111Kequity(long) 111K,116.211768,0.1065599566767107
C,111N,B(long) 111N,0.106559957,0.0038113334533441123
B,111J,c(long) 111J,20.061634,20.061623176440904
Note: had we included "Results" in the final_results variable, we wouldn't even have needed to process first line differently.
Note2: the values seem wrong: final_results seems not in the same order as the expected output. And the Result column has turned to ID, but that's easy to correct.
import csv
HEADER = "Type,Id,TypeId,CalcValues,ID"
final_results = ['0.1065599566767107', '20.061623176440904', '0.0038113334533441123']
with open("test.csv") as inputs, open("tmp.csv", "wb") as outputs:
reader = csv.reader(inputs, delimiter=",")
writer = csv.writer(outputs, delimiter=",")
reader.next() # ignore header line
writer.writerow(HEADER.split(","))
for row in reader:
writer.writerow(row + [final_results.pop(0)])
I store the header fields into HEADER and switch 2nd and 3rd elements of final_results, use pop(0) to remove and return the first element of final_results
output:
Type,Id,TypeId,CalcValues,ID
B,111K,111Kequity(long) 111K,116.211768,0.1065599566767107
C,111N,B(long) 111N,0.106559957,20.061623176440904
B,111J,c(long) 111J,20.061634,0.0038113334533441123

Reading column names alone in a csv file

I have a csv file with the following columns:
id,name,age,sex
Followed by a lot of values for the above columns.
I am trying to read the column names alone and put them inside a list.
I am using Dictreader and this gives out the correct details:
with open('details.csv') as csvfile:
i=["name","age","sex"]
re=csv.DictReader(csvfile)
for row in re:
for x in i:
print row[x]
But what I want to do is, I need the list of columns, ("i" in the above case)to be automatically parsed with the input csv than hardcoding them inside a list.
with open('details.csv') as csvfile:
rows=iter(csv.reader(csvfile)).next()
header=rows[1:]
re=csv.DictReader(csvfile)
for row in re:
print row
for x in header:
print row[x]
This gives out an error
Keyerrror:'name'
in the line print row[x]. Where am I going wrong? Is it possible to fetch the column names using Dictreader?
Though you already have an accepted answer, I figured I'd add this for anyone else interested in a different solution-
Python's DictReader object in the CSV module (as of Python 2.6 and above) has a public attribute called fieldnames.
https://docs.python.org/3.4/library/csv.html#csv.csvreader.fieldnames
An implementation could be as follows:
import csv
with open('C:/mypath/to/csvfile.csv', 'r') as f:
d_reader = csv.DictReader(f)
#get fieldnames from DictReader object and store in list
headers = d_reader.fieldnames
for line in d_reader:
#print value in MyCol1 for each row
print(line['MyCol1'])
In the above, d_reader.fieldnames returns a list of your headers (assuming the headers are in the top row).
Which allows...
>>> print(headers)
['MyCol1', 'MyCol2', 'MyCol3']
If your headers are in, say the 2nd row (with the very top row being row 1), you could do as follows:
import csv
with open('C:/mypath/to/csvfile.csv', 'r') as f:
#you can eat the first line before creating DictReader.
#if no "fieldnames" param is passed into
#DictReader object upon creation, DictReader
#will read the upper-most line as the headers
f.readline()
d_reader = csv.DictReader(f)
headers = d_reader.fieldnames
for line in d_reader:
#print value in MyCol1 for each row
print(line['MyCol1'])
You can read the header by using the next() function which return the next row of the reader’s iterable object as a list. then you can add the content of the file to a list.
import csv
with open("C:/path/to/.filecsv", "rb") as f:
reader = csv.reader(f)
i = reader.next()
rest = list(reader)
Now i has the column's names as a list.
print i
>>>['id', 'name', 'age', 'sex']
Also note that reader.next() does not work in python 3. Instead use the the inbuilt next() to get the first line of the csv immediately after reading like so:
import csv
with open("C:/path/to/.filecsv", "rb") as f:
reader = csv.reader(f)
i = next(reader)
print(i)
>>>['id', 'name', 'age', 'sex']
The csv.DictReader object exposes an attribute called fieldnames, and that is what you'd use. Here's example code, followed by input and corresponding output:
import csv
file = "/path/to/file.csv"
with open(file, mode='r', encoding='utf-8') as f:
reader = csv.DictReader(f, delimiter=',')
for row in reader:
print([col + '=' + row[col] for col in reader.fieldnames])
Input file contents:
col0,col1,col2,col3,col4,col5,col6,col7,col8,col9
00,01,02,03,04,05,06,07,08,09
10,11,12,13,14,15,16,17,18,19
20,21,22,23,24,25,26,27,28,29
30,31,32,33,34,35,36,37,38,39
40,41,42,43,44,45,46,47,48,49
50,51,52,53,54,55,56,57,58,59
60,61,62,63,64,65,66,67,68,69
70,71,72,73,74,75,76,77,78,79
80,81,82,83,84,85,86,87,88,89
90,91,92,93,94,95,96,97,98,99
Output of print statements:
['col0=00', 'col1=01', 'col2=02', 'col3=03', 'col4=04', 'col5=05', 'col6=06', 'col7=07', 'col8=08', 'col9=09']
['col0=10', 'col1=11', 'col2=12', 'col3=13', 'col4=14', 'col5=15', 'col6=16', 'col7=17', 'col8=18', 'col9=19']
['col0=20', 'col1=21', 'col2=22', 'col3=23', 'col4=24', 'col5=25', 'col6=26', 'col7=27', 'col8=28', 'col9=29']
['col0=30', 'col1=31', 'col2=32', 'col3=33', 'col4=34', 'col5=35', 'col6=36', 'col7=37', 'col8=38', 'col9=39']
['col0=40', 'col1=41', 'col2=42', 'col3=43', 'col4=44', 'col5=45', 'col6=46', 'col7=47', 'col8=48', 'col9=49']
['col0=50', 'col1=51', 'col2=52', 'col3=53', 'col4=54', 'col5=55', 'col6=56', 'col7=57', 'col8=58', 'col9=59']
['col0=60', 'col1=61', 'col2=62', 'col3=63', 'col4=64', 'col5=65', 'col6=66', 'col7=67', 'col8=68', 'col9=69']
['col0=70', 'col1=71', 'col2=72', 'col3=73', 'col4=74', 'col5=75', 'col6=76', 'col7=77', 'col8=78', 'col9=79']
['col0=80', 'col1=81', 'col2=82', 'col3=83', 'col4=84', 'col5=85', 'col6=86', 'col7=87', 'col8=88', 'col9=89']
['col0=90', 'col1=91', 'col2=92', 'col3=93', 'col4=94', 'col5=95', 'col6=96', 'col7=97', 'col8=98', 'col9=99']
How about
with open(csv_input_path + file, 'r') as ft:
header = ft.readline() # read only first line; returns string
header_list = header.split(',') # returns list
I am assuming your input file is CSV format.
If using pandas, it takes more time if the file is big size because it loads the entire data as the dataset.
I am just mentioning how to get all the column names from a csv file.
I am using pandas library.
First we read the file.
import pandas as pd
file = pd.read_csv('details.csv')
Then, in order to just get all the column names as a list from input file use:-
columns = list(file.head(0))
Thanking Daniel Jimenez for his perfect solution to fetch column names alone from my csv, I extend his solution to use DictReader so we can iterate over the rows using column names as indexes. Thanks Jimenez.
with open('myfile.csv') as csvfile:
rest = []
with open("myfile.csv", "rb") as f:
reader = csv.reader(f)
i = reader.next()
i=i[1:]
re=csv.DictReader(csvfile)
for row in re:
for x in i:
print row[x]
here is the code to print only the headers or columns of the csv file.
import csv
HEADERS = next(csv.reader(open('filepath.csv')))
print (HEADERS)
Another method with pandas
import pandas as pd
HEADERS = list(pd.read_csv('filepath.csv').head(0))
print (HEADERS)
import pandas as pd
data = pd.read_csv("data.csv")
cols = data.columns
I literally just wanted the first row of my data which are the headers I need and didn't want to iterate over all my data to get them, so I just did this:
with open(data, 'r', newline='') as csvfile:
t = 0
for i in csv.reader(csvfile, delimiter=',', quotechar='|'):
if t > 0:
break
else:
dbh = i
t += 1
Using pandas is also an option.
But instead of loading the full file in memory, you can retrieve only the first chunk of it to get the field names by using iterator.
import pandas as pd
file = pd.read_csv('details.csv'), iterator=True)
column_names_full=file.get_chunk(1)
column_names=[column for column in column_names_full]
print column_names

Categories