Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
'NBN3W27800D1.NS'
3326.5
3515.6
3326.5
'2021-04-13'
3515.6
3904.0
3970.3
27800
'2021-04-15'
'NBN3W27800P1.NS'
6.55
13.0
4.1
'2021-04-13'
9.25
5.6
6.55
27800
'2021-04-15'
my text file contains these data. i want to store as 2 rows into mysql - db
Here is what I assume You want (get each line of text and store them in separate rows):
with open('your_file.txt') as file:
data = file.readlines()
for line in data:
print(line)
Except replace the print with whatever function You use to add to the database and just use a counter or enumerate() to set the row like so:
for row, line in enumerate(data):
print(row, line)
except again replace that print with insertion to database.
You may also want to add .strip() to remove the newlines:
for lines in data:
lines = lines.strip()
print(lines)
and again add the rest of the database code (which I don't know so I am giving You print fucntions)
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 days ago.
Improve this question
I am trying to validate input.csv header column names using existing schema_info.csv file
input.csv
emp_id,emp_name,salary
1,siva,1000
2,ravi,200
3,kiran,800
schema_info
file_name,column_name,column_sequence
input.csv,EMP_ID,1
input.csv,EMP_NAME,2
input.csv,SALARY,3
I try to read header and compare with input.csv file header column name and sequence with schema info data. but unable get sequence order from input file header and unable to compare with Schema file data.. Any suggestions?
input = sc.textFile("examples/src/main/resources/people.txt")
input = input.first()
parts = input.map(lambda l: l.split(","))
# Each line is converted to a tuple.
header_data = parts.map(lambda p: (p[0], p[1].strip()))
schema_info = spark.read.option("header","true").option("inferSchema","true").csv("/schema_info.csv")
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 12 months ago.
Improve this question
I am following an article about image caption transformer model in tensor flow python. When I try to run the following code it does not show the data when I use the head function.
file = open(dir_Flickr_text,'r')
text = file.read()
file.close()
datatxt = []
for line in text.split('\n'):
col = line.split('\t')
if len(col) == 1:
continue
w = col[0].split("#")
datatxt.append(w + [col[1].lower()])
data = pd.DataFrame(datatxt,columns["filename","index","caption"])
data = data.reindex(columns =. ['index','filename','caption'])
data = data[data.filename !='2258277193_586949ec62.jpg.1']
uni_filenames = np.unique(data.filename.values)
data.head()
After running this I see three columns (index, filename , caption) with no data at all. While the real file contains enough data and the in the article they display the data too.
It doesn't show any data because the dataframe is empty, probably because datatext is empty. Try using a print() statement before data=pd.DataFrame(... to see what is going on.
It is hard for us to debug without the dataset.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Without using any modules (panda, csv, etc) I need to filter this csv file (https://data.world/prasert/rotten-tomatoes-top-movies-by-genre/workspace/file?filename=rotten_tomatoes_top_movies_2019-01-15.csv), I would like to filter through ONLY the movies that are in the animation genre and drop the other movies.
I have used open, split and the for loop to read the data, but I am struggling to filter the movies into genres.
I have created a list called genres, and then appended it with genres.append(line.split(",") [4]), but this only gives a list of genres from the genre column rather than giving me info of each movie in a particular genre.
I know it is crazy to attempt this without the modules(this is for school), but is it even possible to do this without them?
Thanks in advance.
try this.
f = open("file_name", "r",encoding="utf-8")
new_list=[]
header=0
for line in f.readlines():
#if header is present in the file
if header==0:
new_list.append(line)
header=1
continue
#add Genre Name to filter
if line.split(',')[4]=='genre_name':
new_list.append(line)
#writing filtered list to output file.
out_flie=open('output.txt','w',encoding="utf-8")
for element in new_list:
out_flie.write(element)
out_flie.write('\n')
out_flie.close()
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
while True:
if bbs_number > lately_number():
sys.stdout = open('date.txt','a')
bbs_lists = range(highest_number() +1, bbs_number +1)
for item in bbs_lists:
url_number = "url" + str(item)
try:
result = requests.get(url_number)
bs_number = BeautifulSoup(result.content, "lxml")
float_box = bs_number.find("div", {"class": "float_box"})
parameter_script = float_box
print("bs_obj()")
except AttributeError as e:
print("error")
with open('lately_number.txt', 'w') as f_last:
f_last.write(str(bbs_number))
Using the while statement above does not cause an error, but duplicate data will be output to date.txt.
I want to modify in the early stages of setting the range value, rather than removing duplicates in the later stages of typing in date.txt.
One possibility is that the existing lately_number() will output a duplicate range to date.txt, because sometimes it is not possible to enter the value correctly in the writing process of lately_number.txt.
I would be grateful if you can help me with a better function expression to add or replace.
The simplest way would be to read the date.txt into a set. Then, you can check the set to see if the date is already in there, and if it isn't, write the date to the date.txt file.
E.G.
uniqueDates = set()
#read file contents into a set.
with open("date.txt", "r") as f:
for line in f:
uniqueDates.add(line.strip()) #strip off the line ending \n
#ensure what we're writing to the date file isn't a duplicate.
with open("date.txt", "a") as f:
if("bs_obj()" not in uniqueDates):
f.write("bs_obj")
You'll probably need to adjust the logic a bit to fit your needs, but, I believe this is what you're trying to accomplish?
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am trying to access the federal reserve bank data at https://fred.stlouisfed.org/series/FEDFUNDS
what is the code I can write to access this database and then put it in a dictionary? Or do I have to download the file first and save it on my computer?
The easiest way to pull that data in would be to download and parse the CSV file listed under the "Download" button.
You can use the Requests library to download the file, then use the native CSV library.
See https://stackoverflow.com/a/32400969/9214517 for how to do it.
Let's say you allow to keep the data in a pandas DataFrame (as the link above do), this is the code:
import pandas as pd
import requests
import io
url = "https://fred.stlouisfed.org/graph/fredgraph.csv?bgcolor=%23e1e9f0&chart_type=line&drp=0&fo=open%20sans&graph_bgcolor=%23ffffff&height=450&mode=fred&recession_bars=on&txtcolor=%23444444&ts=12&tts=12&width=968&nt=0&thu=0&trc=0&show_legend=yes&show_axis_titles=yes&show_tooltip=yes&id=FEDFUNDS&scale=left&cosd=1954-07-01&coed=2018-10-01&line_color=%234572a7&link_values=false&line_style=solid&mark_type=none&mw=3&lw=2&ost=-99999&oet=99999&mma=0&fml=a&fq=Monthly&fam=avg&fgst=lin&fgsnd=2009-06-01&line_index=1&transformation=lin&vintage_date=2018-11-28&revision_date=2018-11-28&nd=1954-07-01"
s = requests.get(url).content.decode("utf-8")
df = pd.read_csv(io.StringIO(s)
Then your df will be:
DATE FEDFUNDS
0 1954-07-01 0.80
1 1954-08-01 1.22
2 1954-09-01 1.06
3 1954-10-01 0.85
4 1954-11-01 0.83
....
And if you insist on a dict, use this instead of the last line above to convert your CSV data s:
mydict = dict([line.split(",") for line in s.splitlines()])
The key is how to get the URL: Hit the download button on the page you quoted, and copy the link to CSV.