[Library] BeautifulSoup 크롤링

Python

[Library] BeautifulSoup 크롤링

차돌박이츄베릅 2023. 4. 14. 18:35

BeautifulSoup 라이브러리 설치

터미널에서 $ pip install bs4

크롤링

웹페이지에서 HTML중에 어떤 부분을 솎아내서 가지고 오는 것

크롤링 기본 세팅

import requests

from bs4 import BeautifulSoup

URL = "https://movie.daum.net/ranking/reservation"

headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}

data = requests.get(URL, headers=headers)

soup = BeautifulSoup(data.text, 'html.parser')

터미널에서 print()값 확인하기

$ python 파일명.py

↑ 누르고 엔터치면 편하게 재확인 가능

Beautiful Soup 사용법

사이트에서 F12로 개발자도구 켜기
원하는 태그 우클릭 - Copy - Copy selector
.select('')에 붙여넣기

lis = soup.select('#mainContent > div > div.box_ranking > ol > li')

for li in lis:

rank = li.select_one('.rank_num').text

title = li.select_one('.link_txt').text.strip()

rate = li.select_one('.txt_grade').text

print(rank, title, rate)

# 속성 값을 가지고 오려면 title['href']

텍스트 클렌징 작업

.strip() : 앞뒤로 붙은 띄어쓰기들을 다 없애줌

.replace(',', '') : 특정 문자를 없애줄 수 있음

'Python' 카테고리의 다른 글

[에러해결] Method Not Allowed The method is not allowed for the requested URL (0)	2023.05.18
[에러해결] WARNING: You are using pip version ... (0)	2023.04.14
[Library] requests (0)	2023.04.14
Python 기초 (0)	2023.04.14
Python 설치 (0)	2023.04.14

현재글[Library] BeautifulSoup 크롤링

FE개발 공부 시작: 23년 4월 10일

앞으로 배우는 내용들을 차곡차곡 기록해나가기 !🎶

이전 기수 작업방식 참고하여 진행중, 코딩애플 Git 무료강의,

Today :
Yesterday :

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

FE개발 공부 시작: 23년 4월 10일