[Python] - 웹 크롤링 (selenium/beautifulsoup4 예제)

Notice

Recent Posts

Recent Comments

Link

« 2026/06 »
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Tags more

Archives

Today

Total

관리 메뉴

main

[Python] - 웹 크롤링 (selenium/beautifulsoup4 예제) 본문

Python

[Python] - 웹 크롤링 (selenium/beautifulsoup4 예제)

1984 2022. 9. 2. 22:59

import os
from urllib import request
from urllib.error import HTTPError
from selenium import webdriver
from selenium.webdriver.common.by import By

url = "https://www.atlassian.com/software/confluence/download-archives"
site = "https://www.atlassian.com"
versionList = set()

# Selenium WebDriver
driver = webdriver.Chrome()
driver.get('https://www.atlassian.com/software/confluence/download-archives')

# 다운로드 method
def get_download(url, fname, directory):
    try:
        # 해당 directory로 전환 (dir 생성되어 있어야 함)
        os.chdir(directory)
        # request 다운로드 링크 
        request.urlretrieve(url,fname)
        print(f'다운로드 완료 : {fname}\n')
    except HTTPError as e:
        print(f'다운로드 에러 : {fname}\n')
        return

def main():
    a_download = driver.find_elements(By.XPATH, "//div[@class='download-one-version']/a")
    for dl in a_download:
        versionList.add(dl.get_attribute("data-version"))

    for v in versionList:
            # https://www.atlassian.com/software/confluence/downloads/binary/atlassian-confluence-7.19.1-x64.exe
            # https://www.atlassian.com/software/confluence/downloads/binary/atlassian-confluence-7.0.5.tar.gz
            downloadURL = "https://www.atlassian.com/software/confluence/downloads/binary/atlassian-confluence-" + v + ".tar.gz"
            downloadFileName = "atlassian-confluence-" + v + ".tar.gz"
            downloadDir = "D:/Atlassian-download/Confluence/" + str(v).split('.')[0]
            get_download(downloadURL, downloadFileName, downloadDir)

if __name__ == "__main__":
    main()

해당 다운로드 페이지가 동적 페이지라 다운로드 링크까지 가져오기 어려웠다.
Selenium 이용하여 동적으로 크롤링하려고 했으나, 실패함. 개선 가능할 것으로 생각된다.
beautifulsoup4는 동적 페이지 크롤링 X, selenium과 함께 사용

728x90

'Python' 카테고리의 다른 글

[Django] 프로젝트 내부에 App 만들기 (0)	2023.04.28
[Django] Django 설치 및 프로젝트 생성 (Python 3.10 / Django 4.2) (0)	2023.04.28
[Python] Python3 설치하기 (v 3.8.5) (0)	2022.09.27
[Python] tkinter 사용 예제 - 슬래쉬, 역슬래쉬 변경 GUI (0)	2022.09.02
[Python] - 웹 크롤링 Parsing/Download (beautifulsoup4 예제) (0)	2022.09.01

'Python' Related Articles

Comments

main

[Python] - 웹 크롤링 (selenium/beautifulsoup4 예제) 본문

[Python] - 웹 크롤링 (selenium/beautifulsoup4 예제)

'Python' 카테고리의 다른 글

티스토리툴바