Network

intranet vs Internet vs internet

intranet: Internet의 www기술을 활용하여 특정 단체의 내부 정보시스템을 구축하는 것 혹은 그 네트워크
Internet(International Network): TCP/IP를 활용하여 정보를 주고 받는 통신 네트워크(www)
internet(internetwork): 패킷을 교환하는 방식으로 기기간의 정보를 주고 받는 방식

Network OSI 7 layer

국제 표준화 기구(ISO)가 통신이 일어나는 과정을 7단계로 구분
패킷(packet) : 데이터를 전송할 수 있는 가장 작은 단위
Data Encapsulation : 헤더 혹은 세그먼트가 데이터 앞, 뒤에 붙는 것

OSI 7 layer

TCP/IP Protocol

미국에서 개발한 인터넷 기본 통신 프로토콜

OSI 7 layer wiht tcpip

HTTP

HyperText Transfer Protocol
www상에서 정보를 주고받는 프로토콜
HTTP method: GET, POST, PUT, DELETE

FTP

File Transfer Protocol
서버와 클라이언트 사이에 파일전송을 위한 프로토콜
보안에 매우 취약하기 때문에 FTP만 사용하면 절대 안됨
현재는 FTPS(FTP-SSL), SFTP(simple FTP), SSH(Secure SHell) 등을 사용

SMTP

Simple Mail Transfer Protocol
Internet에서 메일을 보내기 위한 프로토콜

TCP/IP

Transmission Control Protocol / Internet Protocol
전송제어 프로토콜 + 송수신 호스트의 패킷교환을 위한 프로토콜

UDP

User(Universal) Datagram Protocol
빠른 속도
작은 header 크기

Web Programming

Architecture

Web architecture

웹 개발 패턴 변화

front-end : AngularJS, React, view.js
back-end : Node.js(빠르게 구현 가능하지만, 성능 단점이 많음), Java spring, python django

URI, URL, URN

url : www.blog.com/post , 디렉토리 까지 가는것을 의미
urn : www.blog.com/post/notice.html , 목적지 파일까지 접근하는것
url + urn = uri

RESTful API

uri를 이용해 데이터를 전송받는다.

REST API 설계시 주의할 점

버전 관리 https://api.google.com/v1
명사형 사용 https://api.google.com/v1/user
언어 코드 https://api.google.com/v1/user/kr
응답상태 코드 (200, 400, 500)

Respond code

200, 300번대는 성공
400번대는 Not found
500번대는 서버가 다운된 것

status codes

CRUD

대부분의 컴퓨터 소프트웨어가 가지는 기본적인 데이터 처리 기능인 Create(생성), Read(읽기), Update(갱신), Delete(삭제)를 묶어서 일컫는 말이다. 사용자 인터페이스가 갖추어야 할 기능(정보의 참조/검색/갱신)을 가리키는 용어로서도 사용된다.

Database

SQL(Structured Query Language)

1. 데이터 정의 언어

CREATE, DROP, ALTER

2. 데이터 조작 언어

INSERT, UPDATE, DELETE, SELECT

3. 데이터 제어 언어

GRANT, REVOKE

RDB vs NoSQA

RDB : 데이터가 쉽게 변하지 않을 경우 사용하는 것이 좋다. (예: user관리)
NoSQL : 오래저장하지 않는 데이터를 저장할 때 사용하는 것이 좋다. (예: 쇼핑물의 상품리스트)

구분	RDB	NoSQL
형태	Table	Key-Value, Document, Column
데이터	정형 데이터	비정형 데이터
성능	대용량 처리시 저하	잦은 수정시 저하
스키마	고정	Schemeless
유명	Mysql, MariaDB, PostgreSQL	MongoDB, CouchDB, Redis, Cassandra

Document vs Key-value

document
{
    key: value,
    key: {
        key: value,
        key: value
        }
}

key-value
{
    key: value,
    key: value,
    key: value
}

mongoDB keyword

Web Crawling with Python

Scraping vs Crawling vs Parsing

Scraping: 데이터를 수집하는 행위
Crawling: 조직적 자동화된 방법으로 월드 와이드 웹을 탐색하는 것 (자동으로 봇들이 무작위로 수집)
Parsing: 문장 혹은 문서를 구성 성분으로 분해하고 위계관계를 분석하여 문장의 구조를 결정하는 것

Install urllib3, beautifulsoup4

$ pip install urllib3
$ pip install beautifulsoup4

# 설치 확인
$ pip list
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
beautifulsoup4 (4.5.3)
pip (9.0.1)
setuptools (20.10.1)
urllib3 (1.19.1)

prettify with Beautiful Soup

import urllib
from bs4 import BeautifulSoup


html = "<html><head>Welcome!</head><body>Hi, AEWSOME</body></html>"
# BeautifulSoup(markup, "html.parser"): Python’s html.parser
soup = BeautifulSoup(html, 'html.parser')
print(soup.prettify())

Crawling with Beautiful Soup

import urllib.request
from bs4 import BeautifulSoup

url = "https://www.rottentomatoes.com"
html = urllib.request.urlopen(url)
source = html.read()
html.close()

soup = BeautifulSoup(source, "html.parser")
# print(soup)

# find : 가장 먼저 등장하는 태그 하나 리턴
table = soup.find(id="Top-Box-Office") #find는하나만 찾음
# print(table.prettify())

# find_all : 모두 (리스트로) 리턴
all_tr = table.find_all("tr")  #find_all은 전부찾는거
# print(all_tr)

for tr in all_tr:
    all_td = tr.find_all("td")
    score = all_td[0].find("span", attrs={"class":"tMeterScore"}).text
    movie_name = all_td[1].a.text
    amount = all_td[2].a.text
    print(score, movie_name, amount)

Beautiful Soup보다 scrapy를 더 많이 사용한다.

Selenium

Pre 5

Network

intranet vs Internet vs internet

Network OSI 7 layer

TCP/IP Protocol

HTTP

FTP

SMTP

TCP/IP

UDP

Web Programming

Architecture

웹 개발 패턴 변화

URI, URL, URN

RESTful API

REST API 설계시 주의할 점

Respond code

CRUD

Database

SQL(Structured Query Language)

1. 데이터 정의 언어

2. 데이터 조작 언어

3. 데이터 제어 언어

RDB vs NoSQA

Document vs Key-value

Web Crawling with Python

Scraping vs Crawling vs Parsing

Install urllib3, beautifulsoup4

prettify with Beautiful Soup

Crawling with Beautiful Soup

과제

Network

intranet vs Internet vs internet

Network OSI 7 layer

TCP/IP Protocol

HTTP

FTP

SMTP

TCP/IP

UDP

Web Programming

Architecture

웹 개발 패턴 변화

URI, URL, URN

RESTful API

REST API 설계시 주의할 점

Respond code

CRUD

Database

SQL(Structured Query Language)

1. 데이터 정의 언어

2. 데이터 조작 언어

3. 데이터 제어 언어

RDB vs NoSQA

Document vs Key-value

Web Crawling with Python

Scraping vs Crawling vs Parsing

Install urllib3, beautifulsoup4

prettify with Beautiful Soup

Crawling with Beautiful Soup

과제

Share on: