Monthly Archives: August 2012

New Protections for Phishing

Today, Facebook is proud to announce the launch of phish@fb.com, an email address available to the public to report phishing attempts against Facebook. Phishing is any attempt to acquire personal information, such as username, password, or financial information via impersonation or spoofing.

By providing Facebook with reports, we can investigate and request for browser blacklisting and site takedowns where appropriate. We will then work with our eCrime team to ensure we hold bad actors accountable. Additionally, in some cases, we’ll be able to identify victims, and secure their accounts.

You might ask yourself how to spot suspected phishing emails. Our partners at the Anti-Phishing Working Group have put together some helpful tips to avoid being deceived by these messages:

  1. Be suspicious of any email with urgent requests for login or financial information, and remember, unless the email is digitally signed, you can’t be sure it wasn’t forged or ‘spoofed’
  2. Don’t use the links in an email, instant message, or chat to get to any web page if you suspect the message might not be authentic or you don’t trust the sender, instead navigate to the website directly

This new reporting channel will compliment internal systems we have in place to detect phishing sites attempting to steal Facebook user login information.  The internal systems notify our team, so we can gather information on the attack, take the phishing sites offline, and notify users.  Affected users will be prompted to change their password and provided education to better protect themselves in the future.

While rare, we hope that you forward us any phishing attempts you encounter. Together we can help keep these sites off the web and hold the bad guys responsible. As a reminder, you can visitwww.facebook.com/hacked if you think your account may be compromised.

You can find out more about phishing in our Help Center. You can also forward phishing emails to any of the following: APWG (reportphishing@antiphishing.org), the FTC (spam@uce.gov), and the Internet Crime Compliant Center (www.ic3.gov).

https://www.facebook.com/notes/facebook-security/new-protections-for-phishing/10150960472905766

Exploit Magazine

The Exploit Magazine 01/2012

Dear Readers, we proudly present you The ExploitMag. We decided to launch entirely new magazine devoted to exploits. In this very first issue we focused on Metasploit Framework. In the nearest future, you can expect publications on: DoS Attacks, SOAP, WSDL hacking and more. In this issue:

  • Abhinav Das  demystified Metasploit Framework
  • Sudhanshu Chauhan presented Metasploit Exploitation Samples

I hope that you enjoy the reading, stay tuned.

Security Tools

 

Security Tools

Here is a list and a brief description of commonly used security tools. These tools may be licensed as freeware, shareware or commercial products.

Information Gathering

 

Vulnerability Identification

 

Prevention

 


 

Information Gathering:

 

Domain
  • Sam Spade   Online tools for investigating IP addresses and tracking down spammers.
  • WHOIS Lookup   Online access to detailed domain contact information.
  • VisualRoute   An integrated ping, whois, and traceroute that display results on a world map.

 

Network
  • Firewalk   Firewalk uses traceroute-like techniques to analyze IP packet responses to determine gateway ACL filters and map networks. The tool can also determine the filter rules in place on a packet-forwarding device. The newest version of the tool, firewalk/GTK introduces the option of using a graphical interface.
  • Hping2   Hping2 is a network tool that can send custom ICMP/UDP/TCP packets and display target replies like ping does with ICMP replies. It handles fragmentation and arbitrary packet body and size, and can be used to transfer files under supported protocols. Using hping2 you can test firewall rules, perform [spoofed] port scanning, test net performance using different protocols, packet size, TOS (type of service), and fragmentation, do path MTU discovery, transfer files (even between really Fascist firewall rules), perform traceroute-like actions under different protocols, fingerprint remote OSs, audit a TCP/IP stack, etc. Hping2 is a good tool for learning TCP/IP.
  • Nessus   A Remote network security auditor that tests security modules in an attempt to find vulnerable spots that should be fixed. It is made up of two parts: a server, and a client. The server/daemon, nessusd, is in charge of the attacks, whereas the client, nessus, interferes with the user through nice X11/GTK+ interface. Available on several platforms.
  • Ngrep   Ngrep strives to provide most of GNU grep’s common features, only apply them to network traffic instead of files. Ngrep is a pcap-aware tool that will allow you to specify extended regular expressions to match against data payloads of packets. It currently recognizes TCP, UDP, and ICMP across Ethernet, PPP, SLIP and null interfaces, and understands bpf filter logic in the same fashion as more common packet sniffing tools, such as tcpdump and snoop.
  • Tcpdump   A powerful tool for network monitoring and data acquisition. Tcpdump allows you to dump the traffic of a network. It can be used to print out the headers of packets on a network interface that match a given expression. You can use this tool to track down network problems, to detect “ping attacks” or to monitor network activities.
  • Sguil   Sguil is an intuitive GUI that provides realtime events from snort/barnyard. It also includes other components which facilitate the practice of Network Security Monitoring and event driven analysis of IDS alerts. The sguil client is written in tcl/tk and can be run on any operating system that supports tcl/tk (including Linux, BSD, Solaris, MacOS, and Win32).

 

Machine
  • Cybercop Scanner   A pricey, popular commercial scanner that does not come with source code. However, a powerful demo version is available for testing.
  • NAT (NetBIOS Auditing Tool)  The NetBIOS Auditing Tool (NAT) is designed to explore the NETBIOS file-sharing services offered by the target system. It implements a stepwise approach to gather information and attempt to obtain file system-level access as though it were a legitimate local client.
  • Nmap   A Classic high-speed TCP port scanner.

 

Web Site
  • Grab-a-Site   Blue Squirrel’s very effective site grabbing software.
  • Netcraft   Online tool that queries site OS and webserver information.
  • Paros Proxy   Vulnerability scanner. Offers URL spidering and configurable scanning policies. Capable of identifying SQL injection, SQL fingerprinting, Cross site scripting, and various other vulnerabilities.

 


 

Vulnerability Identification:

 

Obtaining Access
  • Crack / Libcrack   Crack 5 is a local password cracker. A good tool for sysadmins to help verify that all users have strong passwords.
  • DSniff   A suite of powerful tools for sniffing networks for passwords and other information. Includes sophisticated techniques for defeating the “protection” of network switchers.
  • Ethereal   Ethereal is a network traffic analyzer, or “sniffer”, for Unix and Unix-like operating systems. It uses GTK+, a graphical user interface library, and libpcap, a packet capture and filtering library.
  • L0pht Crack  L0phtCrack is an NT password auditing tool. It will compute NT user passwords from the cryptographic hashes that are stored by the NT operation system. L0phtcrack can obtain the hashes through many sources (file, network sniffing, registry, etc) and it has numerous methods of generating password guesses (dictionary, brute force, etc).
  • Sniffit   A packet sniffer for TCP/UDP/ICMP packets. Sniffit is able to give you very detailed technical info on these packets (SEC, ACK, TTL, Window, etc) but also packet contents in different formats (hex or plain text, etc.).
  • Snort  Snort is a flexible libpcap-based packet sniffer/logger that can be used as a lightweight network intrusion detection system. It features rules based logging and can perform content searching/matching in addition to being used to detect a variety of other attacks and probes, such as buffer overflows, stealth port scans, CGI attacks, SMB probes, and much more. Snort has a real-time alerting capability, with alerts being sent to syslog, a separate “alert” file, or even to a Windows computer via Samba.
  • Wireshark  Network packet sniffer

 

Maintaining Access
  • John The Ripper   An active password cracker.
  • Netcat  A simple Unix utility, which reads and writes data across network connections using TCP or UDP protocol. It is designed to be a reliable “back-end” tool that can be used directly or easily driven by other programs and scripts. It is also a good network debugging and exploration tool since it can create almost any kind of connection.
  • THC Hydra   Password Brute forcing tool that can be used on multiple protocols

 


 

Prevention:

 

Personal Firewalls

 

 

Various Other Tools
  • Access Data   Computer Forensics Software
  • GPG/PGP   The GNU Privacy Guard (GnuPG) is a complete and free replacement for PGP, developed in Europe. Because it does not use IDEA or RSA it can be used without any restrictions. GnuPG is a RFC2440 (OpenPGP) compliant application. PGP is the famous encryption program which helps secure your data from eavesdroppers and other risks.
  • IPFilter   A TCP/IP packet filter, suitable for use in a firewall environment. The program can either be used as a loadable kernel module or incorporated into your UNIX kernel; a loadable kernel module is recommended. Scripts are also provided to install and patch system files.
  • IPLog   A TCP/IP traffic logger capable of logging TCP, UDP and ICMP traffic. The newest version also includes a packet filter and a scan and attack detector. It currently runs on Linux, FreeBSD, OpenBSD, BSDI and Solaris.
  • IPtables/netfilter/ipchains/ipfwadm   IP packet filter administration for 2.4.X kernels IPtables is used to set up, maintain, and inspect the tables of IP packet filter rules in the Linux kernel. The IPtables tool also supports configuration of dynamic and static network address translation.
  • Libnet   A program for the construction and handling of network packets. Libnet provides a portable framework for low-level network packet writing and handling. Libnet features portable packet creation interfaces at the IP layer and link layer, as well as a host of supplementary functionality. With experience, complex programs can be written.
  • Logcheck   A free program that finds problems and security violations in the system logfiles and emails the results to the administrator.
  • Ntop   Displays network usage in a top-like format(like the Unix top utility). Ntop can also be run in web mode with a web browser.
  • OpenSSH and SSH   The ssh.com version costs money for some uses, but source code is available. Secure rlogin/rsh/rcp replacement (OpenSSH) is derived from OpenBSD’s version of ssh. Secure Shell is a program for logging into a remote machine and for executing commands on a remote machine. It provides secure encrypted communications between two untrusted hosts over an insecure network. X11 connections and arbitrary TCP/IP ports can also be forwarded over the secure channel. It is intended as a replacement for rlogin, rsh and rcp, and can be used to provide rdist, and rsync with a secure communication channel.
  • SARA   The Security Auditor’s Research Assistant (SARA) is a third generation security analysis tool that is based on SATAN. Updated twice each month.
  • Scanlogd   A portscan detecting tool designed to detect portscan attacks on your machine.
  • Retina   Retina has the ability to scan, monitor, and fix vulnerabilities within a network’s Internet, Intranet, and Extranet. Thus, giving the network administrator complete control across all possible points of attack within an organization.
  • TCP wrappers  Also known as TCPD or LOG_TCP. These programs log the client host name of incoming telnet, ftp, rsh, rlogin, finger, etc. requests. Security options are: access control per host, domain and/or service; detection of host name spoofing or host address spoofing; booby traps to implement an early-warning system.
  • Tripwire   This tool may have expensive licensing fees associated with its use. Tripwire is a tool that aids system administrators and users in monitoring a designated set of files for any changes. Used with system files on a regular (e.g., daily) basis, Tripwire can notify system administrators of corrupted or tampered files, so damage control measures can be taken in a timely manner.
  • Curl    Command line based file transfer that uses many different protocols.

 


 

Sites for Additional Tools

 

 

https://www.securitymetrics.com/securitytools.adp

Internet Security Port List

The list of software normally running on computer ports as well as Trojan applications:

1=TCP-MUX – TCP Port Service Multiplexer
2=COMPRESSNET – Management Utility
3=COMPRESSNET – Compression Process
5=RJE – Remote Job Entry
7=ECHO – Echo
9=DISCARD – Discard
11=SYSSTAT – System Status
13=DAYTIME – Daytime
15=NETSTAT – Network Status
17=QOTD – Quote of the Day
18=MSP – Message Send Protocol
19=CHARGEN – Character Generator
20=FTP-DATA – File Transfer Protocol [Default Data]
21=FTP – File Transfer Protocol [Control]
22=SSH – SSH (Secure Shell) Remote Login Protocol
23=TELNET – Telnet
24=PMS – Private Mail System
25=SMTP – Simple Mail Transfer Protocol
27=NSW-FE – NSW User System FE
29=MSG-ICP – Message ICP
31=MSG-AUTH – Message Authentication
33=DSP – Display Support Protocol
35=PPS – Private Printer Server
37=TIME – Time
38=RAP – Route Access Protocol
39=RLP – Resource Location Protocol
41=GRAPHICS – Graphics
42=NAMESERVER – Host Name Server
43=WHOIS – Who Is
44=MPM-FLAGS – MPM FLAGS Protocol
45=MPM – Message Processing Module [recv]
46=MPM-SND – MPM [default send]
47=NI-FTP – NI FTP (File Transfer Protocol)
48=AUDITD – Digital Audit Daemon
49=BBN-LOGIN – Login Host Protocol (TACACS)
50=RE-MAIL-CK – Remote Mail Checking Protocol
51=LA-MAINT – IMP Logical Address Maintenance
52=XNS-TIME – XNS Time Protocol
53=DOMAIN – Domain Name Server
54=XNS-CH – XNS Clearinghouse
55=ISI-GL – ISI Graphics Language
56=XNS-AUTH – XNS Authentication
57=MTP – Private terminal access
58=XNS-MAIL – XNS Mail
59=PFS – Private File System, mIRC DCC Server
60=Unassigned
61=NI-MAIL – NI MAIL
62=ACAS – ACA Services
63=WHOIS++ – whois++
64=COVIA – Communications Integrator (CI)
65=TACACS-DS – TACACS-Database Service
66=SQL*NET – Oracle SQL*NET
67=BOOTPS – Bootstrap Protocol Server
68=BOOTPC – Bootstrap Protocol Client
69=TFTP – Trivial File Transfer Protocol
70=GOPHER – Gopher
71=NETRJS-1 – Remote Job Service
72=NETRJS-2 – Remote Job Service
73=NETRJS-3 – Remote Job Service
74=NETRJS-4 – Remote Job Service
75=PDOS – Private dial out service
76=DEOS – Distributed External Object Store
77=RJE – Private RJE (Remote Job Entry) service
78=VETTCP – vettcp
79=FINGER – Finger
80=WWW-HTTP – World Wide Web HTTP (Hyper Text Transfer Protocol)
81=HOSTS2-NS – HOSTS2 Name Server
82=XFER – XFER Utility
83=MIT-ML-DEV – MIT ML Device
84=CTF – Common Trace Facility
85=MIT-ML-DEV – MIT ML Device
86=MFCOBOL – Micro Focus Cobol
87=LINK – Private terminal link
88=KERBEROS – Kerberos
89=SU-MIT-TG – SU/MIT Telnet Gateway
90=DNSIX – DNSIX Securit Attribute Token Map
91=MIT-DOV – MIT Dover Spooler
92=NPP – Network Printing Protocol
93=DCP – Device Control Protocol
94=OBJCALL – Tivoli Object Dispatcher
95=SUPDUP – SUPDUP
96=DIXIE – DIXIE Protocol Specification
97=SWIFT-RVF – Swift Remote Virtural File Protocol
98=TACNEWS – TAC News
99=METAGRAM – Metagram Relay
100=NEWACCT – [unauthorized use]
101=HOSTNAMES – NIC Host Name Server
102=ISO-TSAP – ISO-TSAP Class 0
103=X400 – x400
104=X400-SND – x400-snd
105=CSNET-NS – Mailbox Name Nameserver
106=3COM-TSMUX – 3COM-TSMUX
107=RTELNET – Remote Telnet Service
108=SNAGAS – SNA Gateway Access Server
109=POP – Post Office Protocol – Version 2
110=POP3 – Post Office Protocol – Version 3
111=SUNRPC – SUN Remote Procedure Call
112=MCIDAS – McIDAS Data Transmission Protocol
113=IDENT – Authentication Service
114=AUDIONEWS – Audio News Multicast
115=SFTP – Simple File Transfer Protocol
116=ANSANOTIFY – ANSA REX Notify
117=UUCP-PATH – UUCP Path Service
118=SQLSERV – SQL Services
119=NNTP – Network News Transfer Protocol
120=CFDPTKT – CFDPTKT
121=ERPC – Encore Expedited Remote Pro.Call
122=SMAKYNET – SMAKYNET
123=NTP – Network Time Protocol
124=ANSATRADER – ANSA REX Trader
125=LOCUS-MAP – Locus PC-Interface Net Map Ser
126=UNITARY – Unisys Unitary Login
127=LOCUS-CON – Locus PC-Interface Conn Server
128=GSS-XLICEN – GSS X License Verification
129=PWDGEN – Password Generator Protocol
130=CISCO-FNA – cisco FNATIVE
131=CISCO-TNA – cisco TNATIVE
132=CISCO-SYS – cisco SYSMAINT
133=STATSRV – Statistics Service
134=INGRES-NET – INGRES-NET Service
135=RPC-LOCATOR – RPC (Remote Procedure Call) Location Service
136=PROFILE – PROFILE Naming System
137=NETBIOS-NS – NETBIOS Name Service
138=NETBIOS-DGM – NETBIOS Datagram Service
139=NETBIOS-SSN – NETBIOS Session Service
140=EMFIS-DATA – EMFIS Data Service
141=EMFIS-CNTL – EMFIS Control Service
142=BL-IDM – Britton-Lee IDM
143=IMAP – Interim Mail Access Protocol v2
144=NEWS – NewS
145=UAAC – UAAC Protocol
146=ISO-TP0 – ISO-IP0
147=ISO-IP – ISO-IP
148=CRONUS – CRONUS-SUPPORT
149=AED-512 – AED 512 Emulation Service
150=SQL-NET – SQL-NET
151=HEMS – HEMS
152=BFTP – Background File Transfer Program
153=SGMP – SGMP
154=NETSC-PROD – NETSC
155=NETSC-DEV – NETSC
156=SQLSRV – SQL Service
157=KNET-CMP – KNET/VM Command/Message Protocol
158=PCMAIL-SRV – PCMail Server
159=NSS-ROUTING – NSS-Routing
160=SGMP-TRAPS – SGMP-TRAPS
161=SNMP – SNMP (Simple Network Management Protocol)
162=SNMPTRAP – SNMPTRAP (Simple Network Management Protocol)
163=CMIP-MAN – CMIP/TCP Manager
164=CMIP-AGENT – CMIP/TCP Agent
165=XNS-COURIER – Xerox
166=S-NET – Sirius Systems
167=NAMP – NAMP
168=RSVD – RSVD
169=SEND – SEND
170=PRINT-SRV – Network PostScript
171=MULTIPLEX – Network Innovations Multiplex
172=CL/1 – Network Innovations CL/1
173=XYPLEX-MUX – Xyplex
174=MAILQ – MAILQ
175=VMNET – VMNET
176=GENRAD-MUX – GENRAD-MUX
177=XDMCP – X Display Manager Control Protocol
178=NEXTSTEP – NextStep Window Server
179=BGP – Border Gateway Protocol
180=RIS – Intergraph
181=UNIFY – Unify
182=AUDIT – Unisys Audit SITP
183=OCBINDER – OCBinder
184=OCSERVER – OCServer
185=REMOTE-KIS – Remote-KIS
186=KIS – KIS Protocol
187=ACI – Application Communication Interface
188=MUMPS – Plus Five’s MUMPS
189=QFT – Queued File Transport
190=GACP – Gateway Access Control Protocol
191=PROSPERO – Prospero Directory Service
192=OSU-NMS – OSU Network Monitoring System
193=SRMP – Spider Remote Monitoring Protocol
194=IRC – Internet Relay Chat Protocol
195=DN6-NLM-AUD – DNSIX Network Level Module Audit
196=DN6-SMM-RED – DNSIX Session Mgt Module Audit Redir
197=DLS – Directory Location Service
198=DLS-MON – Directory Location Service Monitor
199=SMUX – SMUX
200=SRC – IBM System Resource Controller
201=AT-RTMP – AppleTalk Routing Maintenance
202=AT-NBP – AppleTalk Name Binding
203=AT-3 – AppleTalk Unused
204=AT-ECHO – AppleTalk Echo
205=AT-5 – AppleTalk Unused
206=AT-ZIS – AppleTalk Zone Information
207=AT-7 – AppleTalk Unused
208=AT-8 – AppleTalk Unused
209=QMTP – The Quick Mail Transfer Protocol
210=Z39.50 – ANSI Z39.50
211=914C/G – Texas Instruments 914C/G Terminal
212=ANET – ATEXSSTR
213=IPX – IPX
214=VMPWSCS – VM PWSCS
215=SOFTPC – Insignia Solutions
216=CAILIC – Computer Associates Int’l License Server
217=DBASE – dBASE Unix
218=MPP – Netix Message Posting Protocol
219=UARPS – Unisys ARPs
220=IMAP3 – Interactive Mail Access Protocol v3
221=FLN-SPX – Berkeley rlogind with SPX auth
222=RSH-SPX – Berkeley rshd with SPX auth
223=CDC – Certificate Distribution Center
242=DIRECT –
243=SUR-MEAS – Survey Measurement
244=DAYNA –
245=LINK – LINK
246=DSP3270 – Display Systems Protocol
247=SUBNTBCST_TFTP –
248=BHFHS –
256=RAP –
257=SET – Secure Electronic Transaction
258=YAK-CHAT – Yak Winsock Personal Chat
259=ESRO-GEN – Efficient Short Remote Operations
260=OPENPORT –
261=NSIIOPS – IIOP Name Service Over TLS/SSL
262=ARCISDMS –
263=HDAP –
264=BGMP –
280=HTTP-MGMT –
281=PERSONAL-LINK –
282=CABLEPORT-AX – Cable Port A/X
308=NOVASTORBAKCUP – Novastor Backup
309=ENTRUSTTIME –
310=BHMDS –
311=ASIP-WEBADMIN – Appleshare IP Webadmin
312=VSLMP –
313=MAGENTA-LOGIC –
314=OPALIS-ROBOT –
315=DPSI –
316=DECAUTH –
317=ZANNET –
321=PIP –
344=PDAP – Prospero Data Access Protocol
345=PAWSERV – Perf Analysis Workbench
346=ZSERV – Zebra server
347=FATSERV – Fatmen Server
348=CSI-SGWP – Cabletron Management Protocol
349=MFTP –
350=MATIP-TYPE-A – MATIP Type A
351=MATIP-TYPE-B – MATIP Type B or bhoetty
352=DTAG-STE-SB – DTAG, or bhoedap4
353=NDSAUTH –
354=BH611 –
355=DATEX-ASN –
356=CLOANTO-NET-1 – Cloanto Net 1
357=BHEVENT –
358=SHRINKWRAP –
359=TENEBRIS_NTS – Tenebris Network Trace Service
360=SCOI2ODIALOG –
361=SEMANTIX –
362=SRSSEND – SRS Send
363=RSVP_TUNNEL –
364=AURORA-CMGR –
365=DTK – Deception Tool Kit
366=ODMR –
367=MORTGAGEWARE –
368=QBIKGDP –
369=RPC2PORTMAP –
370=CODAAUTH2 –
371=CLEARCASE – Clearcase
372=ULISTSERV – Unix Listserv
373=LEGENT-1 – Legent Corporation
374=LEGENT-2 – Legent Corporation
375=HASSLE – Hassle
376=NIP – Amiga Envoy Network Inquiry Proto
377=TNETOS – NEC Corporation
378=DSETOS – NEC Corporation
379=IS99C – TIA/EIA/IS-99 modem client
380=IS99S – TIA/EIA/IS-99 modem server
381=HP-COLLECTOR – HP Performance Data Collector
382=HP-MANAGED-NODE – HP Performance Data Managed Node
383=HP-ALARM-MGR – HP Performance Data Alarm Manager
384=ARNS – A Remote Network Server System
385=IBM-APP – IBM Application
386=ASA – ASA Message Router Object Def.
387=AURP – Appletalk Update-Based Routing Pro.
388=UNIDATA-LDM – Unidata LDM Version 4
389=LDAP – Lightweight Directory Access Protocol
390=UIS – UIS
391=SYNOTICS-RELAY – SynOptics SNMP Relay Port
392=SYNOTICS-BROKER – SynOptics Port Broker Port
393=DIS – Data Interpretation System
394=EMBL-NDT – EMBL Nucleic Data Transfer
395=NETCP – NETscout Control Protocol
396=NETWARE-IP – Novell Netware over IP
397=MPTN – Multi Protocol Trans. Net.
398=KRYPTOLAN – Kryptolan
399=ISO-TSAP-C2 – ISO Transport Class 2 Non-Control over TCP
400=WORK-SOL – Workstation Solutions
401=UPS – Uninterruptible Power Supply
402=GENIE – Genie Protocol
403=DECAP – decap
404=NCED – nced
405=NCLD – ncld
406=IMSP – Interactive Mail Support Protocol
407=TIMBUKTU – Timbuktu
408=PRM-SM – Prospero Resource Manager Sys. Man.
409=PRM-NM – Prospero Resource Manager Node Man.
410=DECLADEBUG – DECLadebug Remote Debug Protocol
411=RMT – Remote MT Protocol
412=SYNOPTICS-TRAP – Trap Convention Port
413=SMSP – SMSP
414=INFOSEEK – InfoSeek
415=BNET – BNet
416=SILVERPLATTER – Silverplatter
417=ONMUX – Onmux
418=HYPER-G – Hyper-G
419=ARIEL1 – Ariel
420=SMPTE – SMPTE
421=ARIEL2 – Ariel
422=ARIEL3 – Ariel
423=OPC-JOB-START – IBM Operations Planning and Control Start
424=OPC-JOB-TRACK – IBM Operations Planning and Control Track
425=ICAD-EL – ICAD
426=SMARTSDP – smartsdp
427=SVRLOC – Server Location
428=OCS_CMU – OCS_CMU
429=OCS_AMU – OCS_AMU
430=UTMPSD – UTMPSD
431=UTMPCD – UTMPCD
432=IASD – IASD
433=NNSP – NNSP
434=MOBILEIP-AGENT – MobileIP-Agent
435=MOBILIP-MN – MobilIP-MN
436=DNA-CML – DNA-CML
437=COMSCM – comscm
438=DSFGW – dsfgw
439=DASP – dasp
440=SGCP – sgcp
441=DECVMS-SYSMGT – decvms-sysmgt
442=CVC_HOSTD – cvc_hostd
443=HTTPS – HTTPS (Hyper Text Transfer Protocol Secure) – SSL (Secure Socket Layer)
444=SNPP – Simple Network Paging Protocol
445=MICROSOFT-DS – Microsoft-DS
446=DDM-RDB – DDM-RDB
447=DDM-DFM – DDM-RFM
448=DDM-BYTE – DDM-BYTE
449=AS-SERVERMAP – AS Server Mapper
450=TSERVER – TServer
451=SFS-SMP-NET – Cray Network Semaphore server
452=SFS-CONFIG – Cray SFS config server
453=CREATIVESERVER – CreativeServer
454=CONTENTSERVER – ContentServer
455=CREATIVEPARTNR – CreativePartnr
456=MACON-TCP – macon-tcp
457=SCOHELP – scohelp
458=APPLEQTC – Apple Quick Time
459=AMPR-RCMD – ampr-rcmd
460=SKRONK – skronk
461=DATASURFSRV – DataRampSrv
462=DATASURFSRVSEC – DataRampSrvSec
463=ALPES – alpes
464=KPASSWD – kpasswd
465=SSMTP – ssmtp
466=DIGITAL-VRC – digital-vrc
467=MYLEX-MAPD – mylex-mapd
468=PHOTURIS – proturis
469=RCP – Radio Control Protocol
470=SCX-PROXY – scx-proxy
471=MONDEX – Mondex
472=LJK-LOGIN – ljk-login
473=HYBRID-POP – hybrid-pop
474=TN-TL-W1 – tn-tl-w1
475=TCPNETHASPSRV – tcpnethaspsrv
476=TN-TL-FD1 – tn-tl-fd1
477=SS7NS – ss7ns
478=SPSC – spsc
479=IAFSERVER – iafserver
480=IAFDBASE – iafdbase
481=PH – Ph service
482=BGS-NSI – bgs-nsi
483=ULPNET – ulpnet
484=INTEGRA-SME – Integra Software Management Environment
485=POWERBURST – Air Soft Power Burst
486=AVIAN – avian
487=SAFT – saft
488=GSS-HTTP – gss-http
489=NEST-PROTOCOL – nest-protocol
490=MICOM-PFS – micom-pfs
491=GO-LOGIN – go-login
492=TICF-1 – Transport Independent Convergence for FNA
493=TICF-2 – Transport Independent Convergence for FNA
494=POV-RAY – POV-Ray
495=INTECOURIER –
496=PIM-RP-DISC –
497=DANTZ –
498=SIAM –
499=ISO-ILL – ISO ILL Protocol
500=ISAKMP –
501=STMF –
502=ASA-APPL-PROTO –
503=INTRINSA –
504=CITADEL –
505=MAILBOX-LM –
506=OHIMSRV –
507=CRS –
508=XVTTP –
509=SNARE –
510=FCP – FirstClass Protocol
511=PASSGO –
512=EXEC – Remote Process Execution
513=LOGIN – Remote Login via Telnet;
514=SHELL – Automatic Remote Process Execution
515=PRINTER – Printer Spooler
516=VIDEOTEX –
517=TALK –
518=NTALK –
519=UTIME – Unix Time
520=EFS – Extended File Server
521=RIPNG –
522=ULP –
523=IBM-DB2 –
524=NCP –
525=TIMED – Time Server
526=TEMPO – newdate
527=STX – Stock IXChange
528=CUSTIX – Customer IXChange
529=IRC-SERV –
530=COURIER – rpc
531=CONFERENCE – chat
532=NETNEWS – readnews
533=NETWALL – Emergency Broadcasts
534=MM-ADMIN – MegaMedia Admin
535=IIOP –
536=OPALIS-RDV –
537=NMSP – Networked Media Streaming Protocol
538=GDOMAP –
539=APERTUS-LDP – Apertus Technologies Load Determination
540=UUCP – UUCPD (Unix to Unix Copy)
541=UUCP-RLOGIN – uucp (Unix to Unix Copy) – rlogin (Remote Login)
542=COMMERCE –
543=KLOGIN –
544=KSHELL – krcmd
545=APPLEQTCSRVR – Apple qtcsrvr
546=DHCP-CLIENT – DHCP (Dynamic Host Configuration Protocol) Client
547=DHCP-SERVER – DHCP (Dynamic Host Configuration Protocol) Server
548=AFPOVERTCP – AFP over TCP
549=IDFP –
550=NEW-RWHO – new-who
551=CYBERCASH – CyberCash
552=DEVICESHARE – deviceshare
553=PIRP – pirp
554=RTSP – Real Time Stream Control Protocol
555=DSF –
556=REMOTEFS – rfs (Remote File System) server
557=OPENVMS-SYSIPC – openvms-sysipc
558=SDNSKMP – SDNSKMP
559=TEEDTAP – TEEDTAP
560=RMONITOR – rmonitord
561=MONITOR –
562=CHSHELL – chcmd
563=SNEWS – snews
564=9PFS – plan 9 file service
565=WHOAMI – whoami
566=STREETTALK – streettalk
567=BANYAN-RPC – banyan-rpc
568=MS-SHUTTLE – Microsoft Shuttle
569=MS-ROME – Microsoft Rome
570=METER – demon
571=METER – udemon
572=SONAR – sonar
573=BANYAN-VIP – banyan-vip
574=FTP-AGENT – FTP Software Agent System
575=VEMMI – VEMMI
576=IPCD –
577=VNAS –
578=IPDD –
579=DECBSRV –
580=SNTP-HEARTBEAT –
581=BDP – Bundle Discovery Protocol
582=SCC-SECURITY –
583=PHILIPS-VC – PHilips Video-Conferencing
584=KEYSERVER –
585=IMAP4-SSL – IMAP4+SSL
586=PASSWORD-CHG –
587=SUBMISSION –
588=CAL –
589=EYELINK –
590=TNS-CML –
591=HTTP-ALT – FileMaker, Inc. – HTTP Alternate
592=EUDORA-SET –
593=HTTP-RPC-EPMAP – HTTP RPC Ep Map
594=TPIP –
595=CAB-PROTOCOL –
596=SMSD –
597=PTCNAMESERVICE – PTC Name Service
598=SCO-WEBSRVRMG3 – SCO Web Server Manager 3
599=ACP – Aeolon Core Protocol
600=IPCSERVER – Sun IPC server
606=URM – Cray Unified Resource Manager
607=NQS – nqs
608=SIFT-UFT – Sender-Initiated/Unsolicited File Transfer
609=NPMP-TRAP – npmp-trap
610=NPMP-LOCAL – npmp-local
611=NPMP-GUI – npmp-gui
628=QMQP – Qmail Quick Mail Queueing
633=SERVSTAT – Service Status update (Sterling Software)
634=GINAD – ginad
635=MOUNT – NFS Mount Service
636=LDAPSSL – LDAP Over SSL
640=PCNFS – PC-NFS DOS Authentication
650=BWNFS – BW-NFS DOS Authentication
666=DOOM – doom Id Software
674=PORT
704=ELCSD – errlog copy/server daemon
709=ENTRUSTMANAGER – EntrustManager
729=NETVIEWDM1 – IBM NetView DM/6000 Server/Client
730=NETVIEWDM2 – IBM NetView DM/6000 send/tcp
731=NETVIEWDM3 – IBM NetView DM/6000 receive/tcp
737=SOMETIMES-RPC2 – Rusersd on my OpenBSD Box
740=NETCP – NETscout Control Protocol
741=NETGW – netGW
742=NETRCS – Network based Rev. Cont. Sys.
744=FLEXLM – Flexible License Manager
747=FUJITSU-DEV – Fujitsu Device Control
748=RIS-CM – Russell Info Sci Calendar Manager
749=KERBEROS-ADM – kerberos administration
750=KERBEROS-SEC –
751=KERBEROS_MASTER –
752=QRH –
753=RRH –
754=KBR5_PROP –
758=NLOGIN –
759=CON –
760=NS –
761=RXE –
762=QUOTAD –
763=CYCLESERV –
764=OMSERV –
765=WEBSTER –
767=PHONEBOOK – phone
769=VID –
770=CADLOCK –
771=RTIP –
772=CYCLESERV2 –
773=SUBMIT –
774=RPASSWD –
775=ENTOMB –
776=WPAGES –
780=WPGS –
781=HP-COLLECTOR – HP Performance Data Collector
782=HP-MANAGED-NODE – HP Performance Data Managed Node
783=HP-ALARM-MGR – HP Performance Data Alarm Manager
786=CONCERT – Concert
799=CONTROLIT –
800=MDBS_DAEMON –
801=DEVICE –
808=PORT
871=SUPFILESRV = SUP Server
888=CDDATABASE – CDDataBase
901=PORT
911=Dark Shadow
989=FTPS-DATA – FTP Over TLS/SSL
990=FTP Control TLS/SSL
992=TELNETS – telnet protocol over TLS/SSL
993=IMAPS – Imap4 protocol over TLS/SSL
995=POP3S – Pop3 (Post Office Protocol) over TLS/SSL
996=VSINET – vsinet
997=MAITRD –
998=BUSBOY –
999=PUPROUTER –
1000=CADLOCK –
1001=Silence
1008=UFSD – UFSD
1010=Doly-Trojan
1011=Doly-Trojan
1012=Doly-Trojan
1015=Doly-Trojan
1023=RESERVED – Reserved
1024=OLD_FINGER – old_finger
1025=LISTEN – listen
1026=NTERM – nterm
1027=NT
1028=NT
1029=NT
1030=IAD1 – BBN IAD
1031=IAD2 – BBN IAD
1032=IAD3 – BBN IAD
1033=NT
1034=NT
1035=NT
1036=NT
1037=NT
1038=NT
1039=NT
1040=NT
1041=NT
1042=Bla
1043=NT
1044=NT
1045=Rasmin
1046=NT
1047=NT
1048=NT
1049=NT
1058=NIM – nim
1059=NIMREG – nimreg
1067=INSTL_BOOTS – Installation Bootstrap Proto. Serv.
1068=INSTL_BOOTC – Installation Bootstrap Proto. Cli.
1080=SOCKS – Socks
1083=ANSOFT-LM-1 – Anasoft License Manager
1084=ANSOFT-LM-2 – Anasoft License Manager
1090=Xtreme
1103=XAUDIO – Xaserver
1109=KPOP – kpop
1110=NFSD-STATUS – Cluster Status Info
1112=MSQL – Mini-SQL Server
1127=SUPFILEDBG – SUP Debugging
1155=NFA – Network File Access
1167=PHONE – Conference Calling
1170=Psyber Stream Server, Streaming Audio trojan, Voice
1178=SKKSERV – SKK (Kanji Input)
1212=LUPA – lupa
1222=NERV – SNI R&D network
1234=Ultors Trojan
1241=MSG – Remote Message Server
1243=BackDoor-G, SubSeven, SubSeven Apocalypse
1245=Voodoo Doll
1248=HERMES – Multi Media Conferencing
1269=Mavericks Matrix
1330=PORT
1346=ALTA-ANA-LM – Alta Analytics License Manager
1347=BBN-MMC – Multi Media Conferencing
1348=BBN-MMX – Multi Media Conferencing
1349=SBOOK – Registration Network Protocol
1350=EDITBENCH – Registration Network Protocol
1351=EQUATIONBUILDER – Digital Tool Works (MIT)
1352=LOTUSNOTE – Lotus Note
1353=RELIEF – Relief Consulting
1354=RIGHTBRAIN – RightBrain Software
1355=INTUITIVE EDGE – Intuitive Edge
1356=CUILLAMARTIN – CuillaMartin Company
1357=PEGBOARD – Electronic PegBoard
1358=CONNLCLI – CONNLCLI
1359=FTSRV – FTSRV
1360=MIMER – MIMER
1361=LINX – LinX
1362=TIMEFLIES – TimeFlies
1363=NDM-REQUESTER – Network DataMover Requester
1364=NDM-SERVER – Network DataMover Server
1365=ADAPT-SNA – Network Software Associates
1366=NETWARE-CSP – Novell NetWare Comm Service Platform
1367=DCS – DCS
1368=SCREENCAST – ScreenCast
1369=GV-US – GlobalView to Unix Shell
1370=US-GV – Unix Shell to GlobalView
1371=FC-CLI – Fujitsu Config Protocol
1372=FC-SER – Fujitsu Config Protocol
1373=CHROMAGRAFX – Chromagrafx
1374=MOLLY – EPI Software Systems
1375=BYTEX – Bytex
1376=IBM-PPS – IBM Person to Person Software
1377=CICHLID – Cichlid License Manager
1378=ELAN – Elan License Manager
1379=DBREPORTER – Integrity Solutions
1380=TELESIS-LICMAN – Telesis Network License Manager
1381=APPLE-LICMAN – Apple Network License Manager
1382=UDT_OS –
1383=GWHA – GW Hannaway Network License Manager
1384=OS-LICMAN – Objective Solutions License Manager
1385=ATEX_ELMD – Atex Publishing License Manager
1386=CHECKSUM – CheckSum License Manager
1387=CADSI-LM – Computer Aided Design Software Inc LM
1388=OBJECTIVE-DBC – Objective Solutions DataBase Cache
1389=ICLPV-DM – Document Manager
1390=ICLPV-SC – Storage Controller
1391=ICLPV-SAS – Storage Access Server
1392=ICLPV-PM – Print Manager
1393=ICLPV-NLS – Network Log Server
1394=ICLPV-NLC – Network Log Client
1395=ICLPV-WSM – PC Workstation Manager software
1396=DVL-ACTIVEMAIL – DVL Active Mail
1397=AUDIO-ACTIVMAIL – Audio Active Mail
1398=VIDEO-ACTIVMAIL – Video Active Mail
1399=CADKEY-LICMAN – Cadkey License Manager
1400=CADKEY-TABLET – Cadkey Tablet Daemon
1401=GOLDLEAF-LICMAN – Goldleaf License Manager
1402=PRM-SM-NP – Prospero Resource Manager
1403=PRM-NM-NP – Prospero Resource Manager
1404=IGI-LM – Infinite Graphics License Manager
1405=IBM-RES – IBM Remote Execution Starter
1406=NETLABS-LM – NetLabs License Manager
1407=DBSA-LM – DBSA License Manager
1408=SOPHIA-LM – Sophia License Manager
1409=HERE-LM – Here License Manager
1410=HIQ – HiQ License Manager
1411=AF – AudioFile
1412=INNOSYS – InnoSys
1413=INNOSYS-ACL – Innosys-ACL
1414=IBM-MQSERIES – IBM MQSeries
1415=DBSTAR – DBStar
1416=NOVELL-LU6.2 – Novell LU6.2
1417=TIMBUKTU-SRV1 – Timbuktu Service 1 Port
1418=TIMBUKTU-SRV2 – Timbuktu Service 2 Port
1419=TIMBUKTU-SRV3 – Timbuktu Service 3 Port
1420=TIMBUKTU-SRV4 – Timbuktu Service 4 Port
1421=GANDALF-LM – Gandalf License Manager
1422=AUTODESK-LM – Autodesk License Manager
1423=ESSBASE – Essbase Arbor Software
1424=HYBRID – Hybrid Encryption Protocol
1425=ZION-LM – Zion Software License Manager
1426=SAIS – Satellite-data Acquisition System 1
1427=MLOADD – mloadd monitoring tool
1428=INFORMATIK-LM – Informatik License Manager
1429=NMS – Hypercom NMS
1430=TPDU – Hypercom TPDU
1431=RGTP – Reverse Gossip Transport
1432=BLUEBERRY-LM – Blueberry Software License Manager
1433=MS-SQL-S – Microsoft-SQL-Server
1434=MS-SQL-M – Microsoft-SQL-Monitor
1435=IBM-CICS – IBM CICS
1436=SAISM – Satellite-data Acquisition System 2
1437=TABULA – Tabula
1438=EICON-SERVER – Eicon Security Agent/Server
1439=EICON-X25 – Eicon X25/SNA Gateway
1440=EICON-SLP – Eicon Service Location Protocol
1441=CADIS-1 – Cadis License Management
1442=CADIS-2 – Cadis License Management
1443=IES-LM – Integrated Engineering Software
1444=MARCAM-LM – Marcam License Management
1445=PROXIMA-LM – Proxima License Manager
1446=ORA-LM – Optical Research Associates License Manager
1447=APRI-LM – Applied Parallel Research LM
1448=OC-LM – OpenConnect License Manager
1449=PEPORT – PEport
1450=DWF – Tandem Distributed Workbench Facility
1451=INFOMAN – IBM Information Management
1452=GTEGSC-LM – GTE Government Systems License Man
1453=GENIE-LM – Genie License Manager
1454=INTERHDL_ELMD – interHDL License Manager
1455=ESL-LM – ESL License Manager
1456=DCA – DCA
1457=VALISYS-LM – Valisys License Manager
1458=NRCABQ-LM – Nichols Research Corp.
1459=PROSHARE1 – Proshare Notebook Application
1460=PROSHARE2 – Proshare Notebook Application
1461=IBM_WRLESS_LAN – IBM Wireless LAN
1462=WORLD-LM – World License Manager
1463=NUCLEUS – Nucleus
1464=MSL_LMD – MSL License Manager
1465=PIPES – Pipes Platform
1466=OCEANSOFT-LM – Ocean Software License Manager
1467=CSDMBASE – CSDMBASE
1468=CSDM – CSDM
1469=AAL-LM – Active Analysis Limited License Manager
1470=UAIACT – Universal Analytics
1471=CSDMBASE – csdmbase
1472=CSDM – csdm
1473=OPENMATH – OpenMath
1474=TELEFINDER – Telefinder
1475=TALIGENT-LM – Taligent License Manager
1476=CLVM-CFG – clvm-cfg
1477=MS-SNA-SERVER – ms-sna-server
1478=MS-SNA-BASE – ms-sna-base
1479=DBEREGISTER – dberegister
1480=PACERFORUM – PacerForum
1481=AIRS – AIRS
1482=MITEKSYS-LM – Miteksys License Manager
1483=AFS – AFS License Manager
1484=CONFLUENT – Confluent License Manager
1485=LANSOURCE – LANSource
1486=NMS_TOPO_SERV – nms_topo_serv
1487=LOCALINFOSRVR – LocalInfoSrvr
1488=DOCSTOR – DocStor
1489=DMDOCBROKER – dmdocbroker
1490=INSITU-CONF – insitu-conf
1491=ANYNETGATEWAY – anynetgateway
1492=STONE-DESIGN-1 – stone-design-1
1493=NETMAP_LM – netmap_lm
1494=ICA – ica
1495=CVC – cvc
1496=LIBERTY-LM – liberty-lm
1497=RFX-LM – rfx-lm
1498=WATCOM-SQL – Watcom-SQL
1499=FHC – Federico Heinz Consultora
1500=VLSI-LM – VLSI License Manager
1501=SAISCM – Satellite-data Acquisition System 3
1502=SHIVADISCOVERY – Shiva
1503=IMTC-MCS – Databeam
1504=EVB-ELM – EVB Software Engineering License Manager
1505=FUNKPROXY – Funk Software Inc.
1506=UTCD – Universal Time daemon (utcd)
1507=SYMPLEX – symplex
1508=DIAGMOND – diagmond
1509=ROBCAD-LM – Robcad Ltd. License Manager
1510=MVX-LM – Midland Valley Exploration Ltd. Lic. Man.
1511=3L-L1 – 3l-l1
1512=WINS – Microsoft’s Windows Internet Name Service
1513=FUJITSU-DTC – Fujitsu Systems Business of America Inc
1514=FUJITSU-DTCNS – Fujitsu Systems Business of America Inc
1515=IFOR-PROTOCOL – ifor-protocol
1516=VPAD – Virtual Places Audio data
1517=VPAC – Virtual Places Audio control
1518=VPVD – Virtual Places Video data
1519=VPVC – Virtual Places Video control
1520=ATM-ZIP-OFFICE – atm zip office
1521=NCUBE-LM – nCube License Manager
1522=RNA-LM – Ricardo North America License Manager
1523=CICHILD-LM – cichild
1524=INGRESLOCK – ingres
1525=PROSPERO-NP – Prospero Directory Service non-priv
1526=PDAP-NP – Prospero Data Access Prot non-priv
1527=TLISRV – oracle
1528=MCIAUTOREG – micautoreg
1529=COAUTHOR – oracle
1530=RAP-SERVICE – rap-service
1531=RAP-LISTEN – rap-listen
1532=MIROCONNECT – miroconnect
1533=VIRTUAL-PLACES – Virtual Places Software
1534=MICROMUSE-LM – micromuse-lm
1535=AMPR-INFO – ampr-info
1536=AMPR-INTER – ampr-inter
1537=SDSC-LM – isi-lm
1538=3DS-LM – 3ds-lm
1539=INTELLISTOR-LM – Intellistor License Manager
1540=RDS – rds
1541=RDS2 – rds2
1542=GRIDGEN-ELMD – gridgen-elmd
1543=SIMBA-CS – simba-cs
1544=ASPECLMD – aspeclmd
1545=VISTIUM-SHARE – vistium-share
1546=ABBACCURAY – abbaccuray
1547=LAPLINK – laplink
1548=AXON-LM – Axon License Manager
1549=SHIVAHOSE – Shiva Hose
1550=3M-IMAGE-LM – Image Storage license manager 3M Company
1551=HECMTL-DB – HECMTL-DB
1552=PCIARRAY – pciarray
1553=SNA-CS – sna-cs
1554=CACI-LM – CACI Products Company License Manager
1555=LIVELAN – livelan
1556=ASHWIN – AshWin CI Tecnologies
1557=ARBORTEXT-LM – ArborText License Manager
1558=XINGMPEG – xingmpeg
1559=WEB2HOST – web2host
1560=ASCI-VAL – asci-val
1561=FACILITYVIEW – facilityview
1562=PCONNECTMGR – pconnectmgr
1563=CADABRA-LM – Cadabra License Manager
1564=PAY-PER-VIEW – Pay-Per-View
1565=WINDDLB – WinDD
1566=CORELVIDEO – CORELVIDEO
1567=JLICELMD – jlicelmd
1568=TSSPMAP – tsspmap
1569=ETS – ets
1570=ORBIXD – orbixd
1571=RDB-DBS-DISP – Oracle Remote Data Base
1572=CHIP-LM – Chipcom License Manager
1573=ITSCOMM-NS – itscomm-ns
1574=MVEL-LM – mvel-lm
1575=ORACLENAMES – oraclenames
1576=MOLDFLOW-LM – moldflow-lm
1577=HYPERCUBE-LM – hypercube-lm
1578=JACOBUS-LM – Jacobus License Manager
1579=IOC-SEA-LM – ioc-sea-lm
1580=TN-TL-R1 – tn-tl-r1
1581=VMF-MSG-PORT – vmf-msg-port
1582=TAMS-LM – Toshiba America Medical Systems
1583=SIMBAEXPRESS – simbaexpress
1584=TN-TL-FD2 – tn-tl-fd2
1585=INTV – intv
1586=IBM-ABTACT – ibm-abtact
1587=PRA_ELMD – pra_elmd
1588=TRIQUEST-LM – triquest-lm
1589=VQP – VQP
1590=GEMINI-LM – gemini-lm
1591=NCPM-PM – ncpm-pm
1592=COMMONSPACE – commonspace
1593=MAINSOFT-LM – mainsoft-lm
1594=SIXTRAK – sixtrak
1595=RADIO – radio
1596=RADIO-SM – radio-sm
1597=ORBPLUS-IIOP – orbplus-iiop
1598=PICKNFS – picknfs
1599=SIMBASERVICES – simbaservices
1600=ISSD –
1601=AAS – aas
1602=INSPECT – inspect
1603=PICODBC – pickodbc
1604=ICABROWSER – icabrowser
1605=SLP – Salutation Manager (Salutation Protocol)
1606=SLM-API – Salutation Manager (SLM-API)
1607=STT – stt
1608=SMART-LM – Smart Corp. License Manager
1609=ISYSG-LM – isysg-lm
1610=TAURUS-WH – taurus-wh
1611=ILL – Inter Library Loan
1612=NETBILL-TRANS – NetBill Transaction Server
1613=NETBILL-KEYREP – NetBill Key Repository
1614=NETBILL-CRED – NetBill Credential Server
1615=NETBILL-AUTH – NetBill Authorization Server
1616=NETBILL-PROD – NetBill Product Server
1617=NIMROD-AGENT – Nimrod Inter-Agent Communication
1618=SKYTELNET – skytelnet
1619=XS-OPENBACKUP – xs-openbackup
1620=FAXPORTWINPORT – faxportwinport
1621=SOFTDATAPHONE – softdataphone
1622=ONTIME – ontime
1623=JALEOSND – jaleosnd
1624=UDP-SR-PORT – udp-sr-port
1625=SVS-OMAGENT – svs-omagent
1636=CNCP – CableNet Control Protocol
1637=CNAP – CableNet Admin Protocol
1638=CNIP – CableNet Info Protocol
1639=CERT-INITIATOR – cert-initiator
1640=CERT-RESPONDER – cert-responder
1641=INVISION – InVision
1642=ISIS-AM – isis-am
1643=ISIS-AMBC – isis-ambc
1644=SAISEH – Satellite-data Acquisition System 4
1645=DATAMETRICS – datametrics
1646=SA-MSG-PORT – sa-msg-port
1647=RSAP – rsap
1648=CONCURRENT-LM – concurrent-lm
1649=INSPECT – inspect
1650=NKD –
1651=SHIVA_CONFSRVR – shiva_confsrvr
1652=XNMP – xnmp
1653=ALPHATECH-LM – alphatech-lm
1654=STARGATEALERTS – stargatealerts
1655=DEC-MBADMIN – dec-mbadmin
1656=DEC-MBADMIN-H – dec-mbadmin-h
1657=FUJITSU-MMPDC – fujitsu-mmpdc
1658=SIXNETUDR – sixnetudr
1659=SG-LM – Silicon Grail License Manager
1660=SKIP-MC-GIKREQ – skip-mc-gikreq
1661=NETVIEW-AIX-1 – netview-aix-1
1662=NETVIEW-AIX-2 – netview-aix-2
1663=NETVIEW-AIX-3 – netview-aix-3
1664=NETVIEW-AIX-4 – netview-aix-4
1665=NETVIEW-AIX-5 – netview-aix-5
1666=NETVIEW-AIX-6 – netview-aix-6
1667=NETVIEW-AIX-7 – netview-aix-7
1668=NETVIEW-AIX-8 – netview-aix-8
1669=NETVIEW-AIX-9 – netview-aix-9
1670=NETVIEW-AIX-10 – netview-aix-10
1671=NETVIEW-AIX-11 – netview-aix-11
1672=NETVIEW-AIX-12 – netview-aix-12
1673=PROSHARE-MC-1 – Intel Proshare Multicast
1674=PROSHARE-MC-2 – Intel Proshare Multicast
1675=PDP – Pacific Data Products
1676=NEFCOMM1 – netcomm1
1677=GROUPWISE – groupwise
1723=PPTP – pptp
1807=SpySender
1812=RADIUS – RADIUS Authentication Protocol
1813=RADACCT – RADIUS Accounting Protocol
1827=PCM – PCM Agent
1981=Shockrave
1986=LICENSEDAEMON – cisco license management
1987=TR-RSRB-P1 – cisco RSRB Priority 1 port
1988=TR-RSRB-P2 – cisco RSRB Priority 2 port
1989=MSHNET – MHSnet system
1990=STUN-P1 – cisco STUN Priority 1 port
1991=STUN-P2 – cisco STUN Priority 2 port
1992=IPSENDMSG – IPsendmsg
1993=SNMP-TCP-PORT – cisco SNMP TCP port
1994=STUN-PORT – cisco serial tunnel port
1995=PERF-PORT – cisco perf port
1996=TR-RSRB-PORT – cisco Remote SRB port
1997=GDP-PORT – cisco Gateway Discovery Protocol
1998=X25-SVC-PORT – cisco X.25 service (XOT)
1999=TCP-ID-PORT – cisco identification port
2000=CALLBOOK –
2001=DC –
2002=GLOBE –
2003=CFINGER – cfinger
2004=MAILBOX –
2005=BERKNET –
2006=INVOKATOR –
2007=DECTALK –
2008=CONF –
2009=NEWS –
2010=SEARCH –
2011=RAID-CC – raid
2012=TTYINFO –
2013=RAID-AM –
2014=TROFF –
2015=CYPRESS –
2016=BOOTSERVER –
2017=CYPRESS-STAT –
2018=TERMINALDB –
2019=WHOSOCKAMI –
2020=XINUPAGESERVER –
2021=SERVEXEC –
2022=DOWN –
2023=XINUEXPANSION3 –
2024=XINUEXPANSION4 –
2025=ELLPACK –
2026=SCRABBLE –
2027=SHADOWSERVER –
2028=SUBMITSERVER –
2030=DEVICE2 –
2032=BLACKBOARD –
2033=GLOGGER –
2034=SCOREMGR –
2035=IMSLDOC –
2038=OBJECTMANAGER –
2040=LAM –
2041=INTERBASE –
2042=ISIS – isis
2043=ISIS-BCAST – isis-bcast
2044=RIMSL –
2045=CDFUNC –
2046=SDFUNC –
2047=DLS –
2048=DLS-MONITOR – dls-monitor
2064=DISTRIB-NET(__CENSORED__) – A group of lamers working on a closed-source client for solving the RSA cryptographic challenge
2065=DLSRPN – Data Link Switch Read Port Number
2067=DLSWPN – Data Link Switch Write Port Number
2080=Wingate Winsock Redirector Service
2103=ZEPHYR-CLT – Zephyr Serv-HM Conncetion
2104=Zephyr Host Manager
2105=EKLOGIN – Kerberos (v4) Encrypted RLogin
2106=EKSHELL – Kerberos (v4) Encrypted RShell
2108=RKINIT – Kerberos (v4) Remote Initialization
2111=KX – X Over Kerberos
2112=KIP – IP Over Kerberos
2115=Bugs
2120=KAUTH – Remote kauth
2140=Deep Throat, The Invasor
2155=Illusion Mailer
2201=ATS – Advanced Training System Program
2221=UNREG-AB1 – Allen-Bradley unregistered port
2222=UNREG-AB2 – Allen-Bradley unregistered port
2223=INREG-AB3 – Allen-Bradley unregistered port
2232=IVS-VIDEO – IVS Video default
2241=IVSD – IVS Daemon
2283=HVL Rat5
2301=CIM – Compaq Insight Manager
2307=PEHELP – pehelp
2401=CVSPSERVER – CVS Network Server
2430=VENUS –
2431=VENUS-SE –
2432=CODASRV –
2433=CODASRV-SE –
2500=RTSSERV – Resource Tracking system server
2501=RTSCLIENT – Resource Tracking system client
2564=HP-3000-TELNET – HP 3000 NS/VT block mode telnet
2565=Striker
2583=WinCrash
2592=NETREK[GAME] – netrek[game]
2600=Digital Rootbeer
2601=ZEBRA – Zebra VTY
2602=RIPD – RIPd VTY
2603=RIPNGD – RIPngd VTY
2604=OSPFD – OSPFd VTY
2605=BGPD – BGPd VTY
2627=WEBSTER –
2638=Sybase Database
2700=TQDATA – tqdata
2766=LISTEN – System V Listener Port
2784=WWW-DEV – world wide web – development
2800=Phineas Phucker
2989=(UDP) – RAT
3000=UNKNOWN – Unknown Service
3001=NESSUSD – Nessus Security Scanner
3005=DESLOGIN – Encrypted Symmetric Telnet
3006=DESLOGIND –
3024=WinCrash
3049=NSWS –
3064=DISTRIB-NET-PROXY – Stupid closed source distrib.net proxy
3086=SJ3 – SJ3 (Kanji Input)
3128=RingZero –
3129=Masters Paradise –
3130=SQUID-IPC –
3141=VMODEM – VMODEM
3150=Deep Throat, The Invasor
3155=HTTP Proxy
3264=CCMAIL – cc:mail/lotus
3295=PORT
3306=MYSQL
3333=DEC-NOTES – DEC Notes
3421=BMAP – Bull Apprise portmapper
3454=MIRA – Apple Remote Access Protocol
3455=PRSVP – RSVP Port
3456=VAT – VAT default data
3457=VAT-CONTROL – VAT default control
3459=Eclipse 2000
3700=Portal of Doom
3791=Eclypse
3801=(UDP) – Eclypse
3871=PORT
3900=UDT_OS – Unidata UDT OS
3905=PORT
3908=PORT
3920=PORT
3921=PORT
3922=PORT
3923=PORT
3925=PORT
3975=PORT
3984=MAPPER-NODEMGR – MAPPER network node manager
3985=MAPPER-MAPETHD – MAPPER TCP/IP server
3986=MAPPER-WS_ETHD – MAPPER workstation server
3996=PORT
4000=UNKNOWN – Unknown Service
4001=PORT
4008=NETCHEQUE – NetCheque accounting
4045=LOCKD – NFS Lock Daemon
4092=WinCrash
4132=NUTS_DEM – NUTS Daemon
4133=NUTS_BOOTP – NUTS Bootp Server
4321=RWHOIS – Remote Who Is
4333=MSQL – Mini SQL Server
4343=UNICALL – UNICALL
4444=NV-VIDEO – NV Video default
4500=SAE-URN – sae-urn
4501=URN-X-CDCHOICE – urn-x-cdchoice
4557=FAX – fax
4559=HYLAFAX – HylaFAX cli-svr Protocol
4567=File Nail
4590=ICQTrojan
4672=RFA – remote file access server
4899=RAdmin – Remote Administrator
5000=UPnP – Universal Plug and Play
5001=COMMPLEX-LINK –
5002=RFE – radio free ethernet
5003=CLARIS-FMPRO – Claris FileMaker Pro
5004=AVT-PROFILE-1 – avt-profile-1
5005=AVT-PROFILE-2 – avt-profile-2
5010=TELELPATHSTART – TelepathStart
5011=TELELPATHATTACK – TelepathAttack
5031=NetMetro
5050=MMCC – multimedia conference control tool
5075=IISADMIN = IIS Administration Web Site
5145=RMONITOR_SECURE –
5190=AOL – America-Online
5191=AOL-1 – AmericaOnline1
5192=AOL-2 – AmericaOnline2
5193=AOL-3 – AmericaOnline3
5232=SGI-DGL – SGI Distributed Graphics
5236=PADL2SIM
5300=HACL-HB – HA Cluster Heartbeat
5301=HACL-GS – HA Cluster General Services
5302=HACL-CFG – HA Cluster Configuration
5303=HACL-PROBE – HA Cluster Probing
5304=HACL-LOCAL
5305=HACL-TEST
5308=CFENGINE –
5321=Firehotcker
5376=MS FTP
5400=Blade Runner, Back Construction
5401=Blade Runner, Back Construction
5402=Blade Runner, Back Construction
5432=POSTGRES – Postgres Database Server
5500=Hotline Server
5510=SECUREIDPROP – ACE/Server Services
5512=Illusion Maker
5520=SDLOG – ACE/Server Services
5530=SDSERV – ACE/Server Services
5540=SDXAUTHD – ACE/Server Services
5550=Xtcp
5555=ServeMe
5556=Bo
5557=Bo
5569=Robo-Hack
5631=PCANYWHEREDATA –
5632=PCANYWHERESTAT –
5650=MS FTP PORT
5680=CANNA – Canna (Jap Input)
5713=PROSHAREAUDIO – proshare conf audio
5714=PROSHAREVIDEO – proshare conf video
5715=PROSHAREDATA – proshare conf data
5716=PROSHAREREQUEST – proshare conf request
5717=PROSHARENOTIFY – proshare conf notify
5742=WinCrash
5800=VNC – Virtual Network Computing
5801=VNC – Virtual Network Computing
5858=NETREK[GAME] – netrek[game]
5900=VNC – Virtual Network Computing
5901=VNC-1 – Virtual Network Computing Display
5902=VNC-2 – Virtual Network Computing Display
5977=NCD-PREF-TCP – NCD Preferences
5978=NCD-DIAG-TCP – NCD Diagnostics
5979=NCD-CONF-TCP – NCD Configuration
5997=NCD-PREF – NCD Preferences Telnet
5998=NCD-DIAG – NCD Diagnostics Telnet
5999=NCD-CONF – NCD Configuration Telnet
6000=X11 – X Window System
6001=X11:1 – X Window Server
6002=X11:2 – X Window Server
6003=X11:3 – X Window Server
6004=X11:4 – X Window Server
6005=X11:5 – X Window Server
6006=X11:6 – X Window Server
6007=X11:7 – X Window Server
6008=X11:8 – X Window Server
6009=X11:9 – X Window Server
6110=SOFTCM – HP SoftBench CM
6111=SPC – HP SoftBench Sub-Process Control
6112=DTSPCD – dtspcd
6141=META-CORP – Meta Corporation License Manager
6142=ASPENTEC-LM – Aspen Technology License Manager
6143=WATERSHED-LM – Watershed License Manager
6144=STATSCI1-LM – StatSci License Manager – 1
6145=STATSCI2-LM – StatSci License Manager – 2
6146=LONEWOLF-LM – Lone Wolf Systems License Manager
6147=MONTAGE-LM – Montage License Manager
6148=RICARDO-LM – Ricardo North America License Manager
6149=TAL-POD – tal-pod
6400=The Thing
6455=SKIP-CERT-RECV – SKIP Certificate Receive
6456=SKIP-CERT-SEND – SKIP Certificate Send
6558=XDSXDM –
6660=IRC-SERV – irc-serv
6661=IRC-SERV – irc-serv
6662=IRC-SERV – irc-serv
6663=IRC-SERV – irc-serv
6664=IRC-SERV – irc-serv
6665=IRC-SERV – irc-serv
6666=IRC-SERV – irc-serv
6667=IRC – irc
6668=IRC – irc
6669=Vampyre
6670=DeepThroat
6671=IRC-SERV – irc-serv
6771=DeepThroat
6776=BackDoor-G, SubSeven
6912=Shit Heep
6939=Indoctrination
6969=ACMSODA – acmsoda
6970=GateCrasher, Priority, IRC 3
7000=AFSSERV – file server itself
7001=UNKNOWN – Unknown Service
7002=UNKNOWN – Unknown Service
7003=AFS3-VLSERVER – volume location database
7004=AFS3-KASERVER – AFS/Kerberos authentication service
7005=AFS3-VOLSER – volume managment server
7006=AFS3-ERRORS – error interpretation service
7007=AFS3-BOS – basic overseer process
7008=AFS3-UPDATE – server-to-server updater
7009=AFS3-RMTSYS – remote cache manager service
7010=UPS-ONLINET – onlinet uninterruptable power supplies
7100=FONT-SERVICE – X Font Service
7120=IISADMIN = IIS Administration Web Site
7121=VIRPROT-LM – Virtual Prototypes License Manager
7200=FODMS – FODMS FLIP
7201=DLIP – DLIP
7300=NetMonitor
7301=NetMonitor
7306=NetMonitor
7307=NetMonitor
7308=NetMonitor
7309=NetMonitor
7326=ICB – Internet Citizen’s Band
7648=CUCME-1 – CucMe live video/Audio Server
7649=CUCME-2 – CucMe live video/Audio Server
7650=CUCME-3 – CucMe live video/Audio Server
7651=CUCME-4 – CucMe live video/Audio Server
7770=IRC
7777=CBT – cbt
7789=Back Door Setup, ICKiller
8000=Generic – Shared service port
8001=Generic – Shared service port
8002=Generic – Shared service port
8003=Generic – Shared service port
8004=Generic – Shared service port
8005=Generic – Shared service port
8006=Generic – Shared service port
8007=Generic – Shared service port
8008=Generic – Shared service port
8009=Generic – Shared service port
8010=Generic – Shared service port
8080=Generic – Shared service port
8081=Generic – Shared service port
8082=Generic – Shared service port
8083=Generic – Shared service port
8084=Generic – Shared service port
8085=Generic – Shared service port
8086=Generic – Shared service port
8087=Generic – Shared service port
8088=Generic – Shared service port
8100=Generic – Shared service port
8101=Generic – Shared service port
8102=Generic – Shared service port
8103=Generic – Shared service port
8104=Generic – Shared service port
8105=Generic – Shared service port
8106=Generic – Shared service port
8107=Generic – Shared service port
8108=Generic – Shared service port
8109=Generic – Shared service port
8110=Generic – Shared service port
8181=Generic – Shared service port
8383=Generic – Shared service port
8450=NPMP – npmp
8765=Ultraseek
8807=DEMOS NNTP
8888=SiteScope – SiteScope Remote Server Monitoring
8892=SEOSLOAD – eTrust ACX
9000=UNKNOWN – Unknown Service
9001=UNKNOWN
9010=SERVICE
9090=ZEUS-ADMIN – Zeus Admin Server
9095=SERVICE
9100=JETDIRECT – HP JetDirect Card
9200=WAP – Wireless Application Protocol
9201=WAP – Wireless Application Protocol
9202=WAP – Wireless Application Protocol
9203=WAP – Wireless Application Protocol
9400=InCommand
9443=MSN Messenger
9535=MAN –
9872=Portal of Doom
9873=Portal of Doom
9874=Portal of Doom
9875=Portal of Doom
9876=SD – Session Director
9989=iNi-Killer
9998=DEMOS SMTP
9999=DISTINCT – distinct
10005=STEL – Secure Telnet
10067=(UDP) – Portal of Doom
10080=AMANDA – Amanda Backup Util
10082=AMANDA-IDX – Amanda Indexing
10083=AMIDXTAPE – Amanda Tape Indexing
10101=BrainSpy
10167=(UDP) – Portal of Doom
10520=Acid Shivers
10607=Coma
11000=Senna Spy
11223=Progenic trojan
11371=PKSD – PGP Pub. Key Server
12067=Gjamer
12223=Hack?99 KeyLogger
12345=NB – NetBus
12346=GabanBus, NetBus, X-bill
12361=Whack-a-mole
12362=Whack-a-mole
12631=Whackjob
13000=Senna Spy
13326=CROSSFIRE[GAME] – crossfire[game]
16660=Stacheldraht Master Server
16969=Priority
17007=ISODE-DUA –
17300=Kuang2 The Virus
18000=BIIMENU – Beckman Instruments Inc.
20000=Millennium
20001=Millennium backdoor
20005=BTX – Xcept4
20034=Netbus 2 Pro
20203=Logged
21544=Girlfriend
21845=WEBPHONE – webphone
21846=INFO SERVER – info server
21847=CONNECT SERVER – connect server
22222=Prosiak
22273=WNN6 – Wnn6 (Jap. Input)
22289=WNN6_CN – Wnn6 (Chi. Input)
22305=WNN6_KR – Wnn6 (Kor. Input)
22321=WNN6_TW – Wnn6 (Tai. Input)
23456=Evil FTP, Ugly FTP , Whack Job
23476=Donald Dick
23477=Donald Dick
24326=Netscape Server
25000=ICL-TWOBASE1 – icl-twobase1
25001=ICL-TWOBASE2 – icl-twobase2
25002=ICL-TWOBASE3 – icl-twobase3
25003=ICL-TWOBASE4 – icl-twobase4
25004=ICL-TWOBASE5 – icl-twobase5
25005=ICL-TWOBASE6 – icl-twobase6
25006=ICL-TWOBASE7 – icl-twobase7
25007=ICL-TWOBASE8 – icl-twobase8
25008=ICL-TWOBASE9 – icl-twobase9
25009=ICL-TWOBASE10 – icl-twobase10
26000=QUAKEXX
26001=QUAKEXX
26002=QUAKEXX
26208=WNN6_DS – Wnn6 (Dserver)
26274=(UDP) – Delta Source
27119=QUAKEXX
27444=TRINOO_BCAST – Trinoo Attack Tool
27500=QUAKEXX
27501=QUAKEXX
27502=QUAKEXX
27665=TRINOO_MASTER – Trinoo Attack Tool
27910=QUAKEXX
27911=QUAKEXX
27912=QUAKEXX
27913=QUAKEXX
27920=QUAKEXX
27960=QUAKE3SERVER – Quake 3 Arena Server
29891=(UDP) – The Unexplained
29970=PORT
30029=AOL Trojan
30100=NetSphere
30101=Netsphere
30102=NetSphere
30303=Sockets de Troie
30999=Kuang2
31335=TRINOO_REGISTER – Trinoo Attack Tool
31336=Whack
31337=BO – BackOrifice
31338=NetSpy DK
31457=TETRINET (Tetris GAME)
31666=BO Whack
31785=Hack?a?Tack
31787=Hack?a?Tack
31788=Hack?a?Tack
31789=Hack?a?Tack (udp)
31791=Hack?a?Tack (udp)
31792=Hack?a?Tack
32000=Generic – Shared service port
33333=Prosiak
33911=Spirit 2001a
34324=BigGluck, TN
40193=NetWare
40412=The Spy
40421=Agent 40421, Masters Paradise
40422=Masters Paradise
40423=Masters Paradise
40426=Masters Paradise
43188=REACHOUT –
44333=WinRoute
47262=(UDP) – Delta Source
47557=DBBROWSE – Databeam Corporation
50505=Sockets de Troie
50766=Fore , Schwindler
53001=Remote Window Shutdown
54320=BO 2K
54321=SchoolBus
60000=Deep Throat
61466=TeleCommando
65000=Devil
65301=PCANYWHERE

Public Key Cryptostructure – Hakin9 E-book 1/2012

Public Key Cryptostructure - Hakin9 E-book 1/2012

Public Key Cryptostructure – Hakin9 E-book 1/2012

Dear Readers. In our attempt to become more accessible to you and meet the challenges and expectations of the ever-changing tech world, this E-book is available in both .pdf and .epub formats, contained within a single .rar file.

Data Encryption Software
by Hans Fouche
The need for encryption has existed ever since man has learned to comprehend another man’s written language. But it was not until WW I that encryption hardware was used on a massive scale. While we are quite used to the fact that encryption is pretty common on the Internet, we may not realize that in the physical world it may be found not only in military areas, government buildings or corporate headquarters, but also in places as mundane as retail stores. But what are the most common rules governing encryption and what makes good encryption software?

Replacing Tokens with Digital Certificates for User Authentication on Remote VPN. Is this a Bad Idea?
by Luciano Ferrari
Imagine that you were sent a request by senior management, you have a new mission: reduce the costs of the tokens license, improve the user experience via something simpler and keep the same level of security for your remote VPN users. Would you say no? Would you say that this impossible to achieve? Or would you investigate and try to deliver a solution for the business? If you believe this is impossible, I can tell you that you can have something that comes very close.

SafeSlinger: Easy-to-Use and Secure Public-Key Exchange
by Michael W. Farb, Yue-Hsun Lin, Jonathan McCune and Adrian Perrig
For many current Internet applications, users experience a crisis of confidence. Is the email or message we received from the claimed individual or was it sent by an impostor? Cryptography alone cannot address this problem. We have many useful protocols such as SSL or PGP for entities that already share authentic key material, but the root of the problem still remains: how do we obtain the authentic public key from the intended resource or individual? The global certification process for SSL is not without drawbacks and weaknesses, and the usability challenges of decentralized mechanisms such as PGP are well-known.

Choosing Algorithms to Standardise
by Chris J. Mitchell
The developers of the ISO/IEC standard on encryption, ISO/IEC 18033, are facing a dilemma. To maximise interoperability and make life as simple as possible for developers, the smallest possible number of algorithms should be standardised; however, despite this, there seems to be an inexorable growth in the number of standardised algorithms. We put this problem into historical context, and review efforts to devise ways of restricting the numbers of standardised algorithms dating back to the beginning of the development of ISO/IEC 18033. We then consider how and why these efforts have proved inadequate, leading to an almost uncontrollably large number of standardised algorithms. Finally, we discuss recent efforts to address this situation, which appear to have ramifications not only for ISO/IEC but for almost any body seeking to standardise a set of general purpose techniques.

The RSA Algorithm – The Ups and Downs
by Chuck Easttom
RSA is currently the most widely used asymmetric algorithm (Yeh, Huang, Lin & Chang, 2009; Ambedkar, Gupta, Gautam & Bedi, 2011; Stallings, 2010; Mao, 2011). The algorithm was publicly described in 1977 by Ron Rivest, Adi Shamir, and Leonard Adleman at MIT. The letters RSA are the initials of their surnames. The algorithm is based on some interesting relationships with prime numbers. The security of RSA derives from the fact that it is difficult to factor a large integer composed of two or more large prime factors.

How to Improve the Security of Your SSL/TLS Web Server
by Eric Tews
HTTPS – HTTP over the SSL/TLS protocol – is the de-facto standard when it comes to securing websites in the Internet. HTTPS is used for online banking, social networks, live-streaming of music and audio, email, instant messaging and many more applications. The SSL/TLS protocol provides a secure tunnel, through which the HTTP traffic between a web browser and a web server can be transported, if both sides support this protocol. SSL/TLS encrypts the traffic, providing
confidentiality and prevents the traffic from unauthorized and unnoticed modifications, providing integrity protection. To ensure the authenticity of the website, a digital certificate according to the X.509 standard is used.

Stripping SSL Encryption
by Praful Agarwal and Sulabh Jain
Web servers and Web browsers rely on the Secure Sockets Layer (SSL) protocol to create a uniquely encrypted channel for private communications over the public Internet. Each SSL Certificate consists of a public key and a private key. The public key is used to encrypt information and the private key is used to decipher it. When aWeb browser points to a secured domain, a level of encryption is established based on the type of SSL Certificate as well as the client Web browser, operating system and host server’s capabilities. That is why SSL Certificates feature a range of encryption levels such as “up to 256-bit”.
We would like to introduce a tool called “SSL strip” which is based around a Man-in-the-Middle attack (MitM), where users in a particular network can be forcedly redirected from the secure HTTPS to the insecure version (HTTP) of a web page.

Everyday Cryptography: Yet Another Book about Cryptography
by Keith Martin
Cryptography is a subject whose relevance to everyday life has undergone a dramatic transformation in the last few decades. It used to manifest itself in the public imagination through its historical use, primarily to protect military communications, and through recreational puzzles. However, largely due to the development of computer networks, particularly the Internet, most of
us now use cryptography on a daily basis.
As a result there is a substantial interest in cryptography amongst a wide audience that includes information security users and practitioners, researchers, students on university courses, and even the general public. This explains why there are so many books on the subject of cryptography.

http://hakin9.org/public-key-cryptostructure-hakin9-e-book-1-2012/

Bundestag Gesetze

Deutsche Bundesgesetze- und verordnungen

Dieses Git Repository enthält alle Deutschen Bundesgesetze und -verordnungen im Markdown-Format. Als Quelle dienen die XML-Versionen der Gesetze von www.gesetze-im-internet.de.

Warum Git?

Jeder Bürger kann den aktuellen Stand von Gesetzen sehr einfach online finden. Allerdings ist die Entstehung, die historische Entwicklung und die Aktualisierung von Gesetzen nicht einfach und frei nachvollziehbar. Das liegt daran, dass Gesetze nur in ihrer aktuellsten Version präsentiert werden und Änderungen an diesen Gesetzen nicht maschinenlesbar vorliegen. Dies soll hier geändert werden: die aktuellste Version eines Gesetzes wird hier mit Git versioniert gespeichert. Das erlaubt es, die Mächtigkeit von Git auf Gesetze und auf den Gesetzgebungsprozess anzuwenden. Das Einpflegen der kompletten deutschen Gesetzesvergangenheit in Git ist das ferne Ziel.

Warum Markdown?

Gesetze sind Prosa, sie enthalten keine maschinenlesbare Semantik. Eine Auszeichnungssprache wie XML verringert die Menschenlesbarkeit, erschwert die maschinelle Erkennung von Unterschieden und beinhaltet viel überflüssige Syntax.

Markdown ist eine intuitive Formatierung von Text, die ohne zusätzliche Programme für Menschen les- und schreibbar ist. Das passt zu Gesetzestexten, die nur minimale Formatierung benötigen. Weiterhin lässt sich Markdown in andere Formate wie HTML konvertieren und ist damit maschinen-formatierbar.

Pull Requests

Pull Requests können gerne geöffnet werden. Natürlich werden nur solche gemergt, die tatsächlich vom Bundestag verabschiedet wurden und Gesetz geworden sind.

Dennoch sind Änderungsvorschläge an Gesetzen von Parteien oder aus der Zivilgesellschaft als Pull Request nützlich. Die Änderungen lassen sich einfacher im Kontext verstehen, können direkt am Gesetz diskutiert und nachvollziehbar verändert werden.

Offizielle Gesetzesentwürfe, wenn öffentlich verfügbar, werden vom Fork der Bundesregierung als Pull Request an dieses Repository gestellt.

Fehler und Bitte um Mithilfe

Es wird kein Anspruch auf Korrektheit erhoben. Bitte verlassen Sie sich nur auf offizielle Quellen.

Die XML-Quelle ist nicht fehlerfrei und die Konvertierung von XML nach Markdown ist es auch nicht. Das liegt daran, dass die Gesetze im XML-Format das Markup auch für stilistische Auszeichnungen statt nur für semantische Auszeichnungen nutzen. Dies erschwert eine Konvertierung und führt zu fehlerhaftem Markdown. Da fehlerhaftes Markdown immer noch gut lesbar ist, führt dies erst bei einer Weiterverarbeitung zu Problemen.

Commits richten sich nach Möglichkeit nach den Veröffentlichungen im Bundesgesetzblatt und im Amtlichen Teil des Bundesanzeigers. Das funktioniert nicht immer problemlos und erfordert menschliche Unterstützung. Werkzeuge, die die Aktualisierung vereinfachen finden sich im gesetze-tools repository. Mithilfe ist erwünscht.

Um die Fähigkeiten von Git optimal zu nutzen, wird es nötig sein Commits, die Gesetzesänderungen einbringen, von Commits, die z.B. Korrekturen an der Syntax vornehmen oder die README verändern zu unterscheiden. Hier wird um Mithilfe bei der Ausarbeitung eines Git-Workflows für dieses Repostiory gebeten.

Rechtliches

Gesetze sind amtliche Werke und unterliegen nicht dem Urheberrecht.

Kontakt

Twitter: @bundesgit

https://github.com/bundestag/gesetze

What is data science?

<p>We&rsquo;ve all heard it: according to Hal Varian,&nbsp;<a href=”http://www.nytimes.com/2009/08/06/technology/06stats.html”>statistics is the next sexy job</a>. Five years ago, in&nbsp;<a href=”http://oreilly.com/web2/archive/what-is-web-20.html”>What is Web 2.0</a>, Tim O&rsquo;Reilly said that &ldquo;data is the next Intel Inside.&rdquo; But what does that statement mean? Why do we suddenly care about statistics and about data?</p>
<p>In this post, I examine the many sides of data science &mdash; the technologies, the companies and the unique skill sets.</p>
<p>The web is full of &ldquo;data-driven apps.&rdquo; Almost any e-commerce application is a data-driven application. There&rsquo;s a database behind a web front end, and middleware that talks to a number of other databases and data services (credit card processing companies, banks, and so on). But merely using data isn&rsquo;t really what we mean by &ldquo;data science.&rdquo; A data application acquires its value from the data itself, and creates more data as a result. It&rsquo;s not just an application with data; it&rsquo;s a data product. Data science enables the creation of data products.</p>
<p>One of the earlier data products on the Web was the&nbsp;<a href=”http://en.wikipedia.org/wiki/CDDB”>CDDB database</a>. The developers of CDDB realized that any CD had a unique signature, based on the exact length (in samples) of each track on the CD. Gracenote built a database of track lengths, and coupled it to a database of album metadata (track titles, artists, album titles). If you&rsquo;ve ever used iTunes to rip a CD, you&rsquo;ve taken advantage of this database. Before it does anything else, iTunes reads the length of every track, sends it to CDDB, and gets back the track titles. If you have a CD that&rsquo;s not in the database (including a CD you&rsquo;ve made yourself), you can create an entry for an unknown album. While this sounds simple enough, it&rsquo;s revolutionary: CDDB views music as data, not as audio, and creates new value in doing so. Their business is fundamentally different from selling music, sharing music, or analyzing musical tastes (though these can also be &ldquo;data products&rdquo;). CDDB arises entirely from viewing a musical problem as a data problem.</p>
<p>Google is a master at creating data products. Here&rsquo;s a few examples:</p>
<ul>
<li>Google&rsquo;s breakthrough was realizing that a search engine could use input other than the text on the page. Google&rsquo;s&nbsp;<a href=”http://en.wikipedia.org/wiki/PageRank”>PageRank</a>&nbsp;algorithm was among the first to use data outside of the page itself, in particular, the number of links pointing to a page. Tracking links made Google searches much more useful, and PageRank has been a key ingredient to the company&rsquo;s success.</li>
<li>Spell checking isn&rsquo;t a terribly difficult problem, but by suggesting corrections to misspelled searches, and observing what the user clicks in response, Google made it much more accurate. They&rsquo;ve built a dictionary of common misspellings, their corrections, and the contexts in which they occur.</li>
<li>Speech recognition has always been a hard problem, and it remains difficult. But Google has made huge strides by using the voice data they&rsquo;ve collected, and has been able to&nbsp;<a href=”http://gdgt.com/discuss/voice-recognition-is-amazing-ive-only-68e/”>integrate voice search</a>&nbsp;into their core search engine.</li>
<li>During the Swine Flu epidemic of 2009, Google was able to track the progress of the epidemic&nbsp;<a href=”http://www.google.org/flutrends/about/how.html”>by following searches for flu-related topics</a>.</li>
</ul>
<p></p>
<div>
<p>Flu trends</p>
<p><img alt=”” src=”http://s.radar.oreilly.com/2010/06/01/datascience-swing-flu.png&#8221; width=”580″ /></p>
<p>Google was able to spot trends in the Swine Flu epidemic roughly two weeks before the Center for Disease Control by analyzing searches that people were making in different regions of the country.</p>
</div>
<p>Google isn&rsquo;t the only company that knows how to use data.&nbsp;<a href=”http://www.facebook.com/”>Facebook</a>&nbsp;and&nbsp;<a href=”http://www.linkedin.com/”>LinkedIn</a>&nbsp;use patterns of friendship relationships to suggest other people you may know, or should know, with sometimes frightening accuracy.&nbsp;<a href=”http://www.amazon.com/”>Amazon</a>&nbsp;saves your searches, correlates what you search for with what other users search for, and uses it to create surprisingly appropriate recommendations. These recommendations are &ldquo;data products&rdquo; that help to drive Amazon&rsquo;s more traditional retail business. They come about because Amazon understands that a book isn&rsquo;t just a book, a camera isn&rsquo;t just a camera, and a customer isn&rsquo;t just a customer; customers generate a trail of &ldquo;data exhaust&rdquo; that can be mined and put to use, and a camera is a cloud of data that can be correlated with the customers&rsquo; behavior, the data they leave every time they visit the site.</p>
<p>The thread that ties most of these applications together is that data collected from users provides added value. Whether that data is search terms, voice samples, or product reviews, the users are in a feedback loop in which they contribute to the products they use. That&rsquo;s the beginning of data science.</p>
<p>In the last few years, there has been an explosion in the amount of data that&rsquo;s available. Whether we&rsquo;re talking about web server logs, tweet streams, online transaction records, &ldquo;citizen science,&rdquo; data from sensors, government data, or some other source, the problem isn&rsquo;t finding data, it&rsquo;s figuring out what to do with it. And it&rsquo;s not just companies using their own data, or the data contributed by their users. It&rsquo;s increasingly common to mashup data from a number of sources. &ldquo;<a href=”http://oreilly.com/catalog/9780596804787″>Data Mashups in R</a>&rdquo; analyzes mortgage foreclosures in Philadelphia County by taking a public report from the county sheriff&rsquo;s office, extracting addresses and using Yahoo to convert the addresses to latitude and longitude, then using the geographical data to place the foreclosures on a map (another data source), and group them by neighborhood, valuation, neighborhood per-capita income, and other socio-economic factors.</p>
<p>The question facing every company today, every startup, every non-profit, every project site that wants to attract a community, is how to use data effectively &mdash; not just their own data, but all the data that&rsquo;s available and relevant. Using data effectively requires something different from traditional statistics, where actuaries in business suits perform arcane but fairly well-defined kinds of analysis. What differentiates data science from statistics is that data science is a holistic approach. We&rsquo;re increasingly finding data in the wild, and data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.</p>
<p>To get a sense for what skills are required, let&rsquo;s look at the data lifecycle: where it comes from, how you use it, and where it goes.</p>
<h2 id=”data-location”>Where data comes from</h2>
<p>Data is everywhere: your government, your web server, your business partners,&nbsp;<a href=”http://www.nytimes.com/2010/05/02/magazine/02self-measurement-t.html?ref=magazine”>even your body</a>. While we aren&rsquo;t drowning in a sea of data, we&rsquo;re finding that almost everything can (or has) been instrumented. At O&rsquo;Reilly, we frequently combine publishing industry data from<a href=”http://en.wikipedia.org/wiki/Nielsen_BookScan”>Nielsen BookScan</a>&nbsp;with our own sales data, publicly available Amazon data, and even job data to see what&rsquo;s happening in the publishing industry. Sites like&nbsp;<a href=”http://infochimps.org/”>Infochimps</a>&nbsp;and&nbsp;<a href=”http://www.factual.com/”>Factual</a>&nbsp;provide access to many large datasets, including climate data, MySpace activity streams, and game logs from sporting events. Factual enlists users to update and improve its datasets, which cover topics as diverse as endocrinologists to hiking trails.</p>
<div>
<p>1956 disk drive</p>
<p><img alt=”” src=”http://s.radar.oreilly.com/2010/06/01/datascience-56-drive.jpg&#8221; /></p>
<p>One of the first commercial disk drives from IBM. It has a 5 MB capacity and it&rsquo;s stored in a cabinet roughly the size of a luxury refrigerator. In contrast, a 32 GB microSD card measures around 5/8 x 3/8 inch and weighs about 0.5 gram.</p>
<p>Photo: Mike Loukides. Disk drive on display at&nbsp;<a href=”http://www.almaden.ibm.com/”>IBM Almaden Research</a></p>
</div>
<p>Much of the data we currently work with is the direct consequence of Web 2.0, and of Moore&rsquo;s Law applied to data. The web has people spending more time online, and leaving a trail of data wherever they go. Mobile applications leave an even richer data trail, since many of them are annotated with geolocation, or involve video or audio, all of which can be mined. Point-of-sale devices and frequent-shopper&rsquo;s cards make it possible to capture all of your retail transactions, not just the ones you make online. All of this data would be useless if we couldn&rsquo;t store it, and that&rsquo;s where Moore&rsquo;s Law comes in. Since the early &rsquo;80s, processor speed has increased from&nbsp;<a href=”http://en.wikipedia.org/wiki/Motorola_68000″>10 MHz</a>to 3.6 GHz &mdash; an increase of 360 (not counting increases in word length and number of cores). But we&rsquo;ve seen much bigger increases in storage capacity, on every level. RAM has moved from $1,000/MB to roughly $25/GB &mdash; a price reduction of about 40000, to say nothing of the reduction in size and increase in speed. Hitachi made the&nbsp;<a href=”http://news.cnet.com/2300-1010_3-6031405-6.html”>first gigabyte disk drives</a>&nbsp;in 1982, weighing in at roughly 250 pounds; now terabyte drives are consumer equipment, and a 32 GB microSD card weighs about half a gram. Whether you look at bits per gram, bits per dollar, or raw capacity, storage has more than kept pace with the increase of CPU speed.</p>
<p>The importance of Moore&rsquo;s law as applied to data isn&rsquo;t just geek pyrotechnics. Data expands to fill the space you have to store it. The more storage is available, the more data you will find to put into it. The data exhaust you leave behind whenever you surf the web, friend someone on Facebook, or make a purchase in your local supermarket, is all carefully collected and analyzed. Increased storage capacity demands increased sophistication in the analysis and use of that data. That&rsquo;s the foundation of data science.</p>
<p>So, how do we make that data useful? The first step of any data analysis project is &ldquo;data conditioning,&rdquo; or getting data into a state where it&rsquo;s usable. We are seeing more data in formats that are easier to consume: Atom data feeds, web services, microformats, and other newer technologies provide data in formats that&rsquo;s directly machine-consumable. But old-style&nbsp;<a href=”http://en.wikipedia.org/wiki/Data_scraping#Screen_scraping”>screen scraping</a>&nbsp;hasn&rsquo;t died, and isn&rsquo;t going to die. Many sources of &ldquo;wild data&rdquo; are extremely messy. They aren&rsquo;t well-behaved XML files with all the metadata nicely in place. The foreclosure data used in &ldquo;<a href=”http://oreilly.com/catalog/9780596804787%20id=hni2″>Data Mashups in R</a>&rdquo; was posted on a public website by the Philadelphia county sheriff&rsquo;s office. This data was presented as an HTML file that was probably generated automatically from a spreadsheet. If you&rsquo;ve ever seen the HTML that&rsquo;s generated by Excel, you know that&rsquo;s going to be fun to process.</p>
<p>Data conditioning can involve cleaning up messy HTML with tools like&nbsp;<a href=”http://www.crummy.com/software/BeautifulSoup/”>Beautiful Soup</a>, natural language processing to parse plain text in English and other languages, or even getting humans to do the dirty work. You&rsquo;re likely to be dealing with an array of data sources, all in different forms. It would be nice if there was a standard set of tools to do the job, but there isn&rsquo;t. To do data conditioning, you have to be ready for whatever comes, and be willing to use anything from ancient Unix utilities such as&nbsp;<a href=”http://oreilly.com/catalog/9780596000707″>awk</a>&nbsp;to XML parsers and machine learning libraries. Scripting languages, such as&nbsp;<a href=”http://oreilly.com/perl/”>Perl</a>&nbsp;and&nbsp;<a href=”http://oreilly.com/python/”>Python</a&gt;, are essential.</p>
<p>Once you&rsquo;ve parsed the data, you can start thinking about the quality of your data. Data is frequently missing or incongruous. If data is missing, do you simply ignore the missing points? That isn&rsquo;t always possible. If data is incongruous, do you decide that something is wrong with badly behaved data (after all, equipment fails), or that the incongruous data is telling its own story, which may be more interesting? It&rsquo;s reported that the discovery of ozone layer depletion was delayed because&nbsp;<a href=”http://www.nas.nasa.gov/About/Education/Ozone/history.html”>automated data collection tools discarded readings that were too low</a>&nbsp;<sup><a href=”http://radar.oreilly.com/2010/06/what-is-data-science.html#footnote-1″>1</a></sup&gt;. In data science, what you have is frequently all you&rsquo;re going to get. It&rsquo;s usually impossible to get &ldquo;better&rdquo; data, and you have no alternative but to work with the data at hand.</p>
<p>If the problem involves human language, understanding the data adds another dimension to the problem. Roger Magoulas, who runs the data analysis group at O&rsquo;Reilly, was recently searching a database for Apple job listings requiring geolocation skills. While that sounds like a simple task, the trick was disambiguating &ldquo;Apple&rdquo; from many job postings in the growing Apple industry. To do it well you need to understand the grammatical structure of a job posting; you need to be able to parse the English. And that problem is showing up more and more frequently. Try using<a href=”http://google.com/trends”>Google Trends</a>&nbsp;to figure out what&rsquo;s happening with the&nbsp;<a href=”http://google.com/trends?q=Cassandra”>Cassandra</a>&nbsp;database or the&nbsp;<a href=”http://google.com/trends?q=Python”>Python</a>language, and you&rsquo;ll get a sense of the problem. Google has indexed many, many websites about large snakes. Disambiguation is never an easy task, but tools like the&nbsp;<a href=”http://www.nltk.org/”>Natural Language Toolkit</a>&nbsp;library can make it simpler.</p>
<p>When natural language processing fails, you can replace artificial intelligence with human intelligence. That&rsquo;s where services like Amazon&rsquo;s&nbsp;<a href=”https://www.mturk.com/mturk/welcome%20id=k3la”>Mechanical Turk</a>&nbsp;come in. If you can split your task up into a large number of subtasks that are easily described, you can use Mechanical Turk&rsquo;s marketplace for cheap labor. For example, if you&rsquo;re looking at job listings, and want to know which originated with Apple, you can have real people do the classification for roughly $0.01 each. If you have already reduced the set to 10,000 postings with the word &ldquo;Apple,&rdquo; paying humans $0.01 to classify them only costs $100.</p>
<h2 id=”data-scale”>Working with data at scale</h2>
<p>We&rsquo;ve all heard a lot about &ldquo;big data,&rdquo; but &ldquo;big&rdquo; is really a red herring. Oil companies, telecommunications companies, and other data-centric industries have had huge datasets for a long time. And as storage capacity continues to expand, today&rsquo;s &ldquo;big&rdquo; is certainly tomorrow&rsquo;s &ldquo;medium&rdquo; and next week&rsquo;s &ldquo;small.&rdquo; The most meaningful definition I&rsquo;ve heard:&nbsp;<em>&ldquo;big data&rdquo; is when the size of the data itself becomes part of the problem</em>. We&rsquo;re discussing data problems ranging from gigabytes to petabytes of data. At some point, traditional techniques for working with data run out of steam.</p>
<p>What are we trying to do with data that&rsquo;s different? According to Jeff Hammerbacher&nbsp;<sup><a href=”http://radar.oreilly.com/2010/06/what-is-data-science.html#footnote-2″>2</a></sup>(<a href=”http://twitter.com/hackingdata”>@hackingdata</a&gt;), we&rsquo;re trying to build information platforms or dataspaces. Information platforms are similar to traditional data warehouses, but different. They expose rich APIs, and are designed for exploring and understanding the data rather than for traditional analysis and reporting. They accept all data formats, including the most messy, and their schemas evolve as the understanding of the data changes.</p>
<p>Most of the organizations that have built data platforms have found it necessary to go beyond the relational database model. Traditional relational database systems stop being effective at this scale. Managing sharding and replication across a horde of database servers is difficult and slow. The need to define a schema in advance conflicts with reality of multiple, unstructured data sources, in which you may not know what&rsquo;s important until after you&rsquo;ve analyzed the data. Relational databases are designed for consistency, to support complex transactions that can easily be rolled back if any one of a complex set of operations fails. While rock-solid consistency is crucial to many applications, it&rsquo;s not really necessary for the kind of analysis we&rsquo;re discussing here. Do you really care if you have 1,010 or 1,012 Twitter followers? Precision has an allure, but in most data-driven applications outside of finance, that allure is deceptive. Most data analysis is comparative: if you&rsquo;re asking whether sales to Northern Europe are increasing faster than sales to Southern Europe, you aren&rsquo;t concerned about the difference between 5.92 percent annual growth and 5.93 percent.</p>
<p>To store huge datasets effectively, we&rsquo;ve seen a new breed of databases appear. These are frequently called NoSQL databases, or Non-Relational databases, though neither term is very useful. They group together fundamentally dissimilar products by telling you what they aren&rsquo;t. Many of these databases are the logical descendants of Google&rsquo;s&nbsp;<a href=”http://labs.google.com/papers/bigtable.html”>BigTable</a>&nbsp;and Amazon&rsquo;s<a href=”http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html”>Dynamo</a&gt;, and are designed to be distributed across many nodes, to provide &ldquo;eventual consistency&rdquo; but not absolute consistency, and to have very flexible schema. While there are two dozen or so products available (almost all of them open source), a few leaders have established themselves:</p>
<ul>
<li><a href=”http://cassandra.apache.org/”>Cassandra</a&gt;: Developed at Facebook, in production use at Twitter, Rackspace, Reddit, and other large sites. Cassandra is designed for high performance, reliability, and automatic replication. It has a very flexible data model. A new startup,&nbsp;<a href=”http://www.riptano.com/”>Riptano</a&gt;, provides commercial support.</li>
<li><a href=”http://hadoop.apache.org/hbase/”>HBase</a&gt;: Part of the Apache Hadoop project, and modelled on Google&rsquo;s BigTable. Suitable for extremely large databases (billions of rows, millions of columns), distributed across thousands of nodes. Along with Hadoop, commercial support is provided by&nbsp;<a href=”http://www.cloudera.com/”>Cloudera</a&gt;.</li>
</ul>
<p>Storing data is only part of building a data platform, though. Data is only useful if you can do something with it, and enormous datasets present computational problems. Google popularized the&nbsp;<a href=”http://labs.google.com/papers/mapreduce.html”>MapReduce</a>&nbsp;approach, which is basically a divide-and-conquer strategy for distributing an extremely large problem across an extremely large computing cluster. In the &ldquo;map&rdquo; stage, a programming task is divided into a number of identical subtasks, which are then distributed across many processors; the intermediate results are then combined by a single reduce task. In hindsight, MapReduce seems like an obvious solution to Google&rsquo;s biggest problem, creating large searches. It&rsquo;s easy to distribute a search across thousands of processors, and then combine the results into a single set of answers. What&rsquo;s less obvious is that MapReduce has proven to be widely applicable to many large data problems, ranging from search to machine learning.</p>
<p>The most popular open source implementation of MapReduce is the&nbsp;<a href=”http://hadoop.apache.org/”>Hadoop project</a>. Yahoo&rsquo;s claim that they had built the&nbsp;<a href=”http://developer.yahoo.net/blogs/hadoop/2008/02/yahoo-worlds-largest-production-hadoop.html”>world&rsquo;s largest production Hadoop application</a>, with 10,000 cores running Linux, brought it onto center stage. Many of the key Hadoop developers have found a home at&nbsp;<a href=”http://www.cloudera.com/”>Cloudera</a&gt;, which provides commercial support. Amazon&rsquo;s&nbsp;<a href=”http://aws.amazon.com/elasticmapreduce/”>Elastic MapReduce</a>&nbsp;makes it much easier to put Hadoop to work without investing in racks of Linux machines, by providing preconfigured Hadoop images for its EC2 clusters. You can allocate and de-allocate processors as needed, paying only for the time you use them.</p>
<p><a href=”http://oreilly.com/catalog/9780596521981″>Hadoop</a>&nbsp;goes far beyond a simple MapReduce implementation (of which there are several); it&rsquo;s the key component of a data platform. It incorporates&nbsp;<a href=”http://hadoop.apache.org/hdfs/”>HDFS</a&gt;, a distributed filesystem designed for the performance and reliability requirements of huge datasets; the HBase database;&nbsp;<a href=”http://hadoop.apache.org/hive/”>Hive</a&gt;, which lets developers explore Hadoop datasets using SQL-like queries; a high-level dataflow language called&nbsp;<a href=”http://hadoop.apache.org/pig/”>Pig</a&gt;; and other components. If anything can be called a one-stop information platform, Hadoop is it.</p>
<p>Hadoop has been instrumental in enabling &ldquo;agile&rdquo; data analysis. In software development, &ldquo;agile practices&rdquo; are associated with faster product cycles, closer interaction between developers and consumers, and testing. Traditional data analysis has been hampered by extremely long turn-around times. If you start a calculation, it might not finish for hours, or even days. But Hadoop (and particularly Elastic MapReduce) make it easy to build clusters that can perform computations on long datasets quickly. Faster computations make it easier to test different assumptions, different datasets, and different algorithms. It&rsquo;s easer to consult with clients to figure out whether you&rsquo;re asking the right questions, and it&rsquo;s possible to pursue intriguing possibilities that you&rsquo;d otherwise have to drop for lack of time.</p>
<p>Hadoop is essentially a batch system, but&nbsp;<a href=”http://code.google.com/p/hop/”>Hadoop Online Prototype (HOP)</a>&nbsp;is an experimental project that enables stream processing. Hadoop processes data as it arrives, and delivers intermediate results in (near) real-time. Near real-time data analysis enables features like<a href=”http://search.twitter.com/”>trending topics</a>&nbsp;on sites like&nbsp;<a href=”http://twitter.com/”>Twitter</a&gt;. These features only require soft real-time; reports on trending topics don&rsquo;t require millisecond accuracy. As with the number of followers on Twitter, a &ldquo;trending topics&rdquo; report only needs to be current to within five minutes &mdash; or even an hour. According to Hilary Mason (<a href=”http://twitter.com/hmason”>@hmason</a&gt;), data scientist at&nbsp;<a href=”http://bit.ly/”>bit.ly</a&gt;, it&rsquo;s possible to precompute much of the calculation, then use one of the experiments in real-time MapReduce to get presentable results.</p>
<p>Machine learning is another essential tool for the data scientist. We now expect web and mobile applications to incorporate recommendation engines, and building a recommendation engine is a quintessential artificial intelligence problem. You don&rsquo;t have to look at many modern web applications to see classification, error detection, image matching (behind&nbsp;<a href=”http://www.google.com/mobile/goggles/”>Google Goggles</a>&nbsp;and<a href=”http://www.snaptell.com/”>SnapTell</a&gt;) and even face detection &mdash; an ill-advised mobile application lets you take someone&rsquo;s picture with a cell phone, and look up that person&rsquo;s identity using photos available online.<a href=”http://www.stanford.edu/class/cs229/”>Andrew Ng&rsquo;s Machine Learning course</a>&nbsp;is one of the most popular courses in computer science at Stanford, with hundreds of students (<a href=”http://www.youtube.com/watch?v=UzxYlbK2c7E%20id=j0ha”>this video is highly recommended</a>).</p>
<p>There are many libraries available for machine learning:&nbsp;<a href=”http://pybrain.org/”>PyBrain</a>&nbsp;in Python,&nbsp;<a href=”http://elefant.developer.nicta.com.au/”>Elefant</a&gt;,&nbsp;<a href=”http://www.cs.waikato.ac.nz/ml/weka/”>Weka</a>&nbsp;in Java, and&nbsp;<a href=”http://lucene.apache.org/mahout/”>Mahout</a>&nbsp;(coupled to Hadoop). Google has just announced their&nbsp;<a href=”http://code.google.com/apis/predict/”>Prediction API</a>, which exposes their machine learning algorithms for public use via a RESTful interface. For computer vision, the&nbsp;<a href=”http://opencv.willowgarage.com/wiki/”>OpenCV</a>&nbsp;library is a de-facto standard.</p>
<p><a href=”https://www.mturk.com/mturk/welcome%20id=k3la”>Mechanical Turk</a>&nbsp;is also an important part of the toolbox. Machine learning almost always requires a &ldquo;training set,&rdquo; or a significant body of known data with which to develop and tune the application. The Turk is an excellent way to develop training sets. Once you&rsquo;ve collected your training data (perhaps a large collection of public photos from Twitter), you can have humans classify them inexpensively &mdash; possibly sorting them into categories, possibly drawing circles around faces, cars, or whatever interests you. It&rsquo;s an excellent way to classify a few thousand data points at a cost of a few cents each. Even a relatively large job only costs a few hundred dollars.</p>
<p>While I haven&rsquo;t stressed traditional statistics, building statistical models plays an important role in any data analysis. According to&nbsp;<a href=”http://www.dataspora.com/”>Mike Driscoll</a>&nbsp;(<a href=”http://twitter.com/dataspora”>@dataspora</a&gt;), statistics is the &ldquo;grammar of data science.&rdquo; It is crucial to &ldquo;making data speak coherently.&rdquo; We&rsquo;ve all heard the joke that eating pickles causes death, because everyone who dies has eaten pickles. That joke doesn&rsquo;t work if you understand what correlation means. More to the point, it&rsquo;s easy to notice that one advertisement for&nbsp;<em><a href=”http://oreilly.com/catalog/9780596801717/”>R in a Nutshell</a></em>&nbsp;generated 2 percent more conversions than another. But it takes statistics to know whether this difference is significant, or just a random fluctuation. Data science isn&rsquo;t just about the existence of data, or making guesses about what that data might mean; it&rsquo;s about testing hypotheses and making sure that the conclusions you&rsquo;re drawing from the data are valid. Statistics plays a role in everything from traditional business intelligence (BI) to understanding how Google&rsquo;s ad auctions work. Statistics has become a basic skill. It isn&rsquo;t superseded by newer techniques from machine learning and other disciplines; it complements them.</p>
<p>While there are many commercial statistical packages, the open source&nbsp;<a href=”http://www.r-project.org/”>R language</a>&nbsp;&mdash; and its comprehensive package library,&nbsp;<a href=”http://cran.r-project.org/”>CRAN</a>&nbsp;&mdash; is an essential tool. Although R is an odd and quirky language, particularly to someone with a background in computer science, it comes close to providing &ldquo;one stop shopping&rdquo; for most statistical work. It has excellent graphics facilities; CRAN includes parsers for many kinds of data; and newer extensions extend R into distributed computing. If there&rsquo;s a single tool that provides an end-to-end solution for statistics work, R is it.</p>
<h2 id=”data-story”>Making data tell its story</h2>
<p>A picture may or may not be worth a thousand words, but a picture is certainly worth a thousand numbers. The problem with most data analysis algorithms is that they generate a set of numbers. To understand what the numbers mean, the stories they are really telling, you need to generate a graph. Edward Tufte&rsquo;s&nbsp;<a href=”http://www.amazon.com/Visual-Display-Quantitative-Information-2nd/dp/0961392142/”>Visual Display of Quantitative Information</a>&nbsp;is the classic for data visualization, and a foundational text for anyone practicing data science. But that&rsquo;s not really what concerns us here. Visualization is crucial to each stage of the data scientist. According to Martin Wattenberg (<a href=”http://twitter.com/wattenberg”>@wattenberg</a&gt;, founder of&nbsp;<a>Flowing Media)</a>, visualization is key to data conditioning: if you want to find out just how bad your data is, try plotting it. Visualization is also frequently the first step in analysis. Hilary Mason says that when she gets a new data set, she starts by making a dozen or more scatter plots, trying to get a sense of what might be interesting. Once you&rsquo;ve gotten some hints at what the data might be saying, you can follow it up with more detailed analysis.</p>
<p>There are many packages for plotting and presenting data.&nbsp;<a href=”http://www.gnuplot.info/”>GnuPlot</a>&nbsp;is very effective; R incorporates a fairly comprehensive graphics package; Casey Reas&rsquo; and Ben Fry&rsquo;s&nbsp;<a href=”http://processing.org/”>Processing</a>is the state of the art, particularly if you need to create animations that show how things change over time. At IBM&rsquo;s&nbsp;<a href=”http://manyeyes.alphaworks.ibm.com/manyeyes/”>Many Eyes</a>, many of the visualizations are full-fledged interactive applications.</p>
<p>Nathan Yau&rsquo;s&nbsp;<a href=”http://flowingdata.com/”>FlowingData</a>&nbsp;blog is a great place to look for creative visualizations. One of my favorites is this animation of the&nbsp;<a href=”http://flowingdata.com/2010/04/07/watching-the-growth-of-walmart-now-with-100-more-sams-club/”>growth of Walmart</a>&nbsp;over time. And this is one place where &ldquo;art&rdquo; comes in: not just the aesthetics of the visualization itself, but how you understand it. Does it look like the spread of cancer throughout a body? Or the spread of a flu virus through a population? Making data tell its story isn&rsquo;t just a matter of presenting results; it involves making connections, then going back to other data sources to verify them. Does a successful retail chain spread like an epidemic, and if so, does that give us new insights into how economies work? That&rsquo;s not a question we could even have asked a few years ago. There was insufficient computing power, the data was all locked up in proprietary sources, and the tools for working with the data were insufficient. It&rsquo;s the kind of question we now ask routinely.</p>
<h2 id=”data-scientists”>Data scientists</h2>
<p>Data science requires skills ranging from traditional computer science to mathematics to art. Describing the data science group he put together at Facebook (possibly the first data science group at a consumer-oriented web property), Jeff Hammerbacher said:</p>
<blockquote>
<p>&hellip; on any given day, a team member could author a multistage processing pipeline in Python, design a hypothesis test, perform a regression analysis over data samples with R, design and implement an algorithm for some data-intensive product or service in Hadoop, or communicate the results of our analyses to other members of the organization&nbsp;<sup><a href=”http://radar.oreilly.com/2010/06/what-is-data-science.html#footnote-3″>3</a></sup></p&gt;
</blockquote>
<p>Where do you find the people this versatile? According to DJ Patil, chief scientist at&nbsp;<a href=”http://www.linkedin.com/”>LinkedIn</a>(<a href=”http://twitter.com/dpatil”>@dpatil</a&gt;), the best data scientists tend to be &ldquo;hard scientists,&rdquo; particularly physicists, rather than computer science majors. Physicists have a strong mathematical background, computing skills, and come from a discipline in which survival depends on getting the most from the data. They have to think about the big picture, the big problem. When you&rsquo;ve just spent a lot of grant money generating data, you can&rsquo;t just throw the data out if it isn&rsquo;t as clean as you&rsquo;d like. You have to make it tell its story. You need some creativity for when the story the data is telling isn&rsquo;t what you think it&rsquo;s telling.</p>
<p>Scientists also know how to break large problems up into smaller problems. Patil described the process of creating the group recommendation feature at LinkedIn. It would have been easy to turn this into a high-ceremony development project that would take thousands of hours of developer time, plus thousands of hours of computing time to do massive correlations across LinkedIn&rsquo;s membership. But the process worked quite differently: it started out with a relatively small, simple program that looked at members&rsquo; profiles and made recommendations accordingly. Asking things like, did you go to Cornell? Then you might like to join the Cornell Alumni group. It then branched out incrementally. In addition to looking at profiles, LinkedIn&rsquo;s data scientists started looking at events that members attended. Then at books members had in their libraries. The result was a valuable data product that analyzed a huge database &mdash; but it was never conceived as such. It started small, and added value iteratively. It was an agile, flexible process that built toward its goal incrementally, rather than tackling a huge mountain of data all at once.</p>
<p>This is the heart of what Patil calls &ldquo;data jiujitsu&rdquo; &mdash; using smaller auxiliary problems to solve a large, difficult problem that appears intractable. CDDB is a great example of data jiujitsu: identifying music by analyzing an audio stream directly is a very difficult problem (though not unsolvable &mdash; see&nbsp;<a href=”http://www.midomi.com/”>midomi</a&gt;, for example). But the CDDB staff used data creatively to solve a much more tractable problem that gave them the same result. Computing a signature based on track lengths, and then looking up that signature in a database, is trivially simple.</p>
<p></p>
<div>
<p>Hiring trends for data science</p>
<p><img alt=”” src=”http://s.radar.oreilly.com/2010/06/01/datascience-jobs.png&#8221; width=”580″ /></p>
<p>It&rsquo;s not easy to get a handle on jobs in data science. However, data from&nbsp;<a href=”http://radar.oreilly.com/research/”>O&rsquo;Reilly Research</a>shows a steady year-over-year increase in Hadoop and Cassandra job listings, which are good proxies for the &ldquo;data science&rdquo; market as a whole. This graph shows the increase in Cassandra jobs, and the companies listing Cassandra positions, over time.</p>
</div>
<p>Entrepreneurship is another piece of the puzzle. Patil&rsquo;s first flippant answer to &ldquo;what kind of person are you looking for when you hire a data scientist?&rdquo; was &ldquo;someone you would start a company with.&rdquo; That&rsquo;s an important insight: we&rsquo;re entering the era of products that are built on data. We don&rsquo;t yet know what those products are, but we do know that the winners will be the people, and the companies, that find those products. Hilary Mason came to the same conclusion. Her job as scientist at bit.ly is really to investigate the data that bit.ly is generating, and find out how to build interesting products from it. No one in the nascent data industry is trying to build the 2012 Nissan Stanza or Office 2015; they&rsquo;re all trying to find new products. In addition to being physicists, mathematicians, programmers, and artists, they&rsquo;re entrepreneurs.</p>
<p>Data scientists combine entrepreneurship with patience, the willingness to build data products incrementally, the ability to explore, and the ability to iterate over a solution. They are inherently interdiscplinary. They can tackle all aspects of a problem, from initial data collection and data conditioning to drawing conclusions. They can think outside the box to come up with new ways to view the problem, or to work with very broadly defined problems: &ldquo;here&rsquo;s a lot of data, what can you make from it?&rdquo;</p>
<p>The future belongs to the companies who figure out how to collect and use data successfully. Google, Amazon, Facebook, and LinkedIn have all tapped into their datastreams and made that the core of their success. They were the vanguard, but newer companies like bit.ly are following their path. Whether it&rsquo;s mining your personal biology, building maps from the shared experience of millions of travellers, or studying the URLs that people pass to others, the next generation of successful businesses will be built around data.&nbsp;<a href=”http://www.mckinseyquarterly.com/Hal_Varian_on_how_the_Web_challenges_managers_2286″>The part of Hal Varian&rsquo;s quote that nobody remembers says it all</a>:</p>
<blockquote>
<p><strong>The ability to take data &mdash; to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it &mdash; that&rsquo;s going to be a hugely important skill in the next decades.</strong></p>
</blockquote>
<p>Data is indeed the new Intel Inside.</p>
<p>&nbsp;</p>
<hr />
<p><strong>O&rsquo;Reilly publications related to data science</strong></p>
<p><a href=”http://oreilly.com/catalog/9780596801717/”>R in a Nutshell</a><br />A quick and practical reference to learn what is becoming the standard for developing statistical software.</p>
<p><a href=”http://oreilly.com/catalog/9780596510497/”>Statistics in a Nutshell</a><br />An introduction and reference for anyone with no previous background in statistics.</p>
<p><a href=”http://oreilly.com/catalog/9780596802363/”>Data Analysis with Open Source Tools</a><br />This book shows you how to think about data and the results you want to achieve with it.</p>
<p><a href=”http://oreilly.com/catalog/9780596529321/”>Programming Collective Intelligence</a><br />Learn how to build web applications that mine the data created by people on the Internet.</p>
<p><a href=”http://oreilly.com/catalog/9780596157128/”>Beautiful Data</a><br />Learn from the best data practitioners in the field about how wide-ranging &mdash; and beautiful &mdash; working with data can be.</p>
<p><a href=”http://oreilly.com/catalog/0636920000617/”>Beautiful Visualization</a><br />This book demonstrates why visualizations are beautiful not only for their aesthetic design, but also for elegant layers of detail.</p>
<p><a href=”http://oreilly.com/catalog/9780596527587/”>Head First Statistics</a><br />This book teaches statistics through puzzles, stories, visual aids, and real-world examples.</p>
<p><a href=”http://oreilly.com/catalog/9780596153946/”>Head First Data Analysis</a><br />Learn how to collect your data, sort the distractions from the truth, and find meaningful patterns.</p>
<p>&nbsp;</p>
<hr />
<p><sup>1</sup>&nbsp;The NASA article denies this, but also says that in 1984, they decided that the low values (whch went back to the 70s) were &ldquo;real.&rdquo; Whether humans or software decided to ignore anomalous data, it appears that data was ignored.</p>
<p><sup>2&nbsp;</sup>&ldquo;Information Platforms as Dataspaces,&rdquo; by Jeff Hammerbacher (in&nbsp;<em><a>Beautiful Data</a></em>)</p>
<p><sup>3&nbsp;</sup>&ldquo;Information Platforms as Dataspaces,&rdquo; by Jeff Hammerbacher (in&nbsp;<em><a>Beautiful Data</a></em>)</p>