Beautiful Soup Cheat Sheet
    Written on April 16, 2017
    
    
    
    
    
    Tweet
  Beautiful Soup Cheat Sheet
Navigating using tag name
.tag- The simplest way to navigate the parse tree is to say the name of the tag you want. However, using a tag name as an attribute will give you only the first tag by that name..contents- Gives a tag’s children as a list. The BeautifulSoup object itself has one child <html> tag..children- Gives a generator to iterate over a tag’s children..descendants- Lets you iterate over all of a tag’s children, recursively: its direct children, the children of its direct children, and so on..string- If a tag has only one child, and that child is a NavigableString, the child is made available as .string. If a tag’s only child is another tag, and that tag has a .string, then the parent tag is considered to have the same .string as its child. f a tag contains more than one thing, then it’s not clear what .string should refer to, so .string is defined to be None..strings- If there’s more than one thing inside a tag, you can still look at just the strings using this generator..stipped_strings- Use this generator to remove extra white spaces..parent- Access an element’s parent..parents- Iterate over all of an element’s parents.next_siblingand.previous_sibling- Navigate between page elements that are on the same level of the parse tree..next_siblingsand.previous_siblings- Iterate over a tag’s siblings..next_elementand.previous_elements- points to whatever was parsed immediately afterwards or before.
Searching the tree
- Kinds of filters
    
- string Pass a string to a search method and Beautiful Soup will perform a match against that exact string.
 - regular expression If you pass in a regular expression object, Beautiful Soup will filter against that regular expression using its match() method.
 - list If you pass in a list, Beautiful Soup will allow a string match against any item in that list.
 - True The value True matches everything it can.
 - function Define a function that takes an element as its only argument. The function should return True if the argument matches, and False otherwise.
 
 find_all- The method looks through a tag’s descendants and retrieves all descendants that match your filters.- name Pass in a value for name and you’ll tell Beautiful Soup to only consider tags with certain names.
 - keyword Any argument that’s not recognized will be turned into a filter on one of a tag’s attributes.
 - class_ You can search by CSS class using the this keyword as argument.
 - string You can search for strings instead of tags.
 - limit If you don’t need all the results, you can pass in a number for limit.
 - recursive If you only want Beautiful Soup to consider direct children, you can pass in recursive=False.
 
find- Rather than passing in limit=1 every time you call find_all, you can use the find() method. The only difference is that find_all() returns a list containing the single result, and find() just returns the result. If find_all() can’t find anything, it returns an empty list. If find() can’t find anything, it returns None.find_parentsandfind_parent- These methods work their way up the tree, looking at a tag’s (or a string’s) parents.find_next_siblingsandfind_next_sibling- These methods use .next_siblings to iterate over the rest of an element’s siblings in the tree. The find_next_siblings() method returns all the siblings that match, and find_next_sibling() only returns the first one.find_previous_siblingsandfind_previous_sibling- These methods use .previous_siblings to iterate over an element’s siblings that precede it in the tree. The find_previous_siblings() method returns all the siblings that match, and find_previous_sibling() only returns the first one.
Output
prettify- turns a Beautiful Soup parse tree into a nicely formatted Unicode string, with each HTML/XML tag on its own line.get_text()- returns all the text in a document or beneath a tag, as a single Unicode string.
Parsing only part of a document
SoupStrainer- allows you to choose which parts of an incoming document are parsed. You just create a SoupStrainer and pass it in to the BeautifulSoup constructor as the parse_only argument.